Willy Tarreau [Fri, 18 Jun 2010 07:57:45 +0000 (09:57 +0200)]
[BUG] stick_table: the fix for the memory leak caused a regression
(cherry picked from commit 61ba936e6858dfcf9964d25870726621d8188fb9)
[ note: the bug was finally not present in 1.5-dev but at least we
have to reset store_count to be compatible with 1.4 ]
Commit d6e9e3b5e320b957e6c491bd92d91afad30ba638 caused recently created
entries to be removed as soon as they were created, breaking stickiness.
It is not clear whether a use-after-free was possible or not in this case.
This bug was reported by Ben Congleton and narrowed down by Hervé Commowick,
both of whom also tested the fix. Thanks to them !
Willy Tarreau [Mon, 14 Jun 2010 17:09:21 +0000 (19:09 +0200)]
[MINOR] config: provide a function to quote args in a more friendly way
The quote_arg() function can be used to quote an argument or indicate
"end of line" if it's null or empty. It should be useful to more precisely
report location of problems in the configuration.
Willy Tarreau [Sun, 6 Jun 2010 15:58:34 +0000 (17:58 +0200)]
[MEDIUM] stick_table: separate storage and update of session entries
When an entry already exists, we just need to update its expiration
timer. Let's have a dedicated function for that instead of spreading
open code everywhere.
This change also ensures that an update of an existing sticky session
really leads to an update of its expiration timer, which was apparently
not the case till now. This point needs to be checked in 1.4.
This change makes use of the stick-tables to keep track of any source
address activity. Two ACLs make it possible to check the count of an
entry or update it and act accordingly. The typical usage will be to
reject a TCP request upon match of an excess value.
Willy Tarreau [Sun, 6 Jun 2010 13:38:59 +0000 (15:38 +0200)]
[MEDIUM] stick_table: don't overwrite data when storing an entry
Till now sticky sessions only held server IDs. Now there are other
data types so it is not acceptable anymore to overwrite the server ID
when writing something. The server ID must then only be written from
the caller when appropriate. Doing this has also led to separate
lookup and storage.
Willy Tarreau [Sun, 6 Jun 2010 12:30:13 +0000 (14:30 +0200)]
[MINOR] stick_table: add support for "conn_cum" data type.
This one can be parsed on the "stick-table" after with the "store"
keyword. It will hold the number of connections matching the entry,
for use with ACLs or anything else.
Willy Tarreau [Sun, 6 Jun 2010 11:34:54 +0000 (13:34 +0200)]
[MEDIUM] stick_table: add room for extra data types
The stick_tables will now be able to store extra data for a same key.
A limited set of extra data types will be defined and for each of them
an offset in the sticky session will be assigned at startup time. All
of this information will be stored in the stick table.
The extra data types will have to be specified after the new "store"
keyword of the "stick-table" directive, which will reserve some space
for them.
Willy Tarreau [Sun, 6 Jun 2010 11:22:23 +0000 (13:22 +0200)]
[CLEANUP] stick_table: move pattern to key functions to stick_table.c
pattern.c depended on stick_table while in fact it should be the opposite.
So we move from pattern.c everything related to stick_tables and invert the
dependency. That way the code becomes more logical and intuitive.
Willy Tarreau [Sun, 6 Jun 2010 10:57:10 +0000 (12:57 +0200)]
[CLEANUP] stick_table: rename some stksess struct members to avoid confusion
The name 'exps' and 'keys' in struct stksess was confusing because it was
the same name as in the table which holds all of them, while they only hold
one node each. Remove the trailing 's' to more clearly identify who's who.
Willy Tarreau [Sun, 6 Jun 2010 10:11:37 +0000 (12:11 +0200)]
[MINOR] stick_table: add support for variable-sized data
Right now we're only able to store a server ID in a sticky session.
The goal is to be able to store anything whose size is known at startup
time. For this, we store the extra data before the stksess pointer,
using a negative offset. It will then be easy to cumulate multiple
data provided they each have their own offset.
It's very disturbing to see the "denied req" counter increase without
any other session counter moving. In fact, we can't count a rejected
TCP connection as "denied req" as we have not yet instanciated any
session at all. Let's use a new counter for that.
Willy Tarreau [Sat, 5 Jun 2010 08:49:41 +0000 (10:49 +0200)]
[MEDIUM] frontend: count the incoming connection earlier
The frontend's connection was accounted for once the session was
instanciated. This was problematic because the early ACLs weren't
able to correctly account for the number of concurrent connections.
Now we count the connection once it is assigned to the frontend.
It also brings the nice advantage of being more symmetrical, because
the stream_sock's accept() does not have to account for that anymore,
only the session's accept() does.
Willy Tarreau [Fri, 4 Jun 2010 18:59:39 +0000 (20:59 +0200)]
[MINOR] session: differenciate between accepted connections and received connections
Now we're able to reject connections very early, so we need to use a
different counter for the connections that are received and the ones
that are accepted and converted into sessions, so that the rate limits
can still apply to the accepted ones. The session rate must still be
used to compute the rate limit, so that we can reject undesired traffic
without affecting the rate.
Willy Tarreau [Fri, 4 Jun 2010 10:25:31 +0000 (12:25 +0200)]
[MINOR] buffer: refine the flags that may wake an analyser up.
Analysers don't care (and must not care) about a few flags such as
BF_AUTO_CLOSE or BF_AUTO_CONNECT, so those flags should not be listed
in the BF_MASK_STATIC bitmask.
We should also recheck if some buffer flags should be ignored or not
in process_session() when deciding if we must loop again or not.
Willy Tarreau [Tue, 1 Jun 2010 15:45:26 +0000 (17:45 +0200)]
[MAJOR] frontend: split accept() into frontend_accept() and session_accept()
A new function session_accept() is now called from the lower layer to
instanciate a new session. Once the session is instanciated, the upper
layer's frontent_accept() is called. This one can be service-dependant.
That way, we have a 3-phase accept() sequence :
1) protocol-specific, session-less accept(), which is pointed to by
the listener. It defaults to the generic stream_sock_accept().
2) session_accept() which relies on a frontend but not necessarily
for use in a proxy (eg: stats or any future service).
3) frontend_accept() which performs the accept for the service
offerred by the frontend. It defaults to frontend_accept() which
is really what is used by a proxy.
The TCP/HTTP proxies have been moved to this mode so that we can now rely on
frontend_accept() for any type of session initialization relying on a frontend.
The next step will be to convert the stats to use the same system for the stats.
Willy Tarreau [Tue, 1 Jun 2010 15:12:40 +0000 (17:12 +0200)]
[MAJOR] frontend: reorder the session initialization upon accept
This will be needed for the last factoring step which adds support
for application-level accept(). The tcp/http accept() code has now
been isolated and will have to move to a separate function.
Willy Tarreau [Tue, 1 Jun 2010 08:56:34 +0000 (10:56 +0200)]
[MINOR] frontend: rely on the frontend and not the backend for INDEPSTR
Till now, the frontend relied on the backend's options for INDEPSTR,
while at the time of accept, the frontend and backend are the same.
So we now use the frontend's pointer instead of the backend and we
don't have any dependency on the backend anymore in the frontend's
accept code.
Willy Tarreau [Tue, 1 Jun 2010 08:36:43 +0000 (10:36 +0200)]
[MEDIUM] session: don't assign conn_retries upon accept() anymore
The conn_retries attribute is now assigned when switching from SI_ST_INI
to SI_ST_REQ. This eliminates one of the last dependencies on the backend
in the frontend's accept() function.
Willy Tarreau [Tue, 1 Jun 2010 07:51:00 +0000 (09:51 +0200)]
[MEDIUM] session: move the conn_retries attribute to the stream interface
The conn_retries still lies in the session and its initialization depends
on the backend when it may not yet be known. Let's first move it to the
stream interface.
Willy Tarreau [Mon, 31 May 2010 17:17:12 +0000 (19:17 +0200)]
[MAJOR] frontend: don't initialize the server-side stream_int anymore
The frontend has no reason to initialize the server-side stream_interface.
It's a leftover from old times which now makes no sense due to the fact
that we don't know in the frontend whether the other side will be a socket,
a task or anything else. Removing this part is possible due to previous
patches which perform the initialization at the proper place. We'll still
have to be able to register an I/O handler for situations where everything
is known only to the frontend (eg: unix stats socket), before merging the
various instanciations of this accept() function.
Willy Tarreau [Mon, 31 May 2010 15:44:19 +0000 (17:44 +0200)]
[MEDIUM] backend: initialize the server stream_interface upon connect()
It's not normal to initialize the server-side stream interface from the
accept() function, because it may change later. Thus, we introduce a new
stream_sock_prepare_interface() function which is called just before the
connect() and which sets all of the stream_interface's callbacks to the
default ones used for real sockets. The ->connect function is also set
at the same instant so that we can easily add new server-side protocols
soon.
Willy Tarreau [Mon, 31 May 2010 10:31:35 +0000 (12:31 +0200)]
[MEDIUM] session: initialize server-side timeouts after connect()
It was particularly embarrassing that the server timeout was assigned
to buffers during an accept() just to be potentially changed later in
case of a use_backend rule. The frontend side has nothing to do with
server timeouts.
Now we initialize them right after the connect() succeeds. Later this
should change for a unique stream-interface timeout setting only.
Willy Tarreau [Mon, 31 May 2010 09:57:51 +0000 (11:57 +0200)]
[MEDIUM] session: finish session establishment sequence in with I/O handlers
Calling sess_establish() upon a successful connect() was essential, but
it was not clearly stated whether it was necessary for an access to an
I/O handler or not. While it would be desired, having it automatically
add the response analyzers is quite a problem, and it breaks HTTP stats.
The solution is thus not to call it for now and to perform the few response
initializations as needed.
For the long term, we need to find a way to specify the analyzers to install
during a stream_int_register_handler() if any.
Willy Tarreau [Mon, 31 May 2010 09:27:58 +0000 (11:27 +0200)]
[CLEANUP] buffer->cto is not used anymore
The connection timeout stored in the buffer has not been used since the
stream interface were introduced. Let's get rid of it as it's one of the
things that complicate factoring of the accept() functions.
Willy Tarreau [Mon, 31 May 2010 08:56:17 +0000 (10:56 +0200)]
[MINOR] frontend: only check for monitor-net rules if LI_O_CHK_MONNET is set
We can disable the monitor-net rules on a listener if this flag is not
set in the listener's options. This will be useful when we don't want
to check that fe->addr is set or not for non-TCP frontends.
Willy Tarreau [Mon, 31 May 2010 08:30:33 +0000 (10:30 +0200)]
[MEDIUM] frontend: check for LI_O_TCP_RULES in the listener
The new LI_O_TCP_RULES listener option indicates that some TCP rules
must be checked upon accept on this listener. It is now checked by
the frontend and the L4 rules are evaluated only in this case. The
flag is only set when at least one tcp-req rule is present in the
frontend.
The L4 rules check function has now been moved to proto_tcp.c where
it ought to be.
Willy Tarreau [Sun, 23 May 2010 20:59:00 +0000 (22:59 +0200)]
[MEDIUM] tcp: check for pure layer4 rules immediately after accept()
The tcp inspection rules were fast but were only processed after a
schedule had occurred and all resources were allocated. When defending
against DDoS, it's important to be able to apply some protection the
earliest possible instant.
Thus we introduce a new set of rules : tcp-request rules which act
on pure layer4 information (no content). They are evaluated even
before the buffers are allocated for the session, saving as much
time as possible. That way it becomes possible to check an incoming
connection's source IP address against a list of authorized/blocked
networks, and immediately drop the connection.
The rules are checked even before we perform any socket-specific
operation, so that we can optimize the reject case, which will be the
problematic one during a DDoS. The second stream interface and s->txn
are also now initialized after the rules are parsed for the same
reason. All these optimisations have permitted to reach up to 212000
connnections/s with a real rule rejecting based on the source IP
address.
Willy Tarreau [Fri, 28 May 2010 16:46:57 +0000 (18:46 +0200)]
[MEDIUM] separate protocol-level accept() from the frontend's
For a long time we had two large accept() functions, one for TCP
sockets instanciating proxies, and another one for UNIX sockets
instanciating the stats interface.
A lot of code was duplicated and both did not work exactly the same way.
Now we have a stream_sock layer accept() called for either TCP or UNIX
sockets, and this function calls the frontend-specific accept() function
which does the rest of the frontend-specific initialisation.
Some code is still duplicated (session & task allocation, stream interface
initialization), and might benefit from having an intermediate session-level
accept() callback to perform such initializations. Still there are some
minor differences that need to be addressed first. For instance, the monitor
nets should only be checked for proxies and not for other connection templates.
Last, we renamed l->private as l->frontend. The "private" pointer in
the listener is only used to store a frontend, so let's rename it to
eliminate this ambiguity. When we later support detached listeners
(eg: FTP), we'll add another field to avoid the confusion.
Willy Tarreau [Mon, 24 May 2010 19:02:37 +0000 (21:02 +0200)]
[CLEANUP] rename client -> frontend
The 'client.c' file now only contained frontend-specific functions,
so it has naturally be renamed 'frontend.c'. Same for client.h. This
has also been an opportunity to remove some cross references from
files that should not have depended on it.
In the end, this file should contain a protocol-agnostic accept()
code, which would initialize a session, task, etc... based on an
accept() from a lower layer. Right now there are still references
to TCP.
Willy Tarreau [Mon, 24 May 2010 18:55:15 +0000 (20:55 +0200)]
[CLEANUP] client: move some ACLs away to their respective locations
Some ACLs in the client ought to belong to proto_tcp, or protocols.
This file should only contain frontend-specific information and will
be renamed that way in next commit.
Willy Tarreau [Mon, 24 May 2010 18:27:29 +0000 (20:27 +0200)]
[CLEANUP] tcp: move some non tcp-specific layer6 processing out of proto_tcp
Some functions which act on generic buffer contents without being
tcp-specific were historically in proto_tcp.c. This concerns ACLs
and RDP cookies. Those have been moved away to more appropriate
locations. Ideally we should create some new files for each layer6
protocol parser. Let's do that later.
Willy Tarreau [Sun, 23 May 2010 15:27:44 +0000 (17:27 +0200)]
[MINOR] accept: count the incoming connection earlier
Right now we count the incoming connection only once everything has
been allocated. Since we're planning on considering early ACL rules,
we need to count the connection earlier.
Willy Tarreau [Sun, 23 May 2010 10:24:38 +0000 (12:24 +0200)]
[CLEANUP] acl: use 'L6' instead of 'L4' in ACL flags relying on contents
Just like we do on health checks, we should consider that ACLs that make
use of buffer data are layer 6 and not layer 4, because we'll soon have
to distinguish between pure layer 4 ACLs (without any buffer) and these
ones.
Willy Tarreau [Sun, 6 Jun 2010 16:28:49 +0000 (18:28 +0200)]
[BUG] stick_table: fix possible memory leak in case of connection error
If a "stick store-request" rule is present, an entry is preallocated during
the request. However, if there is no response due to an error or to a redir
mode server, we never release it.
Willy Tarreau [Mon, 7 Jun 2010 12:06:08 +0000 (14:06 +0200)]
[BUG] debug: correctly report truncated messages
By using msg->sol as the beginning of a message, wrong messages were
displayed in debug mode when they were truncated on the last line,
because msg->sol points to the beginning of the last line. Use
data+msg->som instead.
Willy Tarreau [Mon, 7 Jun 2010 11:57:32 +0000 (13:57 +0200)]
[BUG] debug: wrong pointer was used to report a status line
This would only be wrong when the server has not completely responded yet.
Fix two other occurrences of wrong rsp<->sl associations which were harmless
but wrong anyway.
Willy Tarreau [Mon, 7 Jun 2010 08:40:48 +0000 (10:40 +0200)]
[BUG] proxy: connection rate limiting was eating lots of CPU
The rate-limit feature relied on a timer to define how long a frontend
must remain idle. It was not considering the pending connections, so it
was almost always ready to be used again and only the accept's limit was
preventing new connections from coming in. By accounting for the pending
connection, we can compute a correct delay and effectively make the
frontend go idle for that (short) time.
Willy Tarreau [Mon, 7 Jun 2010 20:27:41 +0000 (22:27 +0200)]
[BUG] http: automatically close response if req is aborted
Latest BF_READ_ATTACHED fix has unveiled a nice issue with the way
HTTP requests and responses are forwarded. The case where the request
aborts after the response has responded (POST with early response)
forgot to re-enable auto-close on the response. In fact it still
worked thanks to a side effect as long as BF_READ_ATTACHED was there
to force the states to be resynced (and the flags). Since last fix,
the missing auto-close causes CLOSE_WAIT connections when the client
aborts too late during a data transfer.
The right fix consists in considering the situation where the client
experiences an error and to explicitly abort the transfer. There is
no need to wake the response analysers up for that since they'd have
no added value and the analysers flags are cleared. However for a
future usage, that might help (eg: stickiness, ...).
This fix should be backported to 1.4 if the previous one is backported
too. After all the non-reg tests, the risks to see a problem arise
without both patches seems low, and both patches touch sensible areas
of the code. So there's no hurry.
Willy Tarreau [Fri, 4 Jun 2010 09:40:20 +0000 (11:40 +0200)]
[BUG] session: clear BF_READ_ATTACHED before next I/O
The BF_READ_ATTACHED flag was created to wake analysers once after
a connection was established. It turns out that this flag is never
cleared once set, so even if there is no event, some analysers are
still evaluated for no reason.
The bug was introduced with commit ea38854d34675d5472319c453b7027af42fe8aab.
It may cause slightly increased CPU usages during data transfers, maybe
even quite noticeable once when transferring transfer-encoded data,
due to the fact that the request analysers are being checked for every
chunk.
This fix must be backported in 1.4 after all non-reg tests have been
completed.
Willy Tarreau [Tue, 1 Jun 2010 17:45:06 +0000 (19:45 +0200)]
[BUG] client: always ensure to zero rep->analysers
The response analyser was not emptied upon creation of a new session. In
fact it was always zero just because last session leaved it in a zero state,
but in case of shared pools this cannot be guaranteed. The net effect is
that it was possible to have some HTTP (or any other) analysers on the
response path of a stats unix socket, which would reject the response.
Willy Tarreau [Mon, 31 May 2010 15:01:36 +0000 (17:01 +0200)]
[BUG] http: the transaction must be initialized even in TCP mode (part 2)
Commit 4605e3b641cebbdbb2ee5726e5bbc3c03a2d7b5e was not enough, because
connections passing from a TCP frontend to an HTTP backend without any
ACL and via a "default_backend" statement were still working on non-initialized
data. An initialization was missing in the session_set_backend() function, next
to the initialization of hdr_idx.
Willy Tarreau [Thu, 27 May 2010 16:17:30 +0000 (18:17 +0200)]
[CONTRIB] halog: report per-server status codes, errors and response times
It's sometimes very useful to be able to monitor a production status in real
time by comparing servers behaviours. Now halog is able to do this when called
with "-srv". It reports various fields for each server found in a log, including
statuses, total reqs, valid reqs, percent of valid reqs, average connection time,
average response time.
Willy Tarreau [Tue, 25 May 2010 21:03:02 +0000 (23:03 +0200)]
[BUG] consistent hash: balance on all servers, not only 2 !
It was once reported at least by Dirk Taggesell that the consistent
hash had a very poor distribution, making use of only two servers.
Jeff Persch analysed the code and found the root cause. Consistent
hash makes use of the server IDs, which are completed after the chash
array initialization. This implies that each server which does not
have an explicit "id" parameter will be merged at the same place in
the chash tree and that in the end, only the first or last servers
may be used.
The now obvious fix (thanks to Jeff) is to assign the missing IDs
earlier. However, it should be clearly understood that changing a
hash algorithm on live systems will rebalance the whole system.
Anyway, the only affected users will be the ones for which the
system is quite unbalanced already. The ones who fix their IDs are
not affected at all.
Kudos to Jeff for spotting that bug which got merged 3 days after
the consistent hashing !
Willy Tarreau [Thu, 20 May 2010 14:17:07 +0000 (16:17 +0200)]
[BUG] http: the transaction must be initialized even in TCP mode
When running in pure TCP mode with a traffic inspection rule to detect
HTTP protocol, we have to initialize the HTTP transaction too. The
effect of not doing this was that some incoming connections could have
been matched as carrying HTTP protocol eventhough this was not the case.
Willy Tarreau [Thu, 20 May 2010 09:49:03 +0000 (11:49 +0200)]
[BUG] http: dispatch and http_proxy modes were broken for a long time
Both dispatch and http_proxy modes were broken since 1.4-dev5 when
the adjustment of server health based on response codes was introduced.
In fact, in these modes, s->srv == NULL. The result is a plain segfault.
It should have been noted critical, but the fact that it remained 6
months without being noticed indicates that almost nobody uses these
modes anymore. Also, the crash is immediate upon first request.
Further versions should not be affected anymore since it's planned to
have a dummy server instead of these annoying NULL pointers.
Willy Tarreau [Sun, 16 May 2010 20:34:28 +0000 (22:34 +0200)]
[RELEASE] Released version 1.4.6
Released version 1.4.6 with the following main changes :
- [BUILD] ebtree: update to v6.0.1 to remove references to dprintf()
- [CLEANUP] acl: make use of eb_is_empty() instead of open coding the tree's emptiness test
- [MINOR] acl: add srv_is_up() to check that a specific server is up or not
- [DOC] add a few precisions about the use of RDP cookies
Willy Tarreau [Sun, 16 May 2010 20:31:05 +0000 (22:31 +0200)]
[DOC] add a few precisions about the use of RDP cookies
RDP cookies are not necessarily easy to implement because they require
some configuration on the servers. Add a few hints so that people know
what to check on their servers.
Willy Tarreau [Sun, 16 May 2010 20:18:27 +0000 (22:18 +0200)]
[MINOR] acl: add srv_is_up() to check that a specific server is up or not
This ACL was missing in complex setups where the status of a remote site
has to be considered in switching decisions. Until there, using a server's
status in an ACL required to have a dedicated backend, which is a bit heavy
when multiple servers have to be monitored.
Willy Tarreau [Thu, 13 May 2010 20:17:08 +0000 (22:17 +0200)]
[RELEASE] Released version 1.4.5
Released version 1.4.5 with the following main changes :
- [DOC] report minimum kernel version for tproxy in the Makefile
- [MINOR] add the "ignore-persist" option to conditionally ignore persistence
- [DOC] add the "ignore-persist" option to conditionally ignore persistence
- [DOC] fix ignore-persist/force-persist documentation
- [BUG] cttproxy: socket fd leakage in check_cttproxy_version
- [DOC] doc/configuration.txt: fix typos
- [MINOR] option http-pretend-keepalive is both for FEs and BEs
- [MINOR] fix possible crash in debug mode with invalid responses
- [MINOR] halog: add support for statisticts on status codes
- [OPTIM] halog: use a faster zero test in fgets()
- [OPTIM] halog: minor speedup by using unlikely()
- [OPTIM] halog: speed up fgets2-64 by about 10%
- [DOC] refresh the README file and merge the CONTRIB file into it
- [MINOR] acl: support loading values from files
- [MEDIUM] ebtree: upgrade to version 6.0
- [MINOR] acl trees: add flags and union members to store values in trees
- [MEDIUM] acl: add ability to insert patterns in trees
- [MEDIUM] acl: add tree-based lookups of exact strings
- [MEDIUM] acl: add tree-based lookups of networks
- [MINOR] acl: ignore empty lines and comments in pattern files
- [MINOR] stick-tables: add support for "stick on hdr"
Willy Tarreau [Wed, 12 May 2010 06:08:50 +0000 (08:08 +0200)]
[MINOR] stick-tables: add support for "stick on hdr"
It is now possible to stick on an IP address found in a HTTP header. Right
now only the last occurrence of the header can be used, which is generally
enough for most uses. Also, the header extraction rule only knows how to
convert the header to IP. Later it will be usable as a plain string with
an implicit conversion, and the syntax will not change.
Willy Tarreau [Thu, 13 May 2010 20:07:43 +0000 (22:07 +0200)]
[MINOR] acl: ignore empty lines and comments in pattern files
Most often, pattern files used by ACLs will be produced by tools
which emit some comments (eg: geolocation lists). It's very annoying
to have to clean the files before using them, and it does not make
much sense to be able to support patterns we already can't input in
the config file. So this patch makes the pattern file loader skip
lines beginning with a sharp and the empty ones, and strips leading
spaces and tabs.
Willy Tarreau [Thu, 13 May 2010 18:03:41 +0000 (20:03 +0200)]
[MEDIUM] acl: add tree-based lookups of networks
Networks patterns loaded from files for longest match ACL testing
will now be arranged into a prefix tree. This is possible thanks to
the new prefix features in ebtree v6.0. Longest match testing is
slightly slower than exact data maching. However, the measured impact
of running at 42000 requests per second and testing whether the IP
address found in a header belongs to a list of 52000 networks or
not is 3% CPU (increase from 66% to 69%). This is low enough to
permit true geolocation based on huge tables.
Willy Tarreau [Mon, 10 May 2010 21:42:40 +0000 (23:42 +0200)]
[MEDIUM] acl: add tree-based lookups of exact strings
Now if some ACL patterns are loaded from a file and the operation is
an exact string match, the data will be arranged in a tree, yielding
a significant performance boost on large data sets. Note that this
only works when case is sensitive.
A new dedicated function, acl_lookup_str(), has been created for this
matching. It is called for every possible input data to test and it
looks the tree up for the data. Since the keywords are loosely typed,
we would have had to add a new columns to all keywords to adjust the
function depending on the type. Instead, we just compare on the match
function. We call acl_lookup_str() when we could use acl_match_str().
The tree lookup is performed first, then the remaining patterns are
attempted if the tree returned nothing.
A quick test shows that when matching a header against a list of 52000
network names, haproxy uses 68% of one core on a core2-duo 3.2 GHz at
42000 requests per second, versus 66% without any rule, which means
only a 2% CPU increase for 52000 rules. Doing the same test without
the tree leads to 100% CPU at 6900 requests/s. Also it was possible
to run the same test at full speed with about 50 sets of 52000 rules
without any measurable performance drop.
Willy Tarreau [Tue, 11 May 2010 21:25:05 +0000 (23:25 +0200)]
[MEDIUM] acl: add ability to insert patterns in trees
The code is now ready to support loading pattern from filesinto trees. For
that, it will be required that the ACL keyword has a flag ACL_MAY_LOOKUP
and that the expr is case sensitive. When that is true, the pattern will
have a flag ACL_PAT_F_TREE_OK to indicate that it is possible to feed the
tree instead of a usual pattern if the parsing function is able to do this.
The tree's root is pre-initialized in the pattern's value so that the
function can easily find it. At that point, if the parsing function decides
to use the tree, it just sets ACL_PAT_F_TREE in the return flags so that
the caller knows the tree has been used and the pattern can be recycled.
That way it will be possible to load some patterns into the tree when it
is compatible, and other ones as linear linked lists. A good example of
this might be IPv4 network entries : right now we support holes in masks,
but this very rare feature is not compatible with binary lookup in trees.
So the parser will be able to decide itself whether the pattern can go to
the tree or not.
Willy Tarreau [Mon, 10 May 2010 20:29:06 +0000 (22:29 +0200)]
[MINOR] acl trees: add flags and union members to store values in trees
If we want to be able to match ACLs against a lot of possible values, we
need to put those values in trees. That will only work for exact matches,
which is normally just what is needed.
Right now, only IPv4 and string matching are planned, but others might come
later.
Willy Tarreau [Sun, 9 May 2010 17:29:23 +0000 (19:29 +0200)]
[MEDIUM] ebtree: upgrade to version 6.0
This version adds support for prefix-based matching of memory blocks,
as well as some code-size and performance improvements on the generic
code. It provides a prefix insertion and longest match which are
compatible with the rest of the common features (walk, duplicates,
delete, ...). This is typically used for network address matching. The
longest-match code is a bit slower than the original memory block
handling code, so they have not been merged together into generic
code. Still it's possible to perform about 10 million networks lookups
per second in a set of 50000, so this should be enough for most usages.
This version also fixes some bugs in parts that were not used, so there
is no need to backport them.
Willy Tarreau [Sun, 9 May 2010 21:45:24 +0000 (23:45 +0200)]
[MINOR] acl: support loading values from files
The "acl XXX -f <file>" syntax was supported but nothing was read from
the file. This is now possible. All lines are merged verbatim, even if
they contain spaces (useful for user-agents). There are shortcomings
though. The worst one is that error reporting is too approximative.
Willy Tarreau [Sun, 9 May 2010 20:37:12 +0000 (22:37 +0200)]
[DOC] refresh the README file and merge the CONTRIB file into it
Patrick Mézard reported that it was a bit awkward to have the CONTRIB
and contrib entries in the source archive since those can conflict on
case-insensitive file systems. That made a good opportunity to refresh
the README file and to remove that old outdated file.
Delta Yeh [Mon, 3 May 2010 14:08:33 +0000 (22:08 +0800)]
[BUG] cttproxy: socket fd leakage in check_cttproxy_version
in cttproxy.c check_cttproxy_version socket is not closed before function
returned. Although it is called only once, I think it is better to close
the socket.
Willy Tarreau [Tue, 4 May 2010 08:47:57 +0000 (10:47 +0200)]
[OPTIM] halog: use a faster zero test in fgets()
A new idea came up to detect the presence of a null byte in a word.
It saves several operations compared to the previous one, and eliminates
the jumps (about 6 instructions which can run 2-by-2 in parallel).
This sole optimisation improved the line count speed by about 30%.
[MINOR] fix possible crash in debug mode with invalid responses
When trying to display an invalid request or response we received,
we must at least check that we have identified something looking
like a start of message, otherwise we can dereference a NULL pointer.
Shame on me, I didn't correctly document the "ignore-persist" statement
(convinced I used it like this in my tests, which is not the case at all...)
This fixes the doc and updates the proxy keyword matrix to add "force-persist".
[MINOR] add the "ignore-persist" option to conditionally ignore persistence
This is used to disable persistence depending on some conditions (for
example using an ACL matching static files or a specific User-Agent).
You can see it as a complement to "force-persist".
In the configuration file, the force-persist/ignore-persist declaration
order define the rules priority.
Used with the "appsesion" keyword, it can also help reducing memory usage,
as the session won't be hashed the persistence is ignored.
Released version 1.4.4 with the following main changes :
- [BUG] appsession should match the whole cookie name
- [CLEANUP] proxy: move PR_O_SSL3_CHK to options2 to release one flag
- [MEDIUM] backend: move the transparent proxy address selection to backend
- [MINOR] add very fast IP parsing functions
- [MINOR] add new tproxy flags for dynamic source address binding
- [MEDIUM] add ability to connect to a server from an IP found in a header
- [BUILD] config: last patch breaks build without CONFIG_HAP_LINUX_TPROXY
- [MINOR] http: make it possible to pretend keep-alive when doing close
- [MINOR] config: report "default-server" instead of "(null)" in error messages
[BUG] appsession should match the whole cookie name
I met a strange behaviour with appsession.
I firstly thought this was a regression due to one of my previous patch
but after testing with a 1.3.15.12 version, I also could reproduce it.
To illustrate, the configuration contains :
appsession PHPSESSID len 32 timeout 1h
Then I call a short PHP script containing :
setcookie("P", "should not match")
When calling this script thru haproxy, the cookie "P" matches the appsession rule :
Dumping hashtable 0x11f05c8
table[1572]: should+not+match
Shouldn't it be ignored ?
If you confirm, I'll send a patch for 1.3 and 1.4 branches to check that the
cookie length is equal to the appsession name length.
This is due to the comparison length, where the cookie length is took into
account instead of the appsession name length. Using the appsession name
length would allow ASPSESSIONIDXXX (+ check that memcmp won't go after the
buffer size).
Also, while testing, I noticed that HEAD requests where not available for
URIs containing the appsession parameter. 1.4.3 patch fixes an horrible
segfault I missed in a previous patch when appsession is not in the
configuration and HAProxy is compiled with DEBUG_HASH.
[MINOR] http: make it possible to pretend keep-alive when doing close
Some servers do not completely conform with RFC2616 requirements for
keep-alive when they receive a request with "Connection: close". More
specifically, they don't bother using chunked encoding, so the client
never knows whether the response is complete or not. One immediately
visible effect is that haproxy cannot maintain client connections alive.
The second issue is that truncated responses may be cached on clients
in case of network error or timeout.
Óscar Frías Barranco reported this issue on Tomcat 6.0.20, and
Patrik Nilsson with Jetty 6.1.21.
Cyril Bonté proposed this smart idea of pretending we run keep-alive
with the server and closing it at the last moment as is already done
with option forceclose. The advantage is that we only change one
emitted header but not the overall behaviour.
Since some servers such as nginx are able to close the connection
very quickly and save network packets when they're aware of the
close negociation in advance, we don't enable this behaviour by
default.
"option http-pretend-keepalive" will have to be used for that, in
conjunction with "option http-server-close".