The two variants now do exactly the same (appending at the tail of the
buffer) so let's not keep the distinction between these classes of
functions and have generic ones for this. It's also worth noting that
b{i,o}_putchk() wasn't used at all and was removed.
MINOR: buffer: replace bi_fast_delete() with b_del()
There's no distinction between in and out data now. The latter covers
the needs of the former and supports wrapping. The extra cost is
negligible given the locations where it's used.
Olivier Houchard [Fri, 22 Jun 2018 17:26:39 +0000 (19:26 +0200)]
MEDIUM: buffers: move "output" from struct buffer to struct channel
Since we never access this field directly anymore, but only through the
channel's wrappers, it can now move to the channel. The buffers are now
completely free from the distinction between input and output data.
Since we use "_data" for the amount of data at many places, as opposed to
"_space" for the amount of space, let's rename the "data" field to "area"
so that we can reuse "data" later for the amount of data in the buffer
(currently called "len" despite not being contigous).
MINOR: buffer: b_set_data() doesn't truncate output data anymore
b_set_data() is used :
- in proto_http and hlua to trim input data (b_set_data(co_data()))
- in SPOE to append data to a buffer while building a message
In no case will this truncate a buffer so we can safely remove the
test for len < b->output.
MINOR: buffer: remove the check for output on b_del()
b_del() is used in :
- mux_h2 with the demux buffer : always processes input data
- checks with output data though output is not considered at all there
- b_eat() which is not used anywhere
- co_skip() where the len is always <= output
Thus the distinction for output data is not needed anymore and the
decrement can be made inconditionally in co_skip().
Willy Tarreau [Fri, 29 Jun 2018 16:42:02 +0000 (18:42 +0200)]
MAJOR: start to change buffer API
This is intentionally the minimal and safest set of changes, some cleanups
area still required. These changes are quite tricky and cannot be
independantly tested, so it's important to keep this patch as bisectable
as possible.
buf_empty and buf_wanted were changed and are now exactly similar since
there's no <p> member in the structure anymore. Given that no test is
ever made in the code to check that buf == &buf_wanted, it may be possible
that we don't need to have two anymore, unless some buf_empty tests have
precedence. This will have to be investigated.
A significant part of this commit affects the HTTP compression code,
which used to deeply manipulate the input and output buffers without
any reasonable solution for a better abstraction. For this reason, if
any regression is met and designates this patch as the culprit, it is
important to run tests which specifically involve compression or which
definitely don't use it in order to spot the issue.
Willy Tarreau [Fri, 15 Jun 2018 16:07:57 +0000 (18:07 +0200)]
MINOR: lua: use the wrappers instead of directly manipulating buffer states
This replaces chn->buf->p with ci_head(chn), chn->buf->o with co_data(chn)
and chn->buf->i with ci_data(chn). This is in order to help porting to the
new buffer API.
Olivier Houchard [Fri, 29 Jun 2018 16:16:31 +0000 (18:16 +0200)]
MEDIUM: compression: start to move to the new buffer API
This part is tricky, it passes a channel where we used to have a buffer,
in order to reduce the API changes during the big switch. This way all
the channel's wrappers to distinguish between input and output are
available. It also makes sense given that the compression applies on
a channel since it's in the forwarding path.
Willy Tarreau [Tue, 19 Jun 2018 06:03:19 +0000 (08:03 +0200)]
MEDIUM: h1: port to new buffer API.
The parser now uses the channel exclusively to access the data. In order
to avoid the cost of indirection, a local variable "input" was added to
the function that replaces buf->p. Given that this part is on the critical
path, it will have to be tested again for any visible performance loss.
Willy Tarreau [Fri, 15 Jun 2018 16:31:02 +0000 (18:31 +0200)]
MEDIUM: http: use wrappers instead of directly manipulating buffers states
This is aimed at easing the transition to the new API. There are a few places
which deserve some simplifications afterwards because ci_head() is called
often and may be placed into a local pointer.
Willy Tarreau [Tue, 19 Jun 2018 05:03:14 +0000 (07:03 +0200)]
MINOR: stream-int: use the new buffer API
A few locations still accessing ->i and ->o directly were changed to
use ci_data() and co_data() respectively. A call to b_del() was replaced
with co_set_data() in si_cs_send() so that ->o will is automatically be
decremented after the migration.
Willy Tarreau [Fri, 15 Jun 2018 17:37:42 +0000 (19:37 +0200)]
MEDIUM: spoe: use the new buffer API for the SPOE buffer
The buffer is not used as a forwarding buffer so we can simply map ->i
to ->len and ->p to b_head(). It *seems* that p is never modified, so
that we could even always use b_orig(). This needs to be rechecked.
Willy Tarreau [Mon, 18 Jun 2018 11:33:09 +0000 (13:33 +0200)]
MEDIUM: h2: update to the new buffer API
There is no more distinction between ->i and ->o for the mux's buffers,
we always use b_data() to know the buffer's length since only one side
is used for each direction.
Willy Tarreau [Mon, 18 Jun 2018 09:11:07 +0000 (11:11 +0200)]
MINOR: checks: adapt to the new buffer API
The code exclusively used ->i for data received and ->o for data sent. Now
it always uses b_data(), b_head() and b_tail() so that there is no more
distinction between ->i and ->o.
MINOR: buffer: replace buffer_empty() with b_empty() or c_empty()
For the same consistency reasons, let's use b_empty() at the few places
where an empty buffer is expected, or c_empty() if it's done on a channel.
Some of these places were there to realign the buffer so
{b,c}_realign_if_empty() was used instead.
Willy Tarreau [Fri, 15 Jun 2018 11:59:36 +0000 (13:59 +0200)]
MINOR: buffer: use b_room() to determine available space in a buffer
We used to have variations around buffer_total_space() and
size-buffer_len() or size-b_data(). Let's simplify all this. buffer_len()
was also removed as not used anymore.
Willy Tarreau [Fri, 15 Jun 2018 11:45:17 +0000 (13:45 +0200)]
MINOR: buffer: get rid of b_ptr() and convert its last users
Now the new API functions are being used everywhere, we can get rid
of b_ptr(). A few last users like bi_istput() and bo_istput() appear
to only differ by what part of the buffer they're increasing, but
that should quickly be merged.
Willy Tarreau [Tue, 19 Jun 2018 04:23:38 +0000 (06:23 +0200)]
MINOR: connection: add a new receive flag : CO_RFL_BUF_WET
With this flag we introduce the notion of "dry" vs "wet" buffers : some
demultiplexers like the H2 mux require as much room as possible for some
operations that are not retryable like decoding a headers frame. For this
they need to know if the buffer is congested with data scheduled for
leaving soon or not. Since the new API will not provide this information
in the buffer itself, the caller must indicate it. We never need to know
the amount of such data, just the fact that the buffer is not in its
optimal condition to be used for receipt. This "CO_RFL_BUF_WET" flag is
used to mention that such outgoing data are still pending in the buffer
and that a sensitive receiver should better let it "dry" before using it.
Willy Tarreau [Tue, 19 Jun 2018 04:15:17 +0000 (06:15 +0200)]
MINOR: connection: add a flags argument to rcv_buf()
The mux and transport rcv_buf() now takes a "flags" argument, just like
the snd_buf() one or like the equivalent syscall lower part. The upper
layers will use this to pass some information such as indicating whether
the buffer is free from outgoing data or if the lower layer may allocate
the buffer itself.
MEDIUM: mux: make mux->rcv_buf() take a size_t for the count
It also returns a size_t. This is in order to clean the API. Note
that the H2 mux still uses some ints in the functions called from
h2_rcv_buf(), though it's not really a problem given that H2 frames
are smaller. It may deserve a general cleanup later though.
MEDIUM: connection: make xprt->rcv_buf() use size_t for the count
Just like we have a size_t for xprt->snd_buf(), we adjust to use size_t
for rcv_buf()'s count argument and return value. It also removes the
ambiguity related to the possibility to see a negative value there.
Willy Tarreau [Thu, 14 Jun 2018 16:38:55 +0000 (18:38 +0200)]
MEDIUM: mux: make mux->snd_buf() take the byte count in argument
This way the mux doesn't need to modify the buffer's metadata anymore
nor to know the output's size. The mux->snd_buf() function now takes a
const buffer and it's up to the caller to update the buffer's state.
The return type was updated to return a size_t to comply with the count
argument.
Willy Tarreau [Thu, 14 Jun 2018 16:31:46 +0000 (18:31 +0200)]
MEDIUM: connection: make xprt->snd_buf() take the byte count in argument
This way the senders don't need to modify the buffer's metadata anymore
nor to know about the output's split point. This way the functions can
take a const buffer and it's clearer who's in charge of updating the
buffer after a send. That's why the buffer realignment is now performed
by the caller of the transport's snd_buf() functions.
The return type was updated to return a size_t to comply with the count
argument.
Willy Tarreau [Thu, 14 Jun 2018 13:27:31 +0000 (15:27 +0200)]
MINOR: buffer: make b_getblk_nc() take const pointers
Now that there are no more users requiring to modify the buffer anymore,
switch these ones to const char and const buffer. This will make it more
obvious next time send functions are tempted to modify the buffer's output
count. Minor adaptations were necessary at a few call places which were
using char due to the function's previous prototype.
Willy Tarreau [Fri, 15 Jun 2018 09:51:32 +0000 (11:51 +0200)]
MEDIUM: h2: don't use b_ptr() nor b_end() anymore
The few places where they were still used were replaced with b_peek() and
b_wrap() respectively. The parts making use of ->i and ->o should now be
convertible to the new API.
Willy Tarreau [Thu, 14 Jun 2018 11:33:30 +0000 (13:33 +0200)]
MEDIUM: h2: prevent the various mux encoders from modifying the buffer
Functions h2s_frt_make_resp_headers() and h2s_frt_make_resp_data() used
to modify the buffer's output data count. This is problematic for the
buffer's rework as we don't want to rely on this anymore. This commit
modifies these functions to take an offset (relative to the buffer's
head) and a maximum byte count. Thus h2_snd_buf() now calls them with
buf->o and takes care of removing deleted data itself. The send functions
now almost support being passed const buffers (except for the data part
which is still embedded).
Willy Tarreau [Thu, 14 Jun 2018 11:21:28 +0000 (13:21 +0200)]
MINOR: h2: clarify the fact that the send functions are unsigned
There's no more error return combined with the send output, though
the comments were misleading. Let's fix this as well as the functions'
prototypes. h2_snd_buf()'s return value wasn't changed yet since it
has to match the ->snd_buf prototype.
Willy Tarreau [Fri, 15 Jun 2018 08:28:05 +0000 (10:28 +0200)]
MINOR: buffer: replace bi_del() and bo_del() with b_del()
Till now the callers had to know which one to call for specific use cases.
Let's fuse them now since a single one will remain after the API migration.
Given that bi_del() may only be used where o==0, just combine the two tests
by first removing output data then only input.
Willy Tarreau [Thu, 14 Jun 2018 12:38:11 +0000 (14:38 +0200)]
MINOR: buffer: replace bo_getblk_nc() with b_getblk_nc() which takes an offset
This will be important so that we can parse a buffer without touching it.
Now we indicate where from the buffer's head we plan to start to copy, and
for how many bytes. This will be used by send functions to loop at the end
of the buffer without having to update the buffer's output byte count.
Willy Tarreau [Fri, 15 Jun 2018 12:20:26 +0000 (14:20 +0200)]
MINOR: buffer: replace bo_getblk() with direction agnostic b_getblk()
This new functoin limits itself to the amount of data available in the
buffer and doesn't care about the direction anymore. It's only called
from co_getblk() which already checks that no more than the available
output bytes is requested.
Willy Tarreau [Thu, 7 Jun 2018 16:58:07 +0000 (18:58 +0200)]
MINOR: buffer: merge b{i,o}_contig_space()
These ones were merged into a single b_contig_space() that covers both
(the bo_ case was a simplified version of the other one). The function
doesn't use ->i nor ->o anymore.
Willy Tarreau [Wed, 6 Jun 2018 14:55:45 +0000 (16:55 +0200)]
MINOR: buffer: split bi_contig_data() into ci_contig_data and b_config_data()
This function was sometimes used from a channel and sometimes from a buffer.
In both cases it requires knowledge of the size of the output data (to skip
them). Here the split ensures the channel can deal with this point, and that
other places not having output data can continue to work.
Willy Tarreau [Wed, 6 Jun 2018 05:13:22 +0000 (07:13 +0200)]
MINOR: channel/buffer: replace b_{adv,rew} with c_{adv,rew}
These ones manipulate the output data count which will be specific to
the channel soon, so prepare the call points to use the channel only.
The b_* functions are now unused and were removed.
MEDIUM: channel: make channel_slow_realign() take a swap buffer
The few call places where it's used can use the trash as a swap buffer,
which is made for this exact purpose. This way we can rely on the
generic b_slow_realign() call.
Willy Tarreau [Wed, 6 Jun 2018 04:53:15 +0000 (06:53 +0200)]
MINOR: channel/buffer: replace buffer_slow_realign() with channel_slow_realign() and b_slow_realign()
Where relevant, the channel version is used instead. The buffer version
was ported to be more generic and now takes a swap buffer and the output
byte count to know where to set the alignment point. The H2 mux still
uses buffer_slow_realign() with buf->o but it will change later.
Willy Tarreau [Wed, 6 Jun 2018 13:09:28 +0000 (15:09 +0200)]
MINOR: channel: add a few basic functions for the new buffer API
This adds :
- c_orig() : channel buffer's origin
- c_size() : channel buffer's size
- c_wrap() : channel buffer's wrapping location
- c_data() : channel buffer's total data count
- c_room() : room left in channel buffer's
- c_empty() : true if channel buffer is empty
- c_full() : true if channel buffer is full
- c_ptr() : pointer to an offset relative to input data in the buffer
- c_adv() : advances the channel's buffer (bytes become part of output)
- c_rew() : rewinds the channel's buffer (output bytes not output anymore)
- c_realign_if_empty() : realigns the buffer if it's empty
- co_data() : # of output data
- co_head() : beginning of output data
- co_tail() : end of output data
- ci_data() : # of input data
- ci_head() : beginning of input data
- ci_tail() : end of input data
- ci_stop() : location after ci_tail()
- ci_next() : pointer to next input byte
And for the ci_* / co_* functions above, the "__*" variants which disable
wrapping checks, and the "_ofs" variants which return an offset relative to
the buffer's origin instead.
Willy Tarreau [Fri, 15 Jun 2018 15:50:15 +0000 (17:50 +0200)]
MINOR: buffer: introduce b_realign_if_empty()
Many places deal with buffer realignment after data removal. The method
is always the same : if the buffer is empty, set its pointer to the origin.
Let's have a function for this so that we have less code to change with the
new API.
Olivier Houchard [Thu, 28 Jun 2018 17:10:25 +0000 (19:10 +0200)]
MINOR: buffer: Add b_set_data().
Add a new function that lets you set the amount of input in a buffer.
For now it extends/truncates b->i except if the total length is
below b->o in which case it clears i and adjusts o.
Willy Tarreau [Wed, 6 Jun 2018 12:30:50 +0000 (14:30 +0200)]
MINOR: buffer: add a few basic functions for the new API
Here's the list of newly introduced functions :
- b_data(), returning the total amount of data in the buffer (currently i+o)
- b_orig(), returning the origin of the storage area, that is, the place of
position 0.
- b_wrap(), pointer to wrapping point (currently data+size)
- b_size(), returning the size of the buffer
- b_room(), returning the amount of bytes left available
- b_full(), returning true if the buffer is full, otherwise false
- b_stop(), pointer to end of data mark (currently p+i), used to compute
distances or a stop pointer for a loop.
- b_peek(), this one will help make the transition to the new buffer model.
It returns a pointer to a position in the buffer known from an offest
relative to the beginning of the data in the buffer. Thus, we can replace
the following occurrences :
- b_head(), pointer to the beginning of data (currently bo_ptr())
- b_tail(), pointer to first free place (currently bi_ptr())
- b_next() / b_next_ofs(), pointer to the next byte, taking wrapping
into account.
- b_dist(), returning the distance between two pointers belonging to a buffer
- b_reset(), which resets the buffer
- b_space_wraps(), indicating if the free space wraps around the buffer
- b_almost_full(), indicating if 3/4 or more of the buffer are used
Some of these are provided with the unchecked variants using the "__"
prefix, or with the "_ofs" suffix indicating they return a relative
position to the buffer's origin instead of a pointer.
MINOR: buffer: switch buffer sizes and offsets to size_t
Passing unsigned ints everywhere is painful, and will cause some headache
later when we'll want to integrate better with struct ist which already
uses size_t. Let's switch buffers to use size_t instead.
MINOR: buffer: implement a new file for low-level buffer manipulation functions
The buffer code currently depends on pools and other stuff and is not
really autonomous anymore. The rewrite of the new API is an opportunity
to clean this up. This patch creates a new file (buf.h) which does not
depend on other elements and which will only contain what is needed to
perform the most basic buffer operations. The new API will be introduced
in this file and the conversion will be finished once buffer.h is empty.
The definition of struct buffer was moved to this new file, using more
explicity stdint types for the sizes and offsets.
Most new functions will be implemented in two variants :
__b_something() : unchecked variant, no wrapping is expected
b_something() : wrapping-checked variant
This way callers will be able to select which one to use depending on
the use cases.
Willy Tarreau [Wed, 13 Jun 2018 12:24:56 +0000 (14:24 +0200)]
BUG/MEDIUM: h2: make sure the last stream closes the connection after a timeout
If a timeout strikes on the connection side with some active streams,
there is a corner case which can sometimes cause the following sequence
to happen :
- There are active streams but there are data in the mux buffer
(eg: a client suddenly disconnected during a download with pending
requests). The timeout is active.
- The timeout strikes, h2_timeout_task() is called, kills the task and
doesn't close the connection since there are streams left ; The
connection is marked in H2_CS_ERROR ;
- the streams are woken up and closed ;
- when the last stream closes, calling h2_detach(), it sees the
tree list is empty, but there is no condition allowing the
connection to be closed (mbuf->o > 0), thus it does nothing ;
- since the task is dead, there's no more hope to clear this
situation later
For now we can take care of this by adding a test for the presence of
H2_CS_ERROR and !task, implying the timeout task triggered already
and will not be able to handle this again.
Over the long term it seems like a more reliable test on should be
made, so that it is possible to know whether or not someone is still
able to close this connection.
A big thanks to Janusz Dziemidowicz and Milan Petruzelka for providing
many details helping in figuring this bug.
BUG/MEDIUM: h2: never leave pending data in the output buffer on close
We currently don't process trailers on H2, but this has an impact : on
chunked HTTP/1 responses, we decide to emit the ES bit once we see the
0CRLF. From this point the stream switches to the CLOSED state, which
aborts processing of the remaining bytes. Thus the extra CRLF which ends
trailers is not processed and remains in the buffer. This prevents the
stream from being notified about end of transmission, which in turn keeps
the mux busy and prevents the connection from quitting.
The case of the trailers is not the root cause of this issue, though it
is what triggers it. The root cause is that upon error and/or close, once
we know we're not going to process any more data, we must absolutely flush
any remaining bytes from the output buffer, otherwise there is no way the
stream can quit. This is what this patch does.
It looks very likely related to the issues reported and debugged by
Janusz Dziemidowicz and Milan Petruzelka.
One way to reproduce it is to chain two proxies with the last one emitting
chunked data (typically using the stats page) :
When curl is sent to the first one, "show sess" issued to the CLI will
show a remaining session during the client timeout. When curl is aimed at
port 4444 (px2), there is no such remaining session.
BUG/MEDIUM: h2: don't accept new streams if conn_streams are still in excess
The streams bookkeeping made in H2 is used for protocol compliance only
but it doesn't consider the number of conn_streams still attached to the
mux. It causes an issue when http-request set-nice rules are applied on
H2 requests processed on a saturated machine. Indeed, in this case, the
requests are accepted and assigned a default nice value of zero. When
they are processed, their nice value changes to a higher one (say 1024).
The response is sent through the H2 mux, which detects the end of stream
and decrements the protocol-level stream count (h2c->nb_streams). The
client may then send a new request. But the conn_stream is still attached
and will require a new call to process_stream() to finish, which is made
through the scheduler. Given that the machine is saturated, it is assumed
that many tasks are present in the scheduler. Thus the closing tasks holding
a higher nice value will pass after the new stream creations. If the client
is fast enough with a low latency link, it may add a lot of new stream
creations before the stream terminations have a chance to disappear due
to their high nice value, resulting in a huge amount of memory being used.
The solution consists in letting a mux always monitor its conn_streams and
refrain from creating new ones when it is full. Here the H2 mux checks the
nb_cs counter and sets a new blocked flag (H2_CF_DEM_TOOMANY) if the limit
was reached, so that the frame parser requests a pause in the new stream
creation, leaving some time for the pending conn_streams to vanish.
Several experiments were made using varying thresholds to see if
overbooking would provide any benefit here but it turned out not to be
the case, so the conn_stream limit remains set to the exact streams
limit. Interestingly various performance measurements showed that the
code tends to be slightly faster now than without the limit, probably
due to the smoother memory usage.
This commit requires previous patch ("MINOR: h2: keep a count of the number
of conn_streams attached to the mux"). It needs to be backported to 1.8.
MINOR: h2: keep a count of the number of conn_streams attached to the mux
The h2 mux only knows about the number of H2 streams which are not in a
CLOSED state. This is used for protocol compliance. But it doesn't hold
the number of really attached streams. It is a problem because depending
on scheduling, it is possible that more streams are attached to the mux
than the ones seen at the protocol level, due to some streams taking some
time to be detached. Let's add this count based on the conn_streams.
Note: this patch is part of a series of fixes which will have to be
backported to 1.8.
BUG/MINOR: ssl: properly ref-count the tls_keys entries
Commit 200b0fa ("MEDIUM: Add support for updating TLS ticket keys via
socket") introduced support for updating TLS ticket keys from the CLI,
but missed a small corner case : if multiple bind lines reference the
same tls_keys file, the same reference is used (as expected), but during
the clean shutdown, it will lead to a double free when destroying the
bind_conf contexts since none of the lines knows if others still use
it. The impact is very low however, mostly a core and/or a message in
the system's log upon old process termination.
Let's introduce some basic refcounting to prevent this from happening,
so that only the last bind_conf frees it.
Thanks to Janusz Dziemidowicz and Thierry Fournier for both reporting
the same issue with an easy reproducer.
With certain curl versions URLs which contain brackets may be interpreted
by the "URL globbing parser". This patch ensures that such brackets
are escaped.
Thank you to Ilya Shipitsin for having reported this issue.
Baptiste Assmann [Fri, 22 Jun 2018 13:04:43 +0000 (15:04 +0200)]
MINOR: dns: new DNS options to allow/prevent IP address duplication
By default, HAProxy's DNS resolution at runtime ensure that there is no
IP address duplication in a backend (for servers being resolved by the
same hostname).
There are a few cases where people want, on purpose, to disable this
feature.
This patch introduces a couple of new server side options for this purpose:
"resolve-opts allow-dup-ip" or "resolve-opts prevent-dup-ip".
Baptiste Assmann [Fri, 22 Jun 2018 11:03:50 +0000 (13:03 +0200)]
MINOR: dns: fix wrong score computation in dns_get_ip_from_response
dns_get_ip_from_response() is used to compare the caller current IP to
the IP available in the records returned by the DNS server.
A scoring system is in place to get the best IP address available.
That said, in the current implementation, there are a couple of issues:
1. a comment does not match what the code does
2. the code does not match what the commet says (score value is not
incremented with '2')
Ilya Shipitsin reported that with some curl versions this reg test
may fail due to a wrong URI syntax with ::1 ipv6 local address in
this varnishtest script. This patch fixes this syntax issue and
replaces the iteration of "procees" commands by a "shell" command
to start curl processes (must be faster).
Thanks to Ilya Shipitsin for having reported this VTC file bug.
Vincent Bernat [Fri, 22 Jun 2018 18:57:03 +0000 (20:57 +0200)]
MINOR: systemd: consider exit status 143 as successful
The master process will exit with the status of the last worker. When
the worker is killed with SIGTERM, it is expected to get 143 as an
exit status. Therefore, we consider this exit status as normal from a
systemd point of view. If it happens when not stopping, the systemd
unit is configured to always restart, so it has no adverse effect.
This has mostly a cosmetic effect. Without the patch, stopping HAProxy
leads to the following status:
MINOR: startup: change session/process group settings
Change the way the process groups are set. Indeed setsid() was called
for every processes which caused the worker to have a different process
group than the master.
This patch behave in a better way:
- In daemon mode only, each child do a setsid()
- In master worker + daemon mode, the setsid() is done in the master before
forking the children
- In any foreground mode, we don't do a setsid()
Could be backported in 1.8 but the master-worker mode is mostly used
with systemd which rely on cgroups so that won't affect much people.