Willy Tarreau [Mon, 12 Nov 2018 15:11:08 +0000 (16:11 +0100)]
BUG/MEDIUM: stream-int: change the way buffer room is requested by a stream-int
Subsequent to the recent stream-int updates, we started to consider that
SI_FL_WANT_PUT needs to be set when receipt is enabled, but this is wrong
and results in 100% CPU when an HTTP client stays idle after a keep-alive
request because the stream-int has nothing to provide and nothing to send.
In fact just like for applets this flag should reflect the continuation
of an attempt. So it's si_cs_recv() which should set the flag, and clear
it if it has nothing more to provide. This function is called the first
time in process_stream()), and called again during transfers, so it will
always be up to date during stream_int_update() and stream_int_notify().
As a special case, it should also be set when a connection switches to
the established state. And we should absolutely refrain from calling
si_cs_recv() to re-enable reading, normally just setting this flag
(from within the stream-int's handler or prior to calling si_chk_rcv())
is expected to be OK.
A corner case remains where it was observed that in stream_int_notify() we
can sometimes be called with an empty output channel with SI_FL_WAIT_ROOM
and no CF_WRITE_PARTIAL, so there's no way to detect that we should
re-enable receiving. It's easy to also take care of this condition
there for the time it takes to figure if this situation is expected
or not.
Now it becomes more obvious that relying on a single flag to request
room (or on two flags to arbiter activity) is not workable given the
autonomy of both sides. The mux_h2 has taught us that blocking flags
are much more reliable, require much less condition and are much easier
to deal with. That's probably something to consider quickly in this
area.
Willy Tarreau [Mon, 12 Nov 2018 10:57:56 +0000 (11:57 +0100)]
MEDIUM: log: add a new "raw" format
This format is pretty similar to the previous "short" format except
that it also removes the severity level. Thus only the raw message is
sent. This is suitable for use in containers, where only the raw
information is expected and where the severity is supposed to come
from the file descriptor used.
Willy Tarreau [Mon, 12 Nov 2018 07:45:00 +0000 (08:45 +0100)]
MEDIUM: log: support a new "short" format
This format is meant to be used with local file descriptors. It emits
messages only prefixed with a level, removing all the process name,
system name, date and so on. It is similar to the printk() format used
on Linux. It's suitable to be sent to a local logger compatible with
systemd's output format.
Note that the facility is still required but not used, hence it is
suggested to use "daemon" to remind that it's a local logger.
Example :
log stdout format short daemon # send everything to stdout
log stderr format short daemon notice # send important events to stderr
Willy Tarreau [Mon, 12 Nov 2018 06:34:59 +0000 (07:34 +0100)]
MEDIUM: log: add support for logging to existing file descriptors
In certain situations it would be desirable to log to an existing file
descriptor, the most common case being a pipe between containers or
processes. The main issue with pipes is that using write() on them will
randomly truncate messages. But there is a trick. By using writev(), we
can atomically deliver or drop a message, which perfectly fits the
purpose. The only caveat is that large messages (4096 bytes on modern
operating systems) may be interleaved with messages from other processes
if using nbproc for example. In practice such messages are rare and most
of the time when users need such type of logging, the load is low enough
for a single process to be running so this is not really a problem.
This logging method thus uses unbuffered writev() calls and is uses more
CPU than if it used its own buffer with large writes at once, though this
is not a problem for moderate loads.
Logging to a file descriptor attached to a file also works with the side
effect that the process is significantly slowed down during disk accesses
and that it's not possible to rotate the file without restarting the
process. For this reason this option is not offered as a configuration
option, since it would confuse most users, but one could decide to
redirect haproxy's output to a file during debugging sessions. Two aliases
"stdout" and "stderr" are provided, but keep in mind that these are closed
by default in daemon mode.
When logging to a pipe or socket at a high enough rate, some logs will be
dropped and the number of dropped messages is reported in "show info".
Willy Tarreau [Mon, 12 Nov 2018 06:25:28 +0000 (07:25 +0100)]
MINOR: log: report the number of dropped logs in the stats
It's easy to detect when logs on some paths are lost as sendmsg() will
return EAGAIN. This is particularly true when sending to /dev/log, which
often doesn't support a big logging capacity. Let's keep track of these
and report the total number of dropped messages in "show info".
Willy Tarreau [Mon, 12 Nov 2018 06:56:13 +0000 (07:56 +0100)]
DOC: logs: the format directive was missing from the second log part
The "log" statement appears both in the global section and in listeners.
The "format" directive was only documented in the first one. Maybe we
should think about moving this definition to the log section by now.
Willy Tarreau [Mon, 12 Nov 2018 06:00:11 +0000 (07:00 +0100)]
MINOR: log: slightly improve error message syntax on log failure
The error messages used to say something along "socket logger 2 failed"
or "sendmsg logger 2 failed" which are confusing. Let's rephrase this
"sendmsg() failed for logger 2".
Joseph Herlant [Sat, 10 Nov 2018 01:44:10 +0000 (17:44 -0800)]
DOC: Fix typos in README and CONTRIBUTING
Few typos detected by misspell in the README and CONTRIBUTING.
Even if one of them is on a listing of commits. I'm assuming that
if we want to enforce less typos in the commits, having one in the
contributing guide is not the best example.
Joseph Herlant [Sat, 10 Nov 2018 03:00:24 +0000 (19:00 -0800)]
CLEANUP: fix typos in comments for contrib/wireshark-dissectors
This fixes a typo in the README of the peers section of this subsystem
and 2 typos in code comments. Groupped together as cleanup to avoid too
many 1 char patches.
Willy Tarreau [Sun, 11 Nov 2018 13:00:28 +0000 (14:00 +0100)]
BUILD: cache: fix a build warning regarding too large an integer for the age
Building on 32 bit gives this :
src/cache.c: In function 'http_action_store_cache':
src/cache.c:466:4: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:467:5: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c: In function 'cache_channel_append_age_header':
src/cache.c:578:2: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
src/cache.c:579:3: warning: this decimal constant is unsigned only in ISO C90 [enabled by default]
It's because of the definition below added in commit e7a770c ("MINOR:
cache: Add "Age" header.") :
Willy Tarreau [Sun, 11 Nov 2018 09:43:39 +0000 (10:43 +0100)]
[RELEASE] Released version 1.9-dev6
Released version 1.9-dev6 with the following main changes :
- BUG/MEDIUM: tools: fix direction of my_ffsl()
- BUG/MINOR: cli: forward the whole command on master CLI
- BUG/MEDIUM: auth/threads: use of crypt() is not thread-safe
- MINOR: compat: automatically detect support for crypt_r()
- MEDIUM: auth/threads: make use of crypt_r() on systems supporting it
- DOC: split the http-request actions in their own section
- DOC: split the http-response actions in their own section
- BUG/MAJOR: stream-int: don't call si_cs_recv() in stream_int_chk_rcv_conn()
- BUG/MINOR: tasks: make sure wakeup events are properly reported to subscribers
- MINOR: stats: report the number of active jobs and listeners in "show info"
- MINOR: stats: report the number of active peers in "show info"
- MINOR: stats: report the number of currently connected peers
- MINOR: cli: show the number of reload in 'show proc'
- MINOR: cli: can't connect to the target CLI
- MEDIUM: mworker: does not create the CLI proxy when no listener
- MINOR: mworker: displays more information when leaving
- MEDIUM: mworker: exit with the incriminated exit code
- MINOR: mworker: displays a message when a worker is forked
- MEDIUM: mworker: leave when the master die
- CLEANUP: stream-int: retro-document si_cs_io_cb()
- BUG/MEDIUM: mworker: does not abort() in mworker_pipe_register()
- BUG/MEDIUM: stream-int: don't wake up for nothing during SI_ST_CON
- BUG/MEDIUM: cli: crash when trying to access a worker
- DOC: restore note about "independant" typo
- MEDIUM: stream: implement stream_buf_available()
- MEDIUM: appctx: check for allocation attempts in buffer allocation callbacks
- MINOR: stream-int: rename si_applet_{want|stop|cant}_{get|put}
- MINOR: stream-int: add si_done_{get,put} to indicate that we won't do it anymore
- MINOR: stream-int: use si_cant_put() instead of setting SI_FL_WAIT_ROOM
- MINOR: stream-int: make use of si_done_{get,put}() in shut{w,r}
- MINOR: stream-int: make it clear that si_ops cannot be null
- MEDIUM: stream-int: temporarily make si_chk_rcv() take care of SI_FL_WAIT_ROOM
- MINOR: stream-int: factor the SI_ST_EST state test into si_chk_rcv()
- MEDIUM: stream-int: make SI_FL_WANT_PUT reflect CF_DONT_READ
- MEDIUM: stream-int: always call si_chk_rcv() when we make room in the buffer
- MEDIUM: stream-int: make si_chk_rcv() check that SI_FL_WAIT_ROOM is cleared
- MINOR: stream-int: replace si_update() with si_update_both()
- MEDIUM: stream-int: make stream_int_update() aware of the lower layers
- CLEANUP: stream-int: remove the now unused si->update() function
- MEDIUM: stream-int: Rely only on SI_FL_WAIT_ROOM to stop data receipt
- MEDIUM: stream-int: Try to read data even if channel's buffer seems to be full
- BUG/MINOR: config: better detect the presence of the h2 pattern in npn/alpn
Willy Tarreau [Sun, 11 Nov 2018 09:36:25 +0000 (10:36 +0100)]
BUG/MINOR: config: better detect the presence of the h2 pattern in npn/alpn
In 1.8, commit 45a66cc ("MEDIUM: config: ensure that tune.bufsize is at
least 16384 when using HTTP/2") tried to avoid an annoying issue making
H2 fail when haproxy is built with default buffer sizes smaller than 16kB,
which used to be the case for a very long time. Sadly, the test only sees
when NPN/ALPN exactly match "h2" and not when it's combined like
"h2,http/1.1" nor "http/1.1,h2". We can safely use strstr() there because
the string is prefixed by the token's length (0x02) which is unambiguous
as it cannot be part of any other token.
This fix should be backported to 1.8 as a safety guard against bad
configurations.
MEDIUM: stream-int: Try to read data even if channel's buffer seems to be full
Before calling the mux to get incoming data, we get the amount of space
available at the input of the buffer. If there is no space, we don't try to read
more data. This is good enough when raw data are stored in the buffer. But this
info has no meaning when structured data are stored. Because with the HTTP
refactoring, such kind of data will be stored in buffers, it is a bit annoying.
So, to avoid any problems, we always call the mux. It is the mux's responsiblity
to notify the stream interface it needs more space to store more data. This must
be done by setting the flag CS_FL_RCV_MORE on the conn_stream.
This is exactly what we do in the pass-through mux when <count> is null.
MEDIUM: stream-int: Rely only on SI_FL_WAIT_ROOM to stop data receipt
This flag is set on the stream interface when we should wait for more space in
the channel's buffer to store more incoming data. This means we should wait some
outgoing data are sent before retrying to receive more data.
But in stream interface functions, at many places, instead of checking this
flag, we use the function channel_may_recv to know if we can (re)start
reading. This currently works but it is not really consistent. And, it works
because only raw data are stored in buffers. But it will be a problem when we
start to store structured data in buffers.
So to avoid any problems with futur implementations, we now rely only on
SI_FL_WAIT_ROOM. The function channel_may_recv can still be called, but only
when we are sure to handle raw data (for instance in functions ci_put*). To do
so, among other things, we must be sure to unset SI_FL_WAIT_ROOM and offer an
opportunity to call chk_rcv() on a stream interface when some data are sent
on the other end, which is now granted by the previous patch series.
Willy Tarreau [Fri, 9 Nov 2018 13:56:01 +0000 (14:56 +0100)]
CLEANUP: stream-int: remove the now unused si->update() function
We exclusively use stream_int_update() now, the lower layers are not
called anymore so let's remove them, as well as si_update() which used
to be their wrapper.
Willy Tarreau [Fri, 9 Nov 2018 13:59:25 +0000 (14:59 +0100)]
MEDIUM: stream-int: make stream_int_update() aware of the lower layers
It's far from being clean, but at least it allows to resync both CS and
applets from the same place, taking into account the fact that CS are
processed synchronously for the send side while appletx are processed
outside of the process_stream() loop. The arrangement is optimised to
minimize the amount of iteration by handling send first, then updating
the SI_FL_WAIT_ROOM flags and only then dealing with si_chk_rcv() on
both sides. The SI_FL_WANT_PUT flag is set if needed before calling
si_chk_rcv() since this is done prior to calling stream_int_update().
Now there's no risk that stream_int_notify() is called anymore during
such operations, thus we cannot have any spurious wake-up anymore. The
case where a successful send() could complete a pending connect() is
handled by taking any stream-int state changes into account at the
call place, which is normal since process_stream() is designed to
iterate till stabilisation.
Doing this solves most of the remaining inconsistencies between CS and
applets.
Willy Tarreau [Thu, 8 Nov 2018 17:15:29 +0000 (18:15 +0100)]
MINOR: stream-int: replace si_update() with si_update_both()
The function used to be called in turn for each side of the stream, but
since it's called exclusively from process_stream(), it prevents us from
making use of the knowledge we have of the operations in progress for
each side, resulting in having to go all the way through functions like
stream_int_notify() which are not appropriate there.
That patch creates a new function, si_update_both() which takes two
stream interfaces expected to belong to the same stream, and processes
their flags in a more suitable order, but for now doesn't change the
logic at all.
The next step will consist in trying to reinsert the rest of the socket
layer-specific update code to ultimately update the flags correctly at
the end of the operation.
Willy Tarreau [Fri, 9 Nov 2018 15:21:43 +0000 (16:21 +0100)]
MEDIUM: stream-int: make si_chk_rcv() check that SI_FL_WAIT_ROOM is cleared
After careful inspection, it now seems OK to call si_chk_rcv() only when
SI_FL_WAIT_ROOM is cleared and SI_FL_WANT_PUT is set, since all identified
call places have already taken care of this.
Willy Tarreau [Wed, 7 Nov 2018 17:53:29 +0000 (18:53 +0100)]
MEDIUM: stream-int: always call si_chk_rcv() when we make room in the buffer
Instead of clearing the SI_FL_WAIT_ROOM flag and losing the information
about the need from the producer to be woken up, we now call si_chk_rcv()
immediately. This is cheap to do and it could possibly be further improved
by only doing it when SI_FL_WAIT_ROOM was still set, though this will
require some extra auditing of the code paths.
The only remaining place where the flag was cleared without a call to
si_chk_rcv() is si_alloc_ibuf(), but since this one is called from a
receive path woken up from si_chk_rcv() or not having failed, the
clearing was not necessary anymore either.
And there was one place in stream_int_notify() where si_chk_rcv() was
called with SI_FL_WAIT_ROOM still explicitly set so this place was
adjusted in order to clear the flag prior to calling si_chk_rcv().
Now we don't have any situation where we randomly clear SI_FL_WAIT_ROOM
without trying to wake the other side up, nor where we call si_chk_rcv()
with the flag set, so this flag should accurately represent a failed
attempt at putting data into the buffer.
Willy Tarreau [Wed, 7 Nov 2018 14:07:35 +0000 (15:07 +0100)]
MEDIUM: stream-int: make SI_FL_WANT_PUT reflect CF_DONT_READ
When CF_DONT_READ is set, till now we used to set SI_FL_WAIT_ROOM, which
is not appropriate since it would lose the subscribe status. Instead let's
clear SI_FL_WANT_PUT (just like applets do), and set the flag only when
CF_DONT_READ is cleared.
We have to do this in stream_int_update(), and in si_cs_io_cb() after
returning from si_cs_recv() since it would be a bit invasive to hack
this one for now. It must not be done in stream_int_notify() otherwise
it would re-enable blocked applets.
Last, when si_chk_rcv() is called, it immediately clears the flag before
calling ->chk_rcv() so that we are not tempted to uselessly loop on the
same call until the receive function is called. This is the same principle
as what is done with the applet scheduler.
Willy Tarreau [Wed, 7 Nov 2018 10:55:54 +0000 (11:55 +0100)]
MEDIUM: stream-int: temporarily make si_chk_rcv() take care of SI_FL_WAIT_ROOM
This flag should already be cleared before calling the *chk_rcv() functions.
Before adapting all call places, let's first make sure si_chk_rcv() clears
it before calling them so that these functions do not have to check it again
and so that they do not adjust it. This function will only call the lower
layers if the SI_FL_WANT_PUT flag is present so that the endpoint can decide
not to be called (as done with applets).
Willy Tarreau [Wed, 7 Nov 2018 10:28:12 +0000 (11:28 +0100)]
MINOR: stream-int: make it clear that si_ops cannot be null
There was an ambiguity in which functions of the si_ops struct could be
null or not. only ->update doesn't exist in one of the si_ops (the
embedded one), all others are always defined. ->shutr and ->shutw were
never tested. However ->chk_rcv() and ->chk_snd() were tested, causing
confusion about the proper way to wake the other side up if undefined
(which never happens).
Let's update the comments to state these functions are mandatory and
remove the offending checks.
Willy Tarreau [Tue, 6 Nov 2018 18:10:53 +0000 (19:10 +0100)]
MINOR: stream-int: use si_cant_put() instead of setting SI_FL_WAIT_ROOM
We now do this on the si_cs_recv() path so that we always have
SI_FL_WANT_PUT properly set when there's a need to receive and
SI_FL_WAIT_ROOM upon failure.
It doesn't make sense to limit this code to applets, as any stream
interface can use it. Let's rename it by simply dropping the "applet_"
part of the name. No other change was made except updating the comments.
Willy Tarreau [Tue, 6 Nov 2018 16:32:37 +0000 (17:32 +0100)]
MEDIUM: appctx: check for allocation attempts in buffer allocation callbacks
The buffer allocation callback appctx_res_wakeup() used to rely on old
tricks to detect if a buffer was already granted to an appctx, namely
by checking the task's state. Not only this test is not valid anymore,
but it's inaccurate.
Let's solely on SI_FL_WAIT_ROOM that is now set on allocation failure by
the functions trying to allocate a buffer. The buffer is now allocated on
the fly and the flag removed so that the consistency between the two
remains granted. The patch also fixes minor issues such as the function
being improperly declared inline(!) and the fact that using appctx_wakeup()
sets the wakeup reason to TASK_WOKEN_OTHER while we try to use TASK_WOKEN_RES
when waking up consecutive to a ressource allocation such as a buffer.
Willy Tarreau [Tue, 6 Nov 2018 14:50:21 +0000 (15:50 +0100)]
MEDIUM: stream: implement stream_buf_available()
This function replaces stream_res_available(), which is used as a callback
for the buffer allocator. It now carefully checks which stream interface
was blocked on a buffer allocation, tries to allocate the input buffer to
this stream interface, and wakes the task up once such a buffer was found.
It will automatically remove the SI_FL_WAIT_ROOM flag upon success since
the info this flag indicates becomes wrong as soon as the buffer is
allocated.
The code is still far from being perfect because if a call to si_cs_recv()
fails to allocate a buffer, we'll still end up passing via process_stream()
again, but this could be improved in the future by using finer-grained
wake-up notifications.
Lukas Tribus [Thu, 8 Nov 2018 11:41:42 +0000 (12:41 +0100)]
DOC: restore note about "independant" typo
The independant -> independent error was fixed in 801a0a35 ("DOC: fix
name for "option independant-streams"), but the note about the wrong
name was erroneously fixed in 0e82b92a ("DOC: fix a few config typos").
Restore the "wrong" name so that when reasearching this option people
can actually find it.
BUG/MEDIUM: cli: crash when trying to access a worker
When using the CLI proxy of the master and trying to access a worker
with the @ prefix, the worker just crash.
The commit 7216032 ("MEDIUM: mworker: leave when the master die")
reintroduced the old code of the pipe, which was not trying to access
the pointers before. The owner of the FD was modified to a different
value, this is a problem since we call listener_accept() in most cases
now from the mworker_accept_wrapper() and it casts the owner variable to
get the listener.
This patch fix the issue by setting back the previous owner of the FD.
Willy Tarreau [Thu, 8 Nov 2018 13:32:16 +0000 (14:32 +0100)]
BUG/MEDIUM: stream-int: don't wake up for nothing during SI_ST_CON
Commit eafd8ebcf ("MEDIUM: stream-int: call si_cs_process() in
stream_int_update_conn") uncovered a sleeping bug. By calling
si_cs_process() within si_update(), we end up calling stream_int_notify().
We rely on it to update the stream-int before quitting as a hack, but
it happens to immediately wake the task up while the stream int's
state is still SI_ST_CON (during the connection establishment). The
observable effect is that an unreachable server causes haproxy to
use 100% CPU until the connection timeout strikes.
This patch fixes this by not causing the wake up for the SI_ST_CON
state. It would equally be possible to check for states higher than
SI_ST_EST as is done in other places, but for now better stay on the
safe side by covering the only issue that can be triggered. It's
suspected that this issue slightly affects older versions by causing
one extra call to process_stream() during the connection setup for
each activity change on the other side, but this should not have
any observable effect.
Willy Tarreau [Wed, 7 Nov 2018 06:47:52 +0000 (07:47 +0100)]
CLEANUP: stream-int: retro-document si_cs_io_cb()
It took me 17 minutes this morning to figure where si->wait_event was
set (it's in si_reset() which should now probably be renamed since it
doesn't just perform a reset anymore but also an allocation) and what
its task was assigned to (si_cs_io_cb() even for applets and empty SI).
This is too confusing and not intuitive enough, let's at least add a
few comments for now to help figure how this stuff works next time.
When the master die, the worker should exit too, this is achieved by
checking if the FD of the socketpair/pipe was closed between the master
and the worker.
In the former architecture of the master-worker, there was only a pipe
between the master and the workers, and it was easy to check an EOF on
the pipe FD to exit() the worker.
With the new architecture, we use a socketpair by process, and this
socketpair is also used to accept new connections with the
listener_accept() callback.
This accept callback can't handle the EOF and the exit of the process,
because it's very specific to the master worker. This is why we
transformed the mworker_pipe_handler() function in a wrapper which check
if there is an EOF and exit the process, and if not call
listener_accept() to achieve the accept.
Willy Tarreau [Mon, 5 Nov 2018 16:12:27 +0000 (17:12 +0100)]
MINOR: stats: report the number of currently connected peers
The active peers output indicates both the number of established peers
connections and the number of peers connection attempts. The new counter
"ConnectedPeers" also indicates the number of currently connected peers.
This helps detect that some peers cannot be reached for example. It's
worth mentioning that this value changes over time because unused peers
are often disconnected and reconnected. Most of the time it should be
equal to ActivePeers.
Willy Tarreau [Mon, 5 Nov 2018 15:31:22 +0000 (16:31 +0100)]
MINOR: stats: report the number of active peers in "show info"
Peers are the last type of activity which can maintain a job present, so
it's important to report that such an entity is still active to explain
why the job count may be higher than zero. Here by "ActivePeers" we report
peers sessions, which include both established connections and outgoing
connection attempts.
Willy Tarreau [Mon, 5 Nov 2018 13:38:13 +0000 (14:38 +0100)]
MINOR: stats: report the number of active jobs and listeners in "show info"
When an haproxy process doesn't stop after a reload, it's because it
still has some active "jobs", which mainly are active sessions, listeners,
peers or other specific activities. Sometimes it's difficult to troubleshoot
the cause of these issues (which generally are the result of a bug) only
because some indicators are missing.
This patch add the number of listeners, the number of jobs, and the stopping
status to the output of "show info". This way it becomes a bit easier to try
to narrow down the cause of such an issue should it happen. A typical use
case is to connect to the CLI before reloading, then issuing the "show info"
command to see what happens. In the normal situation, stopping should equal
1, jobs should equal 1 (meaning only the CLI is still active) and listeners
should equal zero.
The patch is so trivial that it could make sense to backport it to 1.8 in
order to help with troubleshooting.
Willy Tarreau [Mon, 5 Nov 2018 14:09:47 +0000 (15:09 +0100)]
BUG/MINOR: tasks: make sure wakeup events are properly reported to subscribers
The tasks API was changed in 1.9-dev1 with commit 9f6af3322 ("MINOR: tasks:
Change the task API so that the callback takes 3 arguments."), causing the
task's state not to be usable anymore and to have been replaced with an
explicit argument in the callee. The task's state doesn't contain any trace
of the wakeup cause anymore. But there were two places where the old task's
state remained in use :
- sessions, used to more accurately report timeouts in logs when seeing
TASK_WOKEN_TIMEOUT ;
- peers, used to finish resynchronization when seeing TASK_WOKEN_SIGNAL
This commit fixes both occurrences by making sure we don't access task->state
directly (should we rename it by the way ?).
Willy Tarreau [Tue, 30 Oct 2018 10:01:08 +0000 (11:01 +0100)]
BUG/MAJOR: stream-int: don't call si_cs_recv() in stream_int_chk_rcv_conn()
This one causes some events to be lost. It has already been tested in
an experimental branch but was not merged until being certain it was
needed. Fred figured that requesting /?k=1&s=447392 from httpterm through
haproxy-master was enough to stall the transfer.
No backport is needed, this only affects 1.9-dev5.
Cyril Bonté [Tue, 16 Oct 2018 22:14:50 +0000 (00:14 +0200)]
DOC: split the http-request actions in their own section
Since http-request was first introduced, more and more actions have been
added over time. This makes the "http-request" difficult to read and some
actions were forgotten in the list.
This is an attempt to make the documenation cleaner. In future steps, it
would be great to provide at least one example for each action.
While I couldn't reproduce the issue, it's obvious that there is a
problem with the use of the non-reentrant crypt() function there.
On Linux systems there's crypt_r() but not on the vast majority of
other ones. Thus a first approach consists in placing a lock around
this crypt() call. Another patch may relax it when crypt_r() is
available.
This fix must be backported to 1.8. Thanks to Ryan O'Hara for the
quick notification.
BUG/MINOR: cli: forward the whole command on master CLI
A bug occurs when the CLI proxy of the master received a command which
is prefixed by some spaces but without a routing prefix (@).
In this case the pcli_parse_request() was returning a wrong number of
data to forward.
The response analyzer was called twice and the prompt displayed twice.
Willy Tarreau [Mon, 29 Oct 2018 15:09:57 +0000 (16:09 +0100)]
BUG/MEDIUM: tools: fix direction of my_ffsl()
Commit 27346b01a ("OPTIM: tools: optimize my_ffsl() for x86_64") optimized
my_ffsl() for intensive use cases in the scheduler, but as half of the times
I got it wrong so it counted bits the reverse way. It doesn't matter for the
scheduler nor fd cache but it broke cpu-map with threads which heavily relies
on proper ordering.
We should probably consider dropping support for gcc < 3.4 and switching
to builtins for these ones, though often they are as ambiguous.
Willy Tarreau [Sun, 28 Oct 2018 19:39:31 +0000 (20:39 +0100)]
[RELEASE] Released version 1.9-dev5
Released version 1.9-dev5 with the following main changes :
- BUILD: Makefile: add the new ERR variable to force -Werror
- MINOR: freq_ctr: add swrate_add_scaled() to work with large samples
- MINOR: stream_interface: Avoid calling si_cs_send/recv if not needed.
- CLEANUP: http: Remove the unused function http_find_header
- MINOR: h1: Export some functions parsing the value of some HTTP headers
- BUG/MEDIUM: stream-int: don't set SI_FL_WAIT_ROOM on CF_READ_DONTWAIT
- MINOR: proxy: add a new option "http-use-htx"
- BUG/MEDIUM: pools: fix the minimum allocation size
- MINOR: shctx: Shared objects block by block allocation.
- MINOR: cache: Larger HTTP objects caching.
- MINOR: shctx: Add a maximum object size parameter.
- MINOR: cache: Add "max-object-size" option.
- DOC: Update about the cache support for big objects.
- BUG/MINOR: cache: Crashes with "total-max-size" > 2047(MB).
- BUG/MINOR: cache: Wrong usage of shctx_init().
- BUG/MINOR: ssl: Wrong usage of shctx_init().
- MINOR: cache: Avoid usage of atoi() when parsing "max-object-size".
- MINOR: shctx: Change max. object size type to unsigned int.
- DOC: cache: Missing information about "total-max-size" and "max-object-size"
- CLEANUP: tools: fix misleading comment above function LIM2A
- MEDIUM: channel: merge back flags CF_WRITE_PARTIAL and CF_WRITE_EVENT
- BUG/MINOR: only mark connections private if NTLM is detected
- BUG/MINOR: only auto-prefer last server if lb-alg is non-deterministic
- MINOR: stream: don't prune variables if the list is empty
- MINOR: stream-int: add si_alloc_ibuf() to ease input buffer allocation
- MEDIUM: stream-int: replace channel_alloc_buffer() with si_alloc_ibuf() everywhere
- MEDIUM: stream: always call si_cs_recv() after a failed buffer allocation
- MEDIUM: stream: don't try to send first in process_stream()
- MEDIUM: stream-int: make si_update() synchronize flag changes before the I/O
- MEDIUM: stream-int: call si_cs_process() in stream_int_update_conn
- MINOR: stream-int: don't needlessly call tasklet_wakeup() in stream_int_chk_snd_conn()
- MINOR: stream-int: make stream_int_notify() not wake the tasklet up
- MINOR: stream-int: don't needlessly call si_cs_send() in si_cs_process()
- MINOR: mworker: number of reload in the life of a worker
- MEDIUM: mworker: each worker socketpair is a CLI listener
- REORG: mworker: move struct mworker_proc to global.h
- MINOR: server: export new_server() function
- MEDIUM: mworker: move proc_list gen before proxies startup
- MEDIUM: mworker: add proc_list in global.h
- MEDIUM: mworker: proxy for the master CLI
- MEDIUM: mworker: create CLI listeners from argv[]
- MEDIUM: cli: disable some keywords in the master
- MEDIUM: mworker: find the server ptr using a CLI prefix
- MEDIUM: cli: 'show proc' displays processus
- MEDIUM: cli: implement 'mode cli' proxy analyzers
- MINOR: cli: displays sockpair@ in "show cli sockets"
- MEDIUM: cli: enable "show cli sockets" for the master
- MINOR: cli: put @master @<relative pid> @!<pid> in the help
- MEDIUM: listeners: set O_CLOEXEC on the accepted FDs
- MEDIUM: mworker: stop the master proxy in the workers
- MEDIUM: channel: reorder the channel analyzers for the cli
- MEDIUM: cli: write a prompt for the CLI proxy of the master
- MINOR: cli: helper to write an response message and close
- MINOR: cache: Add "Age" header.
- REGTEST: make the IP+port logging test more reliable
- BUG/MINOR: memory: make the thread-local cache allocator set the debugging link
- BUG/MAJOR: http: http_txn_get_path() may deference an inexisting buffer
- BUG/MINOR: backend: assign the wait list after the error check
Willy Tarreau [Sun, 28 Oct 2018 19:36:00 +0000 (20:36 +0100)]
BUG/MINOR: backend: assign the wait list after the error check
Commit 85b73e9 ("BUG/MEDIUM: stream: Make sure polling is right on retry.")
introduced a possible null dereference on the error path detected by gcc-7.
Let's simply assign srv_conn after checking the error and not before.
Willy Tarreau [Sun, 28 Oct 2018 19:13:12 +0000 (20:13 +0100)]
BUG/MAJOR: http: http_txn_get_path() may deference an inexisting buffer
When the "path" sample fetch function is called without any path, the
function doesn't check that the request buffer is allocated. While this
doesn't happen with the request during processing, it can definitely
happen when mistakenly trying to reference a path from the response
since the request channel is not allocated anymore.
It's certain that this bug was emphasized by the buffer changes that
went in 1.9 and the HTTP refactoring, but at first glance, 1.8 doesn't
seem 100% safe either so it's possible that older version are affected
as well.
Thanks to PiBa-NL for reporting this bug with a reproducer.
Willy Tarreau [Sun, 28 Oct 2018 19:09:12 +0000 (20:09 +0100)]
BUG/MINOR: memory: make the thread-local cache allocator set the debugging link
When building with DEBUG_MEMORY_POOLS, an element returned from the
cache would not have its pool link initialized unless it's allocated
using pool_alloc(). This is problematic for buffer allocators which
use pool_alloc_dirty(), as freeing this object will make the code
think it was allocated from another pool. This patch does two things :
- make __pool_get_from_cache() set the link
- remove the extra initialization from pool_alloc() since it's always
done in either __pool_get_first() or __pool_refill_alloc()
This patch is marked MINOR since it only affects code explicitly built
for debugging. No backport is needed.
Willy Tarreau [Sun, 28 Oct 2018 18:19:48 +0000 (19:19 +0100)]
REGTEST: make the IP+port logging test more reliable
On my machine, test log/b00000.vtc fails ~9/10 times. Apparently, the
connection is often marked as reset before the timeout strikes, so the
log shows "CD" flags instead of "cD". This fix does two things :
1) shorten the client timeout to 1 millisecond instead of 5
2) accept both "cD" and "CD" as valid termination states since the
purpose is to validate the source address and port, and not the
status itself.
This patch makes the cache capable of adding an "Age" header as defined by
rfc7234.
During the storage of new HTTP objects we memorize ->eoh value and
the value of the "Age" header coming from the origin server.
These information may then be reused to return the cached HTTP objects
with a new "Age" header.
This patch implements analysers for parsing the CLI and extra features
for the master's CLI.
For each command (sent alone, or separated by ; or \n) the request
analyser will determine to which server it should send the request.
The 'mode cli' proxy is able to parse a prefix for each command which is
used to select the apropriate server. The prefix start by @ and is
followed by "master", the PID preceded by ! or the relative PID. (e.g.
@master, @1, @!1234). The servers are not round-robined anymore.
The command is sent with a SHUTW which force the server to close the
connection after sending its response. However the proxy allows a
keepalive connection on the client side and does not close.
The response analyser does not do much stuff, it only reinits the
connection when it received a close from the server, and forward the
response. It does not analyze the response data.
The only guarantee of the end of the response is the close of the
server, we can't rely on the double \n since it's not send by every
command.
This patch introduces mworker_cli_proxy_new_listener() which allows the
creation of new listeners for the CLI proxy.
Using this function it is possible to create new listeners from the
program arguments with -Sa <unix_socket>. It is allowed to create
multiple listeners with several -Sa.
This patch implements a listen proxy within the master. It uses the
sockpair of all the workers as servers.
In the current state of the code, the proxy is only doing round robin on
the CLI of the workers. A CLI mode will be needed to know to which CLI
send the requests.
Willy Tarreau [Thu, 25 Oct 2018 12:02:47 +0000 (14:02 +0200)]
MINOR: stream-int: don't needlessly call si_cs_send() in si_cs_process()
There's a call there to si_cs_send() while we're supposed to come from
si_cs_io_cb() which has just done it. But in fact we can also come here
as a lower layer callback from ->wake() after a connection is established.
Since most of the time we'll end up here with either no data in the buffer
or a blocked output, let's simply check if we're already susbcribed to send
events before calling si_cs_send().
Willy Tarreau [Thu, 25 Oct 2018 11:55:20 +0000 (13:55 +0200)]
MINOR: stream-int: make stream_int_notify() not wake the tasklet up
stream_int_notify() is I/O agnostic and should not wake up the tasklet,
it's up to si_cs_process() to do that, just like si_applet_wake_cb()
does it for the applet.
Willy Tarreau [Thu, 25 Oct 2018 11:49:49 +0000 (13:49 +0200)]
MINOR: stream-int: don't needlessly call tasklet_wakeup() in stream_int_chk_snd_conn()
This one was added by commit 53216e7db ("MEDIUM: connections: Don't
directly mess with the polling from the upper layers.") after the
removal of the conditional cs_want_send() call. But after analysis
it turned out that it's not needed since the si_cs_send() call will
either succeed or subscribe.
Willy Tarreau [Sun, 28 Oct 2018 12:32:08 +0000 (13:32 +0100)]
MEDIUM: stream-int: call si_cs_process() in stream_int_update_conn
Calling si_cs_send() alone is always dangerous because it can result
in the loss of an event if it manages to empty the buffer. Indeed, in
this case it's critical to call si_chk_rcv() on the opposite stream-int.
Given that si_cs_process() takes care of all this, let's call it instead.
All this code could possibly be refined soon to avoid redoing the whole
stream_int_notify() and do it only after a send(), but at the moment it's
not important.
Willy Tarreau [Thu, 25 Oct 2018 09:06:57 +0000 (11:06 +0200)]
MEDIUM: stream-int: make si_update() synchronize flag changes before the I/O
With the new synchronous si_cs_send() at the end of process_stream(),
we're seeing re-appear the I/O layer specific part of the stream interface
which is supposed to deal with I/O event subscription. The only difference
is that now we subscribe to I/Os only after having attempted (and failed)
them.
This patch brings a cleanup in this by reintroducing stream_int_update_conn()
with the send code from process_stream(). However this alone would not be
enough because the flags which are cleared afterwards would result in the
loss of the possible events (write events only at the moment). So the flags
clearing and stream-int state updates are also performed inside si_update()
between the generic code and the I/O specific code. This definitely makes
sense as after this call we can simply check again for channel and SI flag
changes and decide to loop once again or not.
Willy Tarreau [Thu, 25 Oct 2018 08:42:39 +0000 (10:42 +0200)]
MEDIUM: stream: don't try to send first in process_stream()
The rationale here is that we should never need to try to send() at the
beginning of process_stream() because :
- if something was pending, it's very unlikely that it was unblocked
and not sent just between the last poll() and the wakeup instant.
- if something pending was recently sent, then we don't have anything
to send anymore.
So at first glance it doesn't seem like there could be any valid case
where trying to send before entering the function brings any benefit.
Willy Tarreau [Thu, 25 Oct 2018 08:28:27 +0000 (10:28 +0200)]
MEDIUM: stream: always call si_cs_recv() after a failed buffer allocation
If a buffer allocation failed, we have SI_FL_WAIT_ROOM set and c_size(buf)
being zero. It's the only moment where we have a new opportunity to try to
allocate this buffer. However we don't want to waste our time trying this
if both are non-null since it indicates missing room without any changed
condition.
Willy Tarreau [Thu, 25 Oct 2018 08:21:41 +0000 (10:21 +0200)]
MEDIUM: stream-int: replace channel_alloc_buffer() with si_alloc_ibuf() everywhere
Well that's only 3 places (applet.c, stream_interface.c, hlua.c). This
ensures we always clear SI_FL_WAIT_ROOM before setting it on failure,
so that it is granted that SI_FL_WAIT_ROOM always indicates a lack of
room for doing an operation, including the inability to allocate a
buffer for this.
Willy Tarreau [Thu, 25 Oct 2018 08:16:07 +0000 (10:16 +0200)]
MINOR: stream-int: add si_alloc_ibuf() to ease input buffer allocation
This will supersed channel_alloc_buffer() while relying on it. It will
automatically adjust SI_FL_WAIT_ROOM on the stream-int depending on
success or failure to allocate this buffer.
It's worth noting that it could make sense to also set SI_FL_WANT_PUT
each time we do this to further simplify the code at user places such
as applets, but it would possibly not be easy to clean this flag
everywhere an rx operation stops.
Willy Tarreau [Sun, 28 Oct 2018 12:44:36 +0000 (13:44 +0100)]
MINOR: stream: don't prune variables if the list is empty
The vars_prune() and vars_init() functions involve locking while most of
the time there is no variable at all in streams nor sessions. Let's check
for emptiness before calling these functions. Simply doing this has
increased the multithreaded performance from 1.5 to 5% depending on the
workload.
Lukas Tribus [Sat, 27 Oct 2018 18:07:40 +0000 (20:07 +0200)]
BUG/MINOR: only auto-prefer last server if lb-alg is non-deterministic
While "option prefer-last-server" only applies to non-deterministic load
balancing algorithms, 401/407 responses actually caused haproxy to prefer
the last server unconditionally.
As this breaks deterministic load balancing algorithms like uri, this
patch applies the same condition here.
Should be backported to 1.8 (together with "BUG/MINOR: only mark
connections private if NTLM is detected").
Lukas Tribus [Sat, 27 Oct 2018 18:06:59 +0000 (20:06 +0200)]
BUG/MINOR: only mark connections private if NTLM is detected
Instead of marking all connections that see a 401/407 response private
(for connection reuse), this patch detects a RFC4559/NTLM authentication
scheme and restricts the private setting to those connections.
This is so we can reuse connections with 401/407 responses with
deterministic load balancing algorithms later (which requires another fix).
This fixes the problem reported here by Elliot Barlas :
Willy Tarreau [Wed, 24 Oct 2018 15:17:56 +0000 (17:17 +0200)]
MEDIUM: channel: merge back flags CF_WRITE_PARTIAL and CF_WRITE_EVENT
The behaviour of the flag CF_WRITE_PARTIAL was modified by commit 95fad5ba4 ("BUG/MAJOR: stream-int: don't re-arm recv if send fails") due
to a situation where it could trigger an immediate wake up of the other
side, both acting in loops via the FD cache. This loss has caused the
need to introduce CF_WRITE_EVENT as commit c5a9d5bf, to replace it, but
both flags express more or less the same thing and this distinction
creates a lot of confusion and complexity in the code.
Since the FD cache now acts via tasklets, the issue worked around in the
first patch no longer exists, so it's more than time to kill this hack
and to restore CF_WRITE_PARTIAL's semantics (i.e.: there has been some
write activity since we last left process_stream).
This patch mostly reverts the two commits above. Only the part making
use of CF_WROTE_DATA instead of CF_WRITE_PARTIAL to detect the loss of
data upon connection setup was kept because it's more accurate and
better suited.
MINOR: cache: Avoid usage of atoi() when parsing "max-object-size".
With this patch we avoid parsing "max-object-size" with atoi() and we store its
value as an unsigned int to prevent bad implicit conversion issues especially
when we compare it with others unsigned value (content length).