Willy Tarreau [Fri, 30 Aug 2019 09:17:01 +0000 (11:17 +0200)]
MEDIUM: ring: implement a wait mode for watchers
Now it is possible for a reader to subscribe and wait for new events
sent to a ring buffer. When new events are written to a ring buffer,
the applets that are subscribed are woken up to display new events.
For now we only support this with the CLI applet called by "show events"
since the I/O handler is indeed a CLI I/O handler. But it's not
complicated to add other mechanisms to consume events and forward them
to external log servers for example. The wait mode is enabled by adding
"-w" after "show events <sink>". An extra "-n" was added to directly
seek to new events only.
Willy Tarreau [Fri, 30 Aug 2019 06:05:03 +0000 (08:05 +0200)]
MINOR: cli: extend the CLI context with a list and two offsets
Some CLI parsers are currently abusing the CLI context types such as
pointers to stuff longs into them by lack of room. But the context is
80 bytes while cli is only 48, thus there's some room left. This patch
adds a list element and two size_t usable as various offsets. The list
element is initialized.
Willy Tarreau [Fri, 30 Aug 2019 08:16:14 +0000 (10:16 +0200)]
BUG/MINOR: ring: fix the way watchers are counted
There are two problems with the way we count watchers on a ring:
- the test for >=255 was accidently kept to 1 used during QA
- if the producer deletes all data up to the reader's position
and the reader is called, cannot write, and is called again,
it will have a zero offset, causing it to initialize itself
multiple times and each time adding a new refcount.
Let's change this and instead use ~0 as the initial offset as it's
not possible to have it as a valid one. We now store it into the
CLI context's size_t o0 instead of casting it to a void*.
Willy Tarreau [Fri, 30 Aug 2019 05:28:24 +0000 (07:28 +0200)]
MINOR: mux-h2/trace: report h2s->id before h2c->dsi for the stream ID
h2c->dsi is only for demuxing, and needed while decoding a new request.
But if we already have a valid stream ID (e.g. response or outgoing
request), we should use it instead. This avoids seeing [0] in front of
the responses at user level.
Willy Tarreau [Fri, 30 Aug 2019 05:11:30 +0000 (07:11 +0200)]
MINOR: mux-h2/trace: only decode the start-line at verbosity other than "minimal"
This is as documented in "trace h2 verbosity", level "minimal" only
features flags and doesn't perform any decoding at all, "simple" does,
just like "clean" which is the default for end uesrs.
Willy Tarreau [Fri, 30 Aug 2019 05:21:18 +0000 (07:21 +0200)]
MINOR: mux-h2/trace: add a new verbosity level "clean"
The "clean" output will be suitable for user and proto-level output
where the internal stuff (state, pointers, etc) is not desired but
just the basic protocol elements.
Willy Tarreau [Fri, 30 Aug 2019 05:07:08 +0000 (07:07 +0200)]
MINOR: mux-h2: add functions to convert an h2c/h2s state to a string
We need this all the time in traces, let's have it now. For the sake
of compact outputs, the strings are all 3-chars long. The "show fd"
output was improved to make use of this.
Willy Tarreau [Mon, 12 Aug 2019 16:42:03 +0000 (18:42 +0200)]
MEDIUM: mux-h2/trace: add lots of traces all over the code
All functions of the h2 data path were updated to receive one or multiple
TRACE() calls, at least one pair of TRACE_ENTER()/TRACE_LEAVE(), and those
manipulating protocol elements have been improved to report frame types,
special state transitions or detected errors. Even with careful tests, no
performance impact was measured when traces are disabled.
They are not completely exploited yet, the callback function tries to
dump a lot about them, but still doesn't provide buffer dumps, nor does
it indicate the stream or connection error codes.
The first argument is always set to the connection when known. The second
argument is set to the h2s when known, sometimes a 3rd argument is set to
a buffer, generally the rxbuf or htx, and occasionally the 4th argument
points to an integer (number of bytes read/sent, error code).
Retrieving a 10kB object produces roughly 240 lines when at developer
level, 35 lines at data level, 27 at state level, and 10 at proto level
and 2 at user level.
For now the headers are not dumped, but the start line are emitted in
each direction at user level.
The patch is marked medium because it touches lots of places, though
it takes care not to change the execution path.
Willy Tarreau [Mon, 19 Aug 2019 15:56:27 +0000 (17:56 +0200)]
MINOR: mux-h2/trace: add the default decoding callback
The new function h2_trace() is called when relevant by the trace subsystem
in order to provide extra information about the trace being produced. It
can for example display the connection pointer, the stream pointer, etc.
It is declared in the trace source as the default callback as we expect
it to be versatile enough to enrich most traces.
In addition, for requests and responses, if we have a buffer and we can
decode it as an HTX buffer, we can extract user-friendly information from
the start line.
Willy Tarreau [Thu, 8 Aug 2019 16:23:12 +0000 (18:23 +0200)]
MINOR: mux-h2/trace: register a new trace source with its events
For now the traces are not used. Supported events are categorized by
where the activity comes from (h2c, h2s, stream, etc), a direction
(send/recv/wake), and a list of possibilities for each of them (frame
types, errors, shut, ...). This results in ~50 different events that
try to cover a lot of possibilities when it's needed to filter on
something specific. Special events like protocol error are handled.
A few aggregate events like "rx_frame" or "tx_frame" are planed to
cover all frame types at once by being placed all the time with any
of the other ones.
We also state that the first argument is always the connection. This way
the trace subsystem will be able to safely retrieve some useful info, and
we'll still be able to get the h2c from there (conn->ctx) in a pretty
print function. The second argument will always be an h2s, and in order
to propose it for tracking, we add its description. We also define 4
verbosity levels, which seems more than enough.
Willy Tarreau [Thu, 29 Aug 2019 06:24:16 +0000 (08:24 +0200)]
MINOR: trace: change the detail_level to per-source verbosity
The detail level initially based on syslog levels is not used, while
something related is missing, trace verbosity, to indicate whether or
not we want to call the decoding callback and what level of decoding
we want (raw captures etc). Let's change the field to "verbosity" for
this. A verbosity of zero means that the decoding callback is not
called, and all other levels are handled by this callback and are
source-specific. The source is now prompted to list the levels that
are proposed to the user. When the source doesn't define anything,
"quiet" and "default" are available.
Willy Tarreau [Thu, 29 Aug 2019 07:08:36 +0000 (09:08 +0200)]
MINOR: trace: also report the trace level in the output
It's very convenient to know the trace level in the output, at least to
grep/grep -v on it. The main usage is to filter/filter out the developer
traces at level DEVEL. For this we now add the numeric level (0 to 4) just
after the source name in the brackets. The output now looks like this :
Willy Tarreau [Thu, 29 Aug 2019 06:40:59 +0000 (08:40 +0200)]
MINOR: trace: prepend the function name for developer level traces
Working on adding traces to mux-h2 revealed that the function names are
manually copied a lot in developer traces. The reason is that they are
not preprocessor macros and as such cannot be concatenated. Let's
slightly adjust the trace() function call to take a function name just
after the file:line argument. This argument is only added for the
TRACE_DEVEL and 3 new TRACE_ENTER, TRACE_LEAVE, and TRACE_POINT macros
and left NULL for others. This way the function name is only reported
for traces aimed at the developers. The pretty-print callback was also
extended to benefit from this. This will also significantly shrink the
data segment as the "entering" and "leaving" strings will now be merged.
One technical point worth mentioning is that the function name is *not*
passed as an ist to the inline function because it's not considered as
a builtin constant by the compiler, and would lead to strlen() being
run on it from all call places before calling the inline function. Thus
instead we pass the const char * (that the compiler knows where to find)
and it's the __trace() function that converts it to an ist for internal
consumption and for the pretty-print callback. Doing this avoids losing
5-10% peak performance.
Willy Tarreau [Thu, 29 Aug 2019 06:01:48 +0000 (08:01 +0200)]
MINOR: trace: change the "payload" level to "data" and move it
The "payload" trace level was ambigous because its initial purpose was
to be able to dump received data. But it doesn't make sense to force to
report data transfers just to be able to report state changes. For
example, all snd_buf()/rcv_buf() operations coming from the application
layer should be tagged at this level. So here we move this payload level
above the state transitions and rename it to avoid the ambiguity making
one think it's only about request/response payload. Now it clearly is
about any data transfer and is thus just below the developer level. The
help messages on the CLI and the doc were slightly reworded to help
remove this ambiguity.
Willy Tarreau [Thu, 29 Aug 2019 07:32:21 +0000 (09:32 +0200)]
MINOR: tools: add a generic struct "name_desc" for name-description pairs
In prompts on the CLI we now commonly need to propose a keyword name
and a description and it doesn't make sense to define a new struct for
each such pairs. Let's simply have a generic "name_desc" for this.
Willy Tarreau [Wed, 28 Aug 2019 08:35:16 +0000 (10:35 +0200)]
MINOR: trace: change the TRACE() calling convention to put the args and cb last
Previously the callback was almost mandatory so it made sense to have it
before the message. Now that it can default to the one declared in the
trace source, most TRACE() calls contain series of empty args and callbacks,
which make them suitable for being at the end and being totally omitted.
This patch thus reverses the TRACE arguments so that the message appears
first, then the mask, then arg1..arg4, then the callback. In practice
we'll mostly see 1 arg, or 2 args and nothing else, and it will not be
needed anymore to pass long series of commas in the middle of the
arguments. However if a source is enforced, the empty commas will still
be needed for all omitted arguments.
Willy Tarreau [Wed, 28 Aug 2019 08:08:58 +0000 (10:08 +0200)]
MINOR: trace: show thread number and source name in the trace
Traces were missing the thread number and the source name, which was
really annoying. Now the thread number is emitted on two digits inside
the square brackets, followed by the source name then the line location,
each delimited with a vertical bar, such as below :
Willy Tarreau [Wed, 28 Aug 2019 05:03:58 +0000 (07:03 +0200)]
MINOR: trace: support a default callback for the source
It becomes apparent that most traces will use a single trace pretty
print callback, so let's allow the trace source to declare a default
one so that it can be omitted from trace calls, and will be used if
no other one is specified.
Willy Tarreau [Tue, 27 Aug 2019 14:41:06 +0000 (16:41 +0200)]
MINOR: sink: now report the number of dropped events on output
The principle is that when emitting a message, if some dropped events
were logged, we first attempt to report this counter before going
further. This is done under an exclusive lock while all logs are
produced under a shared lock. This ensures that the dropped line is
accurately reported and doesn't accidently arrive after a later
event.
Willy Tarreau [Fri, 23 Aug 2019 13:47:49 +0000 (15:47 +0200)]
MINOR: sink: add support for ring buffers
This now provides sink_new_buf() which allocates a ring buffer. One such
ring ("buf0") of 1 MB is created already, and may be used by sink_write().
The sink's creation should probably be moved somewhere else later.
Willy Tarreau [Tue, 27 Aug 2019 09:55:39 +0000 (11:55 +0200)]
MINOR: ring: add a generic CLI io_handler to dump a ring buffer
The three functions (attach, IO handler, and release) are meant to be
called by any CLI command which requires to dump the contents of a ring
buffer. We do not implement anything generic to dump any ring buffer on
the CLI since it's meant to be used by other functionalities above.
However these functions deal with locking and everything so it's trivial
to embed them in other code.
Willy Tarreau [Tue, 27 Aug 2019 09:44:13 +0000 (11:44 +0200)]
MINOR: ring: add a ring_write() function
This function tries to write to the ring buffer, possibly removing enough
old messages to make room for the new one. It takes two arrays of fragments
on input to ease the insertion of prefixes by the caller. It atomically
writes the message, possibly truncating it if desired, and returns the
operation's status.
Willy Tarreau [Thu, 8 Aug 2019 13:28:52 +0000 (15:28 +0200)]
MINOR: ring: add a new mechanism for retrieving/storing ring data in buffers
Our circular buffers are well suited for being used as ring buffers for
not-so-structured data. The machanism here consists in making room in a
buffer before inserting a new record which is prefixed by its size, and
looking up next record based on the previous one's offset and size. We
can have up to 255 consumers watching for data (dump in progress, tail)
which guarantee that entrees are not recycled while they're being dumped.
The complete representation is described in the header file. For now only
ring_new(), ring_resize() and ring_free() are created.
Willy Tarreau [Tue, 27 Aug 2019 12:21:02 +0000 (14:21 +0200)]
MINOR: sink: now call the generic fd write function
Let's not mess up with fd-specific code, locking nor message formating
here, and use the new generic function instead. This substantially
simplifies the sink_write() code and makes it more agnostic to the
output representation and storage.
Willy Tarreau [Tue, 27 Aug 2019 09:08:17 +0000 (11:08 +0200)]
MINOR: fd: add fd_write_frag_line() to send a fragmented line to an fd
Currently both logs and event sinks may use a file descriptor to
atomically emit some output contents. The two may use the same FD though
nothing is done to make sure they use the same lock. Also there is quite
some redundancy between the two. Better make a specific function to send
a fragmented message to a file descriptor which will take care of the
locking via the fd's lock. The function is also able to truncate a
message and to enforce addition of a trailing LF when building the
output message.
Willy Tarreau [Fri, 23 Aug 2019 15:06:12 +0000 (17:06 +0200)]
MINOR: buffer: add functions to read/write varints from/to buffers
The new functions are :
__b_put_varint() : inserts a varint when it's known that it fits
b_put_varint() : tries to insert a varint at the tail
b_get_varint() : tries to get a varint from the head
b_peek_varint() : tries to peek a varint at a specific offset
Wrapping is supported so that they are expected to be safe to use to
manipulate varints with buffers anywhere.
Willy Tarreau [Fri, 23 Aug 2019 17:02:26 +0000 (19:02 +0200)]
MINOR: tools: add a function varint_bytes() to report the size of a varint
It will sometimes be useful to encode varints to know the output size in
advance. Two versions are provided, one inline using a switch/case construct
which will be trivial for use with constants (and will be very fast albeit
huge) and one function iterating on the number which is 5 times smaller,
for use with variables.
Nenad Merdanovic [Mon, 26 Aug 2019 23:58:13 +0000 (01:58 +0200)]
MINOR: backend: Add srv_queue converter
The converter can be useful to look up a server queue from a dynamic value.
It takes an input value of type string, either a server name or
<backend>/<server> format and returns the number of queued sessions
on that server. Can be used in places where we want to look up
queued sessions from a dynamic name, like a cookie value (e.g.
req.cook(SRVID),srv_queue) and then make a decision to break
persistence or direct a request elsewhere.
Jerome Magnin [Mon, 26 Aug 2019 09:44:21 +0000 (11:44 +0200)]
BUG/MEDIUM: url32 does not take the path part into account in the returned hash.
The url32 sample fetch does not take the path part of the URL into
account. This is because in smp_fetch_url32() we erroneously modify
path.len and path.ptr before testing their value and building the
path based part of the hash.
This fixes issue #235
This must be backported as far as 1.9, when HTX was introduced.
Willy Tarreau [Mon, 26 Aug 2019 08:55:52 +0000 (10:55 +0200)]
BUG/MEDIUM: listener/threads: fix an AB/BA locking issue in delete_listener()
The delete_listener() function takes the listener's lock before taking
the proto_lock, which is contrary to what other functions do, possibly
causing an AB/BA deadlock. In practice the two only places where both
are taken are during protocol_enable_all() and delete_listener(), the
former being used during startup and the latter during stop. In practice
during reload floods, it is technically possible for a thread to be
initializing the listeners while another one is stopping. While this
is too hard to trigger on 2.0 and above due to the synchronization of
all threads during startup, it's reasonably easy to do in 1.9 by having
hundreds of listeners, starting 64 threads and flooding them with reloads
like this :
$ while usleep 50000; do killall -USR2 haproxy; done
Usually in less than a minute, all threads will be deadlocked. The fix
consists in always taking the proto_lock before the listener lock. It
seems to be the only place where these two locks were reversed. This
fix needs to be backported to 2.0, 1.9, and 1.8.
Willy Tarreau [Mon, 26 Aug 2019 08:37:39 +0000 (10:37 +0200)]
BUG/MINOR: mworker: disable SIGPROF on re-exec
If haproxy is built with profiling enabled with -pg, it is possible to
see the master quit during a reload while it's re-executing itself with
error code 155 (signal 27) saying "Profile timer expired)". This happens
if the SIGPROF signal is delivered during the execve() call while the
handler was already unregistered. The issue itself is not directly inside
haproxy but it's easy to address. This patch disables this signal before
calling execvp() during a master reload. A simple test for this consists
in running this little script with haproxy started in master-worker mode :
$ while usleep 50000; do killall -USR2 haproxy; done
This fix should be backported to all versions using the master-worker
model.
Willy Tarreau [Fri, 23 Aug 2019 07:29:29 +0000 (09:29 +0200)]
BUG/MEDIUM: mux-h1: do not report errors on transfers ending on buffer full
If a receipt ends with the HTX buffer full and everything is completed except
appending the HTX EOM block, we end up detecting an error because the H1
parser did not switch to H1_MSG_DONE yet while all conditions for an end of
stream and end of buffer are met. This can be detected by retrieving 31532
or 31533 chunk-encoded bytes over H1 and seeing haproxy log "SD--" at the
end of a successful transfer.
Ideally the EOM part should be totally independent on the H1 message state
since the block was really parsed and finished. So we should switch to a
last state requiring to send only EOM. However this needs a few risky
changes. This patch aims for simplicity and backport safety, thus it only
adds a flag to the H1 stream indicating that an EOM is still needed, and
excludes this condition from the ones used to detect end of processing. A
cleaner approach needs to be studied, either by adding a state before DONE
or by setting DONE once the various blocks are parsed and before trying to
send EOM.
This fix must be backported to 2.0. The issue does not seem to affect 1.9
though it is not yet known why, probably that it is related to the different
encoding of trailers which always leaves a bit of room to let EOM be stored.
Willy Tarreau [Fri, 23 Aug 2019 06:11:36 +0000 (08:11 +0200)]
BUG/MEDIUM: mux-h1: do not truncate trailing 0CRLF on buffer boundary
The H1 message parser calls the various message block parsers with an
offset indicating where in the buffer to start from, and only consumes
the data at the end of the parsing. The headers and trailers parsers
have a condition detecting if a headers or trailers block is too large
to fit into the buffer. This is detected by an incomplete block while
the buffer is full. Unfortunately it doesn't take into account the fact
that the block may be parsed after other blocks that are still present
in the buffer, resulting in aborting some transfers early as reported
in issue #231. This typically happens if a trailers block is incomplete
at the end of a buffer full of data, which typically happens with data
sizes multiple of the buffer size minus less than the trailers block
size. It also happens with the CRLF that follows the 0-sized chunk of
any transfer-encoded contents is itself on such a boundary since this
CRLF is technically part of the trailers block. This can be reproduced
by asking a server to retrieve exactly 31532 or 31533 bytes of static
data using chunked encoding with curl, which reports:
transfer closed with outstanding read data remaining
This issue was revealed in 2.0 and does not affect 1.9 because in 1.9
the trailers block was processed at once as part of the data block
processing, and would simply give up and wait for the rest of the data
to arrive.
It's interesting to note that the headers block parsing is also affected
by this issue but in practice it has a much more limited impact since a
headers block is normally only parsed at the beginning of a buffer. The
only case where it seems to matter is when dealing with a response buffer
full of 100-continue header blocks followed by a regular header block,
which will then be rejected for the same reason.
This fix must be backported to 2.0 and partially to 1.9 (the headers
block part).
Willy Tarreau [Thu, 22 Aug 2019 18:26:28 +0000 (20:26 +0200)]
BUILD: trace: make the lockon_ptr const to silence a warning without threads
I forgot to fix this one before pushing, despite my tests. lockon_ptr is
only used to compare pointers, it doesn't need to point to a writable
location. Without threads the atomic store is turned into an assignment
and rightfully complains.
Willy Tarreau [Thu, 22 Aug 2019 18:06:04 +0000 (20:06 +0200)]
DOC: management: document the "trace" and "show trace" commands
At the moment the subsystem is still not complete and the various modules
do not yet produce traces (some dirty experimental code for H2 exists) but
this aims at easing a broad adoption.
Among the missing elements, we can enumerate the lack of configuration
of the sinks (e.g. it's still not possible to change their output format
nor enable/disable timestamps) and since timestamps are not availalbe in
the sinks, they are not collected nor passed by the traces.
Willy Tarreau [Wed, 21 Aug 2019 15:05:46 +0000 (17:05 +0200)]
MINOR: trace: retrieve useful pointers and enforce lock-on
Now we try to find frontend, listener, backend, server, connection,
session, stream, from the presented argument of type connection,
stream or session. Various combinations and bounces allow to
retrieve most of them almost all the time. The extraction is
performed early so that we'll be able to apply filters later.
The lock-on is set if it was not there while the trace is running and
a valid pointer is available. If it was already set and doesn't match,
no trace is produced.
Willy Tarreau [Tue, 20 Aug 2019 17:24:10 +0000 (19:24 +0200)]
MINOR: trace: parse the "lock" argument to trace
When no criterion is provided, it carefully enumerates all available
ones, including those provided by the source itself. Otherwise it sets
the new criterion and resets the lockon pointer.
Willy Tarreau [Tue, 20 Aug 2019 17:22:53 +0000 (19:22 +0200)]
MINOR: trace: add the possibility to lock on some arguments
Given that we can pass typed arguments to the trace() function, let's
add provisions for tracking them. They are source-specific so we need
to let the source fill their name and description. Only those with a
non-null name will be proposed.
Willy Tarreau [Tue, 20 Aug 2019 16:57:48 +0000 (18:57 +0200)]
MINOR: trace: make sure to always stop the locking when stopping or pausing
When we stop or pause a trace (either on a matching event or by hand),
we must also stop the lock-on feature so that we don't follow any
further activity on this pointer even if it is recycled. For now this
is not exploited.
Willy Tarreau [Tue, 20 Aug 2019 16:42:52 +0000 (18:42 +0200)]
MINOR: trace: add a definition of typed arguments to trace()
With a few macros it's possible for a trace source to commit to only
using a certain type for a given argument (or set of). This will be
particularly useful to let the trace subsystem retrieve some precious
information such as a connection, session, listener, source address or
so, and enable/disable filtering and/or locking.
Willy Tarreau [Mon, 12 Aug 2019 15:27:09 +0000 (17:27 +0200)]
MINOR: trace: add per-level macros to produce traces
The new TRACE_<level>() macros take a mask, 4 args, a callback and a
static message. From this they also inherit the TRACE_SOURCE macro from
the caller, which contains the pointer to the trace source (so that it's
not required to paste it everywhere), and an ist string is also made by
the concatenation of the file name and the line number. This uses string
concatenation by the preprocessor, and turns it into an ist by the compiler
so that there is no operation at all to perform to adjust the data length
as the compiler knows where to cut during the optimization phase. Last,
the message is also automatically turned into an ist so that it's trivial
to put it into an iovec without having to run strlen() on it.
All arguments and the callback may be empty and will then automatically
be replaced with a NULL pointer. This makes the TRACE calls slightly
lighter especially since arguments are not always used. Several other
options were considered to use variadic macros but there's no outstanding
rule that justifies to place an argument before another one, and it still
looks convenient to have the message be the last one to encourage copy-
pasting of the trace statements.
A generic TRACE() macro takes TRACE_LEVEL in from the source file as the
trace level instead of taking it from its name. This may slightly simplify
the production of traces that always run at the same level (internal core
parts may probably only be called at developer level).
Willy Tarreau [Mon, 19 Aug 2019 14:28:07 +0000 (16:28 +0200)]
MINOR: trace: implement a call to a decode function
The trace() call will support an optional decoding callback and 4
arguments that this function is supposed to know how to use to provide
extra information. The output remains unchanged when the function is
NULL. Otherwise, the message is pre-filled into the thread-local
trace_buf, and the function is called with all arguments so that it
completes the buffer in a readable form depending on the expected
level of detail.
Willy Tarreau [Mon, 12 Aug 2019 15:57:57 +0000 (17:57 +0200)]
MINOR: trace: make trace() now also take a level in argument
This new "level" argument will allow the trace sources to label the
traces for different purposes, and filter out some of them if they
are not relevant to the current target. Right now we have 5 different
levels:
- USER : the least verbose one, only a few functional information
- PAYLOAD: like user but also displays some payload-related information
- PROTO: focuses on the protocol's framing
- STATE: also indicate state internal transitions or non-transitions
- DEVELOPER: adds extra info about branches taken in the code (break
points, return points)
Willy Tarreau [Mon, 12 Aug 2019 15:27:09 +0000 (17:27 +0200)]
MINOR: trace: add the file name and line number in the prefix
We now pass an extra argument "where" to the trace() call, which
is supposed to be an ist made of the concatenation of the filename
and the line number. We only keep the last 10 chars from this string
since the end of file names is most often easy to recognize. This
gives developers useful information at very low cost.
Willy Tarreau [Mon, 12 Aug 2019 13:51:58 +0000 (15:51 +0200)]
MINOR: trace: implement a very basic trace() function
For now it remains quite basic. It performs a few state checks, calls
the source's sink if defined, and performs the transitions between
RUNNING, STOPPED and WAITING when the configured events match.
Willy Tarreau [Mon, 12 Aug 2019 14:44:33 +0000 (16:44 +0200)]
MINOR: trace/cli: add "show trace" to report trace state and statistics
The new "show trace" CLI command lists available trace sources and
indicates their status, their sink, and number of dropped packets.
When "show trace <source>" is used, the list of known events is
also listed with their status per action (report/start/stop/pause).
Willy Tarreau [Mon, 12 Aug 2019 15:57:57 +0000 (17:57 +0200)]
MINOR: trace/cli: parse the "level" argument to configure the trace verbosity
The "level" keyword allows to indicate the expected level of verbosity
in the traces, among "user" (least verbose, just synthetic info) to
"developer" (very detailed, including function entry/leaving). It's only
displayed and set but not used yet.
Willy Tarreau [Thu, 8 Aug 2019 16:48:12 +0000 (18:48 +0200)]
MINOR: trace/cli: register the "trace" CLI keyword to list the sources
For now it lists the sources if one is not provided, and checks
for the source's existence. It lists the events if not provided,
checks for their existence if provided, and adjusts reported
events/start/stop/pause events, and performs state transitions.
It lists sinks and adjusts them as well. Filters, lock, and
level are not implemented yet.
Willy Tarreau [Thu, 8 Aug 2019 16:21:26 +0000 (18:21 +0200)]
MINOR: trace: start to create a new trace subsystem
The principle of this subsystem will be to support taking live traces
at various places in the code with conditional triggers, filters, and
ability to lock on some elements. The traces will support typed events
and will be sent into sinks made of ring buffers, file descriptors or
remote servers.
Willy Tarreau [Tue, 20 Aug 2019 09:57:52 +0000 (11:57 +0200)]
MINOR: sink: add a support for file descriptors
This is the most basic type of sink. It pre-registers "stdout" and
"stderr", and is able to use writev() on them. The writev() operation
is locked to avoid mixing outputs. It's likely that the registration
should move somewhere else to take into account the fact that stdout
and stderr are still opened or are closed.
Willy Tarreau [Sun, 11 Aug 2019 14:38:56 +0000 (16:38 +0200)]
MINOR: sink: create definitions a minimal code for event sinks
The principle will be to be able to dispatch events to various destinations
called "sinks". This is already done in part in logs where log servers can
be either a UDP socket or a file descriptor. This will be needed with the
new trace subsystem where we may also want to add ring buffers. And it turns
out that all such destinations make sense at all places. Logs may need to be
sent to a TCP server via a ring buffer, or consulted from the CLI. Trace
events may need to be sent to stdout/stderr as well as to remote log servers.
This patch creates a new structure "sink" aiming at addressing these similar
needs. The goal is to merge together what is common to all of them, such as
the output format, the dropped events count, etc, and also keep separately
the target identification (network address, file descriptor). Provisions
were made to have a "waiter" on the sink. For a TCP log server it will be
the task to wake up after writing to the log buffer. For a ring buffer, it
could be the list of watchers on the CLI running a "tail" operation and
waiting for new events. A lock was also placed in the struct since many
operations will require some locking, including the FD ones. The output
formats covers those in use by logs and two extra ones prepending the ISO
time in front of the message (convenient for stdio/buffer).
For now only the generic infrastructure is present, no type-specific
output is implemented. There's the sink_write() function which prepares
and formats a message to be sent, trying hard to avoid copies and only
using pointer manipulation, where the type-specific code just has to be
added. Dropped messages are already counted (for now 100% drop). The
message is put into an iovec array as it will be trivial to use with
file descriptors and sockets.
Willy Tarreau [Thu, 8 Aug 2019 14:53:40 +0000 (16:53 +0200)]
REORG: trace: rename trace.c to calltrace.c and mention it's not thread-safe
The function call tracing code is a quite old and was never ported to
support threads. It's not even sure whether it still works well, but
at least its presence creates confusion for future work so let's rename
it to calltrace.c and add a comment about its lack of thread-safety.
Willy Tarreau [Mon, 19 Aug 2019 15:39:33 +0000 (17:39 +0200)]
MINOR: tools: add a DEFNULL() macro to use NULL for empty args
It's sometimes convenient for debugging macros not to be forced to
explicitly pass NULL in an unused argument. This macro does this, it
replaces a missing arg with NULL.
Olivier Houchard [Thu, 22 Aug 2019 16:34:25 +0000 (18:34 +0200)]
BUG/MEDIUM: h1: Always try to receive more in h1_rcv_buf().
In h1_rcv_buf(), wake the h1c tasklet as long as we're not done reading the
request/response, and the h1c is not already subscribed for receiving. Now
that we no longer subscribe in h1_recv() if we managed to read data, we
rely on h1_rcv_buf() calling us again, but h1_process_input() may have
returned 0 if we only received part of the request, so we have to wake
the tasklet to be sure to get more data again.
Willy Tarreau [Wed, 21 Aug 2019 12:16:02 +0000 (14:16 +0200)]
MEDIUM: debug: make the thread dump code show Lua backtraces
When we dump a thread's state (show thread, panic) we don't know if
anything is happening in Lua, which can be problematic especially when
calling external functions. With this patch, the thread dump code can
now detect if we're running in a global Lua task (hlua_process_task),
or in a TCP or HTTP Lua service (task_run_applet and applet.fct ==
hlua_applet_tcp_fct or http_applet_http_fct), or a fetch/converter
from an analyser (s->hlua != NULL). In such situations, it's able to
append a formatted Lua backtrace of the Lua execution path with
function names, file names and line numbers.
Note that a shorter alternative could be to call "luaL_where(hlua->T,0)"
which only prints the current location, but it's not necessarily sufficient
for complex code.
Willy Tarreau [Wed, 21 Aug 2019 11:17:37 +0000 (13:17 +0200)]
MINOR: tools: add append_prefixed_str()
This is somewhat related to indent_msg() except that this one places a
known prefix at the beginning of each line, allows to replace the EOL
character, and not to insert a prefix on the first line if not desired.
It works with a normal output buffer/chunk so it doesn't need to allocate
anything nor to modify the input string. It is suitable for use in multi-
line backtraces.
Ilya Shipitsin [Mon, 19 Aug 2019 16:32:53 +0000 (21:32 +0500)]
BUILD: travis-ci: trigger non-mainstream configurations only on daily builds.
Let us save some electricity of Travis-CI.
The following configurations are built on every push:
- linux-glibc with threads+openssl 1.1.1 on x86 + clang
- linux-glibc with threads+libressl 2.9.2 on x86 + clang
- linux-glibc without SSL nor threads on x86 + clang
- osx with openssl-1.1.1 + clang
The following configurations are built daily:
- linux-ppc64le + openssl-1.0.2
- linux + openssl-1.1.0 + 51d/tree
- linux + libressl-2.8.3
- linux + libressl-2.7.5 + prometheus
- linux + boringssl
- cygwin
- coverity scan
Olivier Houchard [Sat, 10 Aug 2019 21:56:16 +0000 (23:56 +0200)]
BUG/MEDIUM: mux_pt: Don't call unsubscribe if we did not subscribe.
In mux_pt_attach(), don't inconditionally call unsubscribe, and only do so
if we were subscribed. The idea was that at this point we would always be
subscribed, as for the mux_pt attach would only be called after at least one
request, after which the mux_pt would have subscribed, but this is wrong.
We can also be called if for some reason the connection failed before the
xprt was created. And with no xprt, attempting to call unsubscribe will
probably lead to a crash.
Willy Tarreau [Fri, 16 Aug 2019 14:06:14 +0000 (16:06 +0200)]
MINOR: fd: make sure to mark the thread as not stuck in fd_update_events()
When I/O events are being processed, we want to make sure to mark the
thread as not stuck. The reason is that some pollers (like poll()) which
do not limit the number of FDs they report could possibly report a huge
amount of FD all having to perform moderately expensive operations in
the I/O callback (e.g. via mux-pt which forwards to the upper layers),
making the watchdog think the thread is stuck since it does not schedule.
Of course this must never happen but if it ever does we must be liberal
about it.
This should be backported to 2.0, where the situation may happen more
easily due to the FD cache which can start to collect a large amount of
events. It may be related to the report in issue #201 though nothing is
certain about it.
BUG/MINOR: stats: Wait the body before processing POST requests
The stats applet waits to have a full body to process POST requests. Because
when it is waiting for the end of a request it does not produce anything, the
applet may be blocked. The client side is blocked because the stats applet does
not consume anything and the applet is waiting because all the body is not
received. Registering the analyzer AN_REQ_HTTP_BODY when a POST request is sent
for the stats applet solves the issue.
BUG/MEDIUM: lua: Fix test on the direction to set the channel exp timeout
This bug was introduced by the commit bfab2ddd ("MINOR: hlua: Add a flag on the
lua txn to know in which context it can be used"). The wrong test was done. So
the timeout was always set on the response channel. It may lead to an infinite
loop.
This patch must be backported everywhere the commit bfab2ddd is. For now, at
least to 2.0, 1.9 and 1.8.
BUG/MINOR: Fix prometheus '# TYPE' and '# HELP' headers
Prometheus protocol defines HELP and TYPE as a token after the '#' and
the space after the '#' is necessary.
This is expected in the prometheus python client for example
(https://github.com/prometheus/client_python/blob/a8f5c80f651ea570577c364203e0edbef67db727/prometheus_client/parser.py#L194)
and the missing space is breaking the parsing of metrics' type.
Lukas Tribus [Sun, 11 Aug 2019 16:03:45 +0000 (18:03 +0200)]
BUG/MINOR: lua: fix setting netfilter mark
In the REORG of commit 1a18b5414 ("REORG: connection: centralize the
conn_set_{tos,mark,quickack} functions") a bug was introduced by
calling conn_set_tos instead of conn_set_mark.
BUG/MEDIUM: proxy: Don't use cs_destroy() when freeing the conn_stream.
When we upgrade the mux from TCP to H2/HTX, don't use cs_destroy() to free
the conn_stream, use cs_free() instead. Using cs_destroy() would call the
mux detach method, and at that point of time the mux would be the H2 mux,
which knows nothing about that conn_stream, so bad things would happen.
This should eventually make upgrade from TCP to H2/HTX work, and fix
the github issue #196.
BUG/MEDIUM: proxy: Don't forget the SF_HTX flag when upgrading TCP=>H1+HTX.
In stream_end_backend(), if we're upgrading from TCP to H1/HTX, as we don't
destroy the stream, we have to add the SF_HTX flag on the stream, or bad
things will happen.
This was broken when attempting to fix github issue #196.
Willy Tarreau [Fri, 9 Aug 2019 09:21:01 +0000 (11:21 +0200)]
CLEANUP: cli: replace all occurrences of manual handling of return messages
There were 221 places where a status message or an error message were built
to be returned on the CLI. All of them were replaced to use cli_err(),
cli_msg(), cli_dynerr() or cli_dynmsg() depending on what was expected.
This removed a lot of duplicated code because most of the times, 4 lines
are replaced by a single, safer one.
These functions perform all the boring filling of the appctx's
cli struct needed by CLI parsers to return a message or an error,
and they return 1 so that they can be used as a single-line return
statement. They may be used for const messages or dynamic messages.
Willy Tarreau [Fri, 9 Aug 2019 07:57:36 +0000 (09:57 +0200)]
MINOR: cli: add two new states to print messages on the CLI
Right now we used to have extremely inconsistent states to report output,
one is CLI_ST_PRINT which prints constant message cli->msg with the
assigned severity, and CLI_ST_PRINT_FREE which prints dynamically
allocated cli->err with severity LOG_ERR, and nothing in between,
eventhough it's useful to be able to report dynamically allocated
messages as well as constant error messages.
This patch adds two extra states, which are not particularly well named
given the constraints imposed by existing ones. One is CLI_ST_PRINT_ERR
which prints a constant error message. The other one is CLI_ST_PRINT_DYN
which prints a dynamically allocated message. By doing so we maintain
the compatibility with current code.
It is important to keep in mind that we cannot pre-initialize pointers
and automatically detect what message type it is based on the assigned
fields, because the CLI's context is in a union shared with all other
users, thus unused fields contain anything upon return. This is why we
have no choice but using 4 states. Keeping the two fields <msg> and
<err> remains useful because one is const and not the other one, and
this catches may copy-paste mistakes. It's just that <err> is pretty
confusing here, it should be renamed.
Willy Tarreau [Thu, 8 Aug 2019 08:09:08 +0000 (10:09 +0200)]
CLEANUP: task: move the cpu_time field to the task-only part
The CPU time accounting field called "cpu_time" is used only by tasks
and not tasklets, yet it used to be stored into the TASK_COMMON part,
which doesn't make sense and wastes tasklet memory. In addition, moving
it to tasks also helps better group the various parts in cache lines.
Willy Tarreau [Thu, 8 Aug 2019 06:06:27 +0000 (08:06 +0200)]
CLEANUP: buffer: replace b_drop() with b_free()
Since last commit there's no point anymore in having two variants of the
same function, let's switch to b_free() only. __b_drop() was renamed to
__b_free() for obvious consistency reasons.
Willy Tarreau [Thu, 8 Aug 2019 05:53:20 +0000 (07:53 +0200)]
BUG/MINOR: buffers/threads: always clear a buffer's head before releasing it
A small race exists in buffers with "show sess all". This one wants to show
some information grabbed from the buffer (especially in HTX mode). But the
thread owning this buffer might just be releasing its area, right after a
free() or munmap() call, resulting in a head that is not seen as empty yet
though the area was released. It may then be dereferenced by "show sess all"
causing a crash. Note that in practice it only happens in debug mode with
UAF enabled, but it's tricky enough to fix it right now.
This should be backported to stable versions which support threads and a
store barrier. It's worth noting that by performing the clearing first,
b_free() and b_drop() now become two exact equivalent.
Willy Tarreau [Thu, 8 Aug 2019 05:38:19 +0000 (07:38 +0200)]
BUG/MINOR: pools: don't mark the thread harmless if already isolated
Commit 85b2cae63 ("MINOR: pools: make the thread harmless during the
mmap/munmap syscalls") was used to relax the pressure experienced by
other threads when running in debug mode with UAF enabled. It places
a pair of thread_harmless_now()/thread_harmless_end() around the call
to mmap(), assuming callers are not sensitive to parallel activity.
But there are a few cases like "show sess all" where this happens in
isolated threads, and marking the thread as harmless there is a very
bad idea, even worse when arriving to thread_harmless_end() which loops
forever.
Let's only do that when the thread is not isolated. No backport is
needed as the patch above was only in 2.1-dev.
Emmanuel Hocdet [Mon, 5 Aug 2019 16:04:16 +0000 (18:04 +0200)]
BUG/MINOR: ssl: fix 0-RTT for BoringSSL
Since BoringSSL commit 777a2391 "Hold off flushing NewSessionTicket until write.",
0-RTT doesn't work. It appears that half-RTT data (response from 0-RTT) never
worked before the BoringSSL fix. For HAProxy the regression come from 010941f8
"BUG/MEDIUM: ssl: Use the early_data API the right way.": the problem is link to
the logic of CO_FL_EARLY_SSL_HS used for OpenSSL. With BoringSSL, handshake is
done before reading early data, 0-RTT data and half-RTT data are processed as
normal data: CO_FL_EARLY_SSL_HS/CO_FL_EARLY_DATA is not needed, simply remove
it.
This should be backported to all versions till 1.8.
Allow HAProxy to cache responses to OPTIONS HTTP requests.
This is useful in the use case of "Cross-Origin Resource Sharing" (cors)
to cache CORS responses from API servers.
Since HAProxy does not support Vary header for now, this would be only
useful for "access-control-allow-origin: *" use case.
Current HTTP cache hash contains only the Host header and the url path.
That said, request method should also be added to the mix to support
caching other request methods on the same URL. IE GET and OPTIONS.
Willy Tarreau [Wed, 7 Aug 2019 12:25:20 +0000 (14:25 +0200)]
CLEANUP: mux-h2: move the demuxed frame check code in its own function
The frame check code in the demuxer was moved to its own function to
keep the demux function clean enough. This also simplifies the test
case as we can now simply call this function once in H2_CS_FRAME_P
state.
When parsing references to stick-tables declared as backends, they are added to
a list of proxies (they are proxies!) which refer to this stick-tables.
Before this patch we added them to these list without checking they were already
present, making the silly hypothesis the actions/sample were checked/resolved in the same
order the proxies are parsed.
This patch implement a simple inline function to in_proxies_list() to test
the presence of a proxy in a list of proxies. We use this function when resolving
/checking samples/actions.
Willy Tarreau [Tue, 6 Aug 2019 14:26:31 +0000 (16:26 +0200)]
BUG/MEDIUM: checks: make sure to close nicely when we're the last to speak
In SMTP, MySQL and PgSQL checks, we're supposed to finish with a message
to politely quit the server, otherwise some of them will log some errors.
This is the case with Postfix as reported in GH issue #187. Since commit fe4abe6 ("BUG/MEDIUM: connections: Don't call shutdown() if we want to
disable linger.") we are a bit more aggressive on outgoing connection
closure and checks were not prepared for this.
This patch makes the 3 checks above disable the linger_risk for these
checks so that we close cleanly, with the side effect that it will leave
some TIME_WAIT connections behind (hence why it should not be generalized
to all checks). It's worth noting that in issue #187 it's mentioned that
this patch doesn't seem to be sufficient for Postfix, however based only
on local network activity this looks OK, so maybe this will need to be
improved later.
Given that the patch above was backported to 2.0 and 1.9, this one should
as well.
Willy Tarreau [Tue, 6 Aug 2019 13:49:51 +0000 (15:49 +0200)]
BUG/MINOR: mux-h2: always reset rcvd_s when switching to a new frame
In Patrick's trace it was visible that after a stream had been missed,
the next stream would receive a WINDOW_UPDATE with the first one's
credit added to its own. This makes sense because in case of error
h2c->rcvd_s is not reset. Given that this counter is per frame, better
reset it when starting to parse a new frame, it's easier and safer.
Willy Tarreau [Tue, 6 Aug 2019 13:39:32 +0000 (15:39 +0200)]
BUG/MINOR: mux-h2: always send stream window update before connection's
In h2_process_mux() if we have some room and an attempt to send a window
update for the connection was pending, it's done first. But it's not done
for the stream, which will have for effect of postponing this attempt
till next pass into h2_process_demux(), at the risk of seeing the send
buffer full again. Let's always try to send both pending frames as soon
as possible.
Willy Tarreau [Tue, 6 Aug 2019 13:21:45 +0000 (15:21 +0200)]
BUG/MEDIUM: mux-h2: do not recheck a frame type after a state transition
Patrick Hemmer reported a rare case where the H2 mux emits spurious
RST_STREAM(STREAM_CLOSED) that are triggered by the send path and do
not even appear to be associated with a previous incoming frame, while
the send path never emits such a thing.
The problem is particularly complex (hence its rarity). What happens is
that when data are uploaded (POST) we must refill the sending stream's
window by sending a WINDOW_UPDATE message (and we must refill the
connection's too). But in a highly bidirectional traffic, it is possible
that the mux's buffer will be full and that there is no more room to build
this WINDOW_UPDATE frame. In this case the demux parser switches to the
H2_CS_FRAME_A state, noting that an "acknowledgement" is needed for the
current frame, and it doesn't change the current stream nor frame type.
But the stream's state was possibly updated (typically OPEN->HREM when
a DATA frame carried the ES flag).
Later the data can leave the buffer, wake up h2_io_cb(), which calls
h2_send() to send pending data, itself calling h2_process_mux() which
detects that there are unacked data in the connection's window so it
emits a WINDOW_UPDATE for the connection and resets the counter. so it
emits a WINDOW_UPDATE for the connection and resets the counter. Then
h2_process() calls h2_process_demux() which continues the processing
based on the current frame type and the current state H2_CS_FRAME_A.
Unfortunately the protocol compliance checks matching the frame type
against the current state are still present. These tests are designed
for new frames only, not for those in progress, but they are not
limited by frame types. Thus the current DATA frame is checked again
against the current stream state that is now HREM, and fails the test
with a STREAM_CLOSED error.
The quick and backportable solution consists in adding the test for
this ACK and bypass all these checks that were already validated prior
to the state transition. A better long-term solution would consist in
having a new state between H and P indicating the frame is new and
needs to be checked ("N" for new?) and apply the protocol tests only
in this state. In addition everywhere we decide to send a window
update, we should send a stream WU first if there are unacked data
for the current stream. Last, rcvd_s should always be reset when
transitioning to FRAME_H (and a BUGON for this in dev would help).
The bug will be way harder to trigger on 2.0 than on 1.8/1.9 because
we have a ring buffer for the connection so the buffer full situations
are extremely rare.
This fix must be backpored to all versions having H2 (as far as 1.8).
Special thanks to Patrick for providing exploitable traces.
Willy Tarreau [Tue, 6 Aug 2019 08:30:58 +0000 (10:30 +0200)]
BUG/MINOR: mux-h2: do not send REFUSED_STREAM on aborted uploads
If the server decides to close early, we don't want to send a
REFUSED_STREAM error but a CANCEL, so that the client doesn't want
to retry. The test in h2_do_shutw() was wrong for this as it would
handle the HLOC case like the case where nothing had been sent for
this stream, which is wrong. Now h2_do_shutw() does nothing in this
case and lets h2_do_shutr() decide.
Note that this partially undoes f983d00a1 ("BUG/MINOR: mux-h2: make
the do_shut{r,w} functions more robust against retries").
This must be backported to 2.0. The patch above was not backported to
1.9 for being too risky there, but if it eventually gets to it, this
one will be needed as well.
Willy Tarreau [Tue, 6 Aug 2019 08:11:02 +0000 (10:11 +0200)]
BUG/MINOR: mux-h2: use CANCEL, not STREAM_CLOSED in h2c_frt_handle_data()
There is a test on the existence of the conn_stream when receiving data,
to be sure to have somewhere to deliver it. Right now it responds with
STREAM_CLOSED, which is not correct since from an H2 point of view the
stream is not closed and a peer could be upset to see this. After some
analysis, it is important to keep this test to be sure not to fill the
rxbuf then stall the connection. Another option could be to modiffy
h2_frt_transfer_data() to silently discard any contents but the CANCEL
error code is designed exactly for this and to save the peer from
continuing to stream data that will be discarded, so better switch to
using this.
Willy Tarreau [Tue, 6 Aug 2019 08:01:40 +0000 (10:01 +0200)]
BUG/MINOR: mux-h2: don't refrain from sending an RST_STREAM after another one
The test in h2s_send_rst_stream() is excessive, it refrains from sending an
RST_STREAM if *the last frame* was an RST_STREAM, regardless of the stream
ID. In a context where both clients and servers abort a lot, it could happen
that one RST_STREAM is dropped from responses from time to time, causing
delays to the client.