BUG/MAJOR: mux-h2: Handle HEADERS frames received after a RST_STREAM frame
As stated in the RFC7540#5.1, an endpoint that receives any frame other than
PRIORITY after receiving a RST_STREAM MUST treat that as a stream error of type
STREAM_CLOSED. However, frames carrying compression state must still be
processed before being dropped to keep the HPACK decoder synchronized. This had
to be the purpose of the commit 8d9ac3ed8b ("BUG/MEDIUM: mux-h2: do not abort
HEADERS frame before decoding them"). But, the test on the frame type was
inverted.
This bug is major because desynchronizing the HPACK decoder leads to mixup
indexed headers in messages. From the time an HEADERS frame is received and
ignored for a closed stream, wrong headers may be sent to the following streams.
This patch may fix several bugs reported on github (#116, #290, #292). It must
be backported to 2.0 and 1.9.
This multiplexer is only available on the backend side. It may handle
multiplexed connections if the FCGI application supports it. A FCGI application
must be configured on the backend to be used. If not redefined during the
request processing by the FCGI filter, this mux handles all mandatory
parameters.
There is a limitation on the way the requests are processed. The parameters must
be encoded into a uniq PARAMS record. It means, once encoded, all HTTP headers
and FCGI parameters must small enough to be store in a buffer. Otherwise, an
internal processing error is returned.
The FCGI application handles all the configuration parameters used to format
requests sent to an application. The configuration of an application is grouped
in a dedicated section (fcgi-app <name>) and referenced in a backend to be used
(use-fcgi-app <name>). To be valid, a FCGI application must at least define a
document root. But it is also possible to set the default index, a regex to
split the script name and the path-info from the request URI, parameters to set
or unset... In addition, this patch also adds a FCGI filter, responsible for
all processing on a stream.
MINOR: muxes/htx: Ignore pseudo header during message formatting
When an HTX message is formatted to an H1 or H2 message, pseudo-headers (with
header names starting by a colon (':')) are now ignored. In fact, for now, only
H2 messages have such headers, and the H2 mux already skips them when it creates
the HTX message. But in the futur, it may be useful to keep these headers in the
HTX message to help the message analysis or to do some processing during the
HTTP formatting. It would also be a good idea to have scopes for pseudo-headers
(:h1-, :h2-, :fcgi-...) to limit their usage to a specific mux.
MINOR: h1-htx: Use the same function to copy message payload in all cases
This function will try to do a zero-copy transfer. Otherwise, it adds a data
block. The same is used for messages with a content-length, chunked messages and
messages with unknown body length.
MEDIUM: mux-h1/h1-htx: move HTX convertion of H1 messages in dedicated file
To avoid code duplication in the futur mux FCGI, functions parsing H1 messages
and converting them into HTX have been moved in the file h1_htx.c. Some
specific parts remain in the mux H1. But most of the parsing is now generic.
MINOR: log: Provide a function to emit a log for an application
Application is a generic term here. It is a modules which handle its own log
server list, with no dependency on a proxy. Such applications can now call the
function app_log() to log messages, passing a log server list and a tag as
parameters. Internally, the function __send_log() has been adapted accordingly.
MINOR: http_fetch: Add sample fetches to get auth method/user/pass
Now, following sample fetches may be used to get information about
authentication:
* http_auth_type : returns the auth method as supplied in Authorization header
* http_auth_user : returns the auth user as supplied in Authorization header
* http_auth_pass : returns the auth pass as supplied in Authorization header
MINOR: config: Support per-proxy and per-server post-check functions callbacks
Most of times, when a keyword is added in proxy section or on the server line,
we need to have a post-parser callback to check the config validity for the
proxy or the server which uses this keyword.
It is possible to register a global post-parser callback. But all these
callbacks need to loop on the proxies and servers to do their job. It is neither
handy nor efficient. Instead, it is now possible to register per-proxy and
per-server post-check callbacks.
MINOR: config: Support per-proxy and per-server deinit functions callbacks
Most of times, when any allocation is done during configuration parsing because
of a new keyword in proxy section or on the server line, we must add a call in
the deinit() function to release allocated ressources. It is now possible to
register a post-deinit callback because, at this stage, the proxies and the
servers are already releases.
Now, it is possible to register deinit callbacks per-proxy or per-server. These
callbacks will be called for each proxy and server before releasing them.
MINOR: http-ana: Handle HTX errors first during message analysis
When an error occurred in a mux, most of time, an error is also reported on the
conn-stream, leading to an error (read and/or write) on the channel. When a
parsing or a processing error is reported for the HTX message, it is better to
handle it first.
MINOR: htx: Add a flag on HTX message to report processing errors
This new flag may be used to report unexpected error because of not well
formatted HTX messages (not related to a parsing error) or our incapactity to
handle the processing because we reach a limit (ressource exhaustion, too big
headers...). It should result to an error 500 returned to the client when
applicable.
BUG/MEDIUM: stick-table: Properly handle "show table" with a data type argument
Since the commit 1b8e68e8 ("MEDIUM: stick-table: Stop handling stick-tables as
proxies."), the target field into the table context of the CLI applet was not
anymore a pointer to a proxy. It was replaced by a pointer to a stktable. But,
some parts of the code was not updated accordingly. the function
table_prepare_data_request() still tries to cast it to a pointer to a proxy. The
result is totally undefined. With a bit of luck, when the "show table" command
is used with a data type, we failed to find a table and the error "Data type not
stored in this table" is returned. But crashes may also be experienced.
This patch fixes the issue #262. It must be backported to 2.0.
Recently Lua code which uses Proxy class (get_stats method) stopped
working ("table index is nil from [C] method 'get_stats'")
It probably affects other codepaths too.
BUG/MINOR: backend: Fix a possible null pointer dereference
In the function connect_server(), when we are not able to reuse a connection and
too many FDs are opened, the variable srv must be defined to kill an idle
connection.
This patch fixes the issue #257. It must be backported to 2.0
BUG/MINOR: acl: Fix memory leaks when an ACL expression is parsed
This only happens during the configuration parsing. First leak is the string
representing the last converter parsed, if any. The second one is on the error
path, when the allocation of the ACL expression failed. In this case, the sample
was not released.
This patch fixes the issue #256. It must be backported to all stable versions.
CLEANUP: mux-h2: Remove unused flag H2_SF_DATA_CHNK
Since the legacy HTTP mode has been removed, this flag is not necessary
anymore. Removing this flag, a test on the HTX message at the end of the
function h2c_decode_headers() has also been removed fixing the github
issue #244.
BUG/MINOR: filters: Properly set the HTTP status code on analysis error
When a filter returns an error during the HTTP analysis, an error must be
returned if the status code is not already set. On the request path, an error
400 is returned. On the response path, an error 502 is returned. The status is
considered as unset if its value is not strictly positive.
If needed, this patch may be backported to all versions having filters (as far
as 1.7). Because nobody have never report any bug, the backport to 2.0 is
probably enough.
It is now possible to export stats using the JSON format from the HTTP stats
page. Like for the CSV export, to export stats in JSON, you must add the option
";json" on the stats URL. It is also possible to dump the JSON schema with the
option ";json-schema". Corresponding Links have been added on the HTML page.
BUG/MINOR: ssl: always check for ssl connection before getting its XPRT context
In several SSL functions, the XPRT context is retrieved before any check on the
connection. In the function ssl_sock_is_ssl(), a test suggests the connection
may be null. So, it is safer to test the ssl connection before retrieving its
XPRT context. It removes any ambiguities and prevents possible null pointer
dereferences.
This patch fixes the issue #265. It must be backported to 2.0.
MINOR: stats: report the number of idle connections for each server
This adds two extra fields to the stats, one for the current number of idle
connections and one for the configured limit. A tooltip link now appears on
the HTML page to show these values in front of the active connection values.
This should be backported to 2.0 and 1.9 as it's the only way to monitor
the idle connections behaviour.
BUG/MEDIUM: connection: don't keep more idle connections than ever needed
When using "http-reuse safe", which is the default, a new incoming connection
does not automatically reuse an existing connection for the first request, as
we don't want to risk to lose the contents if we know the client will not be
able to replay the request. A side effect to this is that when dealing with
mostly http-close traffic, the reuse rate is extremely low and we keep
accumulating server-side connections that may even never be reused. At some
point we're limited to a ratio of file descriptors, but when the system is
configured with very high FD limits, we can still reach the limit of outgoing
source ports and make the system significantly slow down trying to find an
available port for outgoing connections. A simple test on my laptop with
ulimit 100000 and with the following config results in the load immediately
dropping after a few seconds :
listen l1
bind :4445
mode http
server s1 127.0.0.1:8000
As can be seen, the load falls from 38k cps to 400 cps during the first 200ms
(in fact when the source port table is full and connect() takes ages to find
a spare port for a new connection):
At this point, lots of connections are indeed in use, for only 10 connections
on the frontend side:
$ ss -ant state established | wc -l
39022
This patch makes sure we never keep more idle connections than we've ever
had outstanding requests on a server. This way the total number of idle
connections will never exceed the sum of maximum connections. Thus highly
loaded servers will be able to get many connections and slightly loaded
servers will keep less. Ideally we should apply similar limits per process
and the per backend, but in practice this already addresses the issues
pretty well:
MEDIUM: fd: do not use the FD_POLL_* flags in the pollers anymore
As mentioned in previous commit, these flags do not map well to
modern poller capabilities. Let's use the FD_EV_*_{R,W} flags instead.
This first patch only performs a 1-to-1 mapping making sure that the
previously reported flags are still reported identically while using
the closest possible semantics in the pollers.
It's worth noting that kqueue will now support improvements such as
returning distinctions between shut and errors on each direction,
though this is not exploited for now.
MINOR: fd: add two flags ERR and SHUT to describe FD states
There's currently a big ambiguity on our use of POLLHUP because we
currently map POLLHUP and POLLRDHUP to FD_POLL_HUP. The first one
indicates a close in *both* directions while the second one indicates
a unidirectional close. Since we don't know from the resulting flag
we always have to read when reported. Furthermore kqueue only reports
unidirectional responses which are mapped to FD_POLL_HUP as well, and
their write closes are mapped to a general error.
We could add a new FD_POLL_RDHUP flag to improve the mapping, or
switch only to the POLL* flags, but that further complicates the
portability for operating systems like FreeBSD which do not have
POLLRDHUP but have its semantics.
Let's instead directly use the per-direction flag values we already
have, and it will be a first step in the direction of finer states.
Thus we introduce an ERR and a SHUT status for each direction, that
the pollers will be able to compute and pass to fd_update_events().
It's worth noting that FD_EV_STATUS already sees the two new flags,
but they are harmless since used only by fd_{recv,send}_state() which
are never called. Thus in its current state this patch must be totally
transparent.
MEDIUM: connection: enable reading only once the connection is confirmed
In order to address the absurd polling sequence described in issue #253,
let's make sure we disable receiving on a connection until it's established.
Previously with bottom-top I/Os, we were almost certain that a connection
was ready when the first I/O was confirmed. Now we can enter various
functions, including process_stream(), which will attempt to read
something, will fail, and will then subscribe. But we don't want them
to try to receive if we know the connection didn't complete. The first
prerequisite for this is to mark the connection as not ready for receiving
until it's validated. But we don't want to mark it as not ready for sending
because we know that attempting I/Os later is extremely likely to work
without polling.
Once the connection is confirmed we re-enable recv readiness. In order
for this event to be taken into account, the call to tcp_connect_probe()
was moved earlier, between the attempt to send() and the attempt to recv().
This way if tcp_connect_probe() enables reading, we have a chance to
immediately fall back to this and read the possibly pending data.
Now the trace looks like the following. It's far from being perfect
but we've already saved one recvfrom() and one epollctl():
MINOR: fd: add two new calls fd_cond_{recv,send}()
These two functions are used to enable recv/send but only if the FD is
not marked as active yet. The purpose is to conditionally mark them as
tentatively usable without interfering with the polling if polling was
already enabled, when it's supposed to be likely true.
MEDIUM: fd: mark the FD as ready when it's inserted
Given that all our I/Os are now directed from top to bottom and not the
opposite way around, and the FD cache was removed, it doesn't make sense
anymore to create FDs that are marked not ready since this would prevent
the first accesses unless the caller explicitly does an fd_may_recv()
which is not expected to be its job (which conn_ctrl_init() has to do
by the way). Let's move this into fd_insert() instead, and have a single
atomic operation for both directions via fd_may_both().
BUG/MAJOR: ssl: ssl_sock was not fully initialized.
'ssl_sock' wasn't fully initialized so a new session can inherit some
flags from an old one.
This causes some fetches, related to client's certificate presence or
its verify status and errors, returning erroneous values.
This issue could generate other unexpected behaviors because a new
session could also inherit other flags such as SSL_SOCK_ST_FL_16K_WBFSIZE,
SSL_SOCK_SEND_UNLIMITED, or SSL_SOCK_RECV_HEARTBEAT from an old session.
This must be backported to 2.0 but it's useless for previous.
BUG/MINOR: lb/leastconn: ignore the server weights for empty servers
As discussed in issue #178, the change brought around 1.9-dev11 by commit 1eb6c55808 ("MINOR: lb: make the leastconn algorithm more accurate")
causes some harm in the situation it tried to improve. By always applying
the server's weight even for no connection, we end up always picking the
same servers for the first connections, so under a low load, if servers
only have either 0 or 1 connections, in practice the same servers will
always be picked.
This patch partially restores the original behaviour but still keeping
the spirit of the aforementioned patch. Now what is done is that servers
with no connections will always be picked first, regardless of their
weight, so they will effectively follow round-robin. Only servers with
one connection or more will see an accurate weight applied.
This patch was developed and tested by @malsumis and @jaroslawr who
reported the initial issue. It should be backported to 2.0 and 1.9.
BUG/MINOR: mux-h1: Fix a UAF in cfg_h1_headers_case_adjust_postparser()
When an error occurs in the post-parser callback which checks configuration
validity of the option outgoing-headers-case-adjust-file, the error message is
freed too early, before being used.
No backport needed. It fixes the github issue #258.
BUG/MINOR: checks: do not uselessly poll for reads before the connection is up
It's pointless to start to perform a recv() call on a connection that is
not yet established. The only purpose used to be to subscribe but that
causes many extra syscalls when we know we can do it later.
This patch only attempts a read if the connection is established or if
there is no write planed, since we want to be certain to be called. And
in wake_srv_chk() we continue to attempt to read if the reader was not
subscribed, so as to perform the first read attempt. In case a first
result is provided, __event_srv_chk_r() will not do anything anyway so
this is totally harmless in this case.
This fix requires that commit "BUG/MINOR: checks: make __event_chk_srv_r()
report success before closing" is applied before, otherwise it will break
some checks (notably SSL) by doing them again after the connection is shut
down. This completes the fixes on the checks described in issue #253 by
roughly cutting the number of syscalls in half. It must be backported to
2.0.
BUG/MINOR: checks: make __event_chk_srv_r() report success before closing
On a plain TCP check, this function will do nothing except shutting the
connection down and will not even update the status. This prevents it
from being called again, which is the reason why we attempt to do it
once too early. Let's first fix this function to make it report success
on plain TCP checks before closing, as it does for all other ones.
This must be backported to 2.0. It should be safe to backport to older
versions but it doesn't seem it would fix anything there.
BUG/MINOR: checks: start sending the request right after connect()
Since the change of I/O direction, we must not wait for an empty connect
callback before sending the request, we must attempt to send it as soon
as possible so that we don't uselessly poll. This is what this patch
does. This reduces the total check duration by a complete poll loop
compared to what is described in issue #253.
BUG/MINOR: checks: stop polling for write when we have nothing left to send
Since the change of I/O direction, we perform the connect() call and
the send() call together from the top. But the send call must at least
disable polling for writes once it does not have anything left to send.
This bug is partially responsible for the waste of resources described
in issue #253.
CONTRIB: debug: add new program "poll" to test poll() events
This simple program prepares a TCP connection between two ends and
allows to perform various operations on them such as send, recv, poll,
shutdown, close, reset, etc. It takes care of remaining particularly
silent to help inspection via strace, though it can also be verbose
and report status, errno, and poll events. It delays acceptation of
the incoming server-side connection so that it's even possible to
test the poll status on a listener with a pending connection, or
to close the connection without accepting it and inspect the effect
on the client.
Actions are executed in the command line order as they are parsed,
they may be grouped using commas when they are performed on the same
socket.
Example showing a successful recv() of pending data before a pending error:
$ ./poll -v -l pol,acc,pol -c snd,shw -s pol,rcv,pol,rcv,pol,snd,lin,clo -c pol,rcv,pol,rcv,pol
#### BEGIN ####
cmd #1 stp #1: do_pol(3): ret=1 ev=0x1 (IN)
cmd #1 stp #2: do_acc(3): ret=5
cmd #1 stp #3: do_pol(3): ret=0 ev=0
cmd #2 stp #1: do_snd(4): ret=3
cmd #2 stp #2: do_shw(4): ret=0
cmd #3 stp #1: do_pol(5): ret=1 ev=0x2005 (IN OUT RDHUP)
cmd #3 stp #2: do_rcv(5): ret=3
cmd #3 stp #3: do_pol(5): ret=1 ev=0x2005 (IN OUT RDHUP)
cmd #3 stp #4: do_rcv(5): ret=0
cmd #3 stp #5: do_pol(5): ret=1 ev=0x2005 (IN OUT RDHUP)
cmd #3 stp #6: do_snd(5): ret=3
cmd #3 stp #7: do_lin(5): ret=0
cmd #3 stp #8: do_clo(5): ret=0
cmd #4 stp #1: do_pol(4): ret=1 ev=0x201d (IN OUT ERR HUP RDHUP)
cmd #4 stp #2: do_rcv(4): ret=3
cmd #4 stp #3: do_pol(4): ret=1 ev=0x201d (IN OUT ERR HUP RDHUP)
cmd #4 stp #4: do_rcv(4): ret=-1 (Connection reset by peer)
cmd #4 stp #5: do_pol(4): ret=1 ev=0x2015 (IN OUT HUP RDHUP)
#### END ####
MEDIUM: fd: simplify the fd_*_{recv,send} functions using BTS/BTR
Now that we don't have to update FD_EV_POLLED_* at the same time as
FD_EV_ACTIVE_*, we don't need to use a CAS anymore, a bit-test-and-set
operation is enough. Doing so reduces the code size by a bit more than
1 kB. One function was special, fd_done_recv(), whose comments and doc
were inaccurate for the part related to the lack of polling.
Since commit 7ac0e35f2 in 1.9-dev1 ("MAJOR: fd: compute the new fd polling
state out of the fd lock") we've started to update the FD POLLED bit a
bit more aggressively. Lately with the removal of the FD cache, this bit
is always equal to the ACTIVE bit. There's no point continuing to watch
it and update it anymore, all it does is create confusion and complicate
the code. One interesting side effect is that it now becomes visible that
all fd_*_{send,recv}() operations systematically call updt_fd_polling(),
except fd_cant_recv()/fd_cant_send() which never saw it change.
BUG/MEDIUM: cache: Don't cache objects if the size of headers is too big
HTTP responses with headers than impinge upon the reserve must not be
cached. Otherwise, there is no warranty to have enough space to add the header
"Age" when such cached responses are delivered.
This patch must be backported to 2.0 and 1.9. For these versions, the same must
be done for the legacy HTTP mode.
BUG/MEDIUM: cache: Properly copy headers splitted on several shctx blocks
In the cache, huge HTTP headers will use several shctx blocks. When a response
is returned from the cache, these headers must be properly copied in the
corresponding HTX message by updating the pointer where to copied a header
part.
BUG/MINOR: mux-h1: Don't stop anymore input processing when the max is reached
The loop is now stopped only when nothing else is consumed from the input buffer
or if a parsing error is encountered. This will let a chance to detect cases
when we fail to add the EOM.
For instance, when the max is reached after the headers parsing and all the
message is received. In this case, we may have the flag H1S_F_REOS set without
the flag H1S_F_APPEND_EOM and no pending input data, leading to an error because
we think it is an abort.
This patch must be backported to 2.0. This bug does not affect 1.9.
BUG/MINOR: h1: Properly reset h1m when parsing is restarted
Otherwise some processing may be performed twice. For instance, if the header
"Content-Length" is parsed on the first pass, when the parsing is restarted, we
skip it because we think another header with the same value was already seen. In
fact, it is currently the only existing bug that can be encountered. But it is
safer to reset all the h1m on restart to avoid any future bugs.
BUG/MINOR: http-ana: Reset response flags when 1xx messages are handled
Otherwise, the following final response could inherit of some of these
flags. For instance, because informational responses have no body, the flag
HTTP_MSGF_BODYLESS is set for 1xx messages. If it is not reset, this flag will
be kept for the final response.
One of visible effect of this bug concerns the HTTP compression. When the final
response is preceded by an 1xx message, the compression is not performed. This
was reported in github issue #229.
This patch must be backported to 2.0 and 1.9. Note that the file http_ana.c does
not exist for these branches, the patch must be applied on proto_htx.c instead.
BUILD: connection: silence gcc warning with extra parentheses
Commit 8a4ffa0a ("MINOR: send-proxy-v2: sends authority TLV according
to TLV received") is missing parentheses around a variable assignment
used as condition in an if statement, and gcc isn't happy about it.
This bug came with 015e4d7 commit: "MINOR: stick-tables: Add peers process
binding computing" where the "stick" rules cases were missing when computing
the peer local listener process binding. At parsing time we store in the
stick-table struct ->proxies_list the proxies which refer to this stick-table.
The process binding is computed after having parsed the entire configuration file
with this simple loop in cfgparse.c:
/* compute the required process bindings for the peers from <stktables_list>
* for all the stick-tables, the ones coming with "peers" sections included.
*/
for (t = stktables_list; t; t = t->next) {
struct proxy *p;
for (p = t->proxies_list; p; p = p->next_stkt_ref) {
if (t->peers.p && t->peers.p->peers_fe) {
t->peers.p->peers_fe->bind_proc |= p->bind_proc;
}
}
}
Note that if this process binding is not correctly initialized, the child forked
by the master-worker stops the peer local listener. Should be also the case
when daemonizing haproxy.
Emmanuel Hocdet [Thu, 29 Aug 2019 09:54:51 +0000 (11:54 +0200)]
MINOR: send-proxy-v2: sends authority TLV according to TLV received
Since patch "7185b789", the authority TLV in a PROXYv2 header from a
client connection is stored. Authority TLV sends in PROXYv2 should be
taken into account to allow chaining PROXYv2 without droping it.
Willy Tarreau [Fri, 30 Aug 2019 13:24:59 +0000 (15:24 +0200)]
MEDIUM: log: add support for logging to a ring buffer
Now by prefixing a log server with "ring@<name>" it's possible to send
the logs to a ring buffer. One nice thing is that it allows multiple
sessions to consult the logs in real time in parallel over the CLI, and
without requiring file system access. At the moment, ring0 is created as
a default sink for tracing purposes and is available. No option is
provided to create new rings though this is trivial to add to the global
section.
Willy Tarreau [Fri, 30 Aug 2019 12:18:44 +0000 (14:18 +0200)]
MINOR: log: add a target type instead of hacking the address family
Instead of detecting an AF_UNSPEC address family for a log server and
to deduce a file descriptor, let's create a target type field and
explicitly mention that the socket is of type FD.
Willy Tarreau [Fri, 30 Aug 2019 12:05:35 +0000 (14:05 +0200)]
MEDIUM: log: use the new generic fd_write_frag_line() function
When logging to a file descriptor, we'd rather use the unified
fd_write_frag_line() which uses the FD's lock than perform the
writev() ourselves and use a per-server lock, because if several
loggers point to the same output (e.g. stdout) they are still
not locked and their logs may interleave. The function above
instead relies on the fd's lock so this is safer and will even
protect against concurrent accesses from other areas (e.g traces).
The function also deals with the FD's non-blocking mode so we do
not have to keep specific code for this anymore in the logs.
Willy Tarreau [Fri, 30 Aug 2019 12:41:47 +0000 (14:41 +0200)]
MINOR: fd/log/sink: make the non-blocking initialization depend on the initialized bit
Logs and sinks were resorting to dirty hacks to initialize an FD to
non-blocking mode. Now we have a bit for this in the fd tab so we can
do it on the fly on first use of the file descriptor. Previously it was
set per log server by writing value 1 to the port, or during a sink
initialization regardless of the usage of the fd.
Willy Tarreau [Fri, 30 Aug 2019 12:36:10 +0000 (14:36 +0200)]
MINOR: fd: add a new "initialized" bit in the fdtab struct
The purpose is to be able to remember that initialization was already
done for a file descriptor. This will allow to get rid of some dirty
hacks performed in the logs or fd sinks where the init state of the
fd has to be guessed.
Willy Tarreau [Fri, 30 Aug 2019 12:33:11 +0000 (14:33 +0200)]
CLEANUP: fd: remove leftovers of the fdcache
The "cache" entry was still present in the fdtab struct and it was
reported in "show sess". Removing it broke the cache-line alignment
on 64-bit machines which is important for threads, so it was fixed
by adding an attribute(aligned()) when threads are in use. Doing it
only in this case allows 32-bit thread-less platforms to see the
struct fit into 32 bytes.
Willy Tarreau [Fri, 30 Aug 2019 13:02:22 +0000 (15:02 +0200)]
BUG/MEDIUM: mux-h2/trace: fix missing braces added with traces
Ilya reported in issue #242 that h2c_handle_priority() was having
unreachable code... Obviously, I missed the braces around the "if",
leaving an unconditional return.
Willy Tarreau [Fri, 30 Aug 2019 09:17:01 +0000 (11:17 +0200)]
MEDIUM: ring: implement a wait mode for watchers
Now it is possible for a reader to subscribe and wait for new events
sent to a ring buffer. When new events are written to a ring buffer,
the applets that are subscribed are woken up to display new events.
For now we only support this with the CLI applet called by "show events"
since the I/O handler is indeed a CLI I/O handler. But it's not
complicated to add other mechanisms to consume events and forward them
to external log servers for example. The wait mode is enabled by adding
"-w" after "show events <sink>". An extra "-n" was added to directly
seek to new events only.
Willy Tarreau [Fri, 30 Aug 2019 06:05:03 +0000 (08:05 +0200)]
MINOR: cli: extend the CLI context with a list and two offsets
Some CLI parsers are currently abusing the CLI context types such as
pointers to stuff longs into them by lack of room. But the context is
80 bytes while cli is only 48, thus there's some room left. This patch
adds a list element and two size_t usable as various offsets. The list
element is initialized.
Willy Tarreau [Fri, 30 Aug 2019 08:16:14 +0000 (10:16 +0200)]
BUG/MINOR: ring: fix the way watchers are counted
There are two problems with the way we count watchers on a ring:
- the test for >=255 was accidently kept to 1 used during QA
- if the producer deletes all data up to the reader's position
and the reader is called, cannot write, and is called again,
it will have a zero offset, causing it to initialize itself
multiple times and each time adding a new refcount.
Let's change this and instead use ~0 as the initial offset as it's
not possible to have it as a valid one. We now store it into the
CLI context's size_t o0 instead of casting it to a void*.
Willy Tarreau [Fri, 30 Aug 2019 05:28:24 +0000 (07:28 +0200)]
MINOR: mux-h2/trace: report h2s->id before h2c->dsi for the stream ID
h2c->dsi is only for demuxing, and needed while decoding a new request.
But if we already have a valid stream ID (e.g. response or outgoing
request), we should use it instead. This avoids seeing [0] in front of
the responses at user level.
Willy Tarreau [Fri, 30 Aug 2019 05:11:30 +0000 (07:11 +0200)]
MINOR: mux-h2/trace: only decode the start-line at verbosity other than "minimal"
This is as documented in "trace h2 verbosity", level "minimal" only
features flags and doesn't perform any decoding at all, "simple" does,
just like "clean" which is the default for end uesrs.
Willy Tarreau [Fri, 30 Aug 2019 05:21:18 +0000 (07:21 +0200)]
MINOR: mux-h2/trace: add a new verbosity level "clean"
The "clean" output will be suitable for user and proto-level output
where the internal stuff (state, pointers, etc) is not desired but
just the basic protocol elements.
Willy Tarreau [Fri, 30 Aug 2019 05:07:08 +0000 (07:07 +0200)]
MINOR: mux-h2: add functions to convert an h2c/h2s state to a string
We need this all the time in traces, let's have it now. For the sake
of compact outputs, the strings are all 3-chars long. The "show fd"
output was improved to make use of this.
Willy Tarreau [Mon, 12 Aug 2019 16:42:03 +0000 (18:42 +0200)]
MEDIUM: mux-h2/trace: add lots of traces all over the code
All functions of the h2 data path were updated to receive one or multiple
TRACE() calls, at least one pair of TRACE_ENTER()/TRACE_LEAVE(), and those
manipulating protocol elements have been improved to report frame types,
special state transitions or detected errors. Even with careful tests, no
performance impact was measured when traces are disabled.
They are not completely exploited yet, the callback function tries to
dump a lot about them, but still doesn't provide buffer dumps, nor does
it indicate the stream or connection error codes.
The first argument is always set to the connection when known. The second
argument is set to the h2s when known, sometimes a 3rd argument is set to
a buffer, generally the rxbuf or htx, and occasionally the 4th argument
points to an integer (number of bytes read/sent, error code).
Retrieving a 10kB object produces roughly 240 lines when at developer
level, 35 lines at data level, 27 at state level, and 10 at proto level
and 2 at user level.
For now the headers are not dumped, but the start line are emitted in
each direction at user level.
The patch is marked medium because it touches lots of places, though
it takes care not to change the execution path.
Willy Tarreau [Mon, 19 Aug 2019 15:56:27 +0000 (17:56 +0200)]
MINOR: mux-h2/trace: add the default decoding callback
The new function h2_trace() is called when relevant by the trace subsystem
in order to provide extra information about the trace being produced. It
can for example display the connection pointer, the stream pointer, etc.
It is declared in the trace source as the default callback as we expect
it to be versatile enough to enrich most traces.
In addition, for requests and responses, if we have a buffer and we can
decode it as an HTX buffer, we can extract user-friendly information from
the start line.
Willy Tarreau [Thu, 8 Aug 2019 16:23:12 +0000 (18:23 +0200)]
MINOR: mux-h2/trace: register a new trace source with its events
For now the traces are not used. Supported events are categorized by
where the activity comes from (h2c, h2s, stream, etc), a direction
(send/recv/wake), and a list of possibilities for each of them (frame
types, errors, shut, ...). This results in ~50 different events that
try to cover a lot of possibilities when it's needed to filter on
something specific. Special events like protocol error are handled.
A few aggregate events like "rx_frame" or "tx_frame" are planed to
cover all frame types at once by being placed all the time with any
of the other ones.
We also state that the first argument is always the connection. This way
the trace subsystem will be able to safely retrieve some useful info, and
we'll still be able to get the h2c from there (conn->ctx) in a pretty
print function. The second argument will always be an h2s, and in order
to propose it for tracking, we add its description. We also define 4
verbosity levels, which seems more than enough.
Willy Tarreau [Thu, 29 Aug 2019 06:24:16 +0000 (08:24 +0200)]
MINOR: trace: change the detail_level to per-source verbosity
The detail level initially based on syslog levels is not used, while
something related is missing, trace verbosity, to indicate whether or
not we want to call the decoding callback and what level of decoding
we want (raw captures etc). Let's change the field to "verbosity" for
this. A verbosity of zero means that the decoding callback is not
called, and all other levels are handled by this callback and are
source-specific. The source is now prompted to list the levels that
are proposed to the user. When the source doesn't define anything,
"quiet" and "default" are available.
Willy Tarreau [Thu, 29 Aug 2019 07:08:36 +0000 (09:08 +0200)]
MINOR: trace: also report the trace level in the output
It's very convenient to know the trace level in the output, at least to
grep/grep -v on it. The main usage is to filter/filter out the developer
traces at level DEVEL. For this we now add the numeric level (0 to 4) just
after the source name in the brackets. The output now looks like this :
Willy Tarreau [Thu, 29 Aug 2019 06:40:59 +0000 (08:40 +0200)]
MINOR: trace: prepend the function name for developer level traces
Working on adding traces to mux-h2 revealed that the function names are
manually copied a lot in developer traces. The reason is that they are
not preprocessor macros and as such cannot be concatenated. Let's
slightly adjust the trace() function call to take a function name just
after the file:line argument. This argument is only added for the
TRACE_DEVEL and 3 new TRACE_ENTER, TRACE_LEAVE, and TRACE_POINT macros
and left NULL for others. This way the function name is only reported
for traces aimed at the developers. The pretty-print callback was also
extended to benefit from this. This will also significantly shrink the
data segment as the "entering" and "leaving" strings will now be merged.
One technical point worth mentioning is that the function name is *not*
passed as an ist to the inline function because it's not considered as
a builtin constant by the compiler, and would lead to strlen() being
run on it from all call places before calling the inline function. Thus
instead we pass the const char * (that the compiler knows where to find)
and it's the __trace() function that converts it to an ist for internal
consumption and for the pretty-print callback. Doing this avoids losing
5-10% peak performance.
Willy Tarreau [Thu, 29 Aug 2019 06:01:48 +0000 (08:01 +0200)]
MINOR: trace: change the "payload" level to "data" and move it
The "payload" trace level was ambigous because its initial purpose was
to be able to dump received data. But it doesn't make sense to force to
report data transfers just to be able to report state changes. For
example, all snd_buf()/rcv_buf() operations coming from the application
layer should be tagged at this level. So here we move this payload level
above the state transitions and rename it to avoid the ambiguity making
one think it's only about request/response payload. Now it clearly is
about any data transfer and is thus just below the developer level. The
help messages on the CLI and the doc were slightly reworded to help
remove this ambiguity.
Willy Tarreau [Thu, 29 Aug 2019 07:32:21 +0000 (09:32 +0200)]
MINOR: tools: add a generic struct "name_desc" for name-description pairs
In prompts on the CLI we now commonly need to propose a keyword name
and a description and it doesn't make sense to define a new struct for
each such pairs. Let's simply have a generic "name_desc" for this.