OPTIM: hpack-huff: reduce the cache footprint of the huffman decoder
Some tables are currently used to decode bit blocks and lengths. We do
see such lookups in perf top. We have 4 512-byte tables and one 64-byte
one. Looking closer, the second half of the table (length) has so few
variations that most of the time it will be computed in a single "if",
and never more than 3. This alone allows to cut the tables in half. In
addition, one table (bits 15-11) is only 32-element long, while another
one (bits 11-4) starts at 0x60, so we can merge the two as they do not
overlap, and further save size. We're now down to 4 256-entries tables.
This is visible in h3 and h2 where the max request rate is slightly higher
(e.g. +1.6% for h2). The huff_dec() function got slightly larger but the
overall code size shrunk:
$ nm --size haproxy-before | grep huff_dec 000000000000029e T huff_dec
$ nm --size haproxy-after | grep huff_dec 0000000000000345 T huff_dec
$ size haproxy-before haproxy-after
text data bss dec hex filename 7591126 569268 276134810921742 a6a70e haproxy-before 7591082 568180 276134810920610 a6a2a2 haproxy-after
Miroslav Zagorac [Mon, 19 Sep 2022 10:18:53 +0000 (12:18 +0200)]
CLEANUP: httpclient: deleted unused variables
The locally defined static variables 'httpclient_srv_raw' and
'httpclient_srv_ssl' are not used anywhere in the source code,
except that they are set in the httpclient_precheck() function.
nb_hreq is a counter on qcc for active HTTP requests. It is incremented
for each qcs where a full HTTP request was received. It is decremented
when the stream is closed locally :
- on HTTP response fully transmitted
- on stream reset
A bug will occur if a stream is resetted without having processed a full
HTTP request. nb_hreq will be decremented whereas it was not
incremented. This will lead to a crash when building with
DEBUG_STRICT=2. If BUG_ON_HOT are not active, nb_hreq counter will wrap
which may break the timeout logic for the connection.
This bug was triggered on haproxy.org. It can be reproduced by
simulating the reception of a STOP_SENDING frame instead of a STREAM one
by patching qc_handle_strm_frm() :
To fix this bug, a qcs is now flagged with a new QC_SF_HREQ_RECV. This
is set when the full HTTP request is received. When the stream is closed
locally, nb_hreq will be decremented only if this flag was set.
Released version 2.7-dev6 with the following main changes :
- MINOR: Revert part of clarifying samples support per os commit
- BUILD: makefile: enable crypt(3) for NetBSD
- BUG/MINOR: quic: Retransmitted frames marked as acknowledged
- BUG/MINOR: quic: Possible crash with "tls-ticket-keys" on QUIC bind lines
- MINOR: http-check: Remove support for headers/body in "option httpchk" version
- BUG/MINOR: h1: Support headers case adjustment for TCP proxies
- BUG/MINOR: quic: Possible crash when verifying certificates
- BUILD: quic: add some ifdef around the SSL_ERROR_* for libressl
- BUILD: ssl: fix ssl_sock_switchtx_cbk when no client_hello_cb
- BUILD: quic: temporarly ignore chacha20_poly1305 for libressl
- BUILD: quic: enable early data only with >= openssl 1.1.1
- BUILD: ssl: fix the ifdef mess in ssl_sock_initial_ctx
- BUILD: quic: fix the #ifdef in ssl_quic_initial_ctx()
- MINOR: quic: add QUIC support when no client_hello_cb
- MINOR: quic: Add traces about sent or resent TX frames
- MINOR: quic: No TRACE_LEAVE() in retrieve_qc_conn_from_cid()
- BUG/MINOR: quic: Wrong connection ID to thread ID association
- BUG/MINOR: task: always reset a new tasklet's call date
- BUG/MINOR: task: make task_instant_wakeup() work on a task not a tasklet
- MINOR: task: permanently enable latency measurement on tasklets
- CLEANUP: task: rename ->call_date to ->wake_date
- BUG/MINOR: sched: properly account for the CPU time of dying tasks
- MINOR: sched: store the current profile entry in the thread context
- BUG/MINOR: stream/sched: take into account CPU profiling for the last call
- MINOR: tasks: do not keep cpu and latency times in struct task
- MINOR: tools: add generic pointer hashing functions
- CLEANUP: activity: make memprof use the generic ptr_hash() function
- CLEANUP: activity: make taskprof use ptr_hash()
- MINOR: debug: add struct ha_caller to describe a calling location
- CLEANUP: debug: use struct ha_caller for memstat
- DEBUG: task: define a series of wakeup types for tasks and tasklets
- DEBUG: task: use struct ha_caller instead of arrays of file:line
- DEBUG: applet: instrument appctx_wakeup() to log the caller's location
- DEBUG: task: simplify the caller recording in DEBUG_TASK
- CLEANUP: task: move tid and wake_date into the common part
- CLEANUP: sched: remove duplicate code in run_tasks_from_list()
- CLEANUP: activity: make the number of sched activity entries more configurable
- DEBUG: resolvers: unstatify process_resolvers() to make it appear in profiling
- DEBUG: quic: export the few task handlers that often appear in task dumps
- MEDIUM: tasks/activity: combine the called function with the caller
- MINOR: tasks/activity: improve the caller-callee activity hash
- MINOR: activity/cli: support aggregating task profiling outputs
- MINOR: activity/cli: support sorting task profiling by total CPU time
- BUG/MINOR: signals/poller: set the poller timeout to 0 when there are signals
- BUG/MINOR: quic: Speed up the handshake completion only one time
- BUG/MINOR: quic: Trace fix about packet number space information.
- BUG/MINOR: h3: Crash when h3 trace verbosity is "minimal"
- MINOR: h3: Add the quic_conn object to h3 traces
- MINOR: h3: Missing connection argument for a TRACE_LEAVE() argument
- MINOR: h3: Send the h3 settings with others streams (requests)
- MINOR: dev/udp: Apply the corruption to both directions
- BUILD: udp-perturb: Add a make target for udp-perturb tool
- BUG/MINOR: signals/poller: ensure wakeup from signals
- CI: cirrus-ci: bump FreeBSD image to 13-1
- DEV: flags: fix usage message to reflect available options
- DEV: flags: add missing CO_FL_FDLESS connection flag
- MINOR: flags: add a new file to host flag dumping macros
- MINOR: flags: implement a macro used to dump enums inside masks
- MINOR: flags/channel: use flag dumping for channel flags and analysers
- MINOR: flags/connection: use flag dumping for connection flags
- MINOR: flags/stconn: use flag dumping for stconn and sedesc flags
- MINOR: flags/stream: use flag dumping for stream error type
- MINOR: flags/stream: use flag dumping for stream flags
- MINOR: flags/task: use flag dumping for task state
- MINOR: flags/http_ana: use flag dumping for txn flags
- DEV: flags: remove the now unused SHOW_FLAG() definition
- DEV: flags: remove the now useless intermediary functions
- MINOR: flags/htx: use flag dumping to show htx and start-line flags
- MINOR: flags/http_ana: use flag dumping to show http msg states
- BUG/MEDIUM: proxy: ensure pause_proxy() and resume_proxy() own PROXY_LOCK
- MINOR: listener: small API change
- MINOR: proxy/listener: support for additional PAUSED state
- BUG/MINOR: stats: fixing stat shows disabled frontend status as 'OPEN'
- BUILD: flags: fix build warning in some macros used by show_flags
- BUILD: flags: fix the fallback macros for missing stdio
- CLEANUP: pollers: remove dead code in the polling loop
- BUG/MINOR: mux-h1: Increment open_streams counter when H1 stream is created
- REGTESTS: healthcheckmail: Relax matching on the healthcheck log message
- CLEANUP: listener: function comment typo in stop_listener()
- BUG/MINOR: listener: null pointer dereference suspected by coverity
- MINOR: flags/fd: decode FD flags states
- REORG: mux-h2: extract flags and enums into mux_h2-t.h
- MINOR: flags/mux-h2: decode H2C and H2S flags
- REGTESTS: log: test the log-forward feature
- BUG/MEDIUM: sink: bad init sequence on tcp sink from a ring.
- REGTESTS: ssl/log: test the log-forward with SSL
- MEDIUM: httpclient: httpclient_create_proxy() creates a proxy for httpclient
- MEDIUM: httpclient: allow to use another proxy
- DOC: fix TOC in starter guide for subsection 3.3.8. Statistics
- MINOR: httpclient: export httpclient_create_proxy()
- MEDIUM: quic: separate path for rx and tx with set_encryption_secrets
- BUG/MEDIUM: mux-quic: fix crash on early app-ops release
- REORG: mux-h1: extract flags and enums into mux_h1-t.h
- MINOR: flags/mux-h1: decode H1C and H1S flags
- CLEANUP: mux-quic: remove stconn usage in h3/hq
- BUG/MINOR: mux-quic: do not remotely close stream too early
- CLEANUP: exclude udp-perturb with .gitignore
- BUG/MEDIUM: server: segv when adding server with hostname from CLI
- CLEANUP: quic,ssl: fix tiny typos in C comments
- BUG/MEDIUM: captures: free() an error capture out of the proxy lock
- BUILD: fd: fix a build warning on the DWCAS
- MINOR: anon: add new macros and functions to anonymize contents
- MINOR: anon: store the anonymizing key in the global structure
- MINOR: anon: store the anonymizing key in the CLI's appctx
- MINOR: cli: anonymize commands 'show sess' and 'show sess all'
- MINOR: cli: anonymize 'show servers state' and 'show servers conn'
- MINOR: config: add command-line -dC to dump the configuration file
- SCRIPTS: announce-release: update some URLs to https
SCRIPTS: announce-release: update some URLs to https
Some components like Discourse were already redirecting to https. Other
ones like docs and git are covered by the certificate, and finally
switching the advertised scheme for www should increase the ratio of
H2 and H3 in the stats (resp 8.9 and 1.9%) and possibly help spot new
issues.
MINOR: config: add command-line -dC to dump the configuration file
This commit adds a new command line option -dC to dump the configuration
file. An optional key may be appended to -dC in order to produce an
anonymized dump using this key. The anonymizing process uses the same
algorithm as the CLI so that the same key will produce the same hashes
for the same identifiers. This way an admin may share an anonymized
extract of a configuration to match against live dumps. Note that key 0
will not anonymize the output. However, in any case, the configuration
is dumped after tokenizing, thus comments are lost.
MINOR: cli: anonymize 'show servers state' and 'show servers conn'
Modify proxy.c in order to anonymize the following confidential data on
commands 'show servers state' and 'show servers conn':
- proxy name
- server name
- server address
MINOR: cli: anonymize commands 'show sess' and 'show sess all'
Modify stream.c in order to hash the following confidential data if the
anonymized mode is enabled:
- configuration elements such as frontend/backend/server names
- IP addresses
MINOR: anon: store the anonymizing key in the CLI's appctx
In order to allow users to dump internal states using a specific key
without changing the global one, we're introducing a key in the CLI's
appctx. This key is preloaded from the global one when "set anon on"
is used (and if none exists, a random one is assigned). And the key
can optionally be assigned manually for the whole CLI session.
A "show anon" command was also added to show the anon state, and the
current key if the users has sufficient permissions. In addition, a
"debug dev hash" command was added to test the feature.
MINOR: anon: store the anonymizing key in the global structure
Add a uint32_t key in global to hash words with it. A new CLI command
'set global-key <key>' was added to change the global anonymizing key.
The global may also be set in the configuration using the global
"anonkey" directive. For now this key is not used.
MINOR: anon: add new macros and functions to anonymize contents
These macros and functions will be used to anonymize strings by producing
a short hash. This will allow to match config elements against dump elements
without revealing the original data. This will later be used to anonymize
configuration parts and CLI commands output. For now only string, identifiers
and addresses are supported, but the model is easily extensible.
Ilya reported in issue #1816 a build warning on armhf (promoted to error
here since -Werror):
src/fd.c: In function fd_rm_from_fd_list:
src/fd.c:209:87: error: passing argument 3 of __ha_cas_dw discards volatile qualifier from pointer target type [-Werror=discarded-array-qualifiers]
209 | unlikely(!_HA_ATOMIC_DWCAS(((long *)&fdtab[fd].update), (uint32_t *)&cur_list.u32, &next_list.u32))
| ^~~~~~~~~~~~~~
This happens only on such an architecture because the DWCAS requires the
pointer not the value, and gcc seems to be needlessly picky about reading
a const from a volatile! This may safely be backported to older versions.
BUG/MEDIUM: captures: free() an error capture out of the proxy lock
Ed Hein reported in github issue #1856 some occasional watchdog panics
in 2.4.18 showing extreme contention on the proxy's lock while the libc
was in malloc()/free(). One cause of this problem is that we call free()
under the proxy's lock in proxy_capture_error(), which makes no sense
since if we can free the object under the lock after it's been detached,
we can also free it after releasing the lock (since it's not referenced
anymore).
This should be backported to all relevant versions, likely all
supported ones.
BUG/MEDIUM: server: segv when adding server with hostname from CLI
When calling 'add server' with a hostname from the cli (runtime),
str2sa_range() does not resolve hostname because it is purposely
called without PA_O_RESOLVE flag.
This leads to 'srv->addr_node.key' being NULL. According to Willy it
is fine behavior, as long as we handle it properly, and is already
handled like this in srv_set_addr_desc().
This patch fixes GH #1865 by adding an extra check before inserting
'srv->addr_node' into 'be->used_server_addr'. Insertion and removal
will be skipped if 'addr_node.key' is NULL.
udp-perturb is a tool which can be used as a UDP gateway. It can be used
to reorder, remove or corrupt datagrams. It is compiled in dev/
directory and added to .gitignore to not clutter git status output.
BUG/MINOR: mux-quic: do not remotely close stream too early
A stream is considered as remotely closed once we have received all the
data with the FIN bit set.
The condition to close the stream was wrong. In particular, if we
receive an empty STREAM frame with FIN bit set, this would have close
the stream even if we do not have yet received all the data. The
condition is now adjusted to ensure that Rx buffer contains all the data
up to the stream final size.
In most cases, this bug is harmless. However, if compiled with
DEBUG_STRICT=2, a BUG_ON_HOT crash would have been triggered if close is
done too early. This was most notably the case sometimes on interop test
suite with quinn or kwik clients. This can also be artificially
reproduced by simulating reception of an empty STREAM frame with FIN bit
set in qc_handle_strm_frm() :
Small cleanup on snd_buf for application protocol layer.
* do not export h3_snd_buf
* replace stconn by a qcs argument. This is better as h3/hq-interop only
uses the qcs instance.
REORG: mux-h1: extract flags and enums into mux_h1-t.h
The same was performed for the H2 multiplexer. H1C and H1S flags are moved
in a dedicated header file. It will be mainly used to be able to decode
mux-h1 flags from the flags utility.
In this patch, we only move the flags to mux_h1-t.h.
BUG/MEDIUM: mux-quic: fix crash on early app-ops release
H3 SETTINGS emission has recently been delayed. The idea is to send it
with the first STREAM to reduce sendto syscall invocation. This was
implemented in the following patch : 3dd79d378c86b3ebf60e029f518add5f1ed54815
MINOR: h3: Send the h3 settings with others streams (requests)
This patch works fine under nominal conditions. However, it will cause a
crash if a HTTP/3 connection is released before having sent any data,
for example when receiving an invalid first request. In this case,
qc_release will first free qcc.app_ops HTTP/3 application protocol layer
via release callback. Then qc_send is called to emit any closing frames
built by app_ops release invocation. However, in qc_send, as no data has
been sent, it will try to complete application layer protocol
intialization, with a SETTINGS emission for HTTP/3. Thus, qcc.app_ops is
reused, which is invalid as it has been just freed. This will cause a
crash with h3_finalize in the call stack.
This bug can be reproduced artificially by generating incomplete HTTP/3
requests. This will in time trigger http-request timeout without any
data send. This is done by editing qc_handle_strm_frm function.
- ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len,
+ ret = qcc_recv(qc->qcc, strm_frm->id, strm_frm->len - 1,
strm_frm->offset.key, strm_frm->fin,
(char *)strm_frm->data);
To fix this, application layer closing API has been adjusted to be done
in two-steps. A new shutdown callback is implemented : it is used by the
HTTP/3 layer to generate GOAWAY frame in qc_release prologue.
Application layer context qcc.app_ops is then freed later in qc_release
via the release operation which is now only used to liberate app layer
ressources. This fixes the problem as the intermediary qc_send
invocation will be able to reuse app_ops before it is freed.
This patch fixes the crash, but it would be better to adjust H3 SETTINGS
emission in case of early connection closing : in this case, there is no
need to send it. This should be implemented in a future patch.
This should fix the crash recently experienced by Tristan in github
issue #1801.
MEDIUM: quic: separate path for rx and tx with set_encryption_secrets
With quicTLS the set_encruption_secrets callback is always called with
the read_secret and the write_secret.
However this is not the case with libreSSL, which uses the
set_read_secret()/set_write_secret() mecanism. It still provides the
set_encryption_secrets() callback, which is called with a NULL
parameter for the write_secret during the read, and for the read_secret
during the write.
The exchange key was not designed in haproxy to be called separately for
read and write, so this patch allow calls with read or write key to
NULL.
DOC: fix TOC in starter guide for subsection 3.3.8. Statistics
This subsection has been moved from 3.4.9 to 3.3.8 somewhere along
2.4, but the TOC has not been updated - resulting in a invalid
anchor in the HTML version.
MEDIUM: httpclient: httpclient_create_proxy() creates a proxy for httpclient
httpclient_create_proxy() is a function which creates a proxy that could
be used for the httpclient. It will allocate a proxy, a raw server and
an ssl server.
This patch moves most of the code from httpclient_precheck() into a
generic function httpclient_create_proxy().
The proxy will have the PR_CAP_HTTPCLIENT capability.
This could be used for specifics httpclient instances that needs
different proxy settings.
BUG/MEDIUM: sink: bad init sequence on tcp sink from a ring.
The init of tcp sink, particularly for SSL, was done
too early in the code, during parsing, and this can cause
a crash specially if nbthread was not configured.
This was detected by William using ASAN on a new regtest
on log forward.
This patch adds the 'struct proxy' created for a sink
to a list and this list is now submitted to the same init
code than the main proxies list or the log_forward's proxies
list. Doing this, we are assured to use the right init sequence.
It also removes the ini code for ssl from post section parsing.
This patch should be backported as far as v2.2
Note: this fix uses 'goto' labels created by commit
'BUG/MAJOR: log-forward: Fix log-forward proxies not fully initialized'
but this code didn't exist before v2.3 so this patch needs to be
adapted for v2.2.
REORG: mux-h2: extract flags and enums into mux_h2-t.h
Originally in 1.8 we wanted to have an independent mux that could possibly
be disabled and would not impose dependencies on the outside. Everything
would fit into a single C file and that was fine.
Nowadays muxes are unavoidable, and not being able to easily inspect them
from outside is sometimes a bit of a pain. In particular, the flags utility
still cannot be used to decode their flags.
As a first step towards this, this patch moves the flags and enums to
mux_h2-t.h, as well as the two state decoding inline functions. It also
dropped the H2_SS_*_BIT defines that nobody uses. The mux_h2.c file remains
the only one to include that for now.
BUG/MINOR: listener: null pointer dereference suspected by coverity
Please refer to GH #1859 for more info.
Coverity suspected improper proxy pointer handling.
Without the fix it is considered safe for the moment, but it might not
be the case in the future as we want to keep the ability to have
isolated listeners.
Making sure stop_listener(), pause_listener(), resume_listener()
and listener_release() functions make proper use
of px pointer in that context.
No need for backport except if multi-connection protocols (ie:FTP)
were to be backported as well.
REGTESTS: healthcheckmail: Relax matching on the healthcheck log message
Depending on the timing, the conneciton on lisrv listener may be fully
accepted before any reject. Thus, instead of getting a socket error, an
invalid L7 response is reported. There is no reason to be strick on the
error type. Any failure is good here, because we just want to test the
email-alert feature.
This patch should fix issue #1857. It may be backported as far as 2.2.
BUG/MINOR: mux-h1: Increment open_streams counter when H1 stream is created
Since this counter was added, it was incremented at the wrong place for
client streams. It was incremented when the stream-connector (formely the
conn-stream) was created while it should be done when the H1 stream is
created. Thus, on parsing error, on H1>H2 upgrades or TCP>H1 upgrades, the
counter is not incremented. However, it is always decremented when the H1
stream is destroyed.
CLEANUP: pollers: remove dead code in the polling loop
As reported by Ilya and Coverity in issue #1858, since recent commit eea152ee6 ("BUG/MINOR: signals/poller: ensure wakeup from signals")
which removed the test for the global signal flag from the pollers'
loop, the remaining "wake" flag doesn't need to be tested since it
already participates to zeroing the wait_time and will be caught
on the previous line.
BUILD: flags: fix the fallback macros for missing stdio
The fallback macros for when stdio is not there didn't have the "..."
and were causing build issues on platforms with stricter dependencies
between includes.
BUG/MINOR: stats: fixing stat shows disabled frontend status as 'OPEN'
This patch adresses the issue #1626.
Adding support for PR_FL_PAUSED flag in the function stats_fill_fe_stats().
The command 'show stat' now properly reports a disabled frontend
using "PAUSED" state label.
This patch depends on the following commits:
- 7d00077fd5 "BUG/MEDIUM: proxy: ensure pause_proxy()
and resume_proxy() own PROXY_LOCK".
- 001328873c "MINOR: listener: small API change"
- d46f437de6 "MINOR: proxy/listener: support for additional PAUSED state"
MINOR: proxy/listener: support for additional PAUSED state
This patch is a prerequisite for #1626.
Adding PAUSED state to the list of available proxy states.
The flag is set when the proxy is paused at runtime (pause_listener()).
It is cleared when the proxy is resumed (resume_listener()).
BUG/MEDIUM: proxy: ensure pause_proxy() and resume_proxy() own PROXY_LOCK
There was a race involving hlua_proxy_* functions
and some proxy management functions.
pause_proxy() and resume_proxy() can be used directly from lua code,
but that could lead to some race as lua code didn't make sure PROXY_LOCK
was owned before calling the proxy functions.
This patch makes sure it won't happen again elsewhere in the code
by locking PROXY_LOCK directly in resume and pause proxy functions
so that it's not the caller's responsibility anymore.
(based on stop_proxy() behavior that was already safe prior to the patch)
This should be backported to stable series.
Note that the API will likely differ < 2.4
DEV: flags: remove the now useless intermediary functions
There's no more point keeping functions that are just wrappers around
other ones, let's directly call them from the main entry point. It helps
visually control the mapping between output formats and their definition
and doesn't require to invent long names. For a bit more readability, the
tmpbuf and its size adopted slightly shorter names.
MINOR: flags/stream: use flag dumping for stream error type
The new function is strm_et_show_flags(). Only the error type is
handled at the moment, as a bit more complex logic is needed to
mix the values and enums present in some fields.
MINOR: flags/stconn: use flag dumping for stconn and sedesc flags
The two new functions are se_show_flags() and sc_show_flags().
Maybe something could be done for SC_ST_* values but as it's a
small enum, a simple switch/case should work fine.
MINOR: flags/channel: use flag dumping for channel flags and analysers
The two new functions are chn_show_analysers() and chn_show_flags().
They work on an existing buffer so one was declared in flags.c for this
purpose. File flags.c does not have to know about channel flags anymore.
MINOR: flags: implement a macro used to dump enums inside masks
Some of our flags have enums inside a mask. The new macro __APPEND_ENUM
is able to deal with that by comparing the flag's value against an exact
one under the mask. One needs to take care of eliminating the zero value
though, otherwise delimiters will not always be properly placed (e.g. if
some flags were dumped before and what remains is exactly zero). The
bits of the mask are cleared only upon exact matches.
MINOR: flags: add a new file to host flag dumping macros
The "flags" utility is useful but painful to maintain up to date. This
commit aims at providing a low-maintenance solution to keep flags up to
date, by proposing some macros that build a string from a set of flags
in a way that requires the least possible verbosity.
The idea will be to add an inline function dedicated to this just after
the flags declaration, and enumerate the flags one is interested in, and
that function will fill a string based on them.
Placing this inside the type files allows both haproxy and external tools
like "flags" to use it, but comes with a few constraints. First, the
files will be slightly less readable if these functions are huge, so they
need to stay as compact as possible. Second, the function will need
anprintf() and we don't want to include stdio.h in type files as it
proved to be particularly heavy and to cause definition headaches in
the past.
As such the file here only contains a macro enclosed in #ifdef EOF (that
is defined in stdio), and provides an alternate empty one when no stdio
is defined. This way it's the caller that has to include stdio first or
it won't get anything back, and in practice the locations relying on
this always have it.
The macro has to be used in 3 steps:
- prologue: dumps 0 and exits if the value is zero
- flags: the macro can be recursively called and it will push the
flag from bottom to top so that they appear in the same order as
today without requiring to be declared the other way around
- epilogue: dump remaining flags that were not identified
The macro was arranged so that a single character can be used with no
other argument to declare all flags at once. Example:
#define _(n, ...) __APPEND_FLAG(buf, len, del, flg, n, #n, __VA_ARGS__)
_(0);
_(X_FLAG1, _(X_FLAG2, _(X_FLAG3, _(X_FLAG4))));
_(~0);
#undef _
Existing files will have to be updated to rely on it, and more files
could come soon.
DEV: flags: add missing CO_FL_FDLESS connection flag
This was added in 2.6 by commit c78a9698e ("MINOR: connection: add a new
flag CO_FL_FDLESS on fd-less connections") but forgotten in flags.c.
This must be backported to 2.6.
DEV: flags: fix usage message to reflect available options
The proposed decoding options were not updated after the changes in 2.6,
let's fix that by taking the names from the existing declaration. This
should be backported to 2.6.
BUG/MINOR: signals/poller: ensure wakeup from signals
Add self-wake in signal_handler() to fix a race condition with a signal
coming in between checking signal_queue_len and entering polling sleep.
The changes in commit 43c891dda ("BUG/MINOR: signals/poller: set the
poller timeout to 0 when there are signals") were insufficient.
Move the signal_queue_len check from the poll implementations to
run_poll_loop() to keep that logic in one place.
The poll loops are terminated either by the parameter wake being set or
wake up due to a write to their poller_wr_pipe by wake_thread() in
signal_handler().
MINOR: dev/udp: Apply the corruption to both directions
Harden the UDP datagram corruption applying it on both sides. This approaches
the conditions of some tests run by the QUIC interop runner developed by
Marten Seeman.
MINOR: h3: Send the h3 settings with others streams (requests)
This is the ->finalize application callback which prepares the unidirectional STREAM
frames for h3 settings and wakeup the mux I/O handler to send them. As haproxy is
at the same time always waiting for the client request, this makes haproxy
call sendto() to send only about 20 bytes of stream data. Furthermore in case
of heavy loss, this give less chances to short h3 requests to succeed.
Drawback: as at this time the mux sends its streams by their IDs ascending order
the stream 0 is always embedded before the unidirectional stream 3 for h3 settings.
Nevertheless, as these settings may be lost and received after other h3 request
streams, this is permitted by the RFC.
Perhaps there is a better way to do. This will have to be checked with Amaury.
BUG/MINOR: h3: Crash when h3 trace verbosity is "minimal"
This was due to a missing check in h3_trace() about the first argument
presence (connection) and h3_parse_settings_frm() which calls TRACE_LEAVE()
without any argument. Then this argument was dereferenced.
BUG/MINOR: quic: Trace fix about packet number space information.
<qc> variable was confused with <qel>. The consequence was that it was
always the same packet number space which was displayed: the first one (or
the Initial packet number space).
BUG/MINOR: quic: Speed up the handshake completion only one time
It is possible to speed up the handshake completion but only one time
by connection as mentionned in RFC 9002 "6.2.3. Speeding up Handshake Completion".
Add a flag to prevent this process to be run several times
(see https://www.rfc-editor.org/rfc/rfc9002#name-speeding-up-handshake-compl).
BUG/MINOR: signals/poller: set the poller timeout to 0 when there are signals
When receiving a signal before entering the poller, and without any
activity in the process, the poller will be entered with a timeout
calculated without checking the signals.
Since commit 4f59d3 ("MINOR: time: increase the minimum wakeup interval
to 60s") the issue is much more visible because it could be stuck for
60s.
When in mworker mode, if a worker quits and the SIGCHLD signal deliver
at the right time to the master, this one could be stuck for the time of
the timeout.
MINOR: activity/cli: support sorting task profiling by total CPU time
The new "bytime" sorting criterion uses the reported CPU time instead of
the usage. This is convenient to spot tasks that are mostly reponsible
for the CPU usage in a running process. It supports both the detailed
and the aggregated format. The output looks like this:
MINOR: activity/cli: support aggregating task profiling outputs
By default we now dump stats between caller and callee, but by
specifying "aggr" on the command line, stats get aggregated by
callee again as it used to be before the feature was available.
It may sometimes be helpful when comparing total call counts,
though that's about all.
MINOR: tasks/activity: improve the caller-callee activity hash
The previous dump already showed that the "other" category was getting
a few entries. Let's proceed like for the memory profiling, by scanning
a limited range of adjacent slots to find a spare one (16 max). That's
pretty fast since close and likely prefetched and the comparison is
cheap. The new dump now shows up to 45 entries below without "other":
MEDIUM: tasks/activity: combine the called function with the caller
Now instead of getting aggregate stats per called function, we have
them per function AND per call place. The "byaddr" sort considers
the function pointer first, then the call count, so that dominant
callers of a given callee are instantly spotted. This allows to get
sorted outputs like this:
It already becomes visible that some tasks have different very costs
depending where they're called (e.g. process_stream). The method used
to wake them up is also shown. Applets are handled specially and shown
as appctx_wakeup.
CLEANUP: activity: make the number of sched activity entries more configurable
This removes all the hard-coded 8-bit and 256 entries to use a pair of
macros instead so that we can more easily experiment with larger table
sizes if needed.
CLEANUP: sched: remove duplicate code in run_tasks_from_list()
Now that ->wake_date is common to tasks and tasklets, we don't need
anymore to carry a duplicate control block to read and update it for
tasks and tasklets. And given that this code was present early in the
if/else fork between tasks and tasklets, taking it out of the block
allows to move the task part into a more visible "else" branch that
also allows to factor the epilogue that resets th_ctx->current and
updates profile_entry->cpu_time, which also used to be duplicated.
Overall, doing just that saved 253 bytes in the function, or ~1/6,
which is not bad considering that it's on a hot path. And the code
got much ore readable.
CLEANUP: task: move tid and wake_date into the common part
There used to be one tid for tasklets and a thread_mask for tasks. Since
2.7, both tasks and tasklets now use a tid (albeit with a very slight
semantic difference for the negative value), to in order to limit code
duplication and to ease debugging it makes sense to move tid into the
common part. One limitation is that it will leave a hole in the structure,
but we now have the wake_date that is always present and can move there as
well to plug the hole.
This results in something overall pretty clean (and cleaner than before),
with the low-level stuff (state,tid,process,context) appearing first, then
the caller stuff (caller,wake_date,calls,debug) next, and finally the
type-specific stuff (rq/wq/expire/nice).
DEBUG: task: simplify the caller recording in DEBUG_TASK
Instead of storing an index that's swapped at every call, let's use the
two pointers as a shifting history. Now we have a permanent "caller"
field that records the last caller, and an optional prev_caller in the
debug section enabled by DEBUG_TASK that keeps a copy of the previous
caller one. This way, not only it's much easier to follow what's
happening during debugging, but it saves 8 bytes in the struct task in
debug mode and still keeps it under 2 cache lines in nominal mode, and
this will finally be usable everywhere and later in profiling.
The caller_idx was also used as a hint that the entry was freed, in order
to detect wakeup-after-free. This was changed by setting caller to -1
instead and preserving its value in caller[1].
Finally, the operations were made atomic. That's not critical but since
it's used for debugging and race conditions represent a significant part
of the issues in multi-threaded mode, it seems wise to at least eliminate
some possible factors of faulty analysis.
DEBUG: applet: instrument appctx_wakeup() to log the caller's location
appctx_wakeup() relies on task_wakeup(), but since it calls it from a
function, the calling place is always appctx_wakeup() itself, which is
not very useful.
Let's turn it to a macro so that we can log the location of the caller
instead. As an example, the cli_io_handler() which used to be seen as
this:
(gdb) p *appctx->t.debug.caller[0]
$10 = {
func = 0x9ffb78 <__func__.37996> "appctx_wakeup",
file = 0x9b336a "include/haproxy/applet.h",
line = 110,
what = 1 '\001',
arg8 = 0 '\000',
arg32 = 0
}
Now shows the more useful:
(gdb) p *appctx->t.debug.caller[0]
$6 = {
func = 0x9ffe80 <__func__.38641> "sc_app_chk_snd_applet",
file = 0xa00320 "src/stconn.c",
line = 996,
what = 6 '\006',
arg8 = 0 '\000',
arg32 = 0
}
DEBUG: task: use struct ha_caller instead of arrays of file:line
This reduces the task struct by 8 bytes, reduces the code size a little
bit by simplifying the calling convention (one argument dropped), and
as a bonus provides the function name in the caller.
The memstats code currently defines its own file/function/line number,
type and extra pointer. We don't need to keep them separate and we can
easily replace them all with just a struct ha_caller. Note that the
extra pointer could be converted to a pool ID stored into arg8 or
arg32 and be dropped as well, but this would first require to define
IDs for pools (which we currently do not have).
MINOR: debug: add struct ha_caller to describe a calling location
The purpose of this structure is to assemble all constant parts of a
generic calling point for a specific event. These ones are created by
the compiler as a static const element outside of the code path, so
they cost nothing in terms of CPU, and a pointer to that descriptor
can be passed to the place that needs it. This is very similar to what
is being done for the mem_stat stuff.
This will be useful to simplify and improve DEBUG_TASK.
There are a few places where it's convenient to hash a pointer to compute
a statistics bucket. Here we're basically reusing the hash that was used
by memory profiling with a minor update that the multiplier was corrected
to be prime and stand by its promise to have equal numbers of 1 and 0,
and that 32-bit platforms won't lose range anymore.
MINOR: tasks: do not keep cpu and latency times in struct task
It was a mistake to put these two fields in the struct task. This
was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task
CPU time and latency"). These fields are used solely by streams in
order to report the measurements via the lat_ns* and cpu_ns* sample
fetch functions when task profiling is enabled. For the rest of the
tasks, this is pure CPU waste when profiling is enabled, and memory
waste 100% of the time, as the point where these latencies and usages
are measured is in the profiling array.
Let's move the fields to the stream instead, and have process_stream()
retrieve the relevant info from the thread's context.
The struct task is now back to 120 bytes, i.e. almost two cache lines,
with 32 bit still available.
BUG/MINOR: stream/sched: take into account CPU profiling for the last call
When task profiling is enabled, the reported CPU time for short requests
and responses (e.g. redirect) is always zero in the logs, because
process_stream() is only called once and the CPU time is measured after
it returns. This is particuarly annoying when dealing with denies and in
general anything that deals with parasitic traffic because it can be
difficult to figure where the CPU is spent.
The solution taken in this patch consists in having process_stream()
update the cpu time itself before logging and quitting. It's very simple.
It will not take into account the time taken to produce the log nor
freeing the stream, but that's marginal compared to always logging zero.
The task's wake_date is also reset so that the scheduler doesn't have to
perform these operations again. This is dependent on the following patch:
MINOR: sched: store the current profile entry in the thread context
It should be backported to 2.6 as it does help for troubleshooting.
MINOR: sched: store the current profile entry in the thread context
The profile entry that corresponds to the current task/tasklet being
profiled is now stored into the thread's context. This will allow it
to be accessed from the tasks themselves. This is needed for an upcoming
fix.
BUG/MINOR: sched: properly account for the CPU time of dying tasks
When task profiling is enabled, the scheduler can measure and report
the cumulated time spent in each task and their respective latencies. But
this was wrong for tasks with few wakeups as well as for self-waking ones,
because the call date needed to measure how long it takes to process the
task is retrieved in the task itself (->wake_date was turned to the call
date), and we could face two conditions:
- a new wakeup while the task is executing would reset the ->wake_date
field before returning and make abnormally low values being reported;
that was likely the case for taskèrun_applet for self-waking applets;
- when the task dies, NULL is returned and the call date couldn't be
retrieved, so that CPU time was not being accounted for. This was
particularly visible with process_stream() which is usually called
only twice per request, and whose time was systematically halved.
The cleanest solution here is to keep in mind that the scheduler already
uses quite a bit of local context in th_ctx, and place the intermediary
values there so that they cannot vanish. The wake_date has to be reset
immediately once read, and only its copy is used along the function. Note
that this must be done both for tasks and tasklet, and that until recently
tasklets were also able to report wrong values due to their sole dependency
on TH_FL_TASK_PROFILING between tests.
One nice benefit for future improvements is that such information will now
be available from the task without having to be stored into the task itself
anymore.
Since the tasklet part was computed on wrapping 32-bit arithmetics and
the task one was on 64-bit, the values were now consistently moved to
32-bit as it's already largely sufficient (4s spent in a task is more
than twice what the watchdog would tolerate). Some further cleanups might
be necessary, but the patch aimed at staying minimal.
Task profiling output after 1 million HTTP request previously looked like
this:
This feature is essentially used for debugging so it has extremely limited
impact. However it's used quite a bit more in bug reports and it would be
desirable that at least 2.6 gets this fix backported. It depends on at least
these two previous patches which will then also have to be backported:
MINOR: task: permanently enable latency measurement on tasklets
CLEANUP: task: rename ->call_date to ->wake_date
This field is misnamed because its real and important content is the
date the task was woken up, not the date it was called. It temporarily
holds the call date during execution but this remains confusing. In
fact before the latency measurements were possible it was indeed a call
date. Thus is will now be called wake_date.
This change is necessary because a subsequent fix will require the
introduction of the real call date in the thread ctx.
MINOR: task: permanently enable latency measurement on tasklets
When tasklet latency measurement was enabled in 2.4 with commit b2285de04
("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"),
the feature was conditionned on DEBUG_TASK because the field would add 8
bytes to the struct tasklet.
This approach was not a very good idea because the struct ends on an int
anyway thus it does finish with a 32-bit hole regardless of the presence
of this field. What is true however is that adding it turned a 64-byte
struct to 72-byte when caller debugging is enabled.
This patch revisits this with a minor change. Now only the lowest 32
bits of the call date are stored, so they always fit in the remaining
hole, and this allows to remove the dependency on DEBUG_TASK. With
debugging off, we're now seeing a 48-byte struct, and with debugging
on it's exactly 64 bytes, thus still exactly one cache line. 32 bits
allow a latency of 4 seconds on a tasklet, which already indicates a
completely dead process, so there's no point storing the upper bits at
all. And even in the event it would happen once in a while, the lost
upper bits do not really add any value to the debug reports. Also, now
one tasklet wakeup every 4 billion will not be sampled due to the test
on the value itself. Similarly we just don't care, it's statistics and
the measurements are not 9-digit accurate anyway.
BUG/MINOR: task: make task_instant_wakeup() work on a task not a tasklet
There's a subtle (harmless) bug in task_instant_wakeup(). As it uses
some tasklet code instead of some task code, the debug part also acts
on the tasklet equivalent, and the call_date is only set when DEBUG_TASK
is set instead of inconditionally like with tasks. As such, without this
debugging macro, call dates are not updated for tasks woken this way.
There isn't any impact yet because this function was introduced in 2.6 to
solve certain classes of issues and is not used yet, and in the worst case
it would only affect the reported latency time.
This may be backported to 2.6 in case a future fix would depend on it but
currently will not fix existing code.
BUG/MINOR: task: always reset a new tasklet's call date
The tasklet's call date was not reset, so if profiling was enabled while
some tasklets were in the run queue, their initial random value could be
used to preload a bogus initial latency value into the task profiling bin.
Let's just zero the initial value.
This should be backported to 2.4 as it was brought with initial commit b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK
is set"). The impact is very low though.
BUG/MINOR: quic: Wrong connection ID to thread ID association
To work, quic_pin_cid_to_tid() must set cid[0] to a value with <target_id>
as <global.nbthread> modulo. For each integer n, (n - (n % m)) + d has always
d as modulo m (with d < m).
except when n wraps or when another modulo is applied to the addition result.
Here, for 8bit modulo arithmetic, if m does not divides 256, this cannot
works for values which wraps when we increment them by d.
For instance n=255 m=3 and d=1 the formula result is 0 (should be d).
To fix this, we first limit c[0] to 255 - <target_id> to prevent c[0] from wrapping.
Thank you to @esb for having reported this issue in GH #1855.
MINOR: quic: add QUIC support when no client_hello_cb
Add QUIC support to the ssl_sock_switchctx_cbk() variant used only when
no client_hello_cb is available.
This could be used with libreSSL implementation of QUIC for example.
It also works with quictls when HAVE_SSL_CLIENT_HELLO_CB is removed from
openss-compat.h