Willy Tarreau [Thu, 11 May 2023 10:02:21 +0000 (12:02 +0200)]
MINOR: stats: report the total number of warnings issued
Now in "show info" we have a TotalWarnings field that reports the total
number of warnings issued since the process started. It's also reported
in the the stats page next to the uptime.
Willy Tarreau [Thu, 11 May 2023 09:33:35 +0000 (11:33 +0200)]
DEBUG: list: add DEBUG_LIST to purposely corrupt list heads after delete
LIST_DELETE doesn't affect the previous pointers of the stored element.
This can sometimes hide bugs when such a pointer is reused by accident
in a LIST_NEXT() or equivalent after having been detached for example, or
ia another LIST_DELETE is performed again, something that LIST_DEL_INIT()
is immune to. By compiling with -DDEBUG_LIST, we'll replace a freshly
detached list element with two invalid pointers that will cause a crash
in case of accidental misuse. It's not enabled by default.
BUG/MINOR: quic: Buggy acknowlegments of acknowlegments function
qc_treat_ack_of_ack() must remove ranges of acknowlegments from an ebtree which
have been acknowledged. This is done keeping track of the largest acknowledged
packet number which has been acknowledged and sent with an ack-eliciting packet.
But due to the data structure of the acknowledgement ranges used to build an ACK frame,
one must leave at least one range in such an ebtree which must at least contain
a unique one-element range with the largest acknowledged packet number as element.
This issue was revealed by @Tristan971 in GH #2140.
queue:pop_wait() was broken during late refactor prior to merge.
(Due to small modifications to ensure that pop() returns nil on empty
queue instead of nothing)
Because of this, pop_wait() currently behaves exactly as pop(), resulting
in 100% active CPU when used in a while loop.
Indeed, _hlua_queue_pop() should explicitly return 0 when the queue is
empty since pop_wait logic relies on this and the pushnil should be
handled directly in queue:pop() function instead.
BUG/MEDIUM: filters: Don't deinit filters for disabled proxies during startup
During the startup stage, if a proxy was disabled in config, all filters
were released and removed. But it may be an issue if some info are shared
between filters of the same type. Resources may be released too early.
It happens with ACLs defined in SPOE configurations. Pattern expressions can
be shared between filters. To fix the issue, filters for disabled proxies
are no longer released during the startup stage but only when HAProxy is
stopped.
This commit depends on the previous one ("MINOR: spoe: Don't stop disabled
proxies"). Both must be backported to all stable versions.
SPOE register a signal handler to be able to stop SPOE applets ASAP during
soft-stop. Disabled proxies must be ignored at this staged because they are
not fully configured.
For now, it is useless but this change is mandatory to fix a bug.
src/mjson.c:196:21: fatal error: expected ';' at end of declaration
int __maybe_unused n = 0;
and
src/mjson.c:727:17: fatal error: variable 'n' set but not used [-Wunused-but-set-variable]
int sign = 1, n = 0;
An issue was created on the project, but it was not fixed for now:
https://github.com/cesanta/mjson/issues/51
So for now, to fix the build issue, these variables are declared as unused.
Of course, if there is any update on this library, be careful to review this
patch first to be sure it is always required.
This patch should fix the issue #1868. It be backported as far as 2.4.
Willy Tarreau [Thu, 11 May 2023 03:33:21 +0000 (05:33 +0200)]
[RELEASE] Released version 2.8-dev11
Released version 2.8-dev11 with the following main changes :
- BUILD: debug: do not check the isolated_thread variable in non-threaded builds
- BUILD: quic: fix build warning when threads are disabled
- CI: more granular failure on generating build matrix
- CLEANUP: quic: No more used q_buf structure
- CLEANUP: quic: Rename several <buf> variables in quic_frame.(c|h)
- CLEANUP: quic: Typo fix for quic_connection_id pool
- BUG/MINOR: quic: Wrong key update cipher context initialization for encryption
- BUG/MEDIUM: cache: Don't request more room than the max allowed
- MEDIUM: stconn: Be sure to always be able to unblock a SC that needs room
- EXAMPLES: fix IPV6 support for lua mailers script
- BUILD: ssl: buggy -Werror=dangling-pointer since gcc 13.0
- DOC: stconn: Update comments about ABRT/SHUT for stconn structure
- BUG/MEDIUM: stats: Require more room if buffer is almost full
- DOC: configuration: add info about ssl-engine for 2.6
- BUG/MINOR: mux-quic: fix transport VS app CONNECTION_CLOSE
- BUG/MEDIUM: mux-quic: wakeup tasklet to close on error
- DEV: flags: add a script to decode most flags in the "show sess all" output
- BUG/MINOR: quic: Possible crash when dumping version information
- BUG/MINOR: config: make compression work again in defaults section
- BUG/MEDIUM: stream: Forward shutdowns when unhandled errors are caught
- MEDIUM: stream: Resync analyzers at the end of process_stream() on change
- DEV: flags: add missing stream flags to show-sess-to-flags
- DEV: flags/show-sess-to-flags: only retrieve hex digits from hex fields
- DEV: flags/show-sess-to-flags: add support for color output
- CLEANUP: src/listener.c: remove redundant NULL check
"Originally the listeners were intended to work without a bind_conf
(e.g. for FTP processing) hence these tests, but over time the
bind_conf has become omnipresent"
Willy Tarreau [Wed, 10 May 2023 15:48:00 +0000 (17:48 +0200)]
DEV: flags/show-sess-to-flags: add support for color output
Highlighting a few fields helps spot them, but only if there are not too
many. What is done here is the following:
- the first line of each stream is highlighted in white (helps find
beginning/end in long dumps
- fields in the form name=value where value starts with upper case
letters are considered as a state dump (e.g. stconn state) and are
also highlighted. This results in ~20 pairs. In this case the name
and value use two different colors (cyan vs yellow) to further help
find what is being looked for
This is only done when the output is a terminal or when --color=always
is passed. It's also possible to disable it with --color=never or
--no-color.
Willy Tarreau [Wed, 10 May 2023 15:46:10 +0000 (17:46 +0200)]
DEV: flags/show-sess-to-flags: only retrieve hex digits from hex fields
Some fields are followed by a comma or a closing parenthesis and we
take them because we read everything that's not a space. Better be
stricter, we're causing warnings about incorrect hex format when
they're passed to printf.
MEDIUM: stream: Resync analyzers at the end of process_stream() on change
At the end of process_stream(), if there was any change on request/response
analyzers, we now trigger a resync. It is performed if any analyzer is added
but also removed. It should help to catch internal changes on a stream and
eventually avoid it to be frozen.
There is no reason to backport this patch. But it may be good to keep an eye
on it, just in case.
BUG/MEDIUM: stream: Forward shutdowns when unhandled errors are caught
In process_stream(), after request and response analyzers evaluation,
unhandled errors are processed, if any. In this case, depending on the case,
remaining request or response analyzers may be removed, unlesse the last one
about end of filters. However, auto-close is not reenabled in same
time. Thus it is possible to not forward the shutdown for a side to the
other one while no analyzer is there to do so or at least to make evolved
the situation.
In theory, it is thus possible to freeze a stream if no wakeup happens. And
it seems possible because it explain a freeze we've oberseved.
This patch could be backported to every stable versions but only after a
period of observation and if it may match an unexplained bug. It should not
lead to any loop but at worst and eventually to truncated messages.
Willy Tarreau [Wed, 10 May 2023 14:39:00 +0000 (16:39 +0200)]
BUG/MINOR: config: make compression work again in defaults section
When commit ead43fe4f ("MEDIUM: compression: Make it so we can compress
requests as well.") added the test for the direction flags to select the
compression, it implicitly broke compression defined in defaults sections
because the flags from the default proxy were not recopied, hence the
compression was enabled but in no direction.
BUG/MINOR: quic: Possible crash when dumping version information
->others member of tp_version_information structure pointed to a buffer in the
TLS stack used to parse the transport parameters. There is no garantee that this
buffer is available until the connection is released.
Do not dump the available versions selected by the client anymore, but displayed the
chosen one (selected by the client for this connection) and the negotiated one.
Willy Tarreau [Tue, 9 May 2023 18:17:18 +0000 (20:17 +0200)]
DEV: flags: add a script to decode most flags in the "show sess all" output
This ugly script decodes most flags in the "show sess all" output and
summarizes them after each stream in an aligned format that makes it
relatively easy to spot the ones of interest. It relies on "flags", and
if not found in various places, will replace its output with a copy-
pastable command that will produce it. h3/quic are not covered yet.
In order to compact the output format, the front and back stream conns
and connections are noted "f.sc", "f.co", "f.h1s", "b.co" etc. That's
sufficiently understandable and entries are grouped by context anyway.
BUG/MEDIUM: mux-quic: wakeup tasklet to close on error
A recent series of commit have been introduced to rework error
generation on QUIC MUX side. Now, all MUX/APP functions uses
qcc_set_error() to set the flag QC_CF_ERRL on error. Then, this flag is
converted to QC_CF_ERRL_DONE with a CONNECTION_CLOSE emission by
qc_send().
This has the advantage of centralizing the CONNECTION_CLOSE generation
in one place and reduces the link between MUX and quic-conn layer.
However, we must now ensure that every qcc_set_error() call is followed
by a QUIC MUX tasklet to invoke qc_send(). This was not the case, thus
when there is no active transfer, no CONNECTION_CLOSE frame is emitted
and the connection remains opened.
To fix this, add a tasklet_wakeup() directly in qcc_set_error(). This is
a brute force solution as this may be unneeded when already in the MUX
tasklet context. However, it is the simplest solution as it is too
tedious for the moment to list all qcc_set_error() invocation outside of
the tasklet.
BUG/MINOR: mux-quic: fix transport VS app CONNECTION_CLOSE
A recent series of patch were introduced to streamline error generation
by QUIC MUX. However, a regression was introduced : every error
generated by the MUX was built as CONNECTION_CLOSE_APP frame, whereas it
should be only for H3/QPACK errors.
Fix this by adding an argument <app> in qcc_set_error. When false, a
standard CONNECTION_CLOSE is used as error.
This bug was detected by QUIC tracker with the following tests
"stop_sending" and "server_flow_control" which requires a
CONNECTION_CLOSE frame.
BUG/MEDIUM: stats: Require more room if buffer is almost full
This was lost with commit f4258bdf3 ("MINOR: stats: Use the applet API to
write data"). When the buffer is almost full, the stats applet gives up.
When this happens, the applet must require more room. Otherwise, data in the
channel buffer are sent to the client but the applet is not woken up in
return.
DOC: stconn: Update comments about ABRT/SHUT for stconn structure
The comment for the stconn structure was still referencing the SHUTR/SHUTW
flags. These flags were replaced and we now use ABRT/SHUT flags in
comments. The comment itself was slightly updated to be accurate.
While this used to work fine with legacy mailers, IPV6 server support
for lua mailers script was overlooked so it is currently broken.
Indeed, within the lua script, server address was parsed as an IPV4
address to extract both ip and port and pass them to smtp_send_email()
function from Thierry FOURNIER.
From lua point of view: when fetching server address from
ProxyMailers.mailservers, server ip and port are not separated. Each
server address is represented using haproxy server address custom-format
(the one used to specify server addresses within haproxy config,
see 11. Address formats in haproxy configuration manual):
It is a string that contains both proto hint, ip and port.
(Such addresses are manipulated using str2sa_range() and sa2str()
in haproxy's code)
Parsing these custom-format addresses from lua to support multiple address
families is feasible since the format is properly documented in haproxy
configuration.
However, to keep things simple, and given that smtp_send_email() relies
on Socket.connect() function to set-up the tcp connection:
Socket.connect() already supports the full server address custom-format
when no explicit port argument is provided. Thus with minor code changes
we're able to pass the server string as it is.
With this, IPV6 smtp servers from mailers section are now automatically
supported when using lua mailers script.
MEDIUM: stconn: Be sure to always be able to unblock a SC that needs room
When sc_need_room() is called, the caller cannot request more free space
than a minimum value to be sure it is always possible to unblock it. it is a
safety guard to not freeze any SC on NEED_ROOM condition. At worse it will
lead to some wakeups un excess at the edge.
To keep things simple, the following minimum is used:
BUG/MEDIUM: cache: Don't request more room than the max allowed
Since a recent change on the SC API, a producer must specify the amount of
free space it needs to progress when it is blocked. But, it must take care
to never exceed the maximum size allowed in the buffer. Otherwise, the
stream is freezed because it cannot reach the condition to unblock the
producer.
In this context, there is a bug in the cache applet when it fails to dump a
message. It may request more space than allowed. It happens when the cached
object is too big.
BUG/MINOR: quic: Wrong key update cipher context initialization for encryption
As noticed by Miroslav, there was a typo in quic_tls_key_update() which lead
a cipher context for decryption to be initialized and used in place of a cipher
context for encryption. Surprisingly, this did not prevent the key update
from working. Perhaps this is due to the fact that the underlying cryptographic
algorithms used by QUIC are all symetric algorithms.
CLEANUP: quic: Rename several <buf> variables in quic_frame.(c|h)
Most of the function in quic_frame.c and quic_frame.h manipulate <buf> buffer
position variables which have nothing to see with struct buffer variables.
Rename them to <pos>
Willy Tarreau [Sun, 7 May 2023 13:06:22 +0000 (15:06 +0200)]
BUILD: quic: fix build warning when threads are disabled
Commit e83f937cc ("MEDIUM: quic: use a global CID trees list") uses a
local variable "tree" used only for locks, but when threads are disabled
it spews a warning about this unused variable.
Willy Tarreau [Sun, 7 May 2023 13:02:30 +0000 (15:02 +0200)]
BUILD: debug: do not check the isolated_thread variable in non-threaded builds
The build without thread support was broken by commit b30ced3d8 ("BUG/MINOR:
debug: fix incorrect profiling status reporting in show threads") because
it accesses the isolated_thread variable that is not defined when threads
are disabled. In fact both the test on harmless and this one make no sense
without threads, so let's comment out the block and mark the related
variables as unused.
This may have to be backported to 2.7 if the commit above is.
Willy Tarreau [Sun, 7 May 2023 05:31:54 +0000 (07:31 +0200)]
[RELEASE] Released version 2.8-dev10
Released version 2.8-dev10 with the following main changes :
- BUG/MINOR: stats: fix typo in `TotalSplicedBytesOut` field name
- REGTESTS: add success test, "set server" via fqdn
- MINOR: ssl: disable CRL checks with WolfSSL when no CRL file
- BUG/MINOR: stream/cli: fix stream age calculation in "show sess"
- MINOR: debug: clarify "debug dev stream" help message
- DEBUG: cli: add "debug dev task" to show/wake/expire/kill tasks and tasklets
- BUG/MINOR: ssl/sample: x509_v_err_str converter output when not found
- REGTESTS: ssl: simplify X509_V code check in ssl_client_auth.vtc
- BUILD: cli: fix build on Windows due to isalnum() implemented as a macro
- MINOR: activity: use a single macro to iterate over all fields
- MINOR: activity: show the line header inside the SHOW_VAL macro
- MINOR: activity: iterate over all fields in a main loop for dumping
- MINOR: activity: allow "show activity" to restart dumping on any line
- MINOR: activity: allow "show activity" to restart in the middle of a line
- DEV: haring: automatically disable DEBUG_STRICT
- DEV: haring: update readme to suggest using the same build options for haring
- BUG/MINOR: debug: fix incorrect profiling status reporting in show threads
- MINOR: debug: permit the "debug dev loop" to run under isolation
- BUG/MEDIUM: mux-h2: Properly handle end of request to expect data from server
- BUG/MINOR: mux-quic: prevent quic_conn error code to be overwritten
- MINOR: mux-quic: add trace event for local error
- MINOR: mux-quic: wake up after recv only if avail data
- MINOR: mux-quic: adjust local error API
- MINOR: mux-quic: report local error on stream endpoint asap
- MINOR: mux-quic: close connection asap on local error
- BUG/MINOR: debug: do not emit empty lines in thread dumps
- BUG/MINOR: mux-h2: Also expect data when waiting for a tunnel establishment
- BUG/MINOR: time: fix NS_TO_TV macro
- MEDIUM: debug: simplify the thread dump mechanism
- MINOR: debug: write panic dump to stderr one thread at a time
- MINOR: debug: make "show threads" properly iterate over all threads
- CLEANUP: debug: remove the now unused ha_thread_dump_all_to_trash()
- MINOR: ssl: allow to change the server signature algorithm
- MINOR: ssl: allow to change the signature algorithm for client authentication
- MINOR: cli: Use applet API to write output message
- MINOR: stats: Use the applet API to write data
- MINOR: peers: Use the applet API to send message
- MINOR: stconn: Add a field to specify the room needed by the SC to progress
- MEDIUM: tree-wide: Change sc API to specify required free space to progress
- BUG/MEDIUM: stconn: Unblock SC from stream if there is enough room to progrees
- MEDIUM: applet: Check room needed to unblock opposite SC when data was consumed
- MEDIUM: stconn: Check room needed to unblock SC on fast-forward
- MEDIUM: stconn: Check room needed to unblock opposite SC when data was sent
- MINOR: hlua_fcn: fix Server.is_draining() return type
- MINOR: hlua_fcn: add Server.is_backup()
- MINOR: hlua_fcn: add Server.is_dynamic()
- MINOR: hlua_fcn: add Server.tracking()
- MINOR: hlua_fcn: add Server.get_trackers()
- MINOR: hlua_fcn: add Server.get_proxy()
- MINOR: hlua_fcn: add Server.get_pend_conn() and Server.get_cur_sess()
- MINOR: hlua_fcn: add Proxy.get_srv_act() and Proxy.get_srv_bck()
- DOC: lua/event: add ServerEvent class header
- MINOR: server/event_hdl: publish macro helper
- MINOR: server/event_hdl: add SERVER_STATE event
- OPTIM: server: publish UP/DOWN events from STATE change
- MINOR: hlua: expose SERVER_STATE event
- MINOR: server/event_hdl: add SERVER_ADMIN event
- MINOR: hlua: expose SERVER_ADMIN event
- MINOR: checks/event_hdl: SERVER_CHECK event
- MINOR: hlua/event_hdl: expose SERVER_CHECK event
- MINOR: mailers/hlua: disable email sending from lua
- MINOR: hlua: expose proxy mailers
- EXAMPLES: add lua mailers script to replace tcpcheck mailers
- BUG/MINOR: hlua: spinning loop in hlua_socket_handler()
- MINOR: server: fix message report when IDRAIN is set and MAINT is cleared
- CLEANUP: hlua: hlua_register_task() may longjmp
- REGTESTS: use lua mailer script for mailers tests
- MINOR: hlua: declare hlua_{ref,pushref,unref} functions
- MINOR: hlua: declare hlua_gethlua() function
- MINOR: hlua: declare hlua_yieldk() function
- MINOR: hlua_fcn: add Queue class
- EXAMPLES: mailqueue for lua mailers script
- MINOR: quic: add format argument for "show quic"
- MINOR: quic: implement oneline format for "show quic"
- MINOR: config: allow cpu-map to take commas in lists of ranges
- CLEANUP: fix a few reported typos in code comments
- DOC: fix a few reported typos in the config and install doc
Willy Tarreau [Fri, 5 May 2023 14:10:05 +0000 (16:10 +0200)]
MINOR: config: allow cpu-map to take commas in lists of ranges
The function that cpu-map uses to parse CPU sets, parse_cpu_set(), was
etended in 2.4 with commit a80823543 ("MINOR: cfgparse: support the
comma separator on parse_cpu_set") to support commas between ranges.
But since it was quite late in the development cycle, by then it was
decided not to add a last-minute surprise and not to magically support
commas in cpu-map, hence the "comma_allowed" argument.
Since then we know that it was not the best choice, because the comma
is silently ignored in the cpu-map syntax, causing all sorts of
surprises in field with threads running on a single node for example.
In addition it's quite common to copy-paste a taskset line and put it
directly into the haproxy configuration.
This commit relaxes this rule an finally allows cpu-map to support
commas between ranges. It simply consists in removing the comma_allowed
argument in the parse_cpu_set() function. The doc was updated to
reflect this.
MINOR: quic: implement oneline format for "show quic"
Add a new output format "oneline" for "show quic" command. This prints
one connection per line with minimal information. The objective is to
have an equivalent of the netstat/ss tools with just enough information
to quickly find connection which are misbehaving.
A legend is printed on the first line to describe the field columns
starting with a dash character.
Add an extra optional argument for "show quic" to specify desired output
format. Its objective is to control the verbosity per connections. For
the moment, only "full" is supported, which is the already implemented
output with maximum information.
Lua mailers scripts now leverages Queue class to implement a mailqueue.
Thanks to the mailqueue, emails are still sent asynchronously, but with
respect to their generation order (better ordering consistency).
That is, previous script limitation (see below) is no longer true:
"
Current known script limitation: if multiple events are generated
simultaneously it is possible that emails could be received out of
order since emails are sent asynchronously using smtp_send_email()
and there is no sending queue. Relying on the email "date" should
help to know which email was generated first..
"
For a given server, email-alerts will be sent in the same order as the
events were generated. However this does not apply to events between 2
distinct servers, since each server is tracked independently within
the lua script. (1 subscription per server, to make sure only relevant
servers x events are being tracked and prevent useless wakeups)
This class provides a generic FIFO storage mechanism that may be shared
between multiple lua contexts to easily pass data between them, as stock
Lua doesn't provide easy methods for passing data between multiple
coroutines.
New Queue object may be obtained using core.queue()
(it works like core.concat() for a concat Class)
Lua documentation was updated (including some usage examples)
Since mailers/healthcheckmail.vtc already requires lua to emulate the
SMTP server for the test, force it to use lua mailers example script
to send email-alerts so we don't rely anymore on legacy tcpcheck
mailers implementation.
This is done by simply loading examples/mailers.lua (as a symlink) from
haproxy config file.
MINOR: server: fix message report when IDRAIN is set and MAINT is cleared
Remaining in drain mode after removing one of server admins flags leads
to this message being generated:
"Server name/backend is leaving forced drain but remains in drain mode."
However this is not necessarily true: the server might just be leaving
MAINT with the IDRAIN flag set, so the report is incorrect in this case.
(FDRAIN was not set so it cannot be cleared)
To prevent confusion around this message and to comply with the code
comment above it: we remove the "leaving forced drain" precision to
make the report suitable for multiple transitions.
BUG/MINOR: hlua: spinning loop in hlua_socket_handler()
Since 3157222 ("MEDIUM: hlua/applet: Use the sedesc to report and detect
end of processing"), hlua_socket_handler() might spin loop if the hlua
socket is destroyed and some data was left unconsumed in the applet.
Prior to the above commit, the stream was explicitly KILLED
(when ctx->die == 1) so the app couldn't spinloop on unconsumed data.
But since the refactor this is no longer the case.
To prevent unconsumed data from waking the applet indefinitely, we consume
pending data when either one of EOS|ERROR|SHR|SHW flags are set, as it is
done everywhere else this check is performed in the code. Hence it was
probably overlooked in the first place during the refacto.
This bug is 2.8 specific only, so no backport needed.
EXAMPLES: add lua mailers script to replace tcpcheck mailers
Legacy mailers implemented using tcpchecks may now be replaced using a
pure lua implementation.
Simply loading the file "mailers.lua" using lua-load directive is enough
to disable the legacy mailer implementation and make the lua script
subscribe to server events in order to generate messages and send email
alerts to configured mailservers according to the mailers configuration
specified by the user in the config file.
lua-load-per-thread directive is supported, the script will automatically
force itself on a single thread to prevent multiple mails from being sent
for the same event.
The email sending from lua in itself is handled with smtp_send_email()
function from Thierry FOURNIER.
(see: https://www.arpalert.org/how-to-send-email.html)
The function was slightly adapted to send HELO instead of EHLO when
beginning the SMTP handshake to make it more compatible with basic SMTP
stacks and to comply with the legacy SMTP handshake performed in
mailers.c.
Authentication is not yet handled by smtp_send_email(), but it may be
further improved to do so.
Note that existing lua libraries may also support sending emails (even
with authentication support maybe?), I did not do much researchs about
this, so I'm not aware of existing solutions at the time of writing this
script.
The only restriction is that when using an external library, the library
function calls must not be blocking, since haproxy relies on lua
executions to be yieldable and rescheduled.
As long as the script complies with this limitation, it may be customized
or improved in many ways, including templating, making calls to APIs
services.. (ie: triggering automation flows, sending SMS alerts...
you name it)
The purpose of this script is to provide a basic working replacement for
legacy mailers implementation based on tcpchecks, which is planned for
removal in the future. tcpcheck based mailers is kind of a hack which is
not suitable as a long term solution.
(hard to maintain and not customizable)
Note: Email content for email alerts sent using this script might slightly
differ from the legacy implementation (some optional info might be missing
such as server's check dynamic description, or included statistics such as
currently active servers may appear out of sync) due the email generation
now being performed asynchronously. However the output format complies with
the original one and essential informations are consistently reported.
Current known script limitation: if multiple events are generated
simultaneously it is possible that emails could be received out of
order since emails are sent asynchronously using smtp_send_email()
and there is no sending queue. Relying on the email "date" should
help to know which email was generated first..
Proxy mailers, which are configured using "email-alert" directives
in proxy sections from the configuration, are now being exposed
directly in lua thanks to the proxy:get_mailers() method which
returns a class containing the various mailers settings if email
alerts are configured for the given proxy (else nil is returned).
Both the class and the proxy method were marked as LEGACY since this
feature relies on mailers configuration, and the goal here is to provide
the last missing bits of information to lua scripts in order to make
them capable of sending email alerts instead of relying on the soon-to-
be deprecated mailers implementation based on checks (see src/mailers.c)
MINOR: mailers/hlua: disable email sending from lua
Exposing a new hlua function, available from body or init contexts, that
forcefully disables the sending of email alerts even if the mailers are
defined in haproxy configuration.
This will help for sending email directly from lua.
(prevent legacy email sending from intefering with lua)
SERVER_ADMIN is implemented as an advanced server event.
It is published each time the administrative state changes.
(when s->cur_admin changes)
SERVER_ADMIN data is an event_hdl_cb_data_server_admin struct that
provides additional info related to the admin state change, but can
be casted as a regular event_hdl_cb_data_server struct if additional
info is not needed.
OPTIM: server: publish UP/DOWN events from STATE change
Reuse cb_data from STATE event to publish UP and DOWN events.
This saves some CPU time since the event is only constructed
once to publish STATE, STATE+UP or STATE+DOWN depending on the
state change.
SERVER_STATE is implemented as an advanced server event.
It is published each time the server's effective state changes.
(when s->cur_state changes)
SERVER_STATE data is an event_hdl_cb_data_server_state struct that
provides additional info related to the server state change, but can
be casted as a regular event_hdl_cb_data_server struct if additional
info is not needed.
add a macro helper to help publish server events to global and
per-server subscription list at once since all server events
support both subscription modes.
MEDIUM: stconn: Check room needed to unblock opposite SC when data was sent
After a sending attempt, we check the opposite SC to see if it is waiting
for a minimum free space to receive more data. If the condition is
respected, it is unblocked. 0 is special case where the SC is
unconditionally unblocked.
MEDIUM: stconn: Check room needed to unblock SC on fast-forward
During fast-forward, if the SC is waiting for a minimum free space to
receive more data and some data was sent, it is only unblock is the
condition is respected. 0 is special case where the SC is unconditionally
unblocked.
MEDIUM: applet: Check room needed to unblock opposite SC when data was consumed
If the opposite SC is waiting for a minimum free space to receive more data,
it is only unblock is the condition is respected. 0 is a special cases where
the opposite SC is always unblocked.
BUG/MEDIUM: stconn: Unblock SC from stream if there is enough room to progrees
At the end of process_stream(), in sc_update_rx(), the SC is now unblocked
if it was waiting for room and the free space in the input buffer is large
enough. This patch should fix an issue with the compression filter that can
leave the channel's buffer empty while the endpoint is waiting for room to
progress. Indeed, in this case, because the buffer is empty, there is no
send attempt and no other way to unblock the SE.
This commit depends on following commits:
* MEDIUM: tree-wide: Change sc API to specify required free space to progress
* MINOR: stconn: Add a field to specify the room needed by the SC to progress
* MINOR: peers: Use the applet API to send message
* MINOR: stats: Use the applet API to write data
* MINOR: cli: Use applet API to write output message
It should fix a regression introduced with the commit 341a5783b
("BUG/MEDIUM: stconn: stop to enable/disable reads from streams via
si_update_rx").
It must be backported iff the commit above is also backported. It was not
backported yet and it is thus probably a good idea to not do so to avoid to
backport too many change..
MEDIUM: tree-wide: Change sc API to specify required free space to progress
sc_need_room() now takes the required free space to receive more data as
parameter. All calls to this function are updated accordingly. For now, this
value is set but not used. When we are waiting for a buffer, 0 is used. So
we expect to be unblocked ASAP. However this must be reviewed because
SC_FL_NEED_BUF is probably enough in this case and this flag is already set
if the input buffer allocation fails.
MINOR: stconn: Add a field to specify the room needed by the SC to progress
When the SC is blocked because it is waiting for room in the input buffer,
it will be responsible to specify the minimum free space required to
progress. In this commit, we only introduce the field in the stconn
structure that will be used to store this value. It is a signed value with
the following meaning:
* -1: The SC is waiting for room but not based on the buffer state. It
will be typically used during splicing when the pipe is full. In
this case, only a successful send can unblock the SC.
* >= 0; The minimum free space in the input buffer to unblock the SC. 0 is
a special value to specify the SC must be unblocked ASAP, by the
stream, at the end of process_stream() or when output data are
consumed on the opposite side.
The peers applet now use the applet API to send message instead of the
channel API. This way, it does not need to take care to request more room if
it fails to put data into the channel's buffer.
stats_putchk() is updated to use the applet API instead of the channel API
to write data. To do so, the appctx is passed as parameter instead of the
channel. This way, the applet does not need to take care to request more
room it it fails to put data into the channel's buffer.
MINOR: cli: Use applet API to write output message
Instead of using the channel API to to write output message from the CLI
applet, we use the applet API. This way, the applet does not need to take
care to request more room it it fails to put its message into the channel's
buffer.
MINOR: ssl: allow to change the server signature algorithm
This patch introduces the "sigalgs" keyword for the bind line, which
allows to configure the list of server signature algorithms negociated
during the handshake. Also available as "ssl-default-bind-sigalgs" in
the default section.
Willy Tarreau [Thu, 4 May 2023 17:07:56 +0000 (19:07 +0200)]
MINOR: debug: make "show threads" properly iterate over all threads
Previously it would re-dump all threads to the same trash if the output
buffer was full, which it never was since the trash is of the same size.
Now it dumps one thread, copies it to the buffer and yields until it can
continue. Showing 256 threads works as expected.
Willy Tarreau [Thu, 4 May 2023 16:52:51 +0000 (18:52 +0200)]
MINOR: debug: write panic dump to stderr one thread at a time
Currently large setups cannot dump all their threads because they're
first dumped to the trash buffer, then copied to stderr. Here we can
now change this, instead we dump one thread at a time into the trash
and immediately send it to stderr. We also keep a copy into a local
trash chunk that's assigned to thread_dump_buffer so that a core file
still contains a copy of a large number of threads, which is generally
sufficient for the vast majority of situations.
It was verified that dumping 256 threads now produces ~55kB of output
and all of them are properly dumped.
Willy Tarreau [Thu, 4 May 2023 14:23:51 +0000 (16:23 +0200)]
MEDIUM: debug: simplify the thread dump mechanism
The thread dump mechanism that is used by "show threads" and by the
panic dump is overly complicated due to an initial misdesign. It
firsts wakes all threads, then serializes their dumps, then releases
them, while taking extreme care not to face colliding dumps. In fact
this is not what we need and it reached a limit where big machines
cannot dump all their threads anymore due to buffer size limitations.
What is needed instead is to be able to dump *one* thread, and to let
the requester iterate on all threads.
That's what this patch does. It adds the thread_dump_buffer to the
struct thread_ctx so that the requester offers the buffer to the
thread that is about to be dumped. This buffer also serves as a lock.
A thread at rest has a NULL, a valid pointer indicates the thread is
using it, and 0x1 (NULL+1) is used by the dumped thread to tell the
requester it's done. This makes sure that a given thread is dumped
once at a time. In addition to this, the calling thread decides
whether it accesses the thread by itself or via the debug signal
handler, in order to get a backtrace. This is much saner because the
calling thread is free to do whatever it wants with the buffer after
each thread is dumped, and there is no dependency between threads,
once they've dumped, they're free to continue (and possibly to dump
for another requester if needed). Finally, when the THREAD_DUMP
feature is disabled and the debug signal is not used, the requester
accesses the thread by itself like before.
For now we still have the buffer size limitation but it will be
addressed in future patches.
As such, NS_TO_TV usage in hlua_now() is currently incorrect and this
results in unexpected values being passed to lua.
In this patch, we're adding an extra parenthesis around 't' in NS_TO_TV()
macro to make it safe against such usages. (that is: ensure proper
argument expansion as if NS_TO_TV was implemented as a function)
BUG/MINOR: mux-h2: Also expect data when waiting for a tunnel establishment
When a client H2 stream is waiting for a tunnel establishment, it must state
it expects data from server. It is the second fix that should fix
regressions of the commit 2722c04b ("MEDIUM: mux-h2: Don't expect data from
server as long as request is unfinished")
Willy Tarreau [Thu, 4 May 2023 14:28:30 +0000 (16:28 +0200)]
BUG/MINOR: debug: do not emit empty lines in thread dumps
In 2.3, commit 471425f51 ("BUG/MINOR: debug: Don't dump the lua stack
if it is not initialized") introduced the possibility to emit an empty
line when there's no Lua info to dump. The problem is that doing this
on the CLI in "show threads" marks the end of the output, and it may
affect some external tools. We need to make sure that LFs are only
emitted if there's something on the line and that all lines properly
start with the prefix.
This may be backported as far as 2.0 since the commit above was
backported there.
MINOR: mux-quic: close connection asap on local error
With the change for QUIC MUX local error API, the new flag QC_CF_ERRL is
now checked on qc_detach(). If set, qcs instance is freed even though
transfer is not finished. This should help to quickly release qcs and
eventually all MUX instance resources.
To further accelerate this, a specific check has been added in
qc_shutw(). It is skipped if local error flag is set to prevent noisy
reset stream invocation. In the same way, QUIC MUX is not rescheduled on
qc_recv_buf() operation if local error flag set.
MINOR: mux-quic: report local error on stream endpoint asap
If an error a detected at the MUX layer, all remaining stream endpoints
should be closed asap with error set. This is now done by checking for
QC_CF_ERRL flag on qc_wake_some_streams() and qc_send_buf(). To complete
this, qc_wake_some_streams() is called by qc_process() if needed.
This should help to quickly release streams as soon as a new error is
detected locally by the MUX or APP layer. This allows to in turn free
the MUX instance itself. Previously, error would not have been
automatically reported until the transport layer closure would occur
on CONNECTION_CLOSE emission.
When a fatal error is detected by the QUIC MUX or H3 layer, the
connection should be closed with a CONNECTION_CLOSE with an error code
as the reason.
Previously, a direct call was used to the quic_conn layer to try to
close the connection. This API was adjusted to be more flexible. Now,
when an error is detected, the function qcc_set_error() is called. This
set the flag QC_CF_ERRL with the error code stored by the MUX. The
connection will be closed soon so most of the operations are not
conducted anymore. Connection is then finally closed during qc_send()
via quic_conn layer if QC_CF_ERRL is set. This will set the flag
QC_CF_ERRL_DONE which indicates that the MUX instance can be freed.
This model is cleaner and brings the following improvments :
- interaction with quic_conn layer for closure is centralized on a
single function
- CO_FL_ERROR is not set anymore. This was incorrect as this should be
reserved to errors reported by the transport layer to be similar with
other haproxy components. As a consequence, qcc_is_dead() has been
adjusted to check for QC_CF_ERRL_DONE to release the MUX instance.
MINOR: mux-quic: wake up after recv only if avail data
When HTX content is transferred from qcs instance to upper stream
endpoint, a wakeup is conducted for MUX tasklet. However, this is only
necessary if demux was interrupted due to a full QCS HTX buffer.
BUG/MINOR: mux-quic: prevent quic_conn error code to be overwritten
When MUX performs a graceful shutdown, quic_conn error code is set to a
"no error" code which depends on the application layer used. However,
this may overwrite a previous error code if quic_conn layer has detected
an error on its side.
In practice, this behavior has not been seen on production. In fact, it
may have undesirable effect only if this error code modification happens
between the quic_conn error detection and the emission of the
CONNECTION_CLOSE, so it should be pretty rare. However, there is still a
tiny possibility it may happen.
To prevent this, first check that quic_conn error code is not set before
setting it. Ideally, transport layer API should be adjusted to be able
to set this without fiddling with the quic_conn directly.
BUG/MEDIUM: mux-h2: Properly handle end of request to expect data from server
The commit 2722c04b ("MEDIUM: mux-h2: Don't expect data from server as long
as request is unfinished") introduced a regression in the H2 multiplexer.
The end of the request is not systematically handled to state a H2 stream on
client side now expexts data from the server.
Indeed, while the client is uploading its request, the H2 stream warns it
does not expect data from the server. This way, no server timeout is applied
at this stage. When end of the request is detected, the H2 stream must state
it now expects the server response. This enables the server timeout.
However, it was only performed at one place while the end of the request can
be handled at different places. First, during a zero-copy in
h2_rcv_buf(). Then, when the SC is created with the full request. Because of
this bug, it is possible to totally disable the server timeout for H2
streams.
In h2_rcv_buf(), we now rely on h2s flags to detect the end of the request,
but only when the rxbuf was emptied.
Willy Tarreau [Thu, 4 May 2023 09:50:26 +0000 (11:50 +0200)]
MINOR: debug: permit the "debug dev loop" to run under isolation
Sometimes it's convenient to test the effect of tasks running under
isolation, e.g. to validate the contents of the crash dumps. Let's
add an optional "isolated" keyword to "debug dev loop" for this.
Willy Tarreau [Thu, 4 May 2023 09:30:55 +0000 (11:30 +0200)]
BUG/MINOR: debug: fix incorrect profiling status reporting in show threads
Thread dumps include a field "prof" for each thread that reports whether
task profiling is currently active or not. It turns out that in 2.7-dev1,
commit 680ed5f28 ("MINOR: task: move profiling bit to per-thread")
mistakenly replaced it with a check for the current thread's bit in the
thread dumps, which basically is the only place where another thread is
being watched. The same mistake was done a few lines later by confusing
threads_want_rdv_mask with the profiling mask. This mask disappeared
in 2.7-dev2 with commit 598cf3f22 ("MAJOR: threads: change thread_isolate
to support inter-group synchronization"), though instead we know the ID
of the isolated thread. This commit fixes this and now reports "isolated"
instead of "wantrdv".
Willy Tarreau [Thu, 4 May 2023 06:09:02 +0000 (08:09 +0200)]
DEV: haring: automatically disable DEBUG_STRICT
Ideally haring should be compiled with the same options as haproxy so
that ring headers have the same size (e.g. with/without locks, with/
without lock debugging). But when enabling DEBUG_STRICT, BUG_ON() is
enabled and breaks the build by making references to complain() and
ha_backtrace_to_stderr().
Let's just disable DEBUG_STRICT before opening include files. This is
sufficient to address the problem.
This may be backorted to older versions that include haring.
Willy Tarreau [Wed, 3 May 2023 14:18:30 +0000 (16:18 +0200)]
MINOR: activity: allow "show activity" to restart in the middle of a line
16kB buffers are not enough to dump 4096 threads with up to 10 bytes value
on each line. By storing the column number in the applet's context, we can
now restart from the last attempted column. This requires to dump all values
as they are produced, but it doesn't cost that much: a 4096-thread output
from a fesh process produces 300kB of output in ~8ms, or ~400us per call
(19*16kB), most of which are spent in vfprintf(). Given that we don't print
more than needed, it doesn't really change anything.
The main caveat is that when interrupted on such large lines, there's a
great possibility that the total or average on the first column doesn't
match anymore the sum or average of all dumped values. In order to avoid
this whenever possible (typically less than ~1500 threads), we first try
to dump entire lines and only proceed one column at a time when we have
to retry a failed dump. This is already the same for other stats that are
dumped in an interruptible way anyway and there's little that can be done
about it at this point (and not much immediately perceived benefit in
doing this with extreme accuracy for >1500 threads).