BUG/MINOR: signals/poller: set the poller timeout to 0 when there are signals
When receiving a signal before entering the poller, and without any
activity in the process, the poller will be entered with a timeout
calculated without checking the signals.
Since commit 4f59d3 ("MINOR: time: increase the minimum wakeup interval
to 60s") the issue is much more visible because it could be stuck for
60s.
When in mworker mode, if a worker quits and the SIGCHLD signal deliver
at the right time to the master, this one could be stuck for the time of
the timeout.
MINOR: activity/cli: support sorting task profiling by total CPU time
The new "bytime" sorting criterion uses the reported CPU time instead of
the usage. This is convenient to spot tasks that are mostly reponsible
for the CPU usage in a running process. It supports both the detailed
and the aggregated format. The output looks like this:
MINOR: activity/cli: support aggregating task profiling outputs
By default we now dump stats between caller and callee, but by
specifying "aggr" on the command line, stats get aggregated by
callee again as it used to be before the feature was available.
It may sometimes be helpful when comparing total call counts,
though that's about all.
MINOR: tasks/activity: improve the caller-callee activity hash
The previous dump already showed that the "other" category was getting
a few entries. Let's proceed like for the memory profiling, by scanning
a limited range of adjacent slots to find a spare one (16 max). That's
pretty fast since close and likely prefetched and the comparison is
cheap. The new dump now shows up to 45 entries below without "other":
MEDIUM: tasks/activity: combine the called function with the caller
Now instead of getting aggregate stats per called function, we have
them per function AND per call place. The "byaddr" sort considers
the function pointer first, then the call count, so that dominant
callers of a given callee are instantly spotted. This allows to get
sorted outputs like this:
It already becomes visible that some tasks have different very costs
depending where they're called (e.g. process_stream). The method used
to wake them up is also shown. Applets are handled specially and shown
as appctx_wakeup.
CLEANUP: activity: make the number of sched activity entries more configurable
This removes all the hard-coded 8-bit and 256 entries to use a pair of
macros instead so that we can more easily experiment with larger table
sizes if needed.
CLEANUP: sched: remove duplicate code in run_tasks_from_list()
Now that ->wake_date is common to tasks and tasklets, we don't need
anymore to carry a duplicate control block to read and update it for
tasks and tasklets. And given that this code was present early in the
if/else fork between tasks and tasklets, taking it out of the block
allows to move the task part into a more visible "else" branch that
also allows to factor the epilogue that resets th_ctx->current and
updates profile_entry->cpu_time, which also used to be duplicated.
Overall, doing just that saved 253 bytes in the function, or ~1/6,
which is not bad considering that it's on a hot path. And the code
got much ore readable.
CLEANUP: task: move tid and wake_date into the common part
There used to be one tid for tasklets and a thread_mask for tasks. Since
2.7, both tasks and tasklets now use a tid (albeit with a very slight
semantic difference for the negative value), to in order to limit code
duplication and to ease debugging it makes sense to move tid into the
common part. One limitation is that it will leave a hole in the structure,
but we now have the wake_date that is always present and can move there as
well to plug the hole.
This results in something overall pretty clean (and cleaner than before),
with the low-level stuff (state,tid,process,context) appearing first, then
the caller stuff (caller,wake_date,calls,debug) next, and finally the
type-specific stuff (rq/wq/expire/nice).
DEBUG: task: simplify the caller recording in DEBUG_TASK
Instead of storing an index that's swapped at every call, let's use the
two pointers as a shifting history. Now we have a permanent "caller"
field that records the last caller, and an optional prev_caller in the
debug section enabled by DEBUG_TASK that keeps a copy of the previous
caller one. This way, not only it's much easier to follow what's
happening during debugging, but it saves 8 bytes in the struct task in
debug mode and still keeps it under 2 cache lines in nominal mode, and
this will finally be usable everywhere and later in profiling.
The caller_idx was also used as a hint that the entry was freed, in order
to detect wakeup-after-free. This was changed by setting caller to -1
instead and preserving its value in caller[1].
Finally, the operations were made atomic. That's not critical but since
it's used for debugging and race conditions represent a significant part
of the issues in multi-threaded mode, it seems wise to at least eliminate
some possible factors of faulty analysis.
DEBUG: applet: instrument appctx_wakeup() to log the caller's location
appctx_wakeup() relies on task_wakeup(), but since it calls it from a
function, the calling place is always appctx_wakeup() itself, which is
not very useful.
Let's turn it to a macro so that we can log the location of the caller
instead. As an example, the cli_io_handler() which used to be seen as
this:
(gdb) p *appctx->t.debug.caller[0]
$10 = {
func = 0x9ffb78 <__func__.37996> "appctx_wakeup",
file = 0x9b336a "include/haproxy/applet.h",
line = 110,
what = 1 '\001',
arg8 = 0 '\000',
arg32 = 0
}
Now shows the more useful:
(gdb) p *appctx->t.debug.caller[0]
$6 = {
func = 0x9ffe80 <__func__.38641> "sc_app_chk_snd_applet",
file = 0xa00320 "src/stconn.c",
line = 996,
what = 6 '\006',
arg8 = 0 '\000',
arg32 = 0
}
DEBUG: task: use struct ha_caller instead of arrays of file:line
This reduces the task struct by 8 bytes, reduces the code size a little
bit by simplifying the calling convention (one argument dropped), and
as a bonus provides the function name in the caller.
The memstats code currently defines its own file/function/line number,
type and extra pointer. We don't need to keep them separate and we can
easily replace them all with just a struct ha_caller. Note that the
extra pointer could be converted to a pool ID stored into arg8 or
arg32 and be dropped as well, but this would first require to define
IDs for pools (which we currently do not have).
MINOR: debug: add struct ha_caller to describe a calling location
The purpose of this structure is to assemble all constant parts of a
generic calling point for a specific event. These ones are created by
the compiler as a static const element outside of the code path, so
they cost nothing in terms of CPU, and a pointer to that descriptor
can be passed to the place that needs it. This is very similar to what
is being done for the mem_stat stuff.
This will be useful to simplify and improve DEBUG_TASK.
There are a few places where it's convenient to hash a pointer to compute
a statistics bucket. Here we're basically reusing the hash that was used
by memory profiling with a minor update that the multiplier was corrected
to be prime and stand by its promise to have equal numbers of 1 and 0,
and that 32-bit platforms won't lose range anymore.
MINOR: tasks: do not keep cpu and latency times in struct task
It was a mistake to put these two fields in the struct task. This
was added in 1.9 via commit 9efd7456e ("MEDIUM: tasks: collect per-task
CPU time and latency"). These fields are used solely by streams in
order to report the measurements via the lat_ns* and cpu_ns* sample
fetch functions when task profiling is enabled. For the rest of the
tasks, this is pure CPU waste when profiling is enabled, and memory
waste 100% of the time, as the point where these latencies and usages
are measured is in the profiling array.
Let's move the fields to the stream instead, and have process_stream()
retrieve the relevant info from the thread's context.
The struct task is now back to 120 bytes, i.e. almost two cache lines,
with 32 bit still available.
BUG/MINOR: stream/sched: take into account CPU profiling for the last call
When task profiling is enabled, the reported CPU time for short requests
and responses (e.g. redirect) is always zero in the logs, because
process_stream() is only called once and the CPU time is measured after
it returns. This is particuarly annoying when dealing with denies and in
general anything that deals with parasitic traffic because it can be
difficult to figure where the CPU is spent.
The solution taken in this patch consists in having process_stream()
update the cpu time itself before logging and quitting. It's very simple.
It will not take into account the time taken to produce the log nor
freeing the stream, but that's marginal compared to always logging zero.
The task's wake_date is also reset so that the scheduler doesn't have to
perform these operations again. This is dependent on the following patch:
MINOR: sched: store the current profile entry in the thread context
It should be backported to 2.6 as it does help for troubleshooting.
MINOR: sched: store the current profile entry in the thread context
The profile entry that corresponds to the current task/tasklet being
profiled is now stored into the thread's context. This will allow it
to be accessed from the tasks themselves. This is needed for an upcoming
fix.
BUG/MINOR: sched: properly account for the CPU time of dying tasks
When task profiling is enabled, the scheduler can measure and report
the cumulated time spent in each task and their respective latencies. But
this was wrong for tasks with few wakeups as well as for self-waking ones,
because the call date needed to measure how long it takes to process the
task is retrieved in the task itself (->wake_date was turned to the call
date), and we could face two conditions:
- a new wakeup while the task is executing would reset the ->wake_date
field before returning and make abnormally low values being reported;
that was likely the case for taskèrun_applet for self-waking applets;
- when the task dies, NULL is returned and the call date couldn't be
retrieved, so that CPU time was not being accounted for. This was
particularly visible with process_stream() which is usually called
only twice per request, and whose time was systematically halved.
The cleanest solution here is to keep in mind that the scheduler already
uses quite a bit of local context in th_ctx, and place the intermediary
values there so that they cannot vanish. The wake_date has to be reset
immediately once read, and only its copy is used along the function. Note
that this must be done both for tasks and tasklet, and that until recently
tasklets were also able to report wrong values due to their sole dependency
on TH_FL_TASK_PROFILING between tests.
One nice benefit for future improvements is that such information will now
be available from the task without having to be stored into the task itself
anymore.
Since the tasklet part was computed on wrapping 32-bit arithmetics and
the task one was on 64-bit, the values were now consistently moved to
32-bit as it's already largely sufficient (4s spent in a task is more
than twice what the watchdog would tolerate). Some further cleanups might
be necessary, but the patch aimed at staying minimal.
Task profiling output after 1 million HTTP request previously looked like
this:
This feature is essentially used for debugging so it has extremely limited
impact. However it's used quite a bit more in bug reports and it would be
desirable that at least 2.6 gets this fix backported. It depends on at least
these two previous patches which will then also have to be backported:
MINOR: task: permanently enable latency measurement on tasklets
CLEANUP: task: rename ->call_date to ->wake_date
This field is misnamed because its real and important content is the
date the task was woken up, not the date it was called. It temporarily
holds the call date during execution but this remains confusing. In
fact before the latency measurements were possible it was indeed a call
date. Thus is will now be called wake_date.
This change is necessary because a subsequent fix will require the
introduction of the real call date in the thread ctx.
MINOR: task: permanently enable latency measurement on tasklets
When tasklet latency measurement was enabled in 2.4 with commit b2285de04
("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK is set"),
the feature was conditionned on DEBUG_TASK because the field would add 8
bytes to the struct tasklet.
This approach was not a very good idea because the struct ends on an int
anyway thus it does finish with a 32-bit hole regardless of the presence
of this field. What is true however is that adding it turned a 64-byte
struct to 72-byte when caller debugging is enabled.
This patch revisits this with a minor change. Now only the lowest 32
bits of the call date are stored, so they always fit in the remaining
hole, and this allows to remove the dependency on DEBUG_TASK. With
debugging off, we're now seeing a 48-byte struct, and with debugging
on it's exactly 64 bytes, thus still exactly one cache line. 32 bits
allow a latency of 4 seconds on a tasklet, which already indicates a
completely dead process, so there's no point storing the upper bits at
all. And even in the event it would happen once in a while, the lost
upper bits do not really add any value to the debug reports. Also, now
one tasklet wakeup every 4 billion will not be sampled due to the test
on the value itself. Similarly we just don't care, it's statistics and
the measurements are not 9-digit accurate anyway.
BUG/MINOR: task: make task_instant_wakeup() work on a task not a tasklet
There's a subtle (harmless) bug in task_instant_wakeup(). As it uses
some tasklet code instead of some task code, the debug part also acts
on the tasklet equivalent, and the call_date is only set when DEBUG_TASK
is set instead of inconditionally like with tasks. As such, without this
debugging macro, call dates are not updated for tasks woken this way.
There isn't any impact yet because this function was introduced in 2.6 to
solve certain classes of issues and is not used yet, and in the worst case
it would only affect the reported latency time.
This may be backported to 2.6 in case a future fix would depend on it but
currently will not fix existing code.
BUG/MINOR: task: always reset a new tasklet's call date
The tasklet's call date was not reset, so if profiling was enabled while
some tasklets were in the run queue, their initial random value could be
used to preload a bogus initial latency value into the task profiling bin.
Let's just zero the initial value.
This should be backported to 2.4 as it was brought with initial commit b2285de04 ("MINOR: tasks: also compute the tasklet latency when DEBUG_TASK
is set"). The impact is very low though.
BUG/MINOR: quic: Wrong connection ID to thread ID association
To work, quic_pin_cid_to_tid() must set cid[0] to a value with <target_id>
as <global.nbthread> modulo. For each integer n, (n - (n % m)) + d has always
d as modulo m (with d < m).
except when n wraps or when another modulo is applied to the addition result.
Here, for 8bit modulo arithmetic, if m does not divides 256, this cannot
works for values which wraps when we increment them by d.
For instance n=255 m=3 and d=1 the formula result is 0 (should be d).
To fix this, we first limit c[0] to 255 - <target_id> to prevent c[0] from wrapping.
Thank you to @esb for having reported this issue in GH #1855.
MINOR: quic: add QUIC support when no client_hello_cb
Add QUIC support to the ssl_sock_switchctx_cbk() variant used only when
no client_hello_cb is available.
This could be used with libreSSL implementation of QUIC for example.
It also works with quictls when HAVE_SSL_CLIENT_HELLO_CB is removed from
openss-compat.h
BUILD: ssl: fix ssl_sock_switchtx_cbk when no client_hello_cb
When building HAProxy with USE_QUIC and libressl 3.6.0, the
ssl_sock_switchtx_cbk symbol is not found because libressl does not
implement the client_hello_cb.
A ssl_sock_switchtx_cbk version for the servername callback is available
but wasn't exported correctly.
BUG/MINOR: quic: Possible crash when verifying certificates
This verification is done by ssl_sock_bind_verifycbk() which is set at different
locations in the ssl_sock.c code . About QUIC connections, there are a lot of chances
the connection object is not initialized when entering this function. What must
be accessed is the SSL object to retrieve the connection or quic_conn objects,
then the bind_conf object of the listener. If the connection object is not found,
we try to find the quic_conn object.
Modify ssl_sock_dump_errors() interface which takes a connection object as parameter
to also passed a quic_conn object as parameter. Again this function try first
to access the connection object if not NULL or the quic_conn object if not.
There is a remaining thing to do for QUIC: store the certificate verification error
code as it is currently stored in the connection object. This error code is at least
used by the "bc_err" and "fc_err" sample fetches.
There are chances this bug is in relation with GH #1851. Thank you to @tasavis
for the report.
BUG/MINOR: h1: Support headers case adjustment for TCP proxies
On frontend side, "h1-case-adjust-bogus-client" option is now supported in
TCP mode. It is important to be able to adjust the case of response headers
when a connection is routed to an HTTP backend. In this case, the client
connection is upgraded to H1.
On backend side, "h1-case-adjust-bogus-server" option is now also supported
in TCP mode to be able to perform HTTP health-checks with a case adjustment
of the request headers.
MINOR: http-check: Remove support for headers/body in "option httpchk" version
This trick is deprecated since the health-check refactoring, It is now
invalid. It means the following line will trigger an error during the
configuration parsing:
BUG/MINOR: quic: Possible crash with "tls-ticket-keys" on QUIC bind lines
ssl_tlsext_ticket_key_cb() is called when "tls-ticket-keys" option is used on a
"bind" line. It needs to have an access to the TLS ticket keys which have been
stored into the listener bind_conf struct. The fix consists in nitializing the
<ref> variable (references to TLS secret keys) the correct way when this callback
is called for a QUIC connection. The bind_conf struct is store into the quic_conn
object (QUIC connection).
This issue may be in relation with GH #1851. Thank you for @tasavis for the report.
BUG/MINOR: quic: Retransmitted frames marked as acknowledged
Obviously, frames which are duplicated from others must not be retransmitted if
the original frame they were duplicated from was already acknowledged.
This should have been detected by qc_build_frms() which skips such frames,
except if the QUIC xprt does really bad things which are not supported by
the upper layer. This will have to be checked with Amaury.
To prevent the retransmision of these frames which leads to crashes as reported by
hpn0t0ad this gdb backtrace in GH #1809 where the frame builder tries to copy a huge
number of bytes to the packet buffer:
Thread 7 (Thread 0x7fddf373a700 (LWP 13)):
#0 __memmove_sse2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:520
No locals.
#1 0x000055b17435705e in quic_build_stream_frame (buf=0x7fddf372ef78, end=<optimized out>, frm=0x7fdde08d3470, conn=<optimized out>) at src/quic_frame.c:515
to_copy = 18446697703428890384
stream = 0x7fdde08d3490
wrap = <optimized out>
Note that fc_rtt and fc_rttvar are supported on any OS that has TCP_INFO,
not just linux/freebsd/netbsd, so we continue to expose them unconditionally.
Released version 2.7-dev5 with the following main changes :
- BUG/MINOR: mux-quic: Fix memleak on QUIC stream buffer for unacknowledged data
- BUG/MEDIUM: cpu-map: fix thread 1's affinity affecting all threads
- MINOR: cpu-map: remove obsolete diag warning about combined ranges
- BUG/MAJOR: mworker: fix infinite loop on master with no proxies.
- REGTESTS: launch http_reuse_always in mworker mode
- BUG/MINOR: quix: Memleak for non in flight TX packets
- BUG/MINOR: quic: Wrong list_for_each_entry() use when building packets from qc_do_build_pkt()
- BUG/MINOR: quic: Safer QUIC frame builders
- MINOR: quic: Replace MT_LISTs by LISTs for RX packets.
- BUG/MEDIUM: applet: fix incorrect check for abnormal return condition from handler
- BUG/MINOR: applet: make the call_rate only count the no-progress calls
- MEDIUM: peers: limit the number of updates sent at once
- BUILD: tcp_sample: fix build of get_tcp_info() on OpenBSD
- BUG/MINOR: resolvers: return the correct value in resolvers_finalize_config()
- BUG/MINOR: mworker: does not create the "default" resolvers in wait mode
- BUG/MINOR: tcpcheck: Disable QUICKACK only if data should be sent after connect
- REGTESTS: Fix prometheus script to perform HTTP health-checks
- MINOR: resolvers: shut the warning when "default" resolvers is implicit
- Revert "BUG/MINOR: quix: Memleak for non in flight TX packets"
- BUG/MINOR: quic: Leak in qc_release_lost_pkts() for non in flight TX packets
- BUG/MINOR: quic: Stalled connections (missing I/O handler wakeup)
- CLEANUP: quic: No more use ->rx_list MT_LIST entry point (quic_rx_packet)
- CLEANUP: quic: Remove a useless check in qc_lstnr_pkt_rcv()
- MINOR: quic: Remove useless traces about references to TX packets
- Revert "MINOR: quic: Remove useless traces about references to TX packets"
- DOC: configuration: do-resolve doesn't work with a port in the string
- MINOR: sample: add the host_only and port_only converters
- BUG/MINOR: httpclient: fix resolution with port
- DOC: configuration.txt: do-resolve must use host_only to remove its port.
- BUG/MINOR: quic: Null packet dereferencing from qc_dup_pkt_frms() trace
- BUG/MINOR: quic: Frames added to packets even if not built.
- BUG/MEDIUM: spoe: Properly update streams waiting for a ACK in async mode
- BUG/MEDIUM: peers: Add connect and server timeut to peers proxy
- BUG/MEDIUM: peers: Don't use resync timer when local resync is in progress
- BUG/MEDIUM: peers: Don't start resync on reload if local peer is not up-to-date
- BUG/MINOR: hlua: Rely on CF_EOI to detect end of message in HTTP applets
- BUG/MEDIUM: mux-h1: do not refrain from signaling errors after end of input
- BUG/MINOR: epoll: do not actively poll for Rx after an error
- MINOR: raw-sock: don't try to send if an error was already reported
- BUG/MINOR: quic: Missing header protection AES cipher context initialisations (draft-v2)
- MINOR: quic: Add a trace to distinguish the datagram from the packets inside
- BUG/MINOR: ssl: fix deinit of the ca-file tree
- BUG/MINOR: ssl: leak of ckch_inst_link in ckch_inst_free()
- BUG/MINOR: tcpcheck: Disable QUICKACK for default tcp-check (with no rule)
- BUG/MEDIUM: ssl: Fix a UAF when old ckch instances are released
- BUG/MINOR: ssl: revert two wrong fixes with ckhi_link
- BUG/MINOR: dev/udp: properly preset the rx address size
- BUILD: debug: make sure debug macros are never empty
- MINOR: quic: Move traces about RX/TX bytes from QUIC_EV_CONN_PRSAFRM event
- BUG/MINOR: quic: TX frames memleak
- BUG/MINOR: ssl: leak of ckch_inst_link in ckch_inst_free() v2
- MINOR: sink/ring: rotate non-empty file-backed contents only
- BUG/MINOR: regex: Properly handle PCRE2 lib compiled without JIT support
- REGTESTS: http_request_buffer: Add a barrier to not mix up log messages
- BUG/MEDIUM: mux-h1: always use RST to kill idle connections in pools
- MINOR: backend: always satisfy the first req reuse rule with l7 retries
- BUG/MINOR: quic: Do not ack when probing
- MINOR: quic: Add TX frames addresses to traces to several trace events
- MINOR: quic: Trace typo fix in qc_release_frm()
- BUG/MINOR: quic: Frames leak during retransmissions
- BUG/MINOR: h2: properly set the direction flag on HTX response
- BUG/MEDIUM: httpclient: always detach the caller before self-killing
- BUG/MINOR: httpclient: only ask for more room on failed writes
- BUG/MINOR: httpclient: keep-alive was accidentely disabled
- MEDIUM: httpclient: enable ALPN support on outgoing https connections
- BUG/MINOR: mux-h2: fix the "show fd" dest buffer for the subscriber
- BUG/MINOR: mux-h1: fix the "show fd" dest buffer for the subscriber
- BUG/MINOR: mux-fcgi: fix the "show fd" dest buffer for the subscriber
- DEBUG: stream: minor rearrangement of a few fields in struct stream.
- MINOR: debug: report applet pointer and handler in crashes when known
- MINOR: mux-h2: extract the stream dump function out of h2_show_fd()
- MINOR: mux-h2: extract the connection dump function out of h2_show_fd()
- MINOR: muxes: add a "show_sd" helper to complete "show sess" dumps
- MINOR: mux-h2: provide a "show_sd" helper to output stream debugging info
- MINOR: mux-h2: insert line breaks in "show sess all" output for legibility
- MINOR: mux-quic: provide a "show_sd" helper to output stream debugging info
- MINOR: mux-h1: split "show_fd" into connection and stream
- MINOR: mux-h1: provide a "show_sd" helper to output stream debugging info
- BUG/MINOR: http-act: initialize http fmt head earlier
BUG/MINOR: http-act: initialize http fmt head earlier
In github issue #1850, Christian Ruppert reported a case of crash in
2.6 when failing to parse some http rules. This started to happen
with 2.6 commit dd7e6c6 ("BUG/MINOR: http-rules: completely free
incorrect TCP rules on error") but has some of its roots in 2.2
commit 2eb539687 ("MINOR: http-rules: Add release functions for
existing HTTP actions").
The cause is that when the release function is set for HTTP actions,
the rule->arg.http.fmt list head is not yet initialized, hence is
NULL, thus the release function crashes when it tries to iterate over
it. In fact this code was initially not written with the perspective
of releasing such elements upon error, so the arg list initialization
happened after error checking.
This patch just moves the list initialization just after setting the
release pointer and that's OK.
This patch must be backported to 2.6 since the problem is visible
there. It could be backported to 2.5 but the issue is not triggered
there without the first mentioned patch above that landed in 2.6, so
it will not bring any obvious benefit.
MINOR: mux-h1: split "show_fd" into connection and stream
We now have two functions, one for dumping connections and the other
one for dumping the streams. This will permit to use it from show_sd.
A few optional line breaks were inserted where relevant to keep lines
homogenous when a prefix is passed.
MINOR: mux-h2: insert line breaks in "show sess all" output for legibility
h2s and h2c were extremely long in the "show sess all" output, around 300
chars each. This adds a few line breaks to improve legibility, there are
now 3 lines for each, which are around the same length as the other ones
while keeping a natural arrangement. E.g (lines highlighted with '>'):
MINOR: mux-h2: provide a "show_sd" helper to output stream debugging info
With this, it now becomes possible to see the state of each H2 stream from
"show sess all". Lines are still too long and need to be split, but that's
for another patch.
MINOR: muxes: add a "show_sd" helper to complete "show sess" dumps
This helper will be called for muxes that provide it and will be used
to let the mux provide extra information about the stream attached to
a stream descriptor. A line prefix is passed in argument so that the
mux is free to break long lines without breaking indent. No prefix
means no line breaks should be produced (e.g. for short dumps).
MINOR: mux-h2: extract the stream dump function out of h2_show_fd()
The function will be reusable to dump streams, so let's extract it.
Note that due to "last_h2s" being originally printed as a prefix for
the stream dump, now the pointer is displayed by the caller instead.
MINOR: debug: report applet pointer and handler in crashes when known
When an appctx is found looping over itself, we report a number of info
but not the pointers to the definition nor the handler, which can be quite
handy in some cases. Let's add them and try to decode the symbol.
DEBUG: stream: minor rearrangement of a few fields in struct stream.
Some recent traces started to show confusing stream pointers ending with
0xe. The reason was that the stream's obj_type was almost unused in the
past and was stuffed in a hole in the structure. But now it's present in
all "show sess all" outputs and having to mentally match this value against
another one that's 0x17e lower is painful. The solution here is to move the
obj_type at the top, like in almost every other structure, but without
breaking the efficient layout.
This patch moves a few fields around and manages to both plug some holes
(16 bytes saved, 976 to 960) and avoid channels needlessly crossing cache
boundaries (res was spread over 3 lines vs 2 now).
Nothing else was changed. It would be desirable to backport this to 2.6
since it's where dumps are currently being processed the most.
BUG/MINOR: mux-fcgi: fix the "show fd" dest buffer for the subscriber
Commit 1776ffb97 ("MINOR: mux-fcgi: make the "show fd" helper also decode
the fstrm subscriber when known") improved the output of "show fd" for the
FCGI mux, but the output is sent to the trash buffer instead of the msg
argument. It turns out that this has no effect right now as the caller
passes the trash but this is risky.
BUG/MINOR: mux-h1: fix the "show fd" dest buffer for the subscriber
Commit 150c4f8b7 ("MINOR: mux-h1: make the "show fd" helper also decode
the h1s subscriber when known") improved the output of "show fd" for the
H1 mux, but the output is sent to the trash buffer instead of the msg
argument. It turns out that this has no effect right now as the caller
passes the trash but this is risky.
BUG/MINOR: mux-h2: fix the "show fd" dest buffer for the subscriber
Commit 98e40b981 ("MINOR: mux-h2: make the "show fd" helper also decode
the h2s subscriber when known") improved the output of "show fd" for the
H2 mux, but the output is sent to the trash buffer instead of the msg
argument. It turns out that this has no effect right now as the caller
passes the trash but this is risky.
MEDIUM: httpclient: enable ALPN support on outgoing https connections
Since everything is available for this, let's enable ALPN with the
usual "h2,http/1.1" on the https server. This will allow HTTPS requests
to use HTTP/2 when available.
It may be needed to permit to disable this (or to set the string) in
case some client code explicitly checks for the "HTTP/1.1" string, but
since httpclient is quite young it's unlikely that such code already
exists.
Willy Tarreau [Fri, 19 Aug 2022 15:20:43 +0000 (17:20 +0200)]
BUG/MINOR: httpclient: keep-alive was accidentely disabled
The servers were not set with default settings, meaning that a few
settings including the pool_max_delay were not set, thus disabling
connection pools, which is the cause of the fact that keep-alive was
disabled as reported in issue #1831. There might possibly be other
issues pending since all these fields were left to zero.
Note that this patch alone will not fix keep-alive because the applet
does not enforce SE_FL_NOT_FIRST and relies on the default http-reuse
safe, thus if servers are not shared, all requests are considered
first ones and do not reuse existing connections.
In 2.7, commit ecb40b2c3 ("MINOR: backend: always satisfy the first
req reuse rule with l7 retries") addressed this in a more elegant way
by fixing http-reuse to take into account the fact that properly
configured l7 retries provide exactly the capability that reuse safe
was trying to cover, and this patch is suitable for backporting.
BUG/MINOR: httpclient: only ask for more room on failed writes
There's a tiny issue in the I/O handler by which both a failed request
emission and missing response data will want to subscribe for more room
on output. That's not correct in that only the case where the request
buffer is full should cause this, the other one should just wait for
incoming data. This could theoretically cause spurious wakeups at
certain key points (e.g. connect() time maybe) though this could not
be reproduced but better fix this while it's easy enough.
It doesn't seem necessary to backport it right now, though this may
have to in case a concrete reproducible case is discovered.
BUG/MEDIUM: httpclient: always detach the caller before self-killing
If the caller dies before the server responds, the httpclient can crash
in hc_cli_res_end_cb() when unregistering because it dereferences
hc->caller which was already freed during the caller's unregistration.
The easiest way to reproduce it is by sending twice the following
request on the same CLI connection in expert mode, with httpterm
running on local port 8000:
httpclient GET http://127.0.0.1:8000/?t=600
Note the 600ms delay that's larger than socat's default 500.
The code checks for a NULL everywhere hc->caller is used, but the NULL
was forgotten in this specific case. It must be placed in the second
half of httpclient_stop_and_destroy() which is responsible for signaling
the client that the caller leaves.
BUG/MINOR: h2: properly set the direction flag on HTX response
In 1.9-dev, a new flag was introduced on the start line with commit f1ba18d7b ("MEDIUM: htx: Don't rely on h1_sl anymore except during H1
header parsing") to designate a response message: HTX_SL_F_IS_RESP.
Unfortunately as it was done in parallel to the mux_h2 support for
the backend, it was never integrated there. It was not used by then
so this remained unnoticed for a while.
However the http_client now uses it, and missing that flag prevents
it from using the H2 mux, so let's properly add it.
There's no point in backporting this far away, but since the http_client
is fully operational in 2.6 it would make sense to backport this fix at
least there to secure the code.
BUG/MINOR: quic: Frames leak during retransmissions
The frame which are retransmitted by qc_dgrams_retransmit() are duplicated
from sent but not acknowledged packets and added to local frames lists.
Some may not have been sent. If not replaced somewhere (linked to the
connection) they are lost for ever (leak). We splice the list remaining
contents to the packets number space frame list to avoid such a situation.
<force_ack> boolean variable passed to qc_do_build_pkt() which builds a clear
packet is there to force this function to build an ACK frame regardless of
others conditions. This is used during handshake, when we acknowledge every
handshake packets received.
This variable was already taken into an account by the local variable <must_ack>
which is there at least to ignore any other conditions than this one: "are
we building a probing packet?". Indeed we do not want to add ACK frames when
we probe the peers. This is to have more chances to embed the new duplicated frames
into another packets without splitting them. So, the test on <force_ack> boolean
value is useless, silly and brakes the rule which consists in not acknowledging
when probing.
MINOR: backend: always satisfy the first req reuse rule with l7 retries
The "first req" rule consists in not delivering a connection's first
request to a connection that's not known for being safe so that we
don't deliver a broken page to a client if the server didn't intend to
keep it alive. That's what's used by "http-reuse safe" particularly.
But the reason this rule was created was precisely because haproxy was
not able to re-emit the request to the server in case of connection
breakage, which is precisely what l7 retries later brought. As such,
there's no reason for enforcing this rule when l7 retries are properly
enabled because such a blank page will trigger a retry and will not be
delivered to the client.
This patch simply checks that the l7 retries are enabled for the 3 cases
that can be triggered on a dead or dying connection (failure, empty, and
timeout), and if all 3 are enabled, then regular idle connections can be
reused.
This could almost be marked as a bug fix because a lot of users relying
on l7 retries do not necessarily think about using http-reuse always due
to the recommendation against it in the doc, while the protection that
the safe mode offers is never used in that mode, and it forces the http
client not to reuse existing persistent connections since it never sets
the "not first" flag.
It could also be decided that the protection is not used either when
the origin is an applet, as in this case this is internal code that
we can decide to let handle the retry by itself (all info are still
present). But at least the httpclient will be happy with this alone.
It would make sense to backport this at least to 2.6 in order to let
the httpclient reuse connections, maybe to older releases if some
users report low reuse counts.
BUG/MEDIUM: mux-h1: always use RST to kill idle connections in pools
When idle H1 connections cannot be stored into a server pool or are later
evicted, they're often seen closed with a FIN then an RST. The problem is
that this is sufficient to leave them in TIME_WAIT in the local sockets
table and port exhaustion may happen.
The reason is that in h1_release() we rely on h1_shutw_conn() which itself
decides whether to close in silent or normal mode only based on the
H1C_F_ST_SILENT_SHUT flag. This flag is only set by h1_shutw() based on
the requested mode. But when the connection is in the idle list, the mode
ought to always be silent.
What this patch does is to set the flag before trying to add to the idle
list, and remove it after removing from the idle list. This way if the
connection fails to be added or has to be killed, it's closed with an
RST.
This must be backported as far as 2.4. It's not sure whether older
versions need an equivalent.
REGTESTS: http_request_buffer: Add a barrier to not mix up log messages
Depending on the timing, time to time, the log messages can be mixed. A
client can start and be fully handled by HAProxy (including its log message)
before the log message of the previous client was emitted or received. To
fix the issue, a barrier was added to be sure to eval the "expect" rule on
logs before starting the next client.
This patch should fix the issue #1847. It may be backported to all branches
containing this reg-tests.
BUG/MINOR: regex: Properly handle PCRE2 lib compiled without JIT support
The PCRE2 JIT support is buggy. If HAProxy is compiled with USE_PCRE2_JIT
option while the PCRE2 library is compiled without the JIT support, any
matching will fail because pcre2_jit_compile() return value is not properly
handled. We must fall back on pcre2_match() if PCRE2_ERROR_JIT_BADOPTION
error is returned.
This patch should fix the issue #1848. It must be backported as far as 2.4.
Willy Tarreau [Wed, 31 Aug 2022 16:52:17 +0000 (18:52 +0200)]
MINOR: sink/ring: rotate non-empty file-backed contents only
If the service is rechecked before a reload, that may cause the config
to be parsed twice and file-backed rings to be lost.
Here we make sure that such a ring does contain information before
deciding to rotate it. This way the first process starting after some
writes will cause a rotate but not subsequent ones until new writes
are applied.
An attempt was also made to disable rotations on checks but this was a
bad idea, as the ring is still initialized and this causes the contents
to be lost. The choice of initializing the ring during parsing is
questionable but the config check ought to be as close as possible to a
real start, and we could imagine that the ring is used by some code
during startup (e.g. lua). So this approach was abandonned and config
checks also cause a rotation, as the purpose of this rotation is to
preserve latest information against accidental removal.
BUG/MINOR: ssl: leak of ckch_inst_link in ckch_inst_free() v2
ckch_inst_free() unlink the ckch_inst_link structure but never free it.
It can't be fixed simply because cli_io_handler_commit_cafile_crlfile()
is using a cafile_entry list to iterate a list of ckch_inst entries
to free. So both cli_io_handler_commit_cafile_crlfile() and
ckch_inst_free() would modify the list at the same time.
In order to let the caller manipulate the ckch_inst_link,
ckch_inst_free() now checks if the element is still attached before
trying to detach and free it.
For this trick to work, the caller need to do a LIST_DEL_INIT() during
the iteration over the ckch_inst_link.
list_for_each_entry was also replace by a while (!LIST_ISEMPTY()) on the
head list in cli_io_handler_commit_cafile_crlfile() so the iteration
works correctly, because it could have been stuck on the first detached
element. list_for_each_entry_safe() is not enough to fix the issue since
multiple element could have been removed.
MINOR: quic: Move traces about RX/TX bytes from QUIC_EV_CONN_PRSAFRM event
Move these traces to QUIC_EV_CONN_SPPKTS trace event. They were displayed
at a useless location. Make them displayed just after having sent a packet
and when checking the anti-amplication limit.
Useful to diagnose issues in relation with the recovery.
Willy Tarreau [Wed, 31 Aug 2022 08:52:25 +0000 (10:52 +0200)]
BUILD: debug: make sure debug macros are never empty
As outlined in commit f7ebe584d7 ("BUILD: debug: Add braces to if
statement calling only CHECK_IF()"), the BUG_ON() family of macros
is incorrectly defined to be empty when debugging is disabled, and
that can lead to trouble. Make sure they always fall back to the
usual "do { } while (0)". This may be backported to 2.6 if needed,
though no such issue was met there to date.
The architecture is ambiguous here: ckch_inst_free() is detaching and
freeing the "ckch_inst_link" linked list which must be free'd only from
the cafile_entry side.
The problem was also hidden by the fix ddd480c ("BUG/MEDIUM: ssl: Fix a
UAF when old ckch instances are released") which change the ckchi_link
inner loop by a safe one. However this can't fix entirely the problem
since both __ckch_inst_free_locked() could remove several nodes in the
ckchi_link linked list.
This revert is voluntary reintroducing a memory leak before really fixing
the problem.
BUG/MEDIUM: ssl: Fix a UAF when old ckch instances are released
When old chck instances is released at the end of "commit ssl ca-file" or
"commit ssl crl-file" commands, the link is released. But we walk through
the list using the unsafe macro. list_for_each_entry_safe() must be used.
This bug was introduced by commit 056ad01d5 ("BUG/MINOR: ssl: leak of
ckch_inst_link in ckch_inst_free()"). Thus this patch must be backported as
far as 2.5.
BUG/MINOR: tcpcheck: Disable QUICKACK for default tcp-check (with no rule)
The commit 871dd8211 ("BUG/MINOR: tcpcheck: Disable QUICKACK only if data
should be sent after connect") introduced a regression. It removes the test
on the next rule to be able to disable TCP_QUICKACK when only a connect is
performed (so no next rule).
This bug arrived with this commit:
"MINOR: quic: Add reusable cipher contexts for header protection"
haproxy could crash because of missing cipher contexts initializations for
the header protection and draft-v2 Initial secrets. This was due to the fact
that these initialization both for RX and TX secrets were done outside of
qc_new_isecs(). The role of this function is definitively to initialize these
cipher contexts in addition to the derived secrets. Indeed this function is called
by qc_new_conn() which initializes the connection but also by qc_conn_finalize()
which also calls qc_new_isecs() in case of a different QUIC version was negotiated
by the peers from the one used by the client for its first Initial packet.
This was reported by "v2" QUIC interop test with at least picoquic as client.
Willy Tarreau [Mon, 29 Aug 2022 14:48:14 +0000 (16:48 +0200)]
MINOR: raw-sock: don't try to send if an error was already reported
There's no point trying to send() on a socket on which an error was already
reported. This wastes syscalls. Till now it was possible to occasionally
see an attempt to sendto() after epoll_wait() had reported EPOLLERR.
Willy Tarreau [Mon, 29 Aug 2022 16:15:11 +0000 (18:15 +0200)]
BUG/MINOR: epoll: do not actively poll for Rx after an error
In 2.2, commit 5d7dcc2a8 ("OPTIM: epoll: always poll for recv if neither
active nor ready") was added to compensate for the fact that our iocbs
are almost always asynchronous now and do not have the opportunity to
update the FD correctly. As such, they just perform a wakeup, the FD is
turned to inactive, the tasklet wakes up, performs the I/O, updates the
FD, most of the time this is done withing the same polling loop, and the
update cancels itself in the poller without having to switch the FD off
then on.
The issue was that when deciding to claim an FD was active for reads
if it was active for writes, we forgot one situation that unfortunately
causes excessive wakeups: dealing with errors. Indeed, errors are
reported and keep ringing as long as the FD is active for sending even
if the consumer disabled the FD for receiving. Usually this only causes
one extra wakeup for the time it takes to consider a potential write
subscriber and to call it, though with many tasks in a run queue, it
can last a bit longer and be reported more often.
The fix consists in checking that we really want to get more receive
events on this FD, that is:
- that no prevous EPOLLERR was reported
- that the FD doesn't carry a sticky error
- that the FD is not shut for reads
With this, after the last epoll_wait() reports EPOLLERR, one last recv()
is performed to flush pending data and the FD is immediately unregistered.
It's probably not needed to backport this as its effects are not much
visible, though it should not harm.
Willy Tarreau [Mon, 29 Aug 2022 08:22:56 +0000 (10:22 +0200)]
BUG/MEDIUM: mux-h1: do not refrain from signaling errors after end of input
In 2.6-dev4, a fix for truncated response was brought with commit 99bbdbcc2
("BUG/MEDIUM: mux-h1: only turn CO_FL_ERROR to CS_FL_ERROR with empty ibuf"),
trying to address the situation where an error is present at the connection
level but some data are still pending to be read by the stream. However,
this patch did not consider the case where the stream was no longer willing
to read the pending data, resulting in a situation where some aborted
transfers could lead to excessive CPU usage by causing constant stream
wakeups for which no error was reported. This perfectly matches what was
observed and reported in github issue #1842. It's not trivial to reproduce,
but aborting HTTP/1 pipelining in the middle of transfers seems to give
good results (using h2load and Ctrl-C in the middle).
The fix was incorrct as the error should be held only if there were data
that the stream was able to read. This is the approach taken by this patch,
which also checks via SE_FL_EOI | SE_FL_EOS that the stream will be able
to consume the pending data.
Note that the loop was provoked by the attempt by sc_conn_io_cb() itself
to call sc_conn_send() which resulted in a write subscription in
h1_subscribe() which immediately calls a tasklet_wakeup() since the
event is ready, and that it is now stopped by the presence of SE_FL_ERROR
that is checked in sc_conn_io_cb(). It seems that an extra check down the
send() path to refrain from subscribing when the connection is in error
could speed up error detection or at least avoid a risk of loops in this
case, but this is tricky. In addition, there's already SE_FL_ERR_PENDING
that seems more suitable for reporting when there are pending data, but
similarly, it probably isn't checked well enough to be suitable for
backports.
FWIW the issue may (unreliably) be reproduced by chaining haproxy to
httpterm and issuing:
It's necessary to play with the size of the head command that's supposed
to trigger the error at some point. A variant involving h2load in h1 mode
and multiple pipelined streams, that is stopped with Ctrl-C also tends to
work.
As the fix above was backported as far as 2.0, it would be tempting to
backport this one as far. However tests have shown that the oldest
version that can trigger this issue is 2.5, maybe due to subtle
differences in older ones, so it's probably not worth going further
until an issue is reported. Note that in 2.5 and older, the SE_FL_*
flags are applied on the conn_stream instead, as CS_FL_*.
Special thanks go to Felipe W Damasio for providing lots of detailed data
allowing to quickly spot the root cause of the problem.
BUG/MINOR: hlua: Rely on CF_EOI to detect end of message in HTTP applets
applet:getline() and applet:receive() functions for HTTP applets must rely
on the channel flags to detect the end of the message and not on HTX
flags. It means CF_EOI must be used instead of HTX_FL_EOM.
It is important because the HTX flag is transient. Because there is no flag
on HTTP applets to save the info, it is not reliable. However CF_EOI once
set is never removed. So it is safer to rely on it. Otherwise, the call to
these functions hang.
BUG/MEDIUM: peers: Don't start resync on reload if local peer is not up-to-date
On a reload, if the previous resync was not finished, the freshly old worker
must not try to start a new resync. Otherwise, it will compete with the
older wokers, slowing down or blocking the resync. Only an up-to-date woker
must try to perform a local resync.
This patch must be backported as far as 2.0 (and maybe to 1.8 too).
BUG/MEDIUM: peers: Don't use resync timer when local resync is in progress
When a worker is stopped, the resync timer is used to limit in time the
connection stage to the new worker to perform the local resync. However,
this timer must be stopped when the resync is in progress and it must be
re-armed if the resync is interrupted (for instance because another
reload). Otherwise, if the resync is a bit long, an old worker may be killed
too early.
This bug was introduce by the commit 160fff665 ("BUG/MEDIUM: peers: limit
reconnect attempts of the old process on reload"). It must be backported as
far as 2.0.
BUG/MEDIUM: peers: Add connect and server timeut to peers proxy
Only the client timeout was set. Nothing prevent a peer applet to stall
during a connect or waiting a message from a remote peer. To avoid any
issue, it is important to also set connection and server timeouts. The
connect timeout is set to 1s and the server timeout is set to 5s.
This patch must be backported to all supported versions.
BUG/MEDIUM: spoe: Properly update streams waiting for a ACK in async mode
A bug was introduced by the commit b042e4f6f ("BUG/MAJOR: spoe: properly
detach all agents when releasing the applet"). The fix is not correct. We
really want to known if the released appctx is the last one or not. It is
important when async mode is used. If there are still running applets, we
just need to remove the reference on the current applet from streams in the
async waiting queue.
With the commit above, in async mode, if there are still running applets, it
will work as expected. Otherwise a processing timeout will be reported for
all these streams. So it is not too bad. But for other modes (sync and
pipelining), the async waiting queue is always empty. If at least one stream
is waiting to send a message, a new applet is created. It is an issue if the
SPOA is unhealthy because the number of running applets may explode.
However, the commit above tried to fix an issue. The bug is in fact when an
new SPOE applet is created. On success, we must remove reference on the
current appctx from the streams in the async waiting queue.
BUG/MINOR: quic: Frames added to packets even if not built.
Several frames could remain as not build into <frm_list> built by qc_build_frms()
after having stopped at the first building error. So only one frame was reinserted in
the frame list passed as parameter to qc_do_build_pkt(). Then <frm_list> was
spliced to the packet frame list even its frames were not built, nor attached to
any packet. Such frames had their ->pkt member set to NULL, but considered as
built, then sent leading to a crash in qc_release_frm() where ->pkt is dereferenced.
This issue was again reported by useful traces provided by Tristan in GH #1808.
BUG/MINOR: quic: Null packet dereferencing from qc_dup_pkt_frms() trace
This function must duplicate frames be resent from packets. Some of
them are still in flight, others have already been detected as lost.
In this case the original frame ->pkt member is NULL.
Add a trace to distinguish these cases.
Thank you to Tristan for having reported this issue in GH #1808.
Revert "MINOR: quic: Remove useless traces about references to TX packets"
This reverts commit f61398a7caa35d780639a5961d2b1ea427f710b6.
After having checked a version with more traces and reproduced the issue
as reported by Tristan in GH #1808, there are remaining cases where
a duplicated but not already sent frame have to be marked as acked because
the frame it was copied from was acknowledeged before its copied was sent.