Willy Tarreau [Tue, 3 May 2022 15:08:29 +0000 (17:08 +0200)]
CLEANUP: stats/cli: take the "show stat" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing (both in the CLI and HTTP).
The change looks large but it's particularly mechanical. The context
initialization appears in stats.c and http_ana.c. The context is used
in stats.c and resolvers.c since "show stat resolvers" points there.
That's the reason why the definition moved to stats.h. "show info"
and "show stat" continue to share the same state definition for now.
Willy Tarreau [Tue, 3 May 2022 16:05:23 +0000 (18:05 +0200)]
CLEANUP: promex: stop using appctx->st2
Now that we have out own context, there's no need to use the cryptic st2
struct member. It's solely used to carry a field number here, so let's
add this into the context.
Willy Tarreau [Tue, 3 May 2022 15:33:00 +0000 (17:33 +0200)]
CLEANUP: promex: make the applet use its own context
The prometheus applet used to map to the stats context since it was not
convenient to have one's own context, and to reuse the fields with its
own values and enums. The obj1 pointer was used both for proxies and
stick-tables; obj2 was used both for servers and listeners.
This change makes use of the generic command context allocation so that
the there's now a properly typed context for prometheus, defined in the
code itself and independent on the stats or appctx ones. For clarity,
the types are correctly set and there's one proxy, one table, one server
and one listener. Some could be compacted using unions but that's not
necessary since the context is reasonably compact. The stats' st_code
field was used as the object state so the new field name is obj_state.
An attempt was made to change the types to const for what us only visited
but some calls pass through the stats code to retrieve info and that code
uses non-const variables due to internal API limitations (read_freq_ctr()
being used and requiring variable). That could change in the future,
though.
Willy Tarreau [Tue, 3 May 2022 15:02:03 +0000 (17:02 +0200)]
CLEANUP: cli: initialize the whole appctx->ctx, not just the stats part
Historically the CLI was a second access to the stats and we've continued
to initialize only the stats part when initializing the CLI. Let's make
sure we do that on the whole ctx instead. It's probably not more needed
at all nowadays but better stay on the safe side.
Willy Tarreau [Tue, 3 May 2022 12:58:47 +0000 (14:58 +0200)]
CLEANUP: peers/cli: stop using appctx->st2 for the dump state
Let's instead define a 4-state enum solely for this use case, and place
it into the command's context. Note that END and FIN were already
aliases, which is why they were merged.
Willy Tarreau [Tue, 3 May 2022 12:26:31 +0000 (14:26 +0200)]
CLEANUP: peers/cli: take the "show peers" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. The code also uses st2 which deserves being
addressed in separate commit.
Willy Tarreau [Tue, 3 May 2022 13:42:07 +0000 (15:42 +0200)]
CLEANUP: map/cli: always detach the backref from the list after "show map"
There's no point checking the state before deciding to detach the backref
on "show map", it should always be done if the list is not empty. Note
that being empty guarantees that it's not linked into the list, and
conversely not being empty guarantees that it's in the list, hence the
test doesn't need to be performed under the lock.
Willy Tarreau [Tue, 3 May 2022 12:12:56 +0000 (14:12 +0200)]
CLEANUP: map/cli: stop using cli.i0/i1 to store the generation numbers
The "show map"/"clear map" code used to rely on the cli's i0/i1 fields
to store the generation numbers to work with. That's particularly dirty
because it's done at places where ctx.map is also manipulated while they
are part of the same union, and the reason why this didn't cause trouble
is because cli.i0/i1 are at offset 216/224 while the map parts end at
204, so luckily there was no overlap.
Willy Tarreau [Tue, 3 May 2022 09:54:47 +0000 (11:54 +0200)]
CLEANUP: map/cli: take the "show map" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. Many commands, including pure parsers, use this
context but that's not a problem as it's designed to be used this way.
Due to this, many lines are changed but that's in fact a replacement of
"appctx->ctx.map" with "ctx->". Note that the code also uses st2 which
deserves being addressed in separate commit.
Willy Tarreau [Tue, 3 May 2022 10:02:36 +0000 (12:02 +0200)]
CLEANUP: stick-table/cli: remove the unneeded STATE_INIT for "show table"
This state is pointless now, it just serves to initialize the initial
table pointer while this can be done more easily in the parser, so let's
do that and drop that state.
Willy Tarreau [Tue, 3 May 2022 09:45:02 +0000 (11:45 +0200)]
CLEANUP: stick-table/cli: stop using appctx->st2 for the dump state
Let's instead define a 4-state enum solely for this use case, and place
it into the command's context. Note that END and FIN were already
aliases, which is why they were merged.
Willy Tarreau [Tue, 3 May 2022 09:35:07 +0000 (11:35 +0200)]
CLEANUP: stick-table/cli: take the "show table" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing. The code also uses st2 which deserves being
addressed in separate commit.
Willy Tarreau [Tue, 3 May 2022 09:24:24 +0000 (11:24 +0200)]
CLEANUP: proxy/cli: take the "show errors" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing.
The code still has room for improvement, such as in the "flags" field
where bits are hard-coded, but they weren't modified.
Willy Tarreau [Tue, 3 May 2022 09:17:35 +0000 (11:17 +0200)]
CLEANUP: stream/cli: remove the now unneeded dump state from "show sess"
The state was a constant, let's remove what remains of the switch/case.
The code from the "case" statement was only reindented as can be checked
with "git show -b".
Willy Tarreau [Tue, 3 May 2022 09:10:19 +0000 (11:10 +0200)]
CLEANUP: stream/cli: remove the unneeded STATE_FIN state from "show sess"
This state is only an alias for "thr >= global.nbthread" which is the
sole condition that indicates the end and reaches it. Let's just drop
this state. There's only the STATE_LIST that's left.
Willy Tarreau [Tue, 3 May 2022 09:05:39 +0000 (11:05 +0200)]
CLEANUP: stream/cli: remove the unneeded init state from "show sess"
This state was only used to preset the list element. Now that we can
guarantee that the context can be properly preset during the parsing
we don't need this state anymore. The first pointer has to be set to
point to the first stream during the initial call which is detected
by the pointer not yet being set (null). Thanks to this we can also
remove one state check on the abort path.
Willy Tarreau [Tue, 3 May 2022 08:49:00 +0000 (10:49 +0200)]
CLEANUP: stream/cli: take the "show sess" context definition out of the appctx
This makes use of the generic command context allocation so that the
appctx doesn't have to declare a specific one anymore. The context is
created during parsing.
Willy Tarreau [Thu, 5 May 2022 17:55:03 +0000 (19:55 +0200)]
CLEANUP: applet: make appctx_new() initialize the whole appctx
Till now, appctx_new() used to allocate an entry from the pool and
pass it through appctx_init() to initialize a debatable part of it that
didn't correspond anymore to the comments, and fill other fields. It's
hard to say what is fully initialized and what is not.
Let's get rid of that, and always zero the initialization (appctx are
not that big anyway, even with the cache there's no difference in
performance), and initialize what remains. this is cleaner and more
resistant to new field additions.
Willy Tarreau [Mon, 2 May 2022 12:57:03 +0000 (14:57 +0200)]
MINOR: applet: reserve some generic storage in the applet's context
Instead of using existing fields and having to put keyword-specific
contexts in the applet definition, let's have the appctx provide a
generic storage area that's currently large enough for existing CLI
commands and small applets, and a function to allocate that storage.
The function will be responsible for verifying that the requested size
fits in the area so that the caller doesn't need to add specific checks;
it is validated during development as this size is static and will
not change at runtime. In addition the caller doesn't even need to
free() the area since it's part of an existing context. For the
caller's convenience, a context pointer "svcctx" for the command is
also provided so that the allocated area can be placed there (or
possibly any other one in case a larger area is needed).
The struct's layout has been temporarily complicated by adding one
level of anonymous union on top of the "ctx" one. This will allow us
to preserve "ctx" during 2.6 for compatibility with possible external
code and get rid of it in 2.7. This explains why the diff extends to
the whole "ctx" union, but a "git show -b" shows that only one extra
layer was added. In order to make both the svcctx pointer and its
storage accessible without further enlarging the appctx structure,
both svcctx and the storage share the same storage as the ctx part.
This is done by having them placed in the union with a protected
overlapping area for svcctx, for which a shadow member is also
present in the storage area:
union {
void* svcctx; // variable accessed by services
struct {
void *shadow; // shadow of svcctx;
char storage[]; // where most services store their data
};
union { // older commands store here and ignore svcctx
...
} ctx;
};
I.e. new applications will use appctx->svcctx while older ones will be
able to continue to use appctx->ctx.*
The whole area (including the pointer's context) is zeroed before any
applet is initialized, and before CLI keyword processor's first invocation,
as it is an important part of the existing keyword processors, which makes
CLI keywords effectively behave like applets.
Willy Tarreau [Thu, 5 May 2022 11:50:46 +0000 (13:50 +0200)]
CLEANUP: ssl/cli: do not loop on unknown states in "add ssl crt-list" handler
The io_handler in "add ssl crt_list" is built around a "while" loop that
only makes forward progress and that doesn't handle its final state as
it's not supposed to be called again once reached. This makes the code
confusing because its construct implies an infinite loop for such a
state (or any other unhandled one). Let's just remove that unneeded loop.
Willy Tarreau [Wed, 4 May 2022 14:11:50 +0000 (16:11 +0200)]
BUG/MINOR: ssl/cli: fix "show ssl cert" not to mix cli+ssl contexts
The "show ssl cert" command mixes some generic pointers from the
"ctx.cli" struct with context-specific ones from "ctx.ssl" while both
are in a union. Amazingly, despite the use of both p0 and i0 to store
respectively a pointer to the current ckchs and a transaction id, there
was no overlap with the other pointers used during these operations,
but should these fields be reordered or slightly updated this will break.
Comments were added above the faulty functions to indicate which fields
they are using.
Willy Tarreau [Wed, 4 May 2022 14:01:24 +0000 (16:01 +0200)]
BUG/MINOR: ssl/cli: fix "show ssl crl-file" not to mix cli+ssl contexts
The "show ssl crl-file" command mixes some generic pointers from the
"ctx.cli" struct with context-specific ones from "ctx.ssl" while both
are in a union. It's fortunate that the p1 pointer in use is located
before the first one used (it overlaps with old_cafile_entry). But
should these fields be reordered or slightly updated this will break.
Willy Tarreau [Wed, 4 May 2022 13:57:30 +0000 (15:57 +0200)]
BUG/MINOR: ssl/cli: fix "show ssl ca-file <name>" not to mix cli+ssl contexts
The "show ssl ca-file <name>" command mixes some generic pointers from
the "ctx.cli" struct and context-specific ones from "ctx.ssl" while both
are in a union. The i0 integer used to store the current ca_index overlaps
with new_crlfile_entry which is thus harmless for now but is at the mercy
of any reordering or addition of these fields. Let's add dedicated fields
into the ssl structure for this.
Comments were added on top of the affected functions to indicate what they
use.
Willy Tarreau [Wed, 4 May 2022 13:47:39 +0000 (15:47 +0200)]
BUG/MINOR: ssl/cli: fix "show ssl ca-file/crl-file" not to mix cli+ssl contexts
The "show ca-file" and "show crl-file" commands mix some generic pointers
from the "ctx.cli" struct and context-specific ones from "ctx.ssl" while
both are in a union. It's fortunate that the p0 pointer in use is located
immediately before the first one used (it overlaps with next_ckchi_link,
and old_cafile_entry is safe). But should these fields be reordered or
slightly updated this will break.
Comments were added on top of the affected functions to indicate what they
use.
Willy Tarreau [Tue, 3 May 2022 13:26:27 +0000 (15:26 +0200)]
BUG/MINOR: map/cli: make sure patterns don't vanish under "show map"'s init
When "show map" initializes itself, it first takes the reference to the
starting point under a lock, then releases it before switching to state
STATE_LIST, and takes the lock again. The problem is that it is possible
for another thread to remove the first element during this unlock/lock
sequence, and make the list run anywhere. This is of course extremely
unlikely but not impossible.
Let's initialize the pointer in the STATE_LIST part under the same lock,
which is simpler and more reliable.
Willy Tarreau [Tue, 3 May 2022 13:19:49 +0000 (15:19 +0200)]
BUG/MINOR: map/cli: protect the backref list during "show map" errors
In case of write error in "show map", the backref is detached but
the list wasn't locked when this is done. The risk is very low but
it may happen that two concurrent "show map" one of which would fail
or one "show map" failing while the same entry is being updated could
cause a crash.
Willy Tarreau [Thu, 5 May 2022 15:10:03 +0000 (17:10 +0200)]
BUG/MINOR: proxy/cli: don't enumerate internal proxies on "show backend"
Commit e7f74623e ("MINOR: stats: don't output internal proxies (PR_CAP_INT)")
in 2.5 ensured we don't dump internal proxies on the stats page, but the
same is needed for "show backend", as since the addition of the HTTP client
it now appears there:
$ socat /tmp/sock1 - <<< "show backend"
# name
<HTTPCLIENT>
Willy Tarreau [Thu, 5 May 2022 16:29:25 +0000 (18:29 +0200)]
BUG/MEDIUM: cli: make "show cli sockets" really yield
This command was introduced in 1.8 with commit eceddf722 ("MEDIUM: cli:
'show cli sockets' list the CLI sockets") but its yielding doesn't work.
Each time it enters, it restarts from the last bind_conf but enumerates
all listening sockets again, thus it loops forever. The risk that it
happens in field is low but it easily triggers on port ranges after
400-500 sockets depending on the length of their addresses:
$ socat /tmp/sock1 - <<< "show cli sockets"
(...)
ipv4@192.168.8.176:30426 operator all
ipv4@192.168.8.176:30427 operator all
ipv4@192.168.8.176:30428 operator all
ipv4@192.168.8.176:30000 operator all
ipv4@192.168.8.176:30001 operator all
ipv4@192.168.8.176:30002 operator all
^C
This patch adds the minimally needed restart point for the listener so
that it can easily be backported. Some more cleanup is needed though.
Willy Tarreau [Thu, 5 May 2022 14:00:45 +0000 (16:00 +0200)]
BUG/MEDIUM: resolvers: make "show resolvers" properly yield
The "show resolvers" command is bogus, it tries to implement a yielding
mechanism except that if it yields it restarts from the beginning, until
it manages to fill the buffer with only line breaks, and faces error -2
that lets it reach the final state and exit.
The risk is low since it requires about 50 name servers to reach that
state, but it's not impossible, especially when using multiple sections.
In addition, the extraneous line breaks, if sent over an interactive
connection, will desynchronize the commands and make the client believe
the end was reached after the first nameserver. This cannot be fixed
separately because that would turn this bug into an infinite loop since
it's the line feed that manages to fill the buffer and stop it.
The fix consists in saving the current resolvers section into ctx.cli.p1
and the current nameserver into ctx.cli.p2.
This should be backported, but that code moved a lot since it was
introduced and has always been bogus. It looks like it has mostly
stabilized in 2.4 with commit c943799c86 so the fix might be backportable
to 2.4 without too much effort.
MEDIUM: resolvers: create a "default" resolvers section at startup
Try to create a "default" resolvers section at startup, but does not
display any error nor warning. This section is initialized using the
/etc/resolv.conf of the system.
This is opportunistic and with no guarantee that it will work (but it should
on most systems).
This is useful for the httpclient as it allows to use the DNS resolver
without any configuration in most of the cases.
The function is called from the httpclient_pre_check() function to
ensure than we tried to create the section before trying to initiate the
httpclient. But it is also called from the resolvers.c to ensure the
section is created when the httpclient init was disabled.
MINOR: resolvers: move the resolv.conf parser in parse_resolv_conf()
Move the resolv.conf parser from the cfg_parse_resolvers so it could be
used separately.
Some changes were made in the memprintf in order to use a char **
instead of a char *. Also the variable is tested before each memprintf
so could skip them if no warnmsg nor errmsg were set.
DOC: config: Update doc for PR/PH session states to warn about rewrite failures
When an HTTP header rewrite failure is triggered, and 500-internal-error
response is returned. A "PR" termination state is logged if the error
occurred on the request and "PH" if the error is reported for the response.
BUG/MEDIUM: mux-h1: Be able to handle trailers when C-L header was specified
The commit 2eb5243e7 ("BUG/MEDIUM: mux-h1: Set outgoing message to DONE when
payload length is reached") introduced a regression. An internal error is
reported when we try to forward a message with trailers while the
content-length header was specified. Indeed, this case does not exist for H1
messages but it is possible in H2.
This patch should solve the issue #1684. It must be backported as far as
2.4.
BUG/MEDIUM: mux-fcgi: Be sure to never set EOM flag on an empty HTX message
This bug was already fixed at many places (stats, promex, lua) but the FCGI
multiplexer is also affected. When there is no content-length specified in
the response and when the END_REQUEST record is delayed, the response may be
truncated because an abort is erroneously detected. If the connection is not
closed because "keep-conn" option is set, the response is aborted at the end
of the server timeout.
This bug is a design issue with the HTX. It should be addressed. But it will
probably not be possible to backport them as far as 2.4. So, for now, the
only solution is to explicitly add an EOT block with the EOM flag in this
case.
This patch should fix the issue #1682. It must be backported as far as 2.4.
BUG/MEDIUM: conn-stream: Only keep app layer flags of the endpoint on reset
The commit a6c4a4834 ("BUG/MEDIUM: conn-stream: Don't erase endpoint flags
on reset") was too laxy on reset. Only app layer flags must be preserved. On
reset, the endpoint is detached. Thus all flags set by the endpoint itself
or concerning its type must be removed.
Without this fix, we can experienced crashes when a stream is released while
a server connection attempt failed. Indeed, in this case, endpoint of the
backend conn-stream is reset. But the endpoint type is still set. Thus when
the stream is released, the endpoint is detached again.
This patch is 2.6-specific. No backport needed. This commit depends on the
previous one ("MINOR: conn-stream: Add mask from flags set by endpoint or
app layer").
MINOR: conn-stream: Add mask from flags set by endpoint or app layer
In flags set on the endpoints, some are set by endpoints itself and some are
set by the app layer. To help flags manipulations, 2 masks have been
added. The first one, CS_EP_ENDP_MASK, for all flags that an endpoint may
set. The other one, CS_EP_APP_MASK, for flags that the app layer may set.
MINOR: httpclient: allow ipv4 or ipv6 preference for resolving
The httpclient.resolvers.prefer global keyword allows to configure an
ipv4 or ipv6 preference when resolving. This could be useful in
environment where the ipv6 is not supported.
MINOR: httpclient: configure the resolvers section to use
By default the httpclient uses the resolvers section whose ID is
"default", the httpclient.resolvers.id global option allows to configure
another section to use.
MINOR: httpclient: cleanup the error handling in init
Cleanup the error handling in the initialization so we rely on the
ERR_CODE and use memprintf() to set the errmsg before printing it at the
end of the functions.
MINOR: httpclient: handle unix and other socket types in dst
httpclient_set_dst() allows one to set the destination address instead
of using the one in the URL or resolving one from the host.
This function also support other types of socket like sockpair@, unix@,
anything that could be used on a server line.
In order to still support this behavior, the address must be set on the
backend in this particular case because the frontend connection does not
support anything other than ipv4 or ipv6.
BUG/MINOR: mux_quic: Dropped packet upon retransmission for closed streams
We rely on the largest ID which was used to open streams to know if the
stream we received STREAM frames for is closed or not. If closed, we return the
same status as the one for a STREAM frame which was a already received one for
on open stream.
It is possible that we continue to receive retransmitted STREAM frames after
the mux have been released. We rely on the ->rx.streams[].nb_streams counter
to check the stream was closed. If not, at this time we drop the packet.
MINOR: quic: Make the quic_conn be aware of the number of streams
This is required when the retransmitted frame types when the mux is released.
We add a counter for the number of streams which were opened or closed by the mux.
After the mux has been released, we can rely on this counter to know if the STREAM
frames are retransmitted ones or not.
Willy Tarreau [Mon, 2 May 2022 15:51:51 +0000 (17:51 +0200)]
MINOR: session: get rid of the now unused SESS_FL_ADDR_*_SET flags
That's similar to what was done for conn_streams and connections. The
flags were only set exactly when the relevant pointers were allocated,
so better test the pointer than the flag and stop setting the flag.
Willy Tarreau [Mon, 2 May 2022 15:47:46 +0000 (17:47 +0200)]
MINOR: connection: get rid of the CO_FL_ADDR_*_SET flags
Just like for the conn_stream, now that these addresses are dynamically
allocated, there is no single case where the pointer is set without the
corresponding flag, and the flag is used as a permission to dereference
the pointer. Let's just replace the test of the flag with a test of the
pointer and remove all flag assignment. This makes the code clearer
(especially in "if" conditions) and saves the need for future code to
think about properly setting the flag after setting the pointer.
Willy Tarreau [Mon, 2 May 2022 15:45:12 +0000 (17:45 +0200)]
CLEANUP: protocol: make sure the connect_* functions always receive a dst
Some of the protocol-level ->connect() functions currently dereference
the connection's destination address while others test it and return an
error. There's normally no more non-bogus code path that calls such
functions without a valid destination address on the connection, so
let's unify these functions and just place a BUG_ON() there, and drop
the useless test that's supposed to return an internal error.
Willy Tarreau [Mon, 2 May 2022 15:27:34 +0000 (17:27 +0200)]
MINOR: conn_stream: remove the now unused CS_FL_ADDR_*_SET flags
These flags indicate that the ->src or ->dst field in the conn_stream
is not null, which is something the caller already sees (and even tests
from the two sets of functions that set them). They maintain some burden
because an agent trying to set a source or destination has to manually
set the flags in addition to setting the pointer, so they provide no
value anymore, let's drop them.
Willy Tarreau [Mon, 2 May 2022 14:36:47 +0000 (16:36 +0200)]
MEDIUM: stream: remove the confusing SF_ADDR_SET flag
This flag is no longer needed now that it must always match the presence
of a destination address on the backend conn_stream. Worse, before previous
patch, if it were to be accidently removed while the address is present, it
could result in a leak of that address since alloc_dst_address() would first
be called to flush it.
Its usage has a long history where addresses were stored in an area shared
with the connection, but as this is no longer the case, there's no reason
for putting this burden onto application-level code that should not focus
on setting obscure flags.
The only place where that made a small difference is in the dequeuing code
in case of queue redistribution, because previously the code would first
clear the flag, and only later when trying to deal with the queue, would
release the address. It's not even certain whether there would exist a
code path going to connect_server() without calling pendconn_dequeue()
first (e.g. retries on queue timeout maybe?).
Now the pendconn_dequeue() code will rely on SF_ASSIGNED to decide to
clear and release the address, since that flag is always set while in
a server's queue, and its clearance implies that we don't want to keep
the address. At least it remains consistent and there's no more risk of
leaking it.
Willy Tarreau [Mon, 2 May 2022 14:20:36 +0000 (16:20 +0200)]
CLEANUP: backend: make alloc_{bind,dst}_address() idempotent
These functions dynamically allocate a source or destination address but
start by clearing the previous one. There's a non-null risk of leaking
addresses there in case of misuse. Better have them do nothing if the
address was already allocated.
BUG/MINOR: h3: fix parsing of unknown frame type with null length
HTTP/3 implementation must ignore unknown frame type to support protocol
evolution. Clients can deliberately use unknown type to test that the
server is conformant : this principle is called greasing.
Quiche client uses greasing on H3 frame type with a zero length frame.
This reveals a bug in H3 parsing code which causes the transfer to be
interrupted. Fix this by removing the break statement on ret variable.
Now the parsing loop is only interrupted if input buffer is empty or the
demux is blocked.
This should fix http/3 freeze transfers with the quiche client. Thanks
to Lucas Pardue from Cloudflare for his report on the bug. Frédéric
Lecaille quickly found the source of the problem which helps me to write
this patch.
MINOR: mux-quic: support full request channel buffer
If the request channel buffer is full, H3 demuxing must be interrupted
on the stream until some read is performed. This condition is reported
if the HTX stream buffer qcs.rx.app_buf is full.
In this case, qcs instance is marked with a new flag QC_SF_DEM_FULL.
This flag cause the H3 demuxing to be interrupted. It is cleared when
the HTX buffer is read by the conn-stream layer through rcv_buf
operation.
When the flag is cleared, the MUX tasklet is woken up. However, as MUX
iocb does not treat Rx for the moment, this is useless. It must be fix
to prevent possible freeze on POST transfers.
In practice, for the moment the HTX buffer is never full as the current
Rx code is limited by the quic-conn receive buffer size and the
incomplete flow-control implementation. So for now this patch is not
testable under the current conditions.
Released version 2.6-dev8 with the following main changes :
- BUG/MINOR: quic: fix use-after-free with trace on ACK consume
- BUG/MINOR: rules: Forbid captures in defaults section if used by a backend
- BUG/MEDIUM: rules: Be able to use captures defined in defaults section
- BUG/MINOR: rules: Fix check_capture() function to use the right rule arguments
- BUG/MINOR: http-act: make release_http_redir() more robust
- BUG/MINOR: sample: add missing use_backend/use-server contexts in smp_resolve_args
- MINOR: sample: don't needlessly call c_none() in sample_fetch_as_type()
- MINOR: sample: make the bool type cast to bin
- MEDIUM: backend: add new "balance hash <expr>" algorithm
- MINOR: init: add global setting "fd-hard-limit" to bound system limits
- BUILD: pollers: use an initcall to register the pollers
- BUILD: xprt: use an initcall to register the transport layers
- BUILD: thread: use initcall instead of a constructor
- BUILD: http: remove the two unused constructors in rules and ana
- CLEANUP: compression: move the default setting of maxzlibmem to defaults
- MINOR: tree-wide: always consider EWOULDBLOCK in addition to EAGAIN
- BUG/MINOR: connection: "connection:close" header added despite 'close-spread-time'
- MINOR: fd: add functions to set O_NONBLOCK and FD_CLOEXEC
- CLEANUP: tree-wide: use fd_set_nonblock() and fd_set_cloexec()
- CLEANUP: tree-wide: remove 25 occurrences of unneeded fcntl.h
- REGTESTS: fix the race conditions in be2dec.vtc ad field.vtc
- REGTESTS: webstats: remove unused stats socket in /tmp
- MEDIUM: httpclient: disable SSL when the ca-file couldn't be loaded
- BUG/MINOR: httpclient/lua: error when the httpclient_start() fails
- BUG/MINOR: ssl: free the cafile entries on deinit
- BUG/MINOR: ssl: memory leak when trying to load a directory with ca-file
- MEDIUM: httpclient: re-enable the verify by default
- BUG/MEDIUM: ssl/cli: fix yielding in show_cafile_detail
- BUILD: compiler: properly distinguish weak and global symbols
- MINOR: connection: Add way to disable active connection closing during soft-stop
- BUG/MEDIUM: http-ana: Fix memleak in redirect rules with ignore-empty option
- CLEANUP: Destroy `http_err_chunks` members during deinit
- BUG/MINOR: resolvers: Fix memory leak in resolvers_deinit()
- MINOR: Call deinit_and_exit(0) for `haproxy -vv`
- BUILD: fd: disguise the fd_set_nonblock/cloexec result
- BUG/MINOR: pools: make sure to also destroy shared pools in pool_destroy_all()
- MINOR: ssl: add a new global option "tune.ssl.hard-maxrecord"
- CLEANUP: errors: also call deinit_errors_buffers() on deinit()
- CLEANUP: chunks: release trash also in deinit
- CLEANUP: deinit: release the pre-check callbacks
- CLEANUP: deinit: release the config postparsers
- CLEANUP: listeners/deinit: release accept queue tasklets on deinit
- CLEANUP: connections/deinit: destroy the idle_conns tasks
- BUG/MINOR: mux-quic: fix build in release mode
- MINOR: mux-quic: adjust comment on emission function
- MINOR: mux-quic: remove unused bogus qcc_get_stream()
- BUG/MINOR: mux-quic: fix leak if cs alloc failure
- MINOR: mux-quic: count local flow-control stream limit on reception
- BUG/MINOR: h3: fix incomplete POST requests
- BUG/MEDIUM: h3: fix use-after-free on mux Rx buffer wrapping
- MINOR: mux-quic: partially copy Rx frame if almost full buf
- MINOR: h3: change frame demuxing API
- MINOR: mux-quic: add a app-layer context in qcs
- MINOR: h3: implement h3 stream context
- MINOR: h3: support DATA demux if buffer full
- MINOR: quic: decode as much STREAM as possible
- MINOR: quic: Improve qc_prep_pkts() flexibility
- MINOR: quic: Prepare quic_frame struct duplication
- MINOR: quic: Do not retransmit frames from coalesced packets
- MINOR: quic: Add traces about TX frame memory releasing
- MINOR: quic: process_timer() rework
- MEDIUM: quic: New functions for probing rework
- MEDIUM: quic: Retransmission functions rework
- MEDIUM: quic: qc_requeue_nacked_pkt_tx_frms() rework
- MINOR: quic: old data distinction for qc_send_app_pkt()
- MINOR: quic: Mark packets as probing with old data
- MEDIUM: quic: Mark copies of acknowledged frames as acknowledged
- MEDIUM: quic: Enable the new datagram probing process
- MINOR: quic: Do not send ACK frames when probing
- BUG/MINOR: quic: Wrong returned status by qc_build_frms()
- BUG/MINOR: quic: Avoid sending useless PADDING frame
- BUG/MINOR: quic: Traces fix about remaining frames upon packet build failure
- MINOR: quic: Wake up the mux to probe with new data
- BUG/MEDIUM: quic: Possible crash on STREAM frame loss
- BUG/MINOR: quic: Missing Initial packet length check
- CLEANUP: quic: Rely on the packet length set by qc_lstnr_pkt_rcv()
- MINOR: quic: Drop 0-RTT packets if not allowed
- BUG/MINOR: httpclient/ssl: use the correct verify constant
- BUG/MEDIUM: conn-stream: Don't erase endpoint flags on reset
- BUG/MEDIUM: httpclient: Fix loop consuming HTX blocks from the response channel
- BUG/MINOR: httpclient: Count metadata in size to transfer via htx_xfer_blks()
- MINOR: httpclient: Don't use co_set_data() to decrement output
- BUG/MINOR: conn_stream: do not confirm a connection from the frontend path
- MEDIUM: quic: do not ACK packet with STREAM if MUX not present
- MEDIUM: quic: do not ack packet with invalid STREAM
- MINOR: quic: Drop 0-RTT packets without secrets
- CLEANUP: quic: Remaining fprintf() debug trace
- MINOR: quic: moving code for QUIC loss detection
- BUG/MINOR: quic: Missing time threshold multiplifier for loss delay computation
- CI: github actions: update LibreSSL to 3.5.2
- SCRIPTS: announce-release: add URL of dev packages
MEDIUM: quic: do not ack packet with invalid STREAM
If the MUX cannot handle immediately nor buffer a STREAM frame, the
packet containing it must not be acknowledge. This is in conformance
with the RFC9000.
qcc_recv() return codes have been adjusted to differentiate an invalid
frame with an already fully received offset which must be acknowledged.
MEDIUM: quic: do not ACK packet with STREAM if MUX not present
If a packet contains a STREAM frame but the MUX is not allocated, the
frame cannot be enqueued. According to the RFC9000, we must not
acknowledge the packet under this condition.
This may prevents a bug with firefox which keeps trying on refreshing
the web page. This issue has already been detected before closing state
implementation : haproxy wasn't emitted CONNECTION_CLOSE and keeps
acknowledge STREAM frames despite not handle them.
In the future, it might be necessary to respond with a CONNECTION_CLOSE
if the MUX has already been freed.
BUG/MINOR: conn_stream: do not confirm a connection from the frontend path
In issue #1468 it was reported that sometimes server-side connection
attempts were only validated after the "timeout connect" value, and
that would only happen with an H2 client. A long code analysis with the
output dumps showed only one possible call path: an I/O event on the
frontend while reading had just been disabled calls h2_wake() which in
turns wakes cs_conn_io_cb(), which tries cs_conn_process() and cs_notify(),
which sees that the other side is not blocked (already in CS_ST_CON)
and tries cs_chk_snd() on it. But on that side the connection had just
finished to be set up and not yet woken the stream up, cs_notify()
would then call cs_conn_send() which succeeds and passes the connection
to CS_ST_RDY. The problem is that nothing new happened on the frontend
side so there's no reason to wake the stream up and the backend-side
conn_stream remains in CS_ST_RDY state with the stream never being
woken up.
Once the "timeout connect" strikes, process_stream() is woken up and
finds the connection finally setup, so it ignores the timeout and goes
on.
The number of conditions to meet to reproduce this is huge, which also
explains why the reporter says it's "occasional" and we were never able
to reproduce it in the lab. It needs at least reads to be disabled and
immediately re-enabled on the frontend side (e.g. buffer full) with an
I/O even reported before the poller had an opportunity to be disabled
but with no subscribe being reinstalled, so that sock_conn_iocb() has
no other choice but calling h2_wake(), and exactly at the same time
the backend connection must finish to set up so that it was not yet
reported by the poller, the data were sent and the polling for writes
disabled.
Several factors are to be considered here:
- h2_wake() should probably not call h2_wake_some_streams() for
ret >= 0 (common case), but only if some special event is reported
for at least one stream; that part is sensitive though as in the
past we managed to lose some rare cases (e.g. restart processing
after a pause), and such wakeups are extremely rare so we'd better
make that effort once in a while.
- letting a lazy forward attempt on the frontend confirm a backend
connection establishment is too smart to be reliable. That wasn't
in fact the intent and it's inherited from the very old code where
muxes didn't exist and where it was guaranteed that an even at this
layer would wake everyone up.
Here the best thing to do is to refrain from attempting to forward data
until the connection is confirmed. This will let the poller report the
connect() event to the backend side which will process it as it should
and does in all other cases.
Thanks to Jimmy Crutchfield for having reported useful traces and
tested patches.
This will have to be backported to all stable branches after some
observation. Before 2.6 the function is stream_int_chk_snd_conn(),
and the flag to remove is SI_SB_CON.
MINOR: httpclient: Don't use co_set_data() to decrement output
The use of co_set_data() should be strictly limited to setting the amount of
existing data to be transmitted. It ought not be used to decrement the
output after the data have left the buffer, because doing so involves
performing incorrect calculations using co_data() that still comprises data
that are not in the buffer anymore. Let's use c_rew() for this, which is
made exactly for this purpose, i.e. decrement c->output by as much as
requested.
BUG/MINOR: httpclient: Count metadata in size to transfer via htx_xfer_blks()
When HTX blocks are transfer from the HTTP client context to the request
channel, via htx_xfer_blks() function, the metadata must also be counted, in
addition to the data size. Otherwise, expected payload size will not be
copied because the metadata of an HTX block (8 bytes) will be reserved. And
if the payload size is lower than 8 bytes, nothing will be copied. Thus only
a zero-copy will be able to copy the payload.
This issue is 2.6-specific, no backport is needed.
BUG/MEDIUM: httpclient: Fix loop consuming HTX blocks from the response channel
When the HTTP client consumes the response, it loops on the HTX message to
copy blocks content and it removes blocks by calling htx_remove_blk(). But
this function removes a block and returns the next one in the HTX
message. The result must be used instead of using htx_get_next(). It is
especially important because the block used in htx_get_next() loop was
removed. It only works because the message is not defragmented during the
loop.
In addition, the loop on the response was simplified to iter on blocks
instead of positions.
BUG/MEDIUM: conn-stream: Don't erase endpoint flags on reset
Only CS_EP_ERROR flag is now removed from the endpoint when a reset is
performed. When a new the endpoint is allocated, flags are preserved. It is
the caller responsibility to remove other flags, depending on its need.
Concretly, during a connection retry or a L7 retry, we must preserve
flags. In tcpcheck and the CLI, we reset flags.
CLEANUP: quic: Rely on the packet length set by qc_lstnr_pkt_rcv()
This function is used to parse the QUIC packets carried by a UDP datagram.
When a correct packet could be found, the ->len RX packet structure value
is set to the packet length value. On the contrary, it is set to the remaining
number of bytes in the UDP datagram if no correct QUIC packet could be found.
So, there is no need to make this function return a status value. It allows
the caller to parse any QUIC packet carried by a UDP datagram without this.
Any client Initial packet carried in a datagram smaller than QUIC_INITIAL_PACKET_MINLEN(200)
bytes must be discarded. This does not mean we must discard the entire datagram.
So we must at least try to parse the packet length before dropping the packet
and return its length from qc_lstnr_pkt_rcv().
BUG/MEDIUM: quic: Possible crash on STREAM frame loss
A crash is possible under such circumtances:
- The congestion window is drastically reduced to its miniaml value
when a quic listener is experiencing extreme packet loss ;
- we enqueue several STREAM frames to be resent and some of them could not be
transmitted ;
- some of the still in flight are acknowledged and trigger the
stream memory releasing ;
- when we come back to send the remaing STREAM frames, haproxy
crashes when it tries to build them.
To fix this issue, we mark the STREAM frame as lost when detected as lost.
Then a lookup if performed for the stream the STREAM frames are attached
to before building them. They are released if the stream is no more available
or the data range of the frame is consumed.
MINOR: quic: Wake up the mux to probe with new data
When we have to probe the peer, we must first try to send new data. This is done
here waking up the mux after having set the number of maximum number of datagrams
to send to QUIC_MAX_NB_PTO_DGRAMS (2). Of course, this is only the case if the
mux was subscribed to SEND events.
This may happen in rare cases with extreme packet loss (30% for both TX and RX)
which leads the congestion window to decrease down to its minimal value (two
datagrams). Under such circumtances, no ack-eliciting frame can be added to
a packet by qc_build_frms(). In this case we must cancel the packet building
process if there is no ACK or probe (PING frame) to send.
BUG/MINOR: quic: Wrong returned status by qc_build_frms()
This function must return a successful status as soon as it could be build
a frame to be embedded by a packet. This behavior was broken by the last
modifications. This was due to a dangerous "ret = 1" statement inside
a loop. This statement must be reach only if we go out of a switch/case
after a "break" statement.
Add comments to mention this information.
When we are probing, we do not receive packets, furthermore all ACK frames have
already been sent. This is useless to send ACK when probing the peer. This
modification does not reset the flag which marks the connection as requiring an
ACK frame to be sent. If this is the case, this will be taken into an account
by after the probing process.
MEDIUM: quic: Enable the new datagram probing process
Make the two I/O handlers quic_conn_io_cb() and quic_conn_app_io_cb() call
qc_dgrams_retransmit() after probing retransmissions need was detected by
the timer task (qc_process_timer()).
We must modify qc_prep_pkts() to support QUIC_TLS_ENC_LEVEL_NONE as <next_tel>
parameter when called from qc_dgrams_retransmit().
MINOR: quic: Mark packets as probing with old data
When probing retranmissions with old data are needed for the connection we
mark the packets as probing with old data to track them when acknowledged:
we do not resend frames with old data when lost.
MINOR: quic: old data distinction for qc_send_app_pkt()
Modify qc_send_app_pkt() to distinguish the case where it sends new data
against the case where it sends old data during probing retransmissions.
We add <old_data> boolean parameter to this function to do so. The mux
never directly send old data when probing retransmissions are needed by
the connection.
This function is used to requeue the TX frames from TX packets which have
been detected as lost. The modifications consist in avoiding resending frames from
duplicated frames used to probe the peer. This is useless. Only the original
frames loss must be taken into an account because detected as lost before
the retransmitted frames. If these latter are also detected as lost, other
duplicated frames would have been retransmitted before their loss detection.
qc_prep_fast_retrans() and qc_prep_hdshk_fast_retrans() are modified to
take two list of frames as parameters. Two lists are needed for
qc_prep_hdshk_fast_retrans() to build datagrams with two packets during
handshake. qc_prep_fast_retrans() needs two lists of frames to be used
to send two datagrams with one list by datagram.