MINOR: tcp-rules: Make tcp-request capture a custom action
Now, this action is use its own dedicated function and is no longer handled "in
place" during the TCP rules evaluation. Thus the action name ACT_TCP_CAPTURE is
removed. The action type is set to ACT_CUSTOM and a check function is used to
know if the rule depends on request contents while there is no inspect-delay.
MINOR: http-rule/tcp-rules: Make track-sc* custom actions
Now, these actions use their own dedicated function and are no longer handled
"in place" during the TCP/HTTP rules evaluation. Thus the action names
ACT_ACTION_TRK_SC0 and ACT_ACTION_TRK_SCMAX are removed. The action type is now
the tracking index. Thus the function trk_idx() is no longer needed.
MEDIUM: http-rules: Make early-hint custom actions
Now, the early-hint action uses its own dedicated action and is no longer
handled "in place" during the HTTP rules evaluation. Thus the action name
ACT_HTTP_EARLY_HINT is removed. In additionn, http_add_early_hint_header() and
http_reply_103_early_hints() are also removed. This part is now handled in the
new action_ptr callback function.
MINOR: http-rules: Group all processing of early-hint rule in its case clause
To know if the 103 response start-line must be added, we test if it is the first
rule of the ruleset or if the previous rule is not an early-hint rule. And at
the end, to know if the 103 response must be terminated, we test if it is the
last rule of the ruleset or if the next rule is not an early-hint rule. This
way, all the code dealing with early-hint rules is grouped in its case clause.
MINOR: http-rules: Make set/del-map and add/del-acl custom actions
Now, these actions use their own dedicated function and are no longer handled
"in place" during the HTTP rules evaluation. Thus the action names
ACT_HTTP_*_ACL and ACT_HTTP_*_MAP are removed. The action type is now mapped as
following: 0 = add-acl, 1 = set-map, 2 = del-acl and 3 = del-map.
MINOR: http-rules: Make set-header and add-header custom actions
Now, these actions use their own dedicated function and are no longer handled
"in place" during the HTTP rules evaluation. Thus the action names
ACT_HTTP_SET_HDR and ACT_HTTP_ADD_VAL are removed. The action type is now set to
0 to set a header (so remove existing ones if any and add a new one) or to 1 to
add a header (add without remove).
MINOR: http-rules: Make replace-header and replace-value custom actions
Now, these actions use their own dedicated function and are no longer handled
"in place" during the HTTP rules evaluation. Thus the action names
ACT_HTTP_REPLACE_HDR and ACT_HTTP_REPLACE_VAL are removed. The action type is
now set to 0 to evaluate the whole header or to 1 to evaluate every
comma-delimited values.
The function http_transform_header_str() is renamed to http_replace_hdrs() to be
more explicit and the function http_transform_header() is removed. In fact, this
last one is now more or less the new action function.
The lua code has been updated accordingly to use http_replace_hdrs().
MINOR: http-rules: Use a specific action type for some custom HTTP actions
For set-method, set-path, set-query and set-uri, a specific action type is
used. The same as before but no longer stored in <arg.http.i>. Same is done for
replace-path and replace-uri. The same types are used than the "set-" versions.
MINOR: actions: Use an integer to set the action type
<action> field in the act_rule structure is now an integer. The act_name values
are used for all actions without action function (but it is not a pre-requisit
though) or the action will have no effect. But for all other actions, any
integer value may used, only the action function will take care of it. The
default for such actions is ACT_CUSTOM.
MINOR: actions: Add flags to configure the action behaviour
Some flags can now be set on an action when it is registered. The flags are
defined in the act_flag enum. For now, only ACT_FLAG_FINAL may be set on an
action to specify if it stops the rules evaluation. It is set on
ACT_ACTION_ALLOW, ACT_ACTION_DENY, ACT_HTTP_REQ_TARPIT, ACT_HTTP_REQ_AUTH,
ACT_HTTP_REDIR and ACT_TCP_CLOSE actions. But, when required, it may also be set
on custom actions.
Consequently, this flag is checked instead of the action type during the
configuration parsing to trigger a warning when a rule inhibits all the
following ones.
MINOR: actions: Rename the act_flag enum into act_opt
The flags in the act_flag enum have been renamed act_opt. It means ACT_OPT
prefix is used instead of ACT_FLAG. The purpose of this patch is to reserve the
action flags for the actions configuration.
MINOR: http-rules/tcp-rules: Call the defined action function first if defined
When TCP and HTTP rules are evaluated, if an action function (action_ptr field
in the act_rule structure) is defined for a given action, it is now always
called in priority over the test on the action type. Concretly, for now, only
custom actions define it. Thus there is no change. It just let us the choice to
extend the action type beyond the existing ones in the enum.
MINOR: actions: Regroup some info about HTTP rules in the same struct
Info used by HTTP rules manipulating the message itself are splitted in several
structures in the arg union. But it is possible to group all of them in a unique
struct. Now, <arg.http> is used by most of these rules, which contains:
* <arg.http.i> : an integer used as status code, nice/tos/mark/loglevel or
action id.
* <arg.http.str> : an IST used as header name, reason string or auth realm.
* <arg.http.fmt> : a log-format compatible expression
* <arg.http.re> : a regular expression used by replace rules
MINOR: actions: Add a function pointer to release args used by actions
Arguments used by actions are never released during HAProxy deinit. Now, it is
possible to specify a function to do so. ".release_ptr" field in the act_rule
structure may be set during the configuration parsing to a specific deinit
function depending on the action type.
REGTEST: Fix format of set-uri HTTP request rule in h1or2_to_h1c.vtc
First, concat() is a converter, not a sample fetch. So use str() sample fetch
with no string and call concat on it. Then, the argument of the set-uri rule
must be a log format string. So it must be inside %[] to be evaluated.
MEDIUM: http-rules: Enable the strict rewriting mode by default
Now, by default, when a rule performing a rewrite on an HTTP message fails, an
internal error is triggered. Before, the failure was ignored. But most of users
are not aware of this behavior. And it does not happen very often because the
buffer reserve space in large enough. So it may be surprising. Returning an
internal error makes the rewrite failure explicit. If it is acceptable to
silently ignore it, the strict rewriting mode can be disabled.
MINOR: http-rules: Add a rule to enable or disable the strict rewriting mode
It is now possible to explicitly instruct rewriting rules to be strict or not
towards errors. It means that in this mode, an internal error is trigger if a
rewrite rule fails. The HTTP action "strict-mode" can be used to enable or
disable the strict rewriting mode. It can be used in an http-request and an
http-response ruleset.
For now, by default the strict rewriting mode is disabled. Because it is the
current behavior. But it will be changed in another patch.
MINOR: http-rules: Handle all message rewrites the same way
In HTTP rules, error handling during a rewrite is now handle the same way for
all rules. First, allocation errors are reported as internal errors. Then, if
soft rewrites are allowed, rewrite errors are ignored and only the
failed_rewrites counter is incremented. Otherwise, when strict rewrites are
mandatory, interanl errors are returned.
For now, only soft rewrites are supported. Note also that the warning sent to
notify a rewrite failure was removed. It will be useless once the strict
rewrites will be possible.
MINOR: http-ana: Add a txn flag to support soft/strict message rewrites
the HTTP_MSGF_SOFT_RW flag must now be set on the HTTP transaction to ignore
rewrite errors on a message, from HTTP rules. The mode is called the soft
rewrites. If thes flag is not set, strict rewrites are performed. In this mode,
if a rewrite error occurred, an internal error is reported.
For now, HTTP_MSGF_SOFT_RW is always set and there is no way to switch a
transaction in strict mode.
MINOR: counters: Review conditions to increment counters from analysers
Now, for these counters, the following rules are followed to know if it must be
incremented or not:
* if it exists for a frontend, the counter is incremented
* if stats must be collected for the session's listener, if the counter exists
for this listener, it is incremented
* if the backend is already assigned, if the counter exists for this backend,
it is incremented
* if a server is attached to the stream, if the counter exists for this
server, it is incremented
It is not hardcoded rules. Some counters are still handled in a different
way. But many counters are incremented this way now.
MINOR: counters: Remove failed_secu counter and use denied_resp instead
The failed_secu counter is only used for the servers stats. It is used to report
the number of denied responses. On proxies, the same info is stored in the
denied_resp counter. So, it is more consistent to use the same field for
servers.
MINOR: stats: Report internal errors in the proxies/listeners/servers stats
The stats field ST_F_EINT has been added to report internal errors encountered
per proxy, per listener and per server. It appears in the CLI export and on the
HTML stats page.
MINOR: http-rules: Handle denied/aborted/invalid connections from HTTP rules
The new possible results for a custom action (deny/abort/invalid) are now handled
during HTTP rules evaluation. These codes are mapped on HTTP rules ones :
MINOR: tcp-rules: Handle denied/aborted/invalid connections from TCP rules
The new possible results for a custom action (deny/abort/invalid) are now
handled during TCP rules evaluation. For L4/L5 rules, the session is
rejected. For L7 rules, the right counter is incremented, then the connections
killed.
MINOR: http-rules: Add more return codes to let custom actions act as normal ones
When HTTP/TCP rules are evaluated, especially HTTP ones, some results are
possible for normal actions and not for custom ones. So missing return codes
(ACT_RET_) have been added to let custom actions act as normal ones. Concretely
following codes have been added:
* ACT_RET_DENY : deny the request/response. It must be handled by the caller
* ACT_RET_ABRT : abort the request/response, handled by action itsleft.
* ACT_RET_INV : invalid request/response
MINOR: http-rules: Handle internal errors during HTTP rules evaluation
The HTTP_RULE_RES_ERROR code is now used by HTTP analyzers to handle internal
errors during HTTP rules evaluation. It is used instead of HTTP_RULE_RES_BADREQ,
used for invalid requests/responses. In addition, the SF_ERR_RESOURCE flag is
set on the stream when an allocation failure happens.
Note that the return value of http-response rules evaluation is now tested in
the same way than the result of http-request rules evaluation.
MINOR: http-rules: Add a rule result to report internal error
Now, when HTTP rules are evaluated, HTTP_RULE_RES_ERROR must be returned when an
internal error is catched. It is a way to make the difference between a bad
request or a bad response and an error during its processing.
Now, processing errors are properly handled. Instead of returning an error 400
or 502, depending where the error happens, an error 500 is now returned. And the
processing_errors counter is incremented. By default, when such error is
detected, the SF_ERR_INTERNAL stream error is used. When the error is caused by
an allocation failure, and when it is reasonnably possible, the SF_ERR_RESOURCE
stream error is used. Thanks to this patch, bad requests and bad responses
should be easier to detect.
MINOR: counters: Add a counter to report internal processing errors
This counter, named 'internal_errors', has been added in frontend and backend
counters. It should be used when a internal error is encountered, instead for
failed_req or failed_resp.
MINOR: http-rules: Return an error when custom actions return ACT_RET_ERR
Thanks to the commit "MINOR: actions: Use ACT_RET_CONT code to ignore an error
from a custom action", it is now possible to trigger an error from a custom
action in http rules. Now, when a custom action returns the ACT_RET_ERR code
from an http-request rule, an error 400 is returned. And from an http-response
rule, an error 502 is returned.
Be careful if this patch is backported. The other mentioned patch must be
backported first.
MINOR: tcp-rules: Kill connections when custom actions return ACT_RET_ERR
Thanks to the commit "MINOR: actions: Use ACT_RET_CONT code to ignore an error
from a custom action", it is now possible to trigger an error from a custom
action in tcp-content rules. Now, when a custom action returns the ACT_RET_ERR
code, it has the same behavior than a reject rules, the connection is killed.
Be careful if this patch is backported. The other mentioned patch must be
backported first.
MINOR: actions: Use ACT_RET_CONT code to ignore an error from a custom action
Some custom actions are just ignored and skipped when an error is encoutered. In
that case, we jump to the next rule. To do so, most of them use the return code
ACT_RET_ERR. Currently, for http rules and tcp content rules, it is not a
problem because this code is handled the same way than ACT_RET_CONT. But, it
means there is no way to handle the error as other actions. The custom actions
must handle the error and return ACT_RET_DONE. For instance, when http-request
rules are processed, an error when we try to replace a header value leads to a
bad request and an error 400 is returned to the client. But when we fail to
replace the URI, the error is silently ignored. This difference between the
custom actions and the others is an obstacle to write new custom actions.
So, in this first patch, ACT_RET_CONT is now returned from custom actions
instead of ACT_RET_ERR when an error is encoutered if it should be ignored. The
behavior remains the same but it is now possible to handle true errors using the
return code ACT_RET_ERR. Some actions will probably be reviewed to determine if
an error is fatal or not. Other patches will be pushed to trigger an error when
a custom action returns the ACT_RET_ERR code.
This patch is not tagged as a bug because it is just a design issue. But others
will depends on it. So be careful during backports, if so.
MINOR: tcp-rules: Always set from which ruleset a rule comes from
The ruleset from which a TCP rule comes from (the <from> field in the act_rule
structure) is only set when a rule is created from a registered keyword and not
for all TCP rules. But this information may be useful to check the configuration
validity or during the rule evaluation. So now, we systematically set it.
MEDIUM: http-rules: Register an action keyword for all http rules
There are many specific http actions that don't use the action registration
mechanism (allow, deny, set-header...). Instead, the parsing of these actions is
inlined in the functions responsible to parse the http-request/http-response
rules. There is no reason to not register an action keyword for all these
actions. It it the purpose of this patch. The new functions responsible to parse
these http actions are defined in http_act.c
BUG/MINOR: stick-table: Use MAX_SESS_STKCTR as the max track ID during parsing
During the parsing of the sc-inc-gpc0, sc-inc-gpc1 and sc-inc-gpt1 actions, the
maximum stick table track ID allowed is tested against ACT_ACTION_TRK_SCMAX. It
is the action number and not the maximum number of stick counters. Instead,
MAX_SESS_STKCTR must be used.
This patch must be backported to all stable versions.
BUG/MINOR: http-rules: Remove buggy deinit functions for HTTP rules
Functions to deinitialize the HTTP rules are buggy. These functions does not
check the action name to release the right part in the arg union. Only few info
are released. For auth rules, the realm is released and there is no problem
here. But the regex <arg.hdr_add.re> is always unconditionally released. So it
is easy to make these functions crash. For instance, with the following rule
HAProxy crashes during the deinit :
For now, These functions are simply removed and we rely on the deinit function
used for TCP rules (renamed as deinit_act_rules()). This patch fixes the
bug. But arguments used by actions are not released at all, this part will be
addressed later.
This patch must be backported to all stable versions.
BUG/MINOR: http-ana/filters: Wait end of the http_end callback for all filters
Filters may define the "http_end" callback, called at the end of the analysis of
any HTTP messages. It is called at the end of the payload forwarding and it can
interrupt the stream processing. So we must be sure to not remove the XFER_BODY
analyzers while there is still at least filter in progress on this callback.
Unfortunatly, once the request and the response are borh in the DONE or the
TUNNEL mode, we consider the XFER_BODY analyzer has finished its processing on
both sides. So it is possible to prematurely interrupt the execution of the
filters "http_end" callback.
To fix this bug, we switch a message in the ENDING state. It is then switched in
DONE/TUNNEL mode only after the execution of the filters "http_end" callback.
This patch must be backported (and adapted) to 2.1, 2.0 and 1.9. The legacy HTTP
mode shoud probaly be fixed too.
MINOR: mux-h1: Inherit send flags from the upper layer
Send flags (CO_SFL_*) used when xprt->snd_buf() is called, in h1_send(), are now
inherited from the upper layer, when h1_snd_buf() is called. First, the flag
CO_SFL_MSG_MORE is no more set if the output buffer is full, but only if the
stream-interface decides to set it. It has more info to do it than the
mux. Then, the flag CO_SFL_STREAMER is now also handled this way. It was just
ignored till now.
DOC: Add a section to document the internal sample fetches
The section 7.3.7. is now dedicated to internal sample fetches. For now, only
HTX sample fetches are referenced in this section. But it should contain the
documentation of all sample fetches reserved to an internal use, for debugging
or testing purposes.
Ben51Degrees [Mon, 20 Jan 2020 11:25:11 +0000 (11:25 +0000)]
BUG/MINOR: 51d: Fix bug when HTX is enabled
When HTX is enabled, the sample flags were set too early. When matching for
multiple HTTP headers, the sample is fetched more than once, meaning that the
flags would need to be set again. Instead, the flags are now set last (just
before the outermost function returns). This could be further improved by
passing around the message without calling prefetch again.
This patch must be backported as far as 1.9. it should fix bug #450.
Tim Duesterhus [Sat, 18 Jan 2020 00:46:18 +0000 (01:46 +0100)]
BUG/MINOR: cache: Fix leak of cache name in error path
This issue was introduced in commit 99a17a2d91f9044ea20bba6617048488aed80555
which first appeared in tag v1.9-dev11. This bugfix should be backported
to HAProxy 1.9+.
Elliot Otchet [Wed, 15 Jan 2020 13:12:14 +0000 (08:12 -0500)]
MINOR: ssl: Add support for returning the dn samples from ssl_(c|f)_(i|s)_dn in LDAP v3 (RFC2253) format.
Modifies the existing sample extraction methods (smp_fetch_ssl_x_i_dn,
smp_fetch_ssl_x_s_dn) to accommodate a third argument that indicates the
DN should be returned in LDAP v3 format. When the third argument is
present, the new function (ssl_sock_get_dn_formatted) is called with
three parameters including the X509_NAME, a buffer containing the format
argument, and a buffer for the output. If the supplied format matches
the supported format string (currently only "rfc2253" is supported), the
formatted value is extracted into the supplied output buffer using
OpenSSL's X509_NAME_print_ex and BIO_s_mem. 1 is returned when a dn
value is retrieved. 0 is returned when a value is not retrieved.
Argument validation is added to each of the related sample
configurations to ensure the third argument passed is either blank or
"rfc2253" using strcmp. An error is returned if the third argument is
present with any other value.
Documentation was updated in configuration.txt and it was noted during
preliminary reviews that a CLEANUP patch should follow that adjusts the
documentation. Currently, this patch and the existing documentation are
copied with some minor revisions for each sample configuration. It
might be better to have one entry for all of the samples or entries for
each that reference back to a primary entry that explains the sample in
detail.
Special thanks to Chris, Willy, Tim and Aleks for the feedback.
Author: Elliot Otchet <degroens@yahoo.com> Reviewed-by: Tim Duesterhus <tim@bastelstu.be>
Willy Tarreau [Fri, 17 Jan 2020 06:52:13 +0000 (07:52 +0100)]
MINOR: connection: make the last arg of subscribe() a struct wait_event*
The subscriber used to be passed as a "void *param" that was systematically
cast to a struct wait_event*. By now it appears clear that the subscribe()
call at every layer is well defined and always takes a pointer to an event
subscriber of type wait_event, so let's enforce this in the functions'
prototypes, remove the intermediary variables used to cast it and clean up
the comments to clarify what all these functions do in their context.
Willy Tarreau [Thu, 16 Jan 2020 16:55:37 +0000 (17:55 +0100)]
MEDIUM: mux-fcgi: merge recv_wait and send_wait event notifications
This is the last of the "recv_wait+send_wait merge" patches and is
functionally equivalent to previous commit "MEDIUM: mux-h2: merge
recv_wait and send_wait event notifications" but for FCGI this time.
The principle is pretty much the same, since the code is very similar.
We use a single wait_event for both recv and send and rely on the
subscribe flags to know the desired notifications.
Willy Tarreau [Fri, 10 Jan 2020 10:12:48 +0000 (11:12 +0100)]
MEDIUM: mux-h2: merge recv_wait and send_wait event notifications
This is the continuation of the recv+send event notifications merge
that was started. This patch is less trivial than the previous ones
because the existence of a send event subscription is also used to
decide to put a stream back into the send list.
Willy Tarreau [Thu, 16 Jan 2020 10:03:19 +0000 (11:03 +0100)]
MEDIUM: mux-h1: merge recv_wait and send_wait
This is the same principle as previous commit, but for the H1 mux this
time. The checks in the subscribe()/unsubscribe() calls were factored
and some BUG_ON() were added to detect unexpected cases.
h1_wake_for_recv() and h1_wake_for_send() needed to be refined to
consider the current subscription before deciding to wake up.
Willy Tarreau [Fri, 10 Jan 2020 06:06:05 +0000 (07:06 +0100)]
MEDIUM: connection: merge the send_wait and recv_wait entries
In practice all callers use the same wait_event notification for any I/O
so instead of keeping specific code to handle them separately, let's merge
them and it will allow us to create new events later.
Willy Tarreau [Fri, 10 Jan 2020 05:17:03 +0000 (06:17 +0100)]
MEDIUM: backend: move the connection finalization step to back_handle_st_con()
Currently there's still lots of code in conn_complete_server() that performs
one half of the connection setup, which is then checked and finalized in
back_handle_st_con(). There isn't a valid reason for this anymore, we can
simplify this and make sure that conn_complete_server() only wakes the stream
up to inform it about the fact the whole connection stack is set up so that
back_handle_st_con() finishes its job at the stream-int level.
It looks like the there could even be further simplified, but for now it
was moved straight out of conn_complete_server() with no modification.
Willy Tarreau [Thu, 9 Jan 2020 17:43:15 +0000 (18:43 +0100)]
REORG: stream/backend: move backend-specific stuff to backend.c
For more than a decade we've kept all the sess_update_st_*() functions
in stream.c while they're only there to work in relation with what is
currently being done in backend.c (srv_redispatch_connect, connect_server,
etc). Let's move all this pollution over there and take this opportunity
to try to find slightly less confusing names for these old functions
whose role is only to handle transitions from one specific stream-int
state:
The last one remained in stream.c because it's more or less a completion
function which does all the initialization expected on a connection
success or failure, can set analysers and emit logs.
The other ones could possibly slightly benefit from being modified to
take a stream-int instead since it's really what they're working with,
but it's unimportant here.
Willy Tarreau [Thu, 16 Jan 2020 16:20:57 +0000 (17:20 +0100)]
MEDIUM: mux-fcgi: do not make an fstrm subscribe to itself on deferred shut
This is the port to FCGI of previous commit "MEDIUM: mux-h2: do not make
an h2s subscribe to itself on deferred shut".
The purpose is to avoid subscribing to the send_wait list when trying to
close, because we'll soon have to merge both recv and send lists. Basic
testing showed no difference (performance nor issues).
Willy Tarreau [Fri, 10 Jan 2020 14:16:57 +0000 (15:16 +0100)]
MEDIUM: mux-h2: do not make an h2s subscribe to itself on deferred shut
The logic handling the deferred shutdown is a bit complex because it
involves a wait_event struct in each h2s dedicated to subscribing to
itself when shutdowns are not immediately possible. This implies that
we will not be able to support a shutdown and a receive subscription
in the future when we merge all wait events.
Let's solely rely on the H2_SF_WANT_SHUT_{R,W} flags instead and have
an autonomous tasklet for this. This requires to add a few controls
in the code because now when waking up a stream we need to check if it
is for I/O or just a shut, but since sending and shutting are exclusive
it's not difficult.
One point worth noting is that further resources could be shaved off
by only allocating the tasklet when failing to shut, given that in the
vast majority of streams it will never be used. In fact the sole purpose
of the tasklet is to support calling this code from outside the H2 mux
context. Looking at the code, it seems that not too many adaptations
would be required to have the send_list walking code deal with sending
the shut bits itself and further simplify all this.
Willy Tarreau [Thu, 16 Jan 2020 15:59:45 +0000 (16:59 +0100)]
MEDIUM: mux-fcgi: do not try to stop sending streams on blocked mux
This is essentially the same change as applied to mux-h2 in previous commit
"MEDIUM: mux-h2: do not try to stop sending streams on blocked mux". The
goal is to make sure we don't need to keep the item in the send_wait list
until it's executed so that we can later merge it with the recv_wait list.
No performance changes were observed.
Willy Tarreau [Fri, 10 Jan 2020 17:25:07 +0000 (18:25 +0100)]
MEDIUM: mux-h2: do not try to stop sending streams on blocked mux
This partially reverts commit d846c267 ("MINOR: h2: Don't run tasks that
are waiting to send if mux in full"). This commit was introduced to
limit the start/stop overhead incurred by waking many streams to let
only a few work. But since commit 9c218e7521 ("MAJOR: mux-h2: switch
to next mux buffer on buffer full condition."), this situation occurs
way less (typically 2000 to 4000 times less often) and the benefits of
the patch above do not outweigh its shortcomings anymore. And commit c7ce4e3e7f ("BUG/MEDIUM: mux-h2: don't stop sending when crossing a
buffer boundary") addressed a root cause of many unexpected sleeps and
wakeups.
The main problem it's causing is that it requires to keep the element
in the send_wait list until it's executed, leaving the entry in an
uncertain state, and significantly complicating the coexistence of this
list and the wait list dedicated to shutdown. Also it happens that this
call to tasklet_remove_from_task_list() will not be usable anymore once
we start to support streams on different threads. And finally, some of
the other streams that we remove might very well have managed to find
their way to the h2_snd_buf() with an unblocked condition as well so it
is possible that some of these removals were not welcome.
So this patch now makes sure that send_wait is immediately nulled when
the task is woken up, and that we don't have to play with it afterwards.
Since we don't need to stop the tasklets anymore, we don't need the
sending_list that we can remove.
However one very useful benefit of the sending_list was that it used to
provide the information about the fact that the stream already tried to
send and failed. This was an important factor to improve fairness because
late arrived streams should not be allowed to send if others are already
scheduled. So this patch introduces a new per-stream flag H2_SF_NOTIFIED
to distinguish such streams.
With this patch the fairness is preserved, and the ratio of aborted
h2_snd_buf() due to other streams already sending remains quite low
(~0.3-2.1% measured depending on object size, this is within
expectations for 100 independent streams).
If the contention issue the patch above used to address comes up again
in the future, a much better (though more complicated) solution would
be to switch to per-connection buffer pools to distribute between the
connection and the streams so that by default there are more buffers
available for the mux and the streams only have some when the mux's are
unused, i.e. it would push the memory pressure back to the data layer.
One observation made while developing this patch is that when dealing
with large objects we still spend a huge amount of time scanning the
send_list with tasks that are already woken up every time a send()
manages to purge a bit more data. Indeed, by removing the elements
from the list when H2_SF_NOTIFIED is set, the netowrk bandwidth on
1 MB objects fetched over 100 streams per connection increases by 38%.
This was not done here to preserve fairness but is worth studying (e.g.
by keeping a restart pointer on the list or just having a flag indicating
if an entry was added since last scan).
Jerome Magnin [Fri, 17 Jan 2020 17:01:20 +0000 (18:01 +0100)]
BUILD: pattern: include errno.h
Commit 3c79d4bdc introduced the use of errno in pattern.c without
including errno.h.
If we build haproxy without any option errno is not defined and the
build fails.
Willy Tarreau [Fri, 17 Jan 2020 16:39:35 +0000 (17:39 +0100)]
MEDIUM: connection: get rid of CO_FL_CURR_* flags
These ones used to serve as a set of switches between CO_FL_SOCK_* and
CO_FL_XPRT_*, and now that the SOCK layer is gone, they're always a
copy of the last know CO_FL_XPRT_* ones that is resynchronized before
I/O events by calling conn_refresh_polling_flags(), and that are pushed
back to FDs when detecting changes with conn_xprt_polling_changes().
While these functions are not particularly heavy, what they do is
totally redundant by now because the fd_want_*/fd_stop_*() actions
already perform test-and-set operations to decide to create an entry
or not, so they do the exact same thing that is done by
conn_xprt_polling_changes(). As such it is pointless to call that
one, and given that the only reason to keep CO_FL_CURR_* is to detect
changes there, we can now remove them.
Even if this does only save very few cycles, this removes a significant
complexity that has been responsible for many bugs in the past, including
the last one affecting FreeBSD.
All tests look good, and no performance regressions were observed.
Willy Tarreau [Fri, 17 Jan 2020 16:24:30 +0000 (17:24 +0100)]
MINOR: stream-int: remove dependency on CO_FL_WAIT_ROOM for rcv_buf()
The only case where this made sense was with mux_h1 but Since we
introduced CS_FL_MAY_SPLICE, we don't need to rely on this anymore,
thus we don't need to clear it either when we do not splice.
There is a last check on this flag used to determine if the rx channel
is full and that cannot go away unless it's changed to use the CS
instead but for now this wouldn't add any benefit so better not do
it yet.
Willy Tarreau [Fri, 17 Jan 2020 08:59:40 +0000 (09:59 +0100)]
MINOR: connection: move the CO_FL_WAIT_ROOM cleanup to the reader only
CO_FL_WAIT_ROOM is set by the splicing function in raw_sock, and cleared
by the stream-int when splicing is disabled, as well as in
conn_refresh_polling_flags() so that a new call to ->rcv_pipe() could
be attempted by the I/O callbacks called from conn_fd_handler(). This
clearing in conn_refresh_polling_flags() makes no sense anymore and is
in no way related to the polling at all.
Since we don't call them from there anymore it's better to clear it
before attempting to receive, and to set it again later. So let's move
this operation where it should be, in raw_sock_to_pipe() so that it's
now symmetric. It was also placed in raw_sock_to_buf() so that we're
certain that it gets cleared if an attempt to splice is replaced with
a subsequent attempt to recv(). And these were currently already achieved
by the call to conn_refresh_polling_flags(). Now it could theorically be
removed from the stream-int.
Jerome Magnin [Fri, 17 Jan 2020 15:09:33 +0000 (16:09 +0100)]
BUG/MINOR: pattern: handle errors from fgets when trying to load patterns
We need to do some error handling after we call fgets to make sure everything
went fine. If we don't users can be fooled into thinking they can load pattens
from directory because cfgparse doesn't flinch. This applies to acl patterns
map files.
This should be backported to all supported versions.
Willy Tarreau [Fri, 17 Jan 2020 15:19:34 +0000 (16:19 +0100)]
BUG/MEDIUM: connection: add a mux flag to indicate splice usability
Commit c640ef1a7d ("BUG/MINOR: stream-int: avoid calling rcv_buf() when
splicing is still possible") fixed splicing in TCP and legacy mode but
broke it badly in HTX mode.
What happens in HTX mode is that the channel's to_forward value remains
set to CHN_INFINITE_FORWARD during the whole transfer, and as such it is
not a reliable signal anymore to indicate whether more data are expected
or not. Thus, when data are spliced out of the mux using rcv_pipe(), even
when the end is reached (that only the mux knows about), the call to
rcv_buf() to get the final HTX blocks completing the message were skipped
and there was often no new event to wake this up, resulting in transfer
timeouts at the end of large objects.
All this goes down to the fact that the channel has no more information
about whether it can splice or not despite being the one having to take
the decision to call rcv_pipe() or not. And we cannot afford to call
rcv_buf() inconditionally because, as the commit above showed, this
reduces the forwarding performance by 2 to 3 in TCP and legacy modes
due to data lying in the buffer preventing splicing from being used
later.
The approach taken by this patch consists in offering the muxes the ability
to report a bit more information to the upper layers via the conn_stream.
This information could simply be to indicate that more data are awaited
but the real need being to distinguish splicing and receiving, here
instead we clearly report the mux's willingness to be called for splicing
or not. Hence the flag's name, CS_FL_MAY_SPLICE.
The mux sets this flag when it knows that its buffer is empty and that
data waiting past what is currently known may be spliced, and clears it
when it knows there's no more data or that the caller must fall back to
rcv_buf() instead.
The stream-int code now uses this to determine if splicing may be used
or not instead of looking at the rcv_pipe() callbacks through the whole
chain. And after the rcv_pipe() call, it checks the flag again to decide
whether it may safely skip rcv_buf() or not.
All this bitfield dance remains a bit complex and it starts to appear
obvious that splicing vs reading should be a decision of the mux based
on permission granted by the data layer. This would however increase
the API's complexity but definitely need to be thought about, and should
even significantly simplify the data processing layer.
The way it was integrated in mux-h1 will also result in no more calls
to rcv_pipe() on chunked encoded data, since these ones are currently
disabled at the mux level. However once the issue with chunks+splice
is fixed, it will be important to explicitly check for curr_len|CHNK
to set MAY_SPLICE, so that we don't call rcv_buf() after each chunk.
Jerome Magnin [Thu, 16 Jan 2020 16:37:21 +0000 (17:37 +0100)]
BUG/MINOR: stream: don't mistake match rules for store-request rules
In process_sticking_rules() we only want to apply the first store-request
rule for a given table, but when doing so we need to make sure we only
count actual store-request rules when we list the sticking rules.
Failure to do so leads to not being able to write store-request and match
sticking rules in any order as a match rule after a store-request rule
will be ignored.
The following configuration reproduces the issue:
global
stats socket /tmp/foobar
defaults
mode http
frontend in
bind *:8080
default_backend bar
backend bar
server s1 127.0.0.1:21212
server s2 127.0.0.1:21211
stick store-request req.hdr(foo)
stick match req.hdr(foo)
stick-table type string size 10
Since the fix 5fd3b28 ("BUG/MEDIUM: cli: _getsocks must send the peers
sockets") for bug #443. The code which sends the socket for the peers
and the proxies is duplicated. This patch move this code in a separated
function.
BUG/MEDIUM: cli: _getsocks must send the peers sockets
This bug prevents to reload HAProxy when you have both the seamless
reload (-x / expose-fd listeners) and the peers.
Indeed the _getsocks command does not send the FDs of the peers
listeners, so if no reuseport is possible during the bind, the new
process will fail to bind and exits.
With this feature, it is not possible to fallback on the SIGTTOU method
if we didn't receive all the sockets, because you can't close() the
sockets of the new process without closing those of the previous
process, they are the same.
Willy Tarreau [Wed, 15 Jan 2020 10:31:01 +0000 (11:31 +0100)]
REGTEST: add sample_fetches/hashes.vtc to validate hashes
This regtest validates all hashes that we support, on all input bytes from
0x00 to 0xFF. Those supporting avalanche are tested as well. It also tests
len(), hex() and base64(). It purposely does not enable sha2() because this
one relies on OpenSSL and there's no point in validating that OpenSSL knows
how to hash, what matters is that we can test our hashing functions in all
cases. However since the tests were written, they're still present and
commented out in case that helps.
It may be backported to supported versions, possibly dropping a few algos
that were not supported (e.g. crc32c requires 1.9 minimum).
Note that this test will fail on crc32/djb2/sdbm/wt6 unless patches
"BUG/MINOR: stream: init variables when the list is empty" and
"BUG/MAJOR: hashes: fix the signedness of the hash inputs" are included.
Willy Tarreau [Wed, 15 Jan 2020 09:54:42 +0000 (10:54 +0100)]
BUG/MAJOR: hashes: fix the signedness of the hash inputs
Wietse Venema reported in the thread below that we have a signedness
issue with our hashes implementations: due to the use of const char*
for the input key that's often text, the crc32, sdbm, djb2, and wt6
algorithms return a platform-dependent value for binary input keys
containing bytes with bit 7 set. This means that an ARM or PPC
platform will hash binary inputs differently from an x86 typically.
Worse, some algorithms are well defined in the industry (like CRC32)
and do not provide the expected result on x86, possibly causing
interoperability issues (e.g. a user-agent would fail to compare the
CRC32 of a message body against the one computed by haproxy).
Fortunately, and contrary to the first impression, the CRC32c variant
used in the PROXY protocol processing is not affected. Thus the impact
remains very limited (the vast majority of input keys are text-based,
such as user-agent headers for exmaple).
This patch addresses the issue by fixing all hash functions' prototypes
(even those not affected, for API consistency). A reg test will follow
in another patch.
The vast majority of users do not use these hashes. And among those
using them, very few will pass them on binary inputs. However, for the
rare ones doing it, this fix MAY have an impact during the upgrade. For
example if the package is upgraded on one LB then on another one, and
the CRC32 of a binary input is used as a stick table key (why?) then
these CRCs will not match between both nodes. Similarly, if
"hash-type ... crc32" is used, LB inconsistency may appear during the
transition. For this reason it is preferable to apply the patch on all
nodes using such hashes at the same time. Systems upgraded via their
distros will likely observe the least impact since they're expected to
be upgraded within a short time frame.
And it is important for distros NOT to skip this fix, in order to avoid
distributing an incompatible implementation of a hash. This is the
reason why this patch is tagged as MAJOR, eventhough it's extremely
unlikely that anyone will ever notice a change at all.
This patch must be backported to all supported branches since the
hashes were introduced in 1.5-dev20 (commit 98634f0c). Some parts
may be dropped since implemented later.
Link to Wietse's report:
https://marc.info/?l=postfix-users&m=157879464518535&w=2
rate-limits are valid for both frontend and listen, but not backend; so
we can simplify this check in a similar manner as it is done in e.g
max-keep-alive-queue.
this should fix github issue #449
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Olivier Houchard [Wed, 15 Jan 2020 18:16:23 +0000 (19:16 +0100)]
BUG/MEDIUM: raw_sock: Make sur the fd and conn are sync.
Commit 08fa16e397ffb1c6511b98ade2a3bfff9435e521 made sure
we let the fd layer we didn't want to poll anymore if
we failed to send and sendto() returne EAGAIN.
However, just disabling the polling with fd_stop_send()
while not notifying the connection layer means the
connection layer may believe the polling is activated
and nothing needs to be done when it is wrong.
A better fix is to revamp that whole code, for the
time being, just make sure the fd and connection
layer are properly synchronised.
This should fix the problem recently reported on FreeBSD.
Olivier Houchard [Wed, 15 Jan 2020 18:13:32 +0000 (19:13 +0100)]
BUG/MEDIUM: mux_h1: Don't call h1_send if we subscribed().
In h1_snd_buf(), only attempt to call h1_send() if we haven't
already subscribed.
It makes no sense to do it if we subscribed, as we know we failed
to send before, and will create a useless call to sendto(), and
in 2.2, the call to raw_sock_from_buf() will disable polling if
it is enabled.
This should be backported to 2.2, 2.1, 2.0 and 1.9.
BUG/MEDIUM: mworker: remain in mworker mode during reload
If you reload an haproxy started in master-worker mode with
"master-worker" in the configuration, and no "-W" argument,
the new process lost the fact that is was in master-worker mode
resulting in weird behaviors.
The bigest problem is that if it is reloaded with an bad configuration,
the master will exits instead of remaining in waitpid mode.
This problem was discovered in bug #443.
Should be backported in every version using the master-worker mode.
(as far as 1.8)
BUG/MINOR: cli/mworker: can't start haproxy with 2 programs
When trying to start HAProxy with the master CLI and more than one
program in the configuration, it refuses to start with:
[ALERT] 013/132926 (1378) : parsing [cur--1:0] : proxy 'MASTER', another server named 'cur--1' was already defined at line 0, please use distinct names.
[ALERT] 013/132926 (1378) : Fatal errors found in configuration.
The problem is that haproxy tries to create a server for the MASTER
proxy but only the worker are supposed to be in the server list.
Willy Tarreau [Tue, 14 Jan 2020 10:42:59 +0000 (11:42 +0100)]
BUG/MEDIUM: mux-h2: don't stop sending when crossing a buffer boundary
In version 2.0, after commit 9c218e7521 ("MAJOR: mux-h2: switch to next
mux buffer on buffer full condition."), the H2 mux started to use a ring
buffer for the output data in order to reduce competition between streams.
However, one corner case was suboptimally covered: when crossing a buffer
boundary, we have to shrink the outgoing frame size to the one left in
the output buffer, but this shorter size is later used as a signal of
incomplete send due to a buffer full condition (which used to be true when
using a single buffer). As a result, function h2s_frt_make_resp_data()
used to return less than requested, which in turn would cause h2_snd_buf()
to stop sending and leave some unsent data in the buffer, and si_cs_send()
to subscribe for sending more later.
But it goes a bit further than this, because subscribing to send again
causes the mux's send_list not to be empty anymore, hence extra streams
can be denied the access to the mux till the first stream is woken again.
This causes a nasty wakeup-sleep dance between streams that makes it
totally impractical to try to remove the sending list. A test showed
that it was possible to observe 3 million h2_snd_buf() giveups for only
100k requests when using 100 concurrent streams on 20kB objects.
It doesn't seem likely that a stream could get blocked and time out due
to this bug, though it's not possible either to demonstrate the opposite.
One risk is that incompletely sent streams do not have any blocking flags
so they may not be identified as blocked. However on first scan of the
send_list they meet all conditions for a wakeup.
This patch simply allows to continue on a new frame after a partial
frame. with only this change, the number of failed h2_snd_buf() was
divided by 800 (4% of calls). And by slightly increasing the H2C_MBUF_CNT
size, it can go down to zero.
Willy Tarreau [Tue, 14 Jan 2020 08:59:38 +0000 (09:59 +0100)]
MEDIUM: lua: don't call the GC as often when dealing with outgoing connections
In order to properly close connections established from Lua in case
a Lua context dies, the context currently automatically gets a flag
HLUA_MUST_GC set whenever an outgoing connection is used. This causes
the GC to be enforced on the context's death as well as on yield. First,
it does not appear necessary to do it when yielding, since if the
connections die they are already cleaned up. Second, the problem with
the flag is that even if a connection gets properly closed, the flag is
not removed and the GC continues to be called on the Lua context.
The impact on performance looks quite significant, as noticed and
diagnosed by Sadasiva Gujjarlapudi in the following thread:
This patch changes the flag for a counter so that each created
connection increments it and each cleanly closed connection decrements
it. That way we know we have to call the GC on the context's death only
if the count is non-null. As reported in the thread above, the Lua
performance gain is now over 20% by doing this.
Thanks to Sada and Thierry for the design discussion and tests that
led to this solution.
William Dauchy [Mon, 13 Jan 2020 16:52:49 +0000 (17:52 +0100)]
CLEANUP: ssl: remove opendir call in ssl_sock_load_cert
Since commit 3180f7b55434 ("MINOR: ssl: load certificates in
alphabetical order"), `readdir` was replaced by `scandir`. We can indeed
replace it with a check on the previous `stat` call.
This micro cleanup can be a good benefit when you have hundreds of bind
lines which open TLS certificates directories in terms of syscall,
especially in a case of frequent reloads.
Signed-off-by: William Dauchy <w.dauchy@criteo.com>
Willy Tarreau [Fri, 10 Jan 2020 17:20:15 +0000 (18:20 +0100)]
BUG/MEDIUM: mux-h2: fix missing test on sending_list in previous patch
Previous commit 989539b048 ("BUG/MINOR: mux-h2: use a safe
list_for_each_entry in h2_send()") accidently lost its sending_list test,
resulting in some elements to be woken up again while already in the
sending_list and h2_unsubscribe() crashing on integrity tests (only
when built with DEBUG_DEV).
If the fix above is backported this one must be as well.
Willy Tarreau [Fri, 10 Jan 2020 16:01:29 +0000 (17:01 +0100)]
BUG/MINOR: mux-h2: use a safe list_for_each_entry in h2_send()
h2_send() uses list_for_each_entry() to scan paused streams and resume
them, but happily deletes any leftover from a previous failed unsubscribe,
which is obviously not safe and would corrupt the list. In practice this
is a proof that this doesn't happen, but it's not the best way to prove it.
In order to fix this and reduce the maintenance burden caused by code
duplication (this list walk exists at 3 places), let's introduce a new
function h2_resume_each_sending_h2s() doing exactly this and use it at
all 3 places.
This bug was introduced as a side effect of fix 998410a41b ("BUG/MEDIUM:
h2: Revamp the way send subscriptions works.") so it should be backported
as far as 1.9.
BUG/MINOR: stream-int: Don't trigger L7 retry if max retries is already reached
When an HTTP response is received, at the stream-interface level, if a L7 retry
must be triggered because of the status code, the response is trashed and a read
error is reported on the response channel. Then the stream handles this error
and perform the retry. Except if the maximum connection retries is reached. In
this case, an error is reported. Because the server response was already trashed
by the stream-interface, a generic 502 error is returned to the client instead
of the server's one.
Now, the stream-interface triggers a L7 retry only if the maximum connection
retries is not already reached. Thus, at the end, the last server's response is
returned.
This patch must be backported to 2.1 and 2.0. It should fix the issue #439.
Willy Tarreau [Wed, 8 Jan 2020 18:15:07 +0000 (19:15 +0100)]
BUG/MAJOR: listener: do not schedule a task-less proxy
Apparently seamingless commit 0591bf7deb ("MINOR: listener: make the
wait paths cleaner and more reliable") caused a nasty regression and
revealed a rare race that hits regtest stickiness/lb-services.vtc
about 4% of the times for 8 threads.
The problem is that when a multi-threaded listener wakes up on an
incoming connection, several threads can receive the event, especially
when idle. And all of them will race to accept the connections in
parallel, adjusting the listener's nbconn and proxy's feconn until
one reaches the proxy's limit and declines. At this step the changes
are cancelled, the listener is marked "limited", and when the threads
exit the function, one of them will unlimit the listener/proxy again
so that it can accept incoming connections again.
The problem happens when many threads connect to a small peers section
because its maxconn is very limited (typically 6 for 2 peers), and it's
sometimes possible for enough competing threads to hit the limit and
one of them will limit the listener and queue the proxy's task... except
that peers do not initialize their proxy task since they do not use rate
limiting. Thus the process crashes when doing task_schedule(p->task).
Prior to the cleanup patch above, this didn't happen because the error
path that was dedicated to only limiting the listener did not call
task_schedule(p->task).
Given that the proxy's task is optional, and that the expire value
passed there is always TICK_ETERNITY, it's sufficient and reasonable to
avoid calling this task_schedule() when expire is not set. And for long
term safety we can also avoid to do it when the task is not set. A first
fix consisted in allocating a task for the peers proxies but it's never
used and would eat resources for reason.
No backport is needed as this commit was only merged into 2.2.
What happens is that the failed connect() creates an FD update for pollout,
but the successful synchronous send() doesn't disable it because polling was
only disabled in the FD handler. But a successful synchronous connect()
cancellation is a good opportunity to disable polling before it's effectively
enabled in the next loop, so better disable it when reaching the end. The
cost is very low if it was already disabled anyway (one atomic op).
This only affects local connections but with this the typical number of
epoll_ctl() calls per connection dropped from ~4.2 to ~3.8 for plain TCP
and 10k transfers.
Willy Tarreau [Fri, 20 Dec 2019 10:18:54 +0000 (11:18 +0100)]
MEDIUM: dns: implement synchronous send
In dns_send_query(), there's no point in first waking up the FD, to get
called back by the poller to send the request and sleep. Instead let's
simply send the request as soon as it's known and only subscribe to the
poller when the socket buffers are full and it's required to poll (i.e.
almost never).
This significantly reduces the number of calls to the poller. A large
config sees the number of epoll_ctl() calls reduced from 577 to 7 over
10 seconds, the number of recvfrom() from 1533 to 582 and the number of
sendto() from 369 to 162.
It also has the extra benefit of building each requests only once per
resolution and sending it to multiple resolvers instead of rebuilding
it for each and every resolver.
This will reduce the risk of seeing situations similar to bug #416 in
the future.
Willy Tarreau [Tue, 7 Jan 2020 17:03:09 +0000 (18:03 +0100)]
BUG/MEDIUM: session: do not report a failure when rejecting a session
In session_accept_fd() we can perform a synchronous call to
conn_complete_session() and if it succeeds the connection is accepted
and turned into a session. If it fails we take it as an error while it
is not, in this case, it's just that a tcp-request rule has decided to
reject the incoming connection. The problem with reporting such an event
as an error is that the failed status is passed down to the listener code
which decides to disable accept() for 100ms in order to leave some time
for transient issues to vanish, and that's not what we want to do here.
This fix must be backported as far as 1.7. In 1.7 the code is a bit
different as tcp_exec_l5_rules() is called directly from within
session_new_fd() and ret=0 must be assigned there.
BUG/MINOR: channel: inject output data at the end of output
In co_inject(), data must be inserted at the end of output, not the end of
input. For the record, this function does not take care of input data which are
supposed to not exist. But the caller may reset input data after or before the
call. It is its own choice.
This bug, among other effects, is visible when a redirect is performed on
the response path, on legacy HTTP mode (so for HAProxy < 2.1). The redirect
response is appended after the server response when it should overwrite it.
Thanks to Kevin Zhu <ip0tcp@gmail.com> to report the bug. It must be backported
as far as 1.9.