MINOR: mux-h1: clean up conditions to enabled and disabled splicing
First, there is no reason to announce the splicing support at the
conn-stream level when it is created, at least for now. GTUNE_USE_SPLICE
option is already handled at the stream level.
Second, in h1_rcv_buf(), there is no reason to test the message state to
switch the H1C in splicing mode (via H1C_F_WANT_SPLICE flag).
h1_process_input() already takes care to set CS_FL_MAY_SPLICE flag on the
conn-stream when appropriate. Thus, in h1_rcv_buf(), we can rely on this
flag to change the H1C state.
Finally, if h1_rcv_pipe() is called, it means the H1C is already in the
splicing mode. H1C_F_WANT_SPLICE flag is necessarily already set. Thus no
reason to force it.
This script test abortonclose option for HTTP/1 client only. It may be
backported as far as 2.0. But on the 2.2 and prior, the syslog part must be
adapted to catch log messages emitted by proxy during HAProxy
startup. Following lines must be added :
BUG/MEDIUM: mux-h1: Properly report client close if abortonclose option is set
On client side, if CO_RFL_KEEP_RECV flags is set when h1_rcv_buf() is
called, we force subscription for reads to be able to catch read0. This way,
the event will be reported to upper layer to let the stream abort the
request.
This patch fixes the abortonclose option for H1 connections. It depends on
following patches :
* MEDIUM: mux-h1: Don't block reads when waiting for the other side
* MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
But to be sure the event is handled by the stream, the following patches are
also required :
* BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
* MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
All the series must be backported with caution as far as 2.0, and only after
a period of observation to be sure nothing broke.
MEDIUM: mux-h1: Don't block reads when waiting for the other side
When we are waiting for the other side to read more data, or to read the
next request, we must only stop the processing of input data and not the
data receipt. This patch don't change anything on the subscribes for
reads. So it should not change anything. The only difference is that the H1
connection will try to read data if it is woken up for an I/O event and if
it was subscribed for reads.
This patch is required to fix abortonclose option for H1 client connections.
MINOR: conn-stream: Force mux to wait for read events if abortonclose is set
When the abortonclose option is enabled, to be sure to be immediately
notified when a shutdown is received from the client, the frontend
conn-stream must be sure the mux will wait for read events. To do so, the
CO_RFL_KEEP_RECV flag is set when mux->rcv_buf() is called. This new flag
instructs the mux to wait for read events, regardless its internal state.
This patch is required to fix abortonclose option for H1 client connections.
BUG/MINOR: stream-int: Don't block reads in si_update_rx() if chn may receive
In si_update_rx() function, the reads may be blocked because we explicitly
don't want to read or because of a lack of room in the input buffer. The
first condition is valid. However the second one only test if the channel is
empty or not. It means the reads are blocked if there are still some output
data in the input channel, in its buffer or its pipe. This condition is not
accurate. The reads must not be blocked if the channel can still receive
data. Thus instead of relying on channel_is_empty() function, we now call
channel_may_recv().
This patch is especially useful to be able to catch read0 on client side
when we are waiting for a connection to the server, when abortonclose option
is enabled. Otherwise, the client abort is not detected.
This patch depends on "MINOR: channel: Rely on HTX version if appropriate in
channel_may_recv()". Both must be backported as far as 2.0 after a period of
observation to be sure nothing broke.
MINOR: channel: Rely on HTX version if appropriate in channel_may_recv()
When channel_may_recv() is called for an HTX stream, the HTX version,
channel_htx_may_recv() is called. This patch is mandatory to fix a bug
related to the abortonclose option.
Willy Tarreau [Wed, 5 May 2021 16:17:39 +0000 (18:17 +0200)]
BUILD: makefile: add new option USE_MEMORY_PROFILING
It is not enabled by default, and may only work on linux-glibc for now,
though maybe other platforms could adopt it, possibly with certain
restrictions.
Willy Tarreau [Wed, 5 May 2021 16:07:02 +0000 (18:07 +0200)]
MINOR: activity: make "show profiling" also dump the memoery usage
Now the memory usage stats are dumped. They are first sorted by total
alloc+free so that the first ones are always the most relevant, and
that most symmetric alloc/free pairs appear next to each other. This
way it becomes convenient to only show a small part of them such as:
show profiling memory 20
It's worth noting that the sorting is performed upon each call to the
iohandler so it is technically possible that an entry could appear
twice or be dropped if the ordering changes between two calls. In
practice it is not an issue but it's worth being mentioned.
Willy Tarreau [Wed, 5 May 2021 15:33:27 +0000 (17:33 +0200)]
MINOR: activity: clean up the show profiling io_handler a little bit
Let's rearrange it to make it more configurable and allow to iterate
over multiple parts (header, tasks, memory etc), to restart from a
given line number (previously it didn't work, though fortunately it
didn't happen), and to support dumping only certain parts and a given
number of lines. A few entries from ctx.cli are now used to store a
restart point and the current step.
Willy Tarreau [Wed, 5 May 2021 15:07:09 +0000 (17:07 +0200)]
MEDIUM: activity: collect memory allocator statistics with USE_MEMORY_PROFILING
When built with USE_MEMORY_PROFILING the main memory allocation functions
are diverted to collect statistics per caller. It is a bit tricky because
the only way to call the original ones is to find their pointer, which
requires dlsym(), and which is not available everywhere.
Thus all functions are designed to call their fallback function (the
original one), which is preset to an initialization function that is
supposed to call dlsym() to resolve the missing symbols, and vanish.
This saves expensive tests in the critical path.
A second problem is that dlsym() calls calloc() to initialize some
error messages. After plenty of tests with posix_memalign(), valloc()
and friends, it turns out that returning NULL still makes it happy.
Thus we currently use a visit counter (in_memprof) to detect if we're
reentering, in which case all allocation functions return NULL.
In order to convert a return address to an entry in the stats, we
perform a cheap hash consisting in multiplying the pointer by a
balanced number (as many zeros as ones) and keeping the middle bits.
The hash is already pretty good like this, achieving to store up to
638 entries in a 2048-entry table without collision. But in order to
further refine this and improve the fill ratio of the table, in case
of collision we move up to 16 adjacent entries to find a free place.
This remains quite cheap and manages to store all of these inside a
1024-entries hash table with even less risk of collision.
Also, free(NULL) does not produce any stats. By doing so we reduce
from 638 to 208 the average number of entries needed for a basic
config using SSL. free(NULL) not only provides no information as it's
a NOP, but keeping it is pure pollution as it happens all the time.
When DEBUG_MEM_STATS is enabled, malloc/calloc/realloc are redefined as
macros, preventing the code from compiling. Thus, when this option is
detected, the macros are undefined as they are pointless there anyway.
The functions are optimized to quickly jump to the fallback and as such
become almost invisible in terms of processing time, execpt an extra
"if" on a read_mostly variable and a jump. Considering that this only
happens for pool misses and library routines, this remains acceptable.
Performance tests in SSL (the most stressful test) shows less than 1%
performance loss when profiling is enabled on 2c4t.
The code was written in a way to ease backporting to modern versions
(2.2+) if needed, so it keeps the long names for integers and doesn't
use the _INC version of the atomic ops.
Willy Tarreau [Wed, 5 May 2021 14:50:40 +0000 (16:50 +0200)]
MINOR: activity: declare the storage for memory usage statistics
We'll need to store for each call place, the pointer to the caller
(the return address to be more exact as with free() it's not uncommon
to see tail calls), the number of calls to alloc/free and the total
alloc/free bytes. realloc() will be counted either as alloc or free
depending on the balance of the size before vs after.
We store 1024+1 entries. The first ones are used as hashes and the
last one for collisions.
When profiling is enabled via the CLI, all the stats are reset.
Willy Tarreau [Wed, 5 May 2021 14:28:31 +0000 (16:28 +0200)]
CLEANUP: activity: mark the profiling and task_profiling_mask __read_mostly
These ones are only read by the scheduler and occasionally written to
by the CLI parser, so let's move them to read_mostly so that they do
not risk to suffer from cache line pollution.
Willy Tarreau [Wed, 5 May 2021 07:06:21 +0000 (09:06 +0200)]
MINOR: tools: add functions to retrieve the address of a symbol
get_sym_curr_addr() will return the address of the first occurrence of
the given symbol while get_sym_next_addr() will return the address of
the next occurrence of the symbol. These ones return NULL on non-linux,
non-ELF, non-USE_DL.
MEDIUM: connection: close front idling connection on soft-stop
Implement a safe mechanism to close front idling connection which
prevents the soft-stop to complete. Every h1/h2 front connection is
added in a new per-thread list instance. On shutdown, a new task is
waking up which calls wake mux operation on every connection still
present in the new list.
A new stopping_list attach point has been added in the connection
structure. As this member is only used for frontend connections, it
shared the same union as the session_list reserved for backend
connections.
MEDIUM: mux_h1: release idling frontend conns on soft-stop
In h1_process, if the proxy of a frontend connection is disabled,
release the connection.
This commit is in preparation to properly close idling front connections
on soft-stop. h1_process must still be called, this will be done via a
dedicated task which monitors the global variable stopping.
MINOR: connection: move session_list member in a union
Move the session_list attach point in an anonymous union. This member is
only used for backend connections. This commit is in preparation for the
support of stopping frontend idling connections which will add another
member to the union.
This change means that a special care must be taken to be sure that only
backend connections manipulate the session_list. A few BUG_ON has been
added as special guard to prevent from misuse.
MINOR: srv: close all idle connections on shutdown
Implement a function to close all server idle connections. This function
is called via a global deinit server handler.
The main objective is to prevents from leaving sockets in TIME_WAIT
state. To limit the set of operations on shutdown and prevents
tasks rescheduling, only the ctrl stack closing is done.
The purpose of this debugging option was to prevent certain pools from
masking other ones when they were shared. For example, task, http_txn,
h2s, h1s, h1c, session, fcgi_strm, and connection are all 192 bytes and
would normally be mergedi, but not with this option. The problem is that
certain pools are declared multiple times with various parameters, which
are often very close, and due to the way the option works, they're not
shared either. Good examples of this are captures and stick tables. Some
configurations have large numbers of stick-tables of pretty similar types
and it's very common to end up with the following when the option is
enabled:
In addition to not being convenient, it can have important effects on the
memory usage because these pools will not share their entries, so one stick
table cannot allocate from another one's pool.
This patch solves this by going back to the initial goal which was not to
have different pools in the same list. Instead of masking the MAP_F_SHARED
flag, it simply adds a test on the pool's name, and disables pool sharing
if the names differ. This way pools are not shared unless they're of the
same name and size, which doesn't hinder debugging. The same test above
now returns this:
Willy Tarreau [Tue, 4 May 2021 16:40:50 +0000 (18:40 +0200)]
MINOR: debug: add a new "debug dev sym" command in expert mode
This command attempts to resolve a pointer to a symbol name. This is
convenient during development as it's easier to get such pointers live
than by issuing a debugger or calling addr2line.
Willy Tarreau [Tue, 4 May 2021 14:27:45 +0000 (16:27 +0200)]
BUG/MEDIUM: cli: prevent memory leak on write errors
Since the introduction of payload support on the CLI in 1.9-dev1 by
commit abbf60710 ("MEDIUM: cli: Add payload support"), a chunk is
temporarily allocated for the CLI to support defragmenting a payload
passed with a command. However it's only released when passing via
the CLI_ST_END state (i.e. on clean shutdown), but not on errors.
Something as trivial as:
$ while :; do ncat --send-only -U /path/to/cli <<< "show stat"; done
with a few hundreds of servers is enough see the number of allocated
trash chunks go through the roof in "show pools".
BUG/MINOR: hlua: Don't rely on top of the stack when using Lua buffers
When the lua buffers are used, a variable number of stack slots may be
used. Thus we cannot assume that we know where the top of the stack is. It
was not an issue for lua < 5.4.3 (at least for small buffers). But
'socket:receive()' now fails with lua 5.4.3 because a light userdata is
systematically pushed on the top of the stack when a buffer is initialized.
To fix the bug, in hlua_socket_receive(), we save the index of the top of
the stack before creating the buffer. This way, we can check the number of
arguments, regardless anything was pushed on the stack or not.
Note that the other buffer usages seem to be safe.
This patch should solve the issue #1240. It should be backport to all stable
branches.
Willy Tarreau [Sat, 1 May 2021 06:25:15 +0000 (08:25 +0200)]
[RELEASE] Released version 2.4-dev18
Released version 2.4-dev18 with the following main changes :
- DOC: Fix indentation for `path-strip-dot` normalizer
- DOC: Fix RFC reference for the percent-to-uppercase normalizer
- DOC: Add RFC references for the path-strip-dot(dot)? normalizers
- MINOR: uri_normalizer: Add a `percent-decode-unreserved` normalizer
- BUG/MINOR: mux-fcgi: Don't send normalized uri to FCGI application
- REORG: htx: Inline htx functions to add HTX blocks in a message
- CLEANUP: assorted typo fixes in the code and comments
- DOC: general: fix white spaces for HTML converter
- BUG/MINOR: ssl: ssl_sock_prepare_ssl_ctx does not return an error code
- BUG/MINOR: cpuset: move include guard at the very beginning
- BUG/MAJOR: fix build on musl with cpu_set_t support
- BUG/MEDIUM: cpuset: fix build on MacOS
- BUG/MINOR: htx: Preserve HTX flags when draining data from an HTX message
- MEDIUM: htx: Refactor htx_xfer_blks() to not rely on hdrs_bytes field
- CLEANUP: htx: Remove unsued hdrs_bytes field from the HTX start-line
- BUG/MINOR: mux-h2: Don't encroach on the reserve when decoding headers
- MEDIUM: http-ana: handle read error on server side if waiting for response
- MINOR: htx: Limit length of headers name/value when a HTX message is dumped
- BUG/MINOR: applet: Notify the other side if data were consumed by an applet
- BUG/MINOR: hlua: Don't consume headers when starting an HTTP lua service
- BUG/MEDIUM: mux-h2: Handle EOM flag when sending a DATA frame with zero-copy
- CLEANUP: channel: No longer notify the producer in co_skip()/co_htx_skip()
- DOC: general: fix example in set-timeout
- CLEANUP: cfgparse: de-uglify early file error handling in readcfgfile()
- MINOR: config: add a new "default-path" global directive
- BUG/MEDIUM: peers: initialize resync timer to get an initial full resync
- BUG/MEDIUM: peers: register last acked value as origin receiving a resync req
- BUG/MEDIUM: peers: stop considering ack messages teaching a full resync
- BUG/MEDIUM: peers: reset starting point if peers appears longly disconnected
- BUG/MEDIUM: peers: reset commitupdate value in new conns
- BUG/MEDIUM: peers: re-work updates lookup during the sync on the fly
- BUG/MEDIUM: peers: reset tables stage flags stages on new conns
- MINOR: peers: add informative flags about resync process for debugging
- BUG/MEDIUM: time: fix updating of global_now upon clock drift
- CLEANUP: freq_ctr: make arguments of freq_ctr_total() const
- CLEANUP: hlua: rename hlua_appctx* appctx to luactx
- MINOR: server: fix doc/trace on lb algo for dynamic server creation
- REGTESTS: server: fix cli_add_server due to previous trace update
- REGTESTS: add minimal CLI "add map" tests
- DOC: management: move "set var" to the proper place
- CLEANUP: map: slightly reorder the add map function
- MINOR: map: get rid of map_add_key_value()
- MINOR: map: show the current and next pattern version in "show map"
- MINOR: map/acl: add the possibility to specify the version in "show map/acl"
- MINOR: pattern: support purging arbitrary ranges of generations
- MINOR: map/acl: add the possibility to specify the version in "clear map/acl"
- MINOR: map/acl: add the "prepare map/acl" CLI command
- MINOR: map/acl: add the "commit map/acl" CLI command
- MINOR: map/acl: make "add map/acl" support an optional version number
- CLEANUP: map/cli: properly align the map/acl help
- BUILD: compiler: do not use already defined __read_mostly on dragonfly
MINOR: map/acl: make "add map/acl" support an optional version number
By passing a version number to "add map/acl", it becomes possible to
atomically replace maps and ACLs. The principle is that a new version
number is first retrieved by calling"prepare map/acl", and this version
number is used with "add map" and "add acl". Newly added entries then
remain invisible to the matching mechanism but are visible in "show
map/acl" when the version number is specified, or may be cleard with
"clear map/acl". Finally when the insertion is complete, a
"commit map/acl" command must be issued, and the version is atomically
updated so that there is no intermediate state with incomplete entries.
MINOR: map/acl: add the "commit map/acl" CLI command
The command is used to atomically replace a map/acl with the pending
contents of the designated version. The new version must have been
allocated by "prepare map/acl" prior to this. At the moment it is not
possible to force the version when adding new entries, so this may only
be used to atomically clear an ACL/map.
MINOR: map/acl: add the "prepare map/acl" CLI command
This command allocates a new version for the map/acl, that will be usable
later to prepare the addition of new values to atomically replace existing
ones. Technically speaking the operation consists in atomically incrementing
the next version. There's no "undo" operation here, if a version is not
committed, it will automatically be trashed when committing a newer version.
MINOR: map/acl: add the possibility to specify the version in "clear map/acl"
This will ease maintenance of versionned maps by allowing to clear old or
failed updates instead of the current version. Nothing was done to allow
clearing everyhing, though if there was a need for this, implementing "@all"
or something equivalent wouldn't require more than 3 lines of code.
MINOR: pattern: support purging arbitrary ranges of generations
Instead of being able to purge only values older than a specific value,
let's support arbitrary ranges and make pat_ref_purge_older() just be
one special case of this one.
MINOR: map/acl: add the possibility to specify the version in "show map/acl"
The maps and ACLs internally all have two versions, the "current" one,
which is the one being matched against, and the "next" one, the one being
filled during an atomic replacement. Till now the "show" commands only used
to show the current one but it can be convenient to be able to show other
ones as well, so let's add the ability to do this with "show map" and
"show acl". The method used here consists in passing the version number
as "@<ver>" before the map/acl name or ID. It would have been better after
it but that could create confusion with keys already using such a format.
MINOR: map: show the current and next pattern version in "show map"
The "show map" command wasn't updated when pattern generations were
added for atomic reloads, let's report them in the "show map" command
that lists all known maps. It will be useful for users.
This function was only used once in cli_parse_add_map(), and half of the
work it used to do was already known from the caller or testable outside
of the lock. Given that we'll need to modify it soon to pass a generation
number, let's remerge it in the caller instead, using pat_ref_load() which
is the one we'll need.
CLEANUP: map: slightly reorder the add map function
The function uses two distinct code paths for single the key/value pair
and multiple pairs inserted as payload, each with a copy-paste of the
error handling. Let's modify the loop to factor them out.
DOC: management: move "set var" to the proper place
Commit b8bd1ee89 ("MEDIUM: cli: add a new experimental "set var" command")
added "get var" and "set var" but "set var" was misplaced in the doc,
breaking the alphabetic ordering.
The map_redirect test already tests for "show map", "del map" and
"clear map" but doesn't have any "add map" command. Let's add some
trivial ones involving one regular entry and two other ones added as
payload, checking they are properly returned.
REGTESTS: server: fix cli_add_server due to previous trace update
Error output for dynamic server creation if invalid lb algo has changed
since previous commit :
MINOR: server: fix doc/trace on lb algo for dynamic server creation
The vtest regex should have been updated has well to match it.
MINOR: server: fix doc/trace on lb algo for dynamic server creation
The text mentionned that only backends with consistent hash method were
supported for dynamic servers. In fact, it is only required that the lb
algorith is dynamic.
CLEANUP: hlua: rename hlua_appctx* appctx to luactx
There is some serious confusion in the lua interface code related to
sockets and services coming from the hlua_appctx structs being called
"appctx" everywhere, and where the real appctx is reached using
appctx->appctx. This part is a bit of a pain to debug so let's rename
all occurrences of this local variable to "luactx".
BUG/MEDIUM: time: fix updating of global_now upon clock drift
During commit 7e4a557f6 ("MINOR: time: change the global timeval and the
the global tick at once") the approach made sure that the new now_ms was
always higher than or equal to global_now_ms, but by forgetting the old
value. This can cause the first update to global_now_ms to fail if it's
already out of sync, going back into the loop, and the subsequent call
would then succeed due to commit 4d01f3dcd ("MINOR: time: avoid
overwriting the same values of global_now").
And if it goes out of sync, it will fail to update forever, as observed
by Ashley Penney in github issue #1194, causing incorrect freq counters
calculations everywhere. One possible trigger for this issue is one thread
spinning for a few milliseconds while the other ones continue to work.
The issue really is that old_now_ms ought not to be modified in the loop
as it's used for the CAS. But we don't need to structurally guarantee that
global_now_ms grows monotonically as it's computed from the new global_now
which is already verified for this via the __tv_islt() test. Thus, dropping
any corrections on global_now_ms in the loop is the correct way to proceed
as long as this one is always updated to follow global_now.
MINOR: peers: add informative flags about resync process for debugging
This patch adds miscellenous informative flags raised during the initial
full resync process performed during the reload for debugging purpose.
0x00000010: Timeout waiting for a full resync from a local node
0x00000020: Timeout waiting for a full resync from a remote node
0x00000040: Session aborted learning from a local node
0x00000080: Session aborted learning from a remote node
0x00000100: A local node teach us and was fully up to date
0x00000200: A remote node teach us and was fully up to date
0x00000400: A local node teach us but was partially up to date
0x00000800: A remote node teach us but was partially up to date
0x00001000: A local node was assigned for a full resync
0x00002000: A remote node was assigned for a full resync
0x00004000: A resync was explicitly requested
This patch could be backported on any supported branch
BUG/MEDIUM: peers: reset tables stage flags stages on new conns
Flags used as context to know current status of each table pushing a
full resync to a peer were correctly reset receiving a new resync
request or confirmation message but in case of local peer sync during
reload the resync request is implicit and those flags were not
correctly reset in this case.
This could result to a partial initial resync of some tables after reload
if the connection with the old process was broken and retried.
This patch reset those flags at the end of the handshake for all new
connections to be sure to push a entire full resync if needed.
This patch should be backported on all supported branches ( v >= 1.6 )
BUG/MEDIUM: peers: re-work updates lookup during the sync on the fly
Only entries between the opposite of the last 'local update' rotating
counter were considered to be pushed. This processing worked in most
cases because updates are continually pushed trying to reach this point
but it remains some cases where updates id are more far away in the past
and appearing in futur and the push of updates is stuck until the head
reach again the tail which could take a very long time.
This patch re-work the lookup to consider that all positions on the
rotating counter is considered in the past until we reach exactly
the 'local update' value. Doing this, the updates push won't be stuck
anymore.
This patch should be backported on all supported branches ( >= 1.6 )
Emeric Brun [Tue, 23 Feb 2021 16:08:08 +0000 (17:08 +0100)]
BUG/MEDIUM: peers: reset commitupdate value in new conns
The commitupdate value of the table is used to check if the update
is still pending for a push for all peers. To be sure to not miss a
push we reset it just after a handshake success.
This patch should be backported on all supported branches ( >= 1.6 )
Emeric Brun [Tue, 23 Feb 2021 15:50:53 +0000 (16:50 +0100)]
BUG/MEDIUM: peers: reset starting point if peers appears longly disconnected
If two peers are disconnected and during this period they continue to
process a large amount of local updates, after a reconnection they
may take a long time before restarting to push their updates. because
the last pushed update would appear internally in futur.
This patch fix this resetting the cursor on acked updates at the maximum
point considered in the past if it appears in futur but it means we
may lost some updates. A clean fix would be to update the protocol to
be able to signal a remote peer that is was not updated for a too long
period and needs a full resync but this is not yet supported by the
protocol.
This patch should be backported on all supported branches ( >= 1.6 )
Emeric Brun [Thu, 4 Mar 2021 09:27:10 +0000 (10:27 +0100)]
BUG/MEDIUM: peers: stop considering ack messages teaching a full resync
The re-con cursor was updated receiving any ack message
even if we are pushing a complete resync to a peer. This cursor
is reset at the end of the resync but if the connection is broken
during resync, we could re-start at an unwanted point.
With this patch, the peer stops to consider ack messages pushing
a resync since the resync process has is own acknowlegement and
is always restarted from the beginning in case of broken connection.
This patch should be backported on all supported branches ( >= 1.6 )
BUG/MEDIUM: peers: register last acked value as origin receiving a resync req
Receiving a resync request, the origins to start the full sync and
to reset after the full resync are mistakenly computed based on
the last update on the table instead of computed based on the
the last update acked by the node requesting the resync.
It could result in disordered or missing updates pushing to the
requester
This patch sets correctly those origins.
This patch should be backported on all supported branches ( >= 1.6 )
BUG/MEDIUM: peers: initialize resync timer to get an initial full resync
If a reload is performed and there is no incoming connections
from the old process to push a full resync, the new process
can be stuck waiting indefinitely for this conn and it never tries a
fallback requesting a full resync from a remote peer because the resync
timer was init to TICK_ETERNITY.
This patch forces a reset of the resync timer to default value (5 secs)
if we detect value is TICK_ETERNITY.
This patch should be backported on all supported branches ( >= 1.6 )
MINOR: config: add a new "default-path" global directive
By default haproxy loads all files designated by a relative path from the
location the process is started in. In some circumstances it might be
desirable to force all relative paths to start from a different location
just as if the process was started from such locations. This is what this
directive is made for. Technically it will perform a temporary chdir() to
the designated location while processing each configuration file, and will
return to the original directory after processing each file. It takes an
argument indicating the policy to use when loading files whose path does
not start with a slash ('/').
A few options are offered, "current" (the default), "config" (files
relative to config file's dir), "parent" (files relative to config file's
parent dir), and "origin" with an absolute path.
CLEANUP: cfgparse: de-uglify early file error handling in readcfgfile()
In readcfgfile() when malloc() fails to allocate a buffer for the
config line, it currently says "parsing[<file>]: out of memory" while
the error is unrelated to the config file and may make one think it has
to do with the file's size. The second test (fopen() returning error)
needs to release the previously allocated line. Both directly return -1
which is not even documented as a valid error code for the function.
Let's simply make sure that the few variables freed at the end are
properly preset, and jump there upon error, after having displayed a
meaningful error message. Now at least we can get this:
$ ./haproxy -f /dev/kmem
[NOTICE] 116/191904 (23233) : haproxy version is 2.4-dev17-c3808c-13
[NOTICE] 116/191904 (23233) : path to executable is ./haproxy
[ALERT] 116/191904 (23233) : Could not open configuration file /dev/kmem : Permission denied
Alex [Tue, 27 Apr 2021 10:57:07 +0000 (12:57 +0200)]
DOC: general: fix example in set-timeout
The alternative arguments are always in curly brackets, let's fix it for
set-timeout.
The Example in set-timeout does not have the one of the required argument.
This commit makes the PR https://github.com/cbonte/haproxy-dconv/pull/34
obsolete.
CLEANUP: channel: No longer notify the producer in co_skip()/co_htx_skip()
Thanks to the commit "BUG/MINOR: applet: Notify the other side if data were
consumed by an applet", it is no longer necessary to notify the producer when an
applet skips output data. Now, it is the default applet handler responsibility
to take care of that.
BUG/MEDIUM: mux-h2: Handle EOM flag when sending a DATA frame with zero-copy
When a DATA frame is sent, we must take care to properly detect the EOM flag
on the HTX message to set ES flag on the frame when necessary, to finish the
stream. But it is only done when data are copied from the HTX message to the
mux buffer and not when the frame are sent via a zero-copy. This patch fixes
this bug.
BUG/MINOR: hlua: Don't consume headers when starting an HTTP lua service
When an HTTP lua service is started, headers are consumed before calling the
script. When it was initialized, the headers were stored in a lua array,
thus they can be removed from the HTX message because the lua service will
no longer access them. But it is a problem with bodyless messages because
the EOM flag is lost. Indeed, once the headers are consumed, the message is
empty and the buffer is reset, included the flags.
Now, the headers are not immediately consumed. We will skip them if
applet:receive() or applet:getline(). This way, the EOM flag is preserved.
At the end, when the script is finished, all output data are consumed, thus
this remains safe.
BUG/MINOR: applet: Notify the other side if data were consumed by an applet
If an applet consumed output data (the amount of output data has changed
between before and after the call to the applet), the producer is
notified. It means CF_WRITE_PARTIAL and CF_WROTE_DATA are set on the output
channel and the opposite stream interface is notified some room was made in
its input buffer. This way, it is no longer the applet responsibility to
take care of it. However, it doesn't matter if the applet does the same.
Said like that, it looks like an improvement not a bug. But it really fixes
a bug in the lua, for HTTP applets. Indeed, applet:receive() and
applet:getline() are buggy for HTTP applets. Data are consumed but the
producer is not notified. It means if the payload is not fully received in
one time, the applet may be blocked because the producer remains blocked (it
is time dependent).
This patch must be backported as far as 2.0 (only for the HTX part).
MEDIUM: http-ana: handle read error on server side if waiting for response
A read error on the server side is also reported as a write error on the
client side. It means some times, a server side error is handled on the
client side. Among others, it is the case when the client side is waiting
for the response while the request processing is already finished. In this
case, the error is not handled as a server error. It is not accurate.
So now, when the request processing is finished but not the response
processing and if a read error was encountered on the server side, the error
is not immediatly processed on the client side, to let a chance to response
analysers to properly catch the error.
BUG/MINOR: mux-h2: Don't encroach on the reserve when decoding headers
Since the input buffer is transferred to the stream when it is created,
there is no longer control on the request size to be sure the buffer's
reserve is still respected. It was automatically performed in h2_rcv_buf()
because the caller took care to provide the correct available space in the
buffer. The control is still there but it is no longer applied on the
request headers. Now, we should take care of the reserve when the headers
are decoded, before the stream creation.
The test is performed for the request and the response.
MEDIUM: htx: Refactor htx_xfer_blks() to not rely on hdrs_bytes field
It is the only function using the hdrs_bytes start-line field. Thus the
function has been refactored to no longer rely on it. To do so, we first
copy HTX blocks to the destination message, without removing them from the
source message. If the copy is interrupted on headers or trailers, we roll
back. Otherwise, data are drained from the source buffer.
Most of time, the copy will succeeds. So the roll back is only performed in
the worst but very rare case.
BUG/MINOR: htx: Preserve HTX flags when draining data from an HTX message
When all data of an HTX message are drained, we rely on htx_reset() to
reinit the message state. However, the flags must be preserved. It is, among
other things, important to preserve processing or parsing errors.
The new global variable cpu_map conflicted with a local variable of the
same name in the code path for the apple platform when setting the
process affinity.
BUG/MAJOR: fix build on musl with cpu_set_t support
Move cpu_map structure outside of the global struct to a global
variable defined in cpuset.c compilation unit. This allows to reorganize
the includes without having to define _GNU_SOURCE everywhere for the
support of the cpu_set_t.
This fixes the compilation with musl libc, most notably used for the
alpine based docker image.
This fixes the github issue #1235.
No need to backport as this feature is new in the current
2.4-dev.
BUG/MINOR: ssl: ssl_sock_prepare_ssl_ctx does not return an error code
The return value check was wrongly based on error codes when the
function actually returns an error number.
This bug was introduced by f3eedfe19592ebcbaa5b97d8c68aa162e7f6f8fa
which is a feature not present before branch 2.4.
REORG: htx: Inline htx functions to add HTX blocks in a message
The HTX functions used to add new HTX blocks in a message have been moved to
the header file to inline them in calling functions. These functions are
small enough.
BUG/MINOR: mux-fcgi: Don't send normalized uri to FCGI application
A normalized URI is the internal term used to specify an URI is stored using
the absolute format (scheme + authority + path). For now, it is only used
for H2 clients. It is the default and recommended format for H2 request.
However, it is unusual for H1 servers to receive such URI. So in this case,
we only send the path of the absolute URI. It is performed for H1 servers,
but not for FCGI applications. This patch fixes the difference.
Note that it is not a real bug, because FCGI applications should support
abosolute URI.
Note also a normalized URI is only detected for H2 clients when a request is
received. There is no such test on the H1 side. It means an absolute URI
received from an H1 client will be sent without modification to an H1 server
or a FCGI application.
To make it possible, a dedicated function has been added to get the H1
URI. This function is called by the H1 and the FCGI multiplexer when a
request is sent to a server.
This patch should fix the issue #1232. It must be backported as far as 2.2.
Released version 2.4-dev17 with the following main changes :
- MINOIR: mux-pt/trace: Register a new trace source with its events
- BUG/MINOR: mux-pt: Fix a possible UAF because of traces in mux_pt_io_cb
- CI: travis: Drastically clean up .travis.yml
- CLEANUP: pattern: make all pattern tables read-only
- MINOR: trace: replace the trace() inline function with an equivalent macro
- MINOR: initcall: uniformize the section names between MacOS and other unixes
- CLEANUP: initcall: rename HA_SECTION to HA_INIT_SECTION
- MINOR: compiler: add macros to declare section names
- CLEANUP: initcall: rely on HA_SECTION_* instead of defining its own
- MINOR: global: declare a read_mostly section
- MINOR: fd: move a few read-mostly variables to their own section
- MINOR: epoll: move epoll_fd to read_mostly
- MINOR: kqueue: move kqueue_fd to read_mostly
- MINOR: pool: move pool declarations to read_mostly
- MINOR: threads: mark all_threads_mask as read_mostly
- MINOR: server: move idle_conn_task to read_mostly
- MINOR: protocol: move __protocol_by_family to read_mostly
- MINOR: pattern: make the pat_lru_seed read_mostly
- MINOR: trace: make trace sources read_mostly
- MINOR: freq_ctr: add a generic function to report the total value
- MEDIUM: freq_ctr: make read_freq_ctr_period() use freq_ctr_total()
- MEDIUM: freq_ctr: reimplement freq_ctr_remain_period() from freq_ctr_total()
- MINOR: freq_ctr: add the missing next_event_delay_period()
- MINOR: freq_ctr: unify freq_ctr and freq_ctr_period into freq_ctr
- MEDIUM: freq_ctr: replace the per-second counters with the generic ones
- MINOR: freq_ctr: add cpu_relax in the rotation loop of update_freq_ctr_period()
- MINOR: freq_ctr: simplify and improve the update function
- CLEANUP: time: remove the now unused ms_left_scaled
- MINOR: time: move the time initialization out of tv_update_date()
- MINOR: time: remove useless variable copies in tv_update_date()
- MINOR: time: change the global timeval and the the global tick at once
- MEDIUM: time: make the clock offset global and no per-thread
- MINOR: atomic: reimplement the relaxed version of x86 BTS/BTR
- MINOR: trace: Add the checks as a possible trace source
- MINOIR: checks/trace: Register a new trace source with its events
- MINOR: hlua: Add function to release a lua function
- BUG/MINOR: hlua: Fix memory leaks on error path when registering a task
- BUG/MINOR: hlua: Fix memory leaks on error path when registering a converter
- BUG/MINOR: hlua: Fix memory leaks on error path when registering a fetch
- BUG/MINOR: hlua: Fix memory leaks on error path when parsing a lua action
- BUG/MINOR: hlua: Fix memory leaks on error path when registering an action
- BUG/MINOR: hlua: Fix memory leaks on error path when registering a service
- BUG/MINOR: hlua: Fix memory leaks on error path when registering a cli keyword
- BUG/MINOR: cfgparse/proxy: Fix some leaks during proxy section parsing
- BUG/MINOR: listener: Handle allocation error when allocating a new bind_conf
- BUG/MINOR: cfgparse/proxy: Hande allocation errors during proxy section parsing
- MINOR: cfgparse/proxy: Group alloc error handling during proxy section parsing
- DOC: internals: update the SSL architecture schema
- BUG/MEDIUM: sample: Fix adjusting size in field converter
- MINOR: sample: add ub64dec and ub64enc converters
- CLEANUP: sample: align samples list in sample.c
- MINOR: ist: Add `istclear(struct ist*)`
- CI: cirrus: install "pcre" package
- MINOR: opentracing: correct calculation of the number of arguments in the args[]
- MINOR: opentracing: transfer of context names without prefix
- MINOR: sample: converter: Add mjson library.
- MINOR: sample: converter: Add json_query converter
- CI: travis-ci: enable weekly graviton2 builds
- DOC: ssl: Certificate hot update only works on fronted certificates
- DOC: ssl: Certificate hot update works on server certificates
- BUG/MEDIUM: threads: Ignore current thread to end its harmless period
- MINOR: threads: Only consider running threads to end a thread harmeless period
- BUG/MINOR: checks: Set missing id to the dummy checks frontend
- MINOR: logs: Add support of checks as session origin to format lf strings
- BUG/MINOR: connection: Fix fc_http_major and bc_http_major for TCP connections
- MINOR: connection: Make bc_http_major compatible with tcp-checks
- BUG/MINOR: ssl-samples: Fix ssl_bc_* samples when called from a health-check
- BUG/MINOR: http-fetch: Make method smp safe if headers were already forwarded
- MINOR: tcp_samples: Add samples to get src/dst info of the backend connection
- MINOR: tcp_samples: Be able to call bc_src/bc_dst from the health-checks
- BUG/MINOR: http_htx: Remove BUG_ON() from http_get_stline() function
- BUG/MINOR: logs: Report the true number of retries if there was no connection
- BUILD: makefile: Redirect stderr to /dev/null when probing options
- MINOR: uri_normalizer: Add uri_normalizer module
- MINOR: uri_normalizer: Add `enum uri_normalizer_err`
- MINOR: uri_normalizer: Add `http-request normalize-uri`
- MINOR: uri_normalizer: Add a `merge-slashes` normalizer to http-request normalize-uri
- MINOR: uri_normalizer: Add a `dotdot` normalizer to http-request normalize-uri
- MINOR: uri_normalizer: Add support for supressing leading `../` for dotdot normalizer
- MINOR: uri_normalizer: Add a `sort-query` normalizer
- MINOR: uri_normalizer: Add a `percent-upper` normalizer
- MEDIUM: http_act: Rename uri-normalizers
- DOC: Add introduction to http-request normalize-uri
- DOC: Note that URI normalization is experimental
- BUG/MINOR: pools: maintain consistent ->allocated count on alloc failures
- BUG/MINOR: pools/buffers: make sure to always reserve the required buffers
- MINOR: pools: drop the unused static history of artificially failed allocs
- CLEANUP: pools: remove unused arguments to pool_evict_from_cache()
- MEDIUM: pools: move the cache into the pool header
- MINOR: pool: remove the size field from pool_cache_head
- MINOR: pools: rename CONFIG_HAP_LOCAL_POOLS to CONFIG_HAP_POOLS
- MINOR: pools: enable the fault injector in all allocation modes
- MINOR: pools: make the basic pool_refill_alloc()/pool_free() update needed_avg
- MEDIUM: pools: unify pool_refill_alloc() across all models
- CLEANUP: pools: re-merge pool_refill_alloc() and __pool_refill_alloc()
- MINOR: pools: call pool_alloc_nocache() out of the pool's lock
- CLEANUP: pools: move the lock to the only __pool_get_first() that needs it
- CLEANUP: pools: rename __pool_get_first() to pool_get_from_shared_cache()
- CLEANUP: pools: rename pool_*_{from,to}_cache() to *_local_cache()
- CLEANUP: pools: rename __pool_free() to pool_put_to_shared_cache()
- MINOR: tools: add statistical_prng_range() to get a random number over a range
- MINOR: pools: use cheaper randoms for fault injections
- MINOR: pools: move the fault injector to __pool_alloc()
- MINOR: pools: split the OS-based allocator in two
- MINOR: pools: always use atomic ops to maintain counters
- MINOR: pools: move pool_free_area() out of the lock in the locked version
- MINOR: pools: factor the release code into pool_put_to_os()
- MEDIUM: pools: make CONFIG_HAP_POOLS control both local and shared pools
- MINOR: pools: create unified pool_{get_from,put_to}_cache()
- MINOR: pools: evict excess objects using pool_evict_from_local_cache()
- MEDIUM: pools: make pool_put_to_cache() always call pool_put_to_local_cache()
- CLEANUP: pools: make the local cache allocator fall back to the shared cache
- CLEANUP: pools: merge pool_{get_from,put_to}_local_caches with generic ones
- CLEANUP: pools: uninline pool_put_to_cache()
- CLEANUP: pools: declare dummy pool functions to remove some ifdefs
- BUILD: pools: fix build with DEBUG_FAIL_ALLOC
- BUG/MINOR: server: make srv_alloc_lb() allocate lb_nodes for consistent hash
- CONTRIB: mod_defender: import the minimal number of includes
- CONTRIB: mod_defender: make the code build with the embedded includes
- CONTRIB: modsecurity: import the minimal number of includes
- CONTRIB: modsecurity: make the code build with the embedded includes
- CLEANUP: sample: Improve local variables in sample_conv_json_query
- CLEANUP: sample: Explicitly handle all possible enum values from mjson
- CLEANUP: sample: Use explicit return for successful `json_query`s
- CLEANUP: lists/tree-wide: rename some list operations to avoid some confusion
- CONTRIB: move spoa_example out of the tree
- BUG/MINOR: server: free srv.lb_nodes in free_server
- BUG/MINOR: logs: free logsrv.conf.file on exit
- BUG/MEDIUM: server: ensure thread-safety of server runtime creation
- MINOR: server: add log on dynamic server creation
- MINOR: server: implement delete server cli command
- CONTRIB: move spoa_server out of the tree
- CONTRIB: move modsecurity out of the tree
- BUG/MINOR: server: fix potential null gcc error in delete server
- BUG/MAJOR: mux-h2: Properly detect too large frames when decoding headers
- BUG/MEDIUM: mux-h2: Fix dfl calculation when merging CONTINUATION frames
- BUG/MINOR: uri_normalizer: Use delim parameter when building the sorted query in uri_normalizer_query_sort
- CLEANUP: uri_normalizer: Remove trailing whitespace
- MINOR: uri_normalizer: Add a `strip-dot` normalizer
- CONTRIB: move mod_defender out of the tree
- CLEANUP: contrib: remove the last references to the now dead contrib/ directory
- BUG/MEDIUM: config: fix cpu-map notation with both process and threads
- MINOR: config: add a diag for invalid cpu-map statement
- BUG/MINOR: mworker/init: don't reset nb_oldpids in non-mworker cases
- BUG/MINOR: mworker: don't use oldpids[] anymore for reload
- BUILD: makefile: fix the "make clean" target on strict bourne shells
- IMPORT: slz: import slz into the tree
- BUILD: compression: switch SLZ from out-of-tree to in-tree
- CI: github: do not build libslz any more
- CLEANUP: compression: remove calls to SLZ init functions
- BUG/MEDIUM: mux-h2: Properly handle shutdowns when received with data
- MINOR: cpuset: define a platform-independent cpuset type
- MINOR: cfgparse: use hap_cpuset for parse_cpu_set
- MEDIUM: config: use platform independent type hap_cpuset for cpu-map
- MINOR: thread: implement the detection of forced cpu affinity
- MINOR: cfgparse: support the comma separator on parse_cpu_set
- MEDIUM: cfgparse: detect numa and set affinity if needed
- MINOR: global: add option to disable numa detection
- BUG/MINOR: haproxy: fix compilation on macOS
- BUG/MINOR: cpuset: fix compilation on platform without cpu affinity
- MINOR: time: avoid unneeded updates to now_offset
- MINOR: time: avoid overwriting the same values of global_now
- CLEANUP: time: use __tv_to_ms() in tv_update_date() instead of open-coding
- MINOR: time: avoid u64 needlessly expensive computations for the 32-bit now_ms
- BUG/MINOR: peers: remove useless table check if initial resync is finished
- BUG/MEDIUM: peers: re-work connection to new process during reload.
- BUG/MEDIUM: peers: re-work refcnt on table to protect against flush
- BUG/MEDIUM: config: fix missing initialization in numa_detect_topology()
BUG/MEDIUM: config: fix missing initialization in numa_detect_topology()
The error path of the NUMA topology detection introduced in commit b56a7c89a ("MEDIUM: cfgparse: detect numa and set affinity if needed")
lacks an initialization resulting in possible crashes at boot. No
backport is needed since that was introduced in 2.4-dev.
BUG/MEDIUM: peers: re-work refcnt on table to protect against flush
In proxy.c, when process is stopping we try to flush tables content
using 'stktable_trash_oldest'. A check on a counter "table->syncing" was
made to verify if there is no pending resync in progress.
But using multiple threads this counter can be increased by an other thread
only after some delay, so the content of some tables can be trashed earlier and
won't be pushed to the new process (after reload, some tables appear reset and
others don't).
This patch re-names the counter "table->syncing" to "table->refcnt" and
the counter is increased during configuration parsing (registering a table to
a peer section) to protect tables during runtime and until resync of a new
process has succeeded or failed.
The inc/dec operations are now made using atomic operations
because multiple peer sections could refer to the same table in futur.
This fix addresses github #1216.
This patch should be backported on all branches multi-thread support (v >= 1.8)
BUG/MEDIUM: peers: re-work connection to new process during reload.
The peers task handling the "stopping" could wake up multiple
times in stopping state with WOKEN_SIGNAL: the connection to the
local peer initiated on the first processing was immediatly
shutdown by the next processing of the task and the old process
exits considering it is unable to connect. It results on
empty stick-tables after a reload.
This patch checks the flag 'PEERS_F_DONOTSTOP' to know if the
signal is considered and if remote peers connections shutdown
is already done or if a connection to the local peer must be
established.
This patch should be backported on all supported branches (v >= 1.6)
BUG/MINOR: peers: remove useless table check if initial resync is finished
The old process checked each table resync status even if
the resync process is finished. This behavior had no known impact
except useless processing and was discovered during debugging on
an other issue.
This patch could be backported in all supported branches (v >= 1.6)
but once again, it has no impact except avoid useless processing.
MINOR: time: avoid u64 needlessly expensive computations for the 32-bit now_ms
The compiler cannot guess that tv_sec or tv_usec might have unused
parts, so the multiply by 1000 and the divide by 1000 are both
performed using 64-bit constants to stick to the common type. This is
not needed since we only keep the final 32 bits, let's help the compiler
here by casting these fields to uint. The tv_update_date() code is much
cleaner (48 bytes smaller in the CAS loop) as it avoids some register
spilling at a location where that's really unwanted.
MINOR: time: avoid overwriting the same values of global_now
In tv_update_date(), we calculate the new global date based on the local
one. It's very likely that other threads will end up with the exact same
now_ms date (at 1 million wakeups/s it happens 99.9% of the time), and
even the microsecond was measured to remain unchanged ~70% of the time
with 16 threads, simply because sometimes another thread already updated
a more recent version of it.
In such cases, performing a CAS to the global variable requires a cache
line flush which brings nothing. By checking if they're changed before
writing, we can divide by about 6 the number of writes to the global
variables, hence the overall contention.
In addition, it's worth noting that all threads will want to update at
the same time, so let's place a cpu relax call before trying again, this
will spread attempts apart.
The time adjustment is very rare, even at high pool rates. Tests show
that only 0.2% of tv_update_date() calls require a change of offset. Such
concurrent writes to a shared variable have an important impact on future
loads, so let's only update the variable if it changed.
BUG/MINOR: cpuset: fix compilation on platform without cpu affinity
The compilation is currently broken on platform without USE_CPU_AFFINITY
set. An error has been reported by the cygwin build of the CI.
This does not need to be backported.
In file included from include/haproxy/global-t.h:27,
from include/haproxy/global.h:26,
from include/haproxy/fd.h:33,
from src/ev_poll.c:22:
include/haproxy/cpuset-t.h:32:3: error: #error "No cpuset support implemented on this platform"
32 | # error "No cpuset support implemented on this platform"
| ^~~~~
include/haproxy/cpuset-t.h:37:2: error: unknown type name ‘CPUSET_REPR’
37 | CPUSET_REPR cpuset;
| ^~~~~~~~~~~
make: *** [Makefile:944: src/ev_poll.o] Error 1
make: *** Waiting for unfinished jobs....
In file included from include/haproxy/global-t.h:27,
from include/haproxy/global.h:26,
from include/haproxy/fd.h:33,
from include/haproxy/connection.h:30,
from include/haproxy/ssl_sock.h:27,
from src/ssl_sample.c:30:
include/haproxy/cpuset-t.h:32:3: error: #error "No cpuset support implemented on this platform"
32 | # error "No cpuset support implemented on this platform"
| ^~~~~
include/haproxy/cpuset-t.h:37:2: error: unknown type name ‘CPUSET_REPR’
37 | CPUSET_REPR cpuset;
| ^~~~~~~~~~~
make: *** [Makefile:944: src/ssl_sample.o] Error 1
Fix the warning treated as error on the CI for the macOS compilation :
"src/haproxy.c:2939:23: error: unused variable 'set'
[-Werror,-Wunused-variable]"
Amaury Denoyelle [Fri, 26 Mar 2021 17:50:33 +0000 (18:50 +0100)]
MINOR: global: add option to disable numa detection
Render numa detection optional with a global configuration statement
'no numa-cpu-mapping'. This can be used if the applied affinity of the
algorithm is not optimal. Also complete the documentation with this new
keyword.
Amaury Denoyelle [Fri, 26 Mar 2021 17:20:47 +0000 (18:20 +0100)]
MEDIUM: cfgparse: detect numa and set affinity if needed
On process startup, the CPU topology of the machine is inspected. If a
multi-socket CPU machine is detected, automatically define the process
affinity on the first node with active cpus. This is done to prevent an
impact on the overall performance of the process in case the topology of
the machine is unknown to the user.
This step is not executed in the following condition :
- a non-null nbthread statement is present
- a restrictive 'cpu-map' statement is present
- the process affinity is already restricted, for example via a taskset
call
For the record, benchmarks were executed on a machine with 2 CPUs
Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz. In both clear and ssl
scenario, the performance were sub-optimal without the automatic
rebinding on a single node.
MINOR: cfgparse: support the comma separator on parse_cpu_set
Allow to specify multiple cpu ids/ranges in parse_cpu_set separated by a
comma. This is optional and must be activated by a parameter.
The comma support is disabled for the parsing of the 'cpu-map' config
statement. However, it will be useful to parse files in sysfs when
inspecting the cpus topology for NUMA automatic process binding.
Amaury Denoyelle [Wed, 31 Mar 2021 14:57:39 +0000 (16:57 +0200)]
MINOR: thread: implement the detection of forced cpu affinity
Create a function thread_cpu_mask_forced. Its purpose is to report if a
restrictive cpu mask is active for the current proces, for example due
to a taskset invocation. It is only implemented for the linux platform
currently.
Replace the unsigned long parameter by a hap_cpuset. This allows to
address CPU with index greater than LONGBITS.
This function is used to parse the 'cpu-map' statement. However at the
moment, the result is casted back to a long to store it in the global
structure. The next step is to replace ulong in in cpu_map in the
global structure with hap_cpuset.
MINOR: cpuset: define a platform-independent cpuset type
This module can be used to manipulate a cpu sets in a platform agnostic
way. Use the type cpu_set_t/cpuset_t if available on the platform, or
fallback to unsigned long, which limits de facto the maximum cpu index
to LONGBITS.
BUG/MEDIUM: mux-h2: Properly handle shutdowns when received with data
The H2_CF_RCVD_SHUT flag is used to report a read0 was encountered. It is
used by the H2 mux to properly handle shutdowns. However, this flag is only
set when no data are received. If it is detected at the socket level when
some data are received, it is not handled. And because the event was
reported on the connection, any other read attempts are blocked. In this
case, we are unable to close the connection and release the mux
immediately. We must wait the mux timeout expires.
This patch should fix the issue #1231. It must be backported as far as 2.0.
CLEANUP: compression: remove calls to SLZ init functions
As we now embed the library we don't need to support the older 1.0 API
any more, so we can remove the explicit calls to slz_make_crc_table()
and slz_prepare_dist_table().