Maria Matejka [Sun, 15 Dec 2024 20:04:22 +0000 (21:04 +0100)]
Table prune inhibited during reconfiguration
When many changes are done during reconfiguration, the table may
start pruning old routes before everything is settled down, slowing
down not only the reconfiguration, but also the shutdown process.
Maria Matejka [Sat, 14 Dec 2024 22:21:07 +0000 (23:21 +0100)]
Disable multiple malloc arenas
In our usecase, these are impossibly greedy because we often
free memory in a different thread than where we allocate, forcing
the default allocator to scatter the used memory all over the place.
There was a suspicion that maybe the BIRD 3 version of ROA gets the
digesting wrong. This test covers the nastiest cornercases we could
think about, so now we can expect it to be right.
We have quite large critical sections and we need to allocate inside
them. This is something to revise properly later on, yet for now,
instead of slowly but surely growing the virtual memory address space,
it's better to optimize the cold page cache pickup and count situations
where this happened inside the critical section.
Lockfree journal: Cleanup hook runs only when needed.
The function lfjour_cleanup_hook() was scheduled each time any of the
journal recipients reached end of a block of journal items or read all
of journal items. Because lfjour_cleanup_hook() can clean only journal
items every recipient has processed, it was often called uselessly.
This commit restricts most of the unuseful scheduling. Only some
recipients are given a token alowing them to try to schedule the
cleanup hook. When a recipient wants to schedule the cleanup hook, it
checks whether it has a token. If yes, it decrements number of tokens
the journal has given (issued_tokens) and discards its own token. If
issued_tokens reaches zero, the recipient is allowed to schedule the
cleanup hook.
There is a maximum number of tokens a journal can give to its recipients
(max_tokens). A new recipient is given a token in its init, unless the
maximum number of tokens is reached. The rest of tokens is given to
customers in lfjour_cleanup_hook().
In the cleanup hook, the issued_tokens number is increased in order to
avoid calling the hook before it finishes. Then, tokens are given to the
slowest recipients (but never to more than max_token recipients). Before
leaving lfjour_cleanup_hook(), the issued_tokens number is decreased back.
If no other tokens are given, we have to make sure the
lfjour_cleanup_hook will be called again. If every item in journal was
read by every recipient, tokens are given to random recipients. If all
recipients with tokens managed to finish until now, we give the token to
the first unfinished customer we find, or we just call the hook again.
Ondrej Zajicek [Tue, 3 Dec 2024 00:19:44 +0000 (01:19 +0100)]
RPKI: Fix several errors in handling of Error PDU
Fix several errors including:
- Unaligned memory access to 'Length of Error Text' field
- No validation of 'Length of Encapsulated PDU' field
- No validation of 'Error Code' field
- No validation of characters in diagnostic message
Maria Matejka [Thu, 14 Nov 2024 19:43:35 +0000 (20:43 +0100)]
CLI: Dumping internal data structures to files, not to debug output
All the 'dump something' CLI commands now have a new mandatory
argument -- name of the file where to dump the data. This allows
for more flexible dumping even for production deployments where
the debug output is by default off.
Also the dump commands are now restricted (they weren't before)
to assure that only the appropriate users can run these time consuming
commands.
Maria Matejka [Thu, 14 Nov 2024 22:34:28 +0000 (23:34 +0100)]
Printf: impossible buffer overflow fix
When printing near the end of the buffer, there was an overflow in two cases:
(1) %c and size is zero
(2) %1N, %1I, %1I4, %1I6 (auto-fill field_width for Net or IP), size is
more than actual length of the net/ip but less than the auto-filled
field width.
Manual code examination showed that nothing could have ever triggered
this behavior. All older versions of BIRD, including BIRD 3 development
versions, are totally safe. This exact overflow has been found while
implementing a new feature in later commits.
Ondrej Zajicek [Sun, 1 Dec 2024 23:00:36 +0000 (00:00 +0100)]
BMP: Refactor route monitor message serialization
Instead of several levels of functions, just have two functions
(one for routes, the other for end-of-rib), this allows to create
messages in a simple linear fashion.
Also reduce three duplicite functions to construct BGP header for
BMP messages to just one.
Maria Matejka [Tue, 17 Sep 2024 14:27:54 +0000 (16:27 +0200)]
BMP: simplified update queuing and better memory performance
This commit is quite a substantial rework of the underlying layers in
BMP TX:
- several unnecessary layers of indirection dropped, including most of
the original BMP's buffer machinery
- all messages are now written directly into one protocol's buffer
allocated for the whole time big enough to fit every possible message
- output blocks are allocated by pages and immediately returned when
used, improving the overall memory footprint
- no intermediary allocation is done from the heap altogether
- there is a documented and configurable limit on the TX queue size
Maria Matejka [Thu, 28 Nov 2024 13:10:40 +0000 (14:10 +0100)]
RPKI: refactored pdu to host byte order conversion
We shouldn't convert bytes 2 and 3 of the PDU blindly, there are several
cases where these are used by bytes. Instead, the conversion is done
only where needed.
This fixes misinterpretation bug of ASPA PDU flags on little endian
architectures.