Maria Matejka [Thu, 24 Aug 2023 15:00:54 +0000 (17:00 +0200)]
Logging: fixed size logfiles behaving as mmapped ringbuffers
This variant of logging avoids calling write() for every log line,
allowing for waitless logging. This makes heavy logging less heavy
and more useful for race condition debugging.
Maria Matejka [Mon, 21 Aug 2023 16:44:10 +0000 (18:44 +0200)]
Logging now doesn't lock with each message
The original logging routines were locking a common mutex. This led to
massive underperformance and unwanted serialization when heavily logging
due to lock contention. Now the logging is lockless, though still
serializing on write() syscalls to the same filedescriptor.
This change also brings in a persistent logging channel structures and
thus avoids writing into active configuration data structures during
regular run.
Maria Matejka [Mon, 29 May 2023 17:32:26 +0000 (19:32 +0200)]
BFD: Fixed reconfiguration issues
After converting BFD to the new IO loop system, reconfiguration never
really worked. Sadly, we missed this case in our testing suite so it
passed under the radar for quite a while.
Thanks to Andrei Dinu <andrei.dinu@digitalit.ro> for reporting and
isolating this issue.
Maria Matejka [Sat, 13 May 2023 18:33:35 +0000 (20:33 +0200)]
Fixed abort when running in foreground but stdin is closed
A forgotten else-clause caused BIRD to treat some pseudo-random place in
memory as fd-pair. This was happening only on startup of the first
thread in group and the value there in memory was typically zero ... and
writing to stdin succeeded.
When running BIRD with stdin not present (like systemd does), it died on
this spurious write. Now it seems to work correctly.
Thanks to Daniel Suchy <danny@danysek.cz> for reporting.
Maria Matejka [Mon, 8 May 2023 11:09:02 +0000 (13:09 +0200)]
Properly protecting the route src global index by RCU read lock and atomic operations
There was a bug occuring when one thread sought for a src by its global id
and another one was allocating another src with such an ID that it caused
route src global index reallocation. This brief moment of inconsistency
led to a rare use-after-free of the old global index block.
Maria Matejka [Fri, 5 May 2023 07:39:13 +0000 (09:39 +0200)]
Fixed a bug in hot page global storage
The original algorithm was suffering from an ABA race condition:
A: fp = page_stack
B: completely allocates the same page and writes into it some data
A: unsuspecting, loads (invalid) next = fp->next
B: finishes working with the page and returns it back to page_stack
A: compare-exchange page_stack: fp => next succeeds and writes garbage
to page_stack
Fixed this by using an implicit spinlock in hot page allocator.
Maria Matejka [Sun, 30 Apr 2023 20:17:42 +0000 (22:17 +0200)]
First try of loop balancing
If a thread encounters timeout == 0 for poll, it considers itself
"busy" and with some hysteresis it tries to drop loops for others to
pick and thus better distribute work between threads.
Maria Matejka [Fri, 21 Apr 2023 13:26:06 +0000 (15:26 +0200)]
Resource pools are now bound with domains.
Memory allocation is a fragile part of BIRD and we need checking that
everybody is using the resource pools in an appropriate way. To assure
this, all the resource pools are associated with locking domains and
every resource manipulation is thoroughly checked whether the
appropriate locking domain is locked.
With transitive resource manipulation like resource dumping or mass free
operations, domains are locked and unlocked on the go, thus we require
pool domains to have higher order than their parent to allow for this
transitive operations.
Adding pool locking revealed some cases of insecure memory manipulation
and this commit fixes that as well.
Maria Matejka [Thu, 20 Apr 2023 17:33:00 +0000 (19:33 +0200)]
Linpool: Fix lp_restore()
When lp_save() is called on an empty linpool, then some allocation is
done, then lp_restore() is called, the linpool is restored but the used
chunks are inaccessible. Fix it.
Maria Matejka [Wed, 19 Apr 2023 15:52:52 +0000 (17:52 +0200)]
Typed lists keep an explicit pointer to the list head.
This change adds one pointer worth of memory to every list node.
Keeping this information helps auditing the lists, checking that the
node indeed is outside of list or inside the right one.
The typed lists shouldn't be used anywhere with memory pressure anyway,
thus the one added pointer isn't significant.
Maria Matejka [Wed, 5 Apr 2023 19:59:01 +0000 (21:59 +0200)]
BFD: fixed a request pickup race condition
When several BGPs requested a BFD session in short time, chances were
that the second BGP would file a request while the pickup routine was
still running and it would get enqueued into the waiting list instead of
being picked up.
Fixed this by enforcing pickup loop restart when new requests got added,
and also by atomically moving the unpicked requests to a temporary list
to announce admin down before actually being added into the wait list.
Maria Matejka [Sun, 2 Apr 2023 17:15:22 +0000 (19:15 +0200)]
Sockets: Unified API for main and other loops
Now sk_open() requires an explicit IO loop to open the socket in. Also
specific functions for socket RX pause / resume are added to allow for
BGP corking.
And last but not least, socket reloop is now synchronous to resolve
weird cases of the target loop stopping before actually picking up the
relooped socket. Now the caller must ensure that both loops are locked
while relooping, and this way all sockets always have their respective
loop.
Maria Matejka [Fri, 24 Feb 2023 08:13:35 +0000 (09:13 +0100)]
More efficient IO loop event execution to avoid long loops
If there are lots of loops in a single thread and only some of the loops
are actually active, the other loops are now kept aside and not checked
until they actually get some timers, events or active sockets.
This should help with extreme loads like 100k tables and protocols.
Also ping and loop pickup mechanism was allowing subtle race
conditions. Now properly handling collisions between loop ping and pickup.
Maria Matejka [Thu, 9 Mar 2023 15:34:17 +0000 (16:34 +0100)]
Fixed default table configuration
When changing default table behavior, I missed that it enabled to
configure multiple master4 and master6 tables. Now BIRD recognizes it
and fails properly.
Maria Matejka [Wed, 8 Mar 2023 20:38:18 +0000 (21:38 +0100)]
Fixed bad filter re-evaluation with import table if filtered->accepted
The import table feed wasn't resetting the table-specific route values
like REF_FILTERED and thus made the route look like filtered even though
it should have been re-evaluated as accepted.
Maria Matejka [Wed, 29 Mar 2023 16:55:46 +0000 (18:55 +0200)]
BGP: LLGR Staleness optimization dropped.
This brought unnecessary complexity into the decision procedures while the
performance aspects weren't worth it. It just saved one ea_list traversal
when many others are also done.
Ondrej Zajicek [Sun, 19 Feb 2023 02:59:10 +0000 (03:59 +0100)]
Conf: Fix too early free of old configuration
The change 371eb49043d225d2bab8149187b813a14b4b86d2 introduced early free
of old_config. Unfortunately, it did not properly check whether it is not
still in use (blocked by obstacle during reconfiguration). Fix that.
It also means that we still could have a short peak when three configs
are in use (when a new reconfig is requeste while the previous one is
still active).