Maria Matejka [Sat, 24 May 2025 12:16:44 +0000 (14:16 +0200)]
Table: Do not attempt to prune an empty table
Table pruning is requested from multiple places, including
a timer. Sometimes this may happen in a race condition with
table shutdown, and therefore we explicitly refuse to schedule
pruning if the table is empty.
Also added a full-blown check that a shutting-down table
is indeed empty, and several more asserts to catch imminent crashes
before they happen in hard-to-debug places.
Maria Matejka [Fri, 23 May 2025 18:40:22 +0000 (20:40 +0200)]
Shutdown: Do not export routes
When shutting down, the tables start flushing routes before all protocols
are even officially aware of the shutdown. This fix allows for a fast check
whether the shutdown is already running or not, and if so, all exports are
ignored instead of processing them.
One notable exception is the kernel protocol which needs to process all
the exports normally to actually withdraw the routes.
This is a hotfix for #251 and #252. Proper fix will require protocol
state machine refactoring.
Maria Matejka [Fri, 23 May 2025 17:17:53 +0000 (19:17 +0200)]
Table: fix a race condition in export
The race condition happens as follows:
- channel A starts feeding
- channel B imports a route ahead of the feeding pointer
- channel A exports this route and continues feeding from the pointer
- no other import hits this specific prefix
- there is at least one channel C which has not cleared this export
- channel A computes ecnt=0 for this prefix because all exports
have been already cleared
- the condition e >= ecnt mistakenly triggers retry
If the birdloops involved get assigned to the same thread, this race
condition then can't recover and the thread is stuck in an infinite
loop.
Fixed the race condition by moving the consistency check after actually
checking eligibility of the export, not before.
Maria Matejka [Wed, 21 May 2025 11:11:10 +0000 (13:11 +0200)]
ASPA: fix the table subscription
With the partial reloads of ROA, the ASPA reload must be different.
When merging ASPA from 2.16, we forgot about that, and the ROA digesting
procedures simply do not work with ASPA, so we instead re-add the settle
timer back.
Maria Matejka [Wed, 21 May 2025 11:53:35 +0000 (13:53 +0200)]
ASPA: fix aspa_check_upstream and aspa_check_downstream parse crash
Due to internal differences between BIRD 2 and 3, there is
FI_CURRENT_ROUTE in BIRD 3 and `val.rte == NULL` is not supported
as a shortcut for fetching the route object.
When merging, we forgot to fix this difference in the BGP config parser,
thus the constant folder failed to recognize that this is not a constant
expression, leading to accidentally dereference the NULL route pointer
in config parse time.
Maria Matejka [Wed, 21 May 2025 09:06:01 +0000 (11:06 +0200)]
BGP: fix warnings for roa_check() with no import table and route refresh on
When roa_check() appears in import filters, the autoreload is switched
on but prohibited for BGP if no import table is available. By fixing the
route refresh feature in 9edc421148fe9c5e7d6038b667ba8fafb587a1eb, we
inadvertently exposed another bug where Nest wasn't distinguishing
between locally and remotely available reload.
Whereas on manual reconfiguration, the route refresh is expected to be
invoked, the autoreload should not trigger any remote actions. Therefore
now the channels actualy indicate whether their reload hook triggers
remote actions or not. This information can be then used to decide
whether to allow autoreload or not.
Maria Matejka [Wed, 21 May 2025 08:15:49 +0000 (10:15 +0200)]
All events are required to have their hook
This is quite an obvious requirement but it wasn't enforced properly,
leading to unnecessarily tedious debugging when an uninitialized event
is improperly enqueued.
Now BIRD is going to crash as soon as somebody tries to enqueue such
an event, not when trying to execute nothing from a clean context.
When recycling an existing configuration (configure undo), the found_old
flags were already set. Due to this oversight, log files failed to
reopen on configure undo.
Maria Matejka [Thu, 8 May 2025 21:03:57 +0000 (23:03 +0200)]
BGP: Fix repeated route refresh request
The previous approach was crashing on rapid successions of route refreshs
without even completing the previous ones. Now the newly requested refreshs
just queue and don't start multiple refreshs over and over again.
Igor Putovny [Mon, 5 May 2025 13:47:10 +0000 (15:47 +0200)]
BGP: Fix route refresh behavior
On import filter reconfiguration, the route refresh capability is now
honored and used instead of restarting the session.
On export filter reconfiguration, the enhanced route refresh capability
is used to indicate BoRR and EoRR, unless the export table is on. In
such cases, only relevant changes are sent as proper updates.
When route refresh request is received, the enhanced route refresh
capability is now honored.
Maria Matejka [Sun, 19 Jan 2025 00:06:24 +0000 (01:06 +0100)]
Doc: building singlepage version
Some minor changes were done in the original documentation to allow for
easier conversion, and also to make the documentation a little bit more
strictly valid.
This change caters for the new website automation and allows for future
online display of documentation even for development versions.
Maria Matejka [Wed, 7 May 2025 13:02:24 +0000 (15:02 +0200)]
BGP: Fix crash on too long export
When BGP route is short enough to be accepted but too long after local
changes, it is converted to withdraw. In these cases though, there was
a dangling pointer left from the prefix structure to the attribute
bucket.
That pointer is now pointing at the right place after the bucket gets
converted to withdraw.
Thanks to ix.br for catching and reporting this issue.
Maria Matejka [Tue, 17 Dec 2024 11:38:12 +0000 (12:38 +0100)]
CI: fix test collisions between branches
The build-netlab job was side-effecting the test-* jobs,
and if for some reason Gitlab scheduled build-netlab before
other pipeline's test-* jobs finished, these jobs got a wrong
binary, possibly failing. Solved by using explicit artifacts, which is
not the fastest way to do this (we could keep the binaries named there)
but it's the gitlab-right way to do this.
Ondrej Zajicek [Thu, 27 Mar 2025 16:43:56 +0000 (17:43 +0100)]
BFD: Fix crash related to reconfiguration and passwords
Any change in BFD iface configuration should trigger session
reconfiguration, as config is copied into the bfd_session structure
and not just accessed through the bfd_iface structure.
As bfd_session now contains a pointer to the password list allocated
from the configuration, forgetting to update the bfd_session causes
use-after-free.
Maria Matejka [Mon, 10 Mar 2025 19:20:32 +0000 (20:20 +0100)]
Table export: ignoring invalid routes before marking them in export maps
Fast subsequent updates on filtered routes made the code crash because
no flags were set while ignoring them. And if these routes flapped, the
squashed export update crashed on a consistency check.
We ignore them unconditionally so we don't have to mark them at all and
we can convert them to NULL even before export maps are touched.
Maria Matejka [Mon, 10 Mar 2025 08:15:53 +0000 (09:15 +0100)]
Table export: Relaxing too strict inconsistency assert
In case of refeeds, we may get old routes which we have not seen,
the table does not know that and the channel ingress is the right place
to detect it.
Maria Matejka [Mon, 3 Mar 2025 18:48:58 +0000 (19:48 +0100)]
Table export: Another inconsistency in refeeds
When a route has been already sent to the channel and the refeed
runs because of a filter change or just because requested, the
old and new routes are the same which was actually not anticipated
by rt_notify_basic().
Commit 69d1ffde4c72 ("Split route data structure to storage (ro) /
manipulation (rw) structures.") changed rte->net from a pointer to a
'struct network' to a 'struct net_addr', but kept the address-of (&)
operator before casting to 'net_addr_ip6_sadr *' when sending a
source-specific route to the kernel.
Because 'struct network' had an embedded struct member (struct
fib_node), the address-of was needed to get back to a pointer to the
data, but with the change in the commit mentioned above, e->net is now a
straight pointer to the address.
The bug meant that the source prefixes passed to the kernel were
essentially garbage, leading to routes in the kernel like:
default from b74:9e05:0:1:d8cf:c000::/86 via fe80::1 dev eth0 proto bird metric 32 pref medium
Fix this by getting rid of the address-of operator.
Note by commiter: used our TYPE_CAST macro instead of plain typecast
to avoid this kind of problem in future.
Fixes: 69d1ffde4c72 ("Split route data structure to storage (ro) / manipulation (rw) structures.") Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: Maria Matejka <mq@jmq.cz>
Table export: fixed inconsistency in export_rejected_map
When updates arrived in such an order that the first one was rejected and
the second one got accepted, the export_rejected_map flag mistakenly
stayed set, leaking the route ID.
In the RA_OPTIMAL channel mode, there are consistency checks that at
most one route for a net has been accepted or rejected. After some time,
the leaked ID and bit in export_rejected_map caused spurious crashes in
asserts later in channel_notify_basic().
Thanks to NIX-CZ and Maiyun Zhang for reporting this.
Maria Matejka [Wed, 12 Feb 2025 20:29:10 +0000 (21:29 +0100)]
BGP export table src fix
When exchanging routes in BGP export table, we forgot to update
the src in cases of add path off. This led to falsely claiming another
origin of that route in export table dump and also holding protocols
in the flush state because of their srcs being kept in the export tables.
Maria Matejka [Mon, 10 Feb 2025 11:29:51 +0000 (12:29 +0100)]
Fix channel restart sequence
If channel goes start -> pause -> start, the original code crashed
but it's a valid sequence for protocol half-restart, going from UP
to START and then back UP.
Maria Matejka [Mon, 13 Jan 2025 21:15:52 +0000 (22:15 +0100)]
BGP: fix shutdown crash when dynamic peer is just connected
In some edge cases, the dynamic BGP starts but doesn't yet pick up
the socket from the peer, when it gets shut down, typically on
a complete shutdown. Fixing this to just close the socket, not assert
it being already picked up.
Ondrej Zajicek [Thu, 9 Jan 2025 15:44:51 +0000 (16:44 +0100)]
lib: Unify alignment of allocators
Different internal allocators (memory blocks, linpools, and slabs) used
different way to compute alignment. Unify it to use alignment based on
standard max_align_t type.
On x86_64, this does not change alignment of memory blocks and linpools
(both old and new is 16), but it increases alignment of slabs from 8 to
16.
Maria Matejka [Wed, 8 Jan 2025 19:22:21 +0000 (20:22 +0100)]
Table: more best route refeed fixes
Best route refeed is tricky. The journal may include repeatedly the same
route in the old and/or in the new position in case of flaps. We don't
like checking that fully in the RCU critical section which is already
way too long, thus we filter out the repeated occurence of the current
best route while keeping possibly more old routes.
We also don't want to send spurious withdraws, and we need to check that
only one notification per net is sent for RA_OPTIMAL.
There was also missing a rejected map update in case of idempotent
squashed update, and last but not least, the best route journal should
not include invalid routes (import keep filtered).
Maria Matejka [Tue, 24 Dec 2024 15:16:55 +0000 (16:16 +0100)]
Allocate the normalization buckets on stack
Even though allocating from tmp_linpool is quite cheap,
it isn't cheap when the block is larger than a page, which is the case here.
Instead, we now allocate just the result which typically fits in a page,
avoiding a necessity of a malloc().
Maria Matejka [Tue, 24 Dec 2024 11:18:39 +0000 (12:18 +0100)]
Stonehenge: multi-slab allocator
To mid-term allocate and free lots of small blocks in a fast pace,
mb_alloc is too slow and causes heap bloating. We can already allocate
blocks from slabs, and if we allow for a little bit of inefficiency,
we can just use multiple slabs with stepped sizes.
This technique is already used in ea_list allocation which is gonna be
converted to Stonehenge.
Maria Matejka [Mon, 23 Dec 2024 10:58:05 +0000 (11:58 +0100)]
Kernel: feed only once during startup
There was an inefficiency in the initial scan state machine,
causing routes to be fed several times instead of just once.
Now the export startup is postponed until first krt_scan()
finishes and we actually can do the pruning with full information.
Ondrej Zajicek [Tue, 17 Dec 2024 08:00:42 +0000 (09:00 +0100)]
Nest: Fix handling of 64-bit rte_src.private_id
The commit 21213be523baa7f2cbf0feaa617f265c55e9b17a expanded private_id
in route source to u64, but forgot to modify function arguments, so it
was still cropped at 32-bit, which may cause some collisions for L3VPN.
This patch fixes that.
Ondrej Zajicek [Thu, 12 Dec 2024 03:04:07 +0000 (04:04 +0100)]
Netlink: Handle onlink flag on BSD-Netlink
On BSD, the onlink flag is not tracked or reported by kernel. We are
using an heuristic that assigns the onlink flag to routes scanned from
the kernel. We should use the same heuristic even in BSD-Netlink
case, as the onlink flag is not reported here too.
Fabian Bläse [Tue, 10 Dec 2024 01:14:06 +0000 (02:14 +0100)]
Babel: fix seqno wrapping on seqno request
The Babel seqno wraps around when reaching its maximum value (UINT16_MAX).
When comparing seqnos, this has to be taken into account. Therefore,
plain number comparisons do not work.
Maria Matejka [Sat, 21 Dec 2024 18:02:22 +0000 (19:02 +0100)]
BFD: Fix session reconfiguration locking order
The sessions have to be updated asynchronously to avoid
cross-locking between protocols.
Testsuite: cf-ibgp-bfd-switch, cf-ibgp-multi-bfd-auth Fixes: #139
Thanks to Daniel Suchy <danny@danysek.cz> for reporting:
https://trubka.network.cz/pipermail/bird-users/2024-December/017984.html
Maria Matejka [Fri, 20 Dec 2024 10:28:00 +0000 (11:28 +0100)]
BGP: fix locking order error on dynamic protocol spawn
We missed that the protocol spawner violates the prescribed
locking order. When the rtable level is locked, no new protocol can be
started, thus we need to:
* create the protocol from a clean mainloop context
* in protocol start hook, take the socket
Testsuite: cf-bgp-autopeer Fixes: #136
Thanks to Job Snijders <job@fastly.com> for reporting:
https://trubka.network.cz/pipermail/bird-users/2024-December/017980.html
Maria Matejka [Thu, 19 Dec 2024 11:28:27 +0000 (12:28 +0100)]
Kernel: when channel traces, we have to trace the final result
Otherwise it looks like we are sending too much traffic to netlink
every other while, which is not true. Now we can disambiguate between
in-kernel updates and ignored routes.
Maria Matejka [Thu, 19 Dec 2024 10:54:05 +0000 (11:54 +0100)]
Table: not feeding twice, once is enough
If there is no feed pending, the requested one should be
activated immediately, otherwise it is activated only after
the full run, effectively running first a full feed and
then the requested one.