Maria Matejka [Thu, 18 Sep 2025 16:01:37 +0000 (18:01 +0200)]
BGP: Fixed crash on Notification with a message
Due to wrong locking order, when a peer with an established BGP
session sent a Notification with a custom message, BIRD always
crashed when trying to allocate the memory for that message.
This is a minimal crashfix for stable branches; the development
branch will get a more systematic protocol allocation rework.
Maria Matejka [Mon, 22 Sep 2025 08:37:16 +0000 (10:37 +0200)]
BGP: Fixed invalid memory access in pending TX flush
When BGP is shutting down (or graceful-restarting), it must flush the
pending TX data. In quite rare cases, it may have happened that with the
export table on and shutting down a session with just the right amount
of unsent updates, the flush may have caused a step-down of the prefix
hash in the middle of walking it.
Usually, when downsizing, the prefix of the allocated block is used, but
if the block is large enough, it may have been re-used by another thread
early enough to cause some very unwanted out-of-buffer access.
Igor Putovny [Wed, 11 Jun 2025 15:44:38 +0000 (17:44 +0200)]
Hash: Assert that table is not resized during HASH_WALK
According to measurements of hash_test, hash table with this assertion added
was not found to be significantly slower than without it on average. Therefore
we conclude that this addition would not hamper the performance of HASH_WALK.
Igor Putovny [Wed, 11 Jun 2025 10:00:23 +0000 (12:00 +0200)]
Hash: fix buffer overflow in unit test
This bug manifested itself as segmentation fault of t_insert2_find test when
TEST_ORDER was increased from 13 to 14. When checking the validity of filled
table, the table is iterated from 0 to MAX_NUM. However, when order is an even
number, the size of the table is lower than MAX_NUM (due to table resizing),
which caused reading beyond the allocated memory.
Maria Matejka [Tue, 16 Sep 2025 10:04:21 +0000 (12:04 +0200)]
ROA Aggregator: Fix crash on multiwithdraw
Theoretically, multiple withdraw from the best feed should never happen
but apparently there is an opportunity. We are unable to reproduce that
but it's obvious that with the old code, if the last ROA to remove is at
the end of the list, an undefined memory is checked. If it accidentally
matches (which seems to be pretty rare), BIRD may call memcpy() with
a negative length and subsequently crash on segfault.
Protocol: State announcements must be always processed before leaving the loop
When using PROTO_LOCKED_FROM_MAIN or other birdloop_enter, there may be
deferred state announcements which have to be sent immediately,
otherwise the main loop would try to execute them out of the appropriate
locked context.
Maria Matejka [Thu, 18 Sep 2025 10:43:44 +0000 (12:43 +0200)]
Proto: deferring start from proto_enable
When the enable command is issued from CLI, we actually do not need
to enable the protocol right away, it's enough to run the rethink goal
function later from a deferred context. This allows us to change the
protocol's loop safely.
Nest: Function aspa_check() should return ASPA_INVALID for paths containing AS_SET
The aspa_check() uses as_path_getlen() to estimate the size of a buffer,
which does not work for AS_SET segments, because as_path_getlen() returns
length 1 for them regardless of their length. This may cause buffer
overflow and crash.
As AS_SET segments are not valid for ASPA verification, we can just
handle them explicitly. See https://datatracker.ietf.org/doc/html/draft-ietf-sidrops-aspa-verification#section-6
Co-Authored-By: Alarig <alarig@swordarmor.fr>
Minor changes by committer.
Maria Matejka [Thu, 19 Jun 2025 19:12:52 +0000 (21:12 +0200)]
CI: Limit log bloating for netlab runners
There is no useful configurable file size limit for netlab, allowing
the job to eat up all the disk. Thus we limit it directly in the script
by setting ulimit -f to 1G.
Maria Matejka [Thu, 12 Jun 2025 09:52:32 +0000 (11:52 +0200)]
CI: Fix netlab failure artifact collection
When implementing artifact collection for netlab,
I forgot that the coredumps and logs are generated outside the
repository. Moving the artifacts to the right place for collection.
Maria Matejka [Sat, 24 May 2025 12:16:44 +0000 (14:16 +0200)]
Table: Do not attempt to prune an empty table
Table pruning is requested from multiple places, including
a timer. Sometimes this may happen in a race condition with
table shutdown, and therefore we explicitly refuse to schedule
pruning if the table is empty.
Also added a full-blown check that a shutting-down table
is indeed empty, and several more asserts to catch imminent crashes
before they happen in hard-to-debug places.
Maria Matejka [Fri, 23 May 2025 18:40:22 +0000 (20:40 +0200)]
Shutdown: Do not export routes
When shutting down, the tables start flushing routes before all protocols
are even officially aware of the shutdown. This fix allows for a fast check
whether the shutdown is already running or not, and if so, all exports are
ignored instead of processing them.
One notable exception is the kernel protocol which needs to process all
the exports normally to actually withdraw the routes.
This is a hotfix for #251 and #252. Proper fix will require protocol
state machine refactoring.
Maria Matejka [Fri, 23 May 2025 17:17:53 +0000 (19:17 +0200)]
Table: fix a race condition in export
The race condition happens as follows:
- channel A starts feeding
- channel B imports a route ahead of the feeding pointer
- channel A exports this route and continues feeding from the pointer
- no other import hits this specific prefix
- there is at least one channel C which has not cleared this export
- channel A computes ecnt=0 for this prefix because all exports
have been already cleared
- the condition e >= ecnt mistakenly triggers retry
If the birdloops involved get assigned to the same thread, this race
condition then can't recover and the thread is stuck in an infinite
loop.
Fixed the race condition by moving the consistency check after actually
checking eligibility of the export, not before.
Maria Matejka [Wed, 21 May 2025 11:11:10 +0000 (13:11 +0200)]
ASPA: fix the table subscription
With the partial reloads of ROA, the ASPA reload must be different.
When merging ASPA from 2.16, we forgot about that, and the ROA digesting
procedures simply do not work with ASPA, so we instead re-add the settle
timer back.
Maria Matejka [Wed, 21 May 2025 11:53:35 +0000 (13:53 +0200)]
ASPA: fix aspa_check_upstream and aspa_check_downstream parse crash
Due to internal differences between BIRD 2 and 3, there is
FI_CURRENT_ROUTE in BIRD 3 and `val.rte == NULL` is not supported
as a shortcut for fetching the route object.
When merging, we forgot to fix this difference in the BGP config parser,
thus the constant folder failed to recognize that this is not a constant
expression, leading to accidentally dereference the NULL route pointer
in config parse time.
Maria Matejka [Wed, 21 May 2025 09:06:01 +0000 (11:06 +0200)]
BGP: fix warnings for roa_check() with no import table and route refresh on
When roa_check() appears in import filters, the autoreload is switched
on but prohibited for BGP if no import table is available. By fixing the
route refresh feature in 9edc421148fe9c5e7d6038b667ba8fafb587a1eb, we
inadvertently exposed another bug where Nest wasn't distinguishing
between locally and remotely available reload.
Whereas on manual reconfiguration, the route refresh is expected to be
invoked, the autoreload should not trigger any remote actions. Therefore
now the channels actualy indicate whether their reload hook triggers
remote actions or not. This information can be then used to decide
whether to allow autoreload or not.
Maria Matejka [Wed, 21 May 2025 08:15:49 +0000 (10:15 +0200)]
All events are required to have their hook
This is quite an obvious requirement but it wasn't enforced properly,
leading to unnecessarily tedious debugging when an uninitialized event
is improperly enqueued.
Now BIRD is going to crash as soon as somebody tries to enqueue such
an event, not when trying to execute nothing from a clean context.
When recycling an existing configuration (configure undo), the found_old
flags were already set. Due to this oversight, log files failed to
reopen on configure undo.
xtex [Fri, 9 May 2025 11:14:50 +0000 (07:14 -0400)]
Build: call tools/version with sh
As tools/version has a shebang line, it should be
fine to just call it without specifying bash.
Calling bash explicitly may lead to inconsistency,
as the first line of tools/version indicates /usr/bin/sh
but the script is always executed with bash.
And, it adds bash as a new build dependency.
Maria Matejka [Thu, 8 May 2025 21:03:57 +0000 (23:03 +0200)]
BGP: Fix repeated route refresh request
The previous approach was crashing on rapid successions of route refreshs
without even completing the previous ones. Now the newly requested refreshs
just queue and don't start multiple refreshs over and over again.
Igor Putovny [Mon, 5 May 2025 13:47:10 +0000 (15:47 +0200)]
BGP: Fix route refresh behavior
On import filter reconfiguration, the route refresh capability is now
honored and used instead of restarting the session.
On export filter reconfiguration, the enhanced route refresh capability
is used to indicate BoRR and EoRR, unless the export table is on. In
such cases, only relevant changes are sent as proper updates.
When route refresh request is received, the enhanced route refresh
capability is now honored.
Maria Matejka [Wed, 7 May 2025 13:02:24 +0000 (15:02 +0200)]
BGP: Fix crash on too long export
When BGP route is short enough to be accepted but too long after local
changes, it is converted to withdraw. In these cases though, there was
a dangling pointer left from the prefix structure to the attribute
bucket.
That pointer is now pointing at the right place after the bucket gets
converted to withdraw.
Thanks to ix.br for catching and reporting this issue.
and not only is obviously not relevant to that commit, it is also
obviously wrong because exactly the same nullification has been pushed
later to rt_notify_basic().
Maria Matejka [Sat, 19 Apr 2025 18:11:02 +0000 (20:11 +0200)]
CI: Tasks generated from a data file and template
Instead of directly editing .gitlab-ci.yml, the pipeline definition is
generated by a script from a much shorter file defining the required
tests, and templated.
Maria Matejka [Mon, 10 Mar 2025 19:20:32 +0000 (20:20 +0100)]
Table export: ignoring invalid routes before marking them in export maps
Fast subsequent updates on filtered routes made the code crash because
no flags were set while ignoring them. And if these routes flapped, the
squashed export update crashed on a consistency check.
We ignore them unconditionally so we don't have to mark them at all and
we can convert them to NULL even before export maps are touched.
cli/commands: Help for multiple word command did not show properly.
Possible commands are stored as keywords, each keyword has its own structure.
The last acceptable keyword structure contains string with hint. But when the hint was printed only direct child
of the base keyword was considered. If it was multi keyword command, the first child did not carry any hint to print,
so it was ignored.
Now, if we don't find a hint in a child, we recursively search in grandchildren.
Jana Babovakova [Wed, 2 Apr 2025 11:44:02 +0000 (13:44 +0200)]
Doc: Minor corrections in README and INSTALL
- Licence to License - also in code comments.
- copyright date until now.
- updated license text from gnu.org
- added command to build bird documentation.
Maria Matejka [Tue, 17 Dec 2024 11:38:12 +0000 (12:38 +0100)]
CI: fix test collisions between branches (again)
The build-netlab job was side-effecting the test-* jobs,
and if for some reason Gitlab scheduled build-netlab before
other pipeline's test-* jobs finished, these jobs got a wrong
binary, possibly failing. Solved by using explicit artifacts, which is
not the fastest way to do this (we could keep the binaries named there)
but it's the gitlab-right way to do this.
(Re-committed, was accidentally removed by one of previous commits)