Maria Matejka [Thu, 25 Dec 2025 17:19:48 +0000 (18:19 +0100)]
CI: Packaging cleanup
With the removal of APKG, we don't need to split out "-legacy" DEB with
old python, and "-wa" RPM with an obscure sed applied to specfile which
never actually did anything since added in 2021.
Maria Matejka [Sat, 6 Dec 2025 21:43:54 +0000 (22:43 +0100)]
CI: No more APKG in packaging
We have had minor and subtle but repeating problems with the APKG
dependency chain and its overall usability. It has become apparent
that we actually don't need that kind of abstraction layer because
all our problems are actually solvable by just a bunch of short scripts.
With that, we are now using the (standard) dpkg-buildpackage and
rpmbuild tools directly from bash script.
From now on, with just several exceptions, all our distribution builds
should be fully reproducible.
Maria Matejka [Mon, 22 Dec 2025 21:55:22 +0000 (22:55 +0100)]
Source package and documentation builds are now reproducible
We now explicitly set the PDF build datetime to commit datetime, and we
also clean all the file metadata in the TGZ archives, so that the
generated archives are now bit-identical.
Maria Matejka [Mon, 22 Dec 2025 22:27:27 +0000 (23:27 +0100)]
Lib: Fix comments so that progdoc is deterministic
For some weird reason, the old Perl code behaves non-deterministically on
@foo() and there is no clear explanation why. The snails should not be there
anyway so removing them.
Maria Matejka [Thu, 18 Dec 2025 11:39:42 +0000 (12:39 +0100)]
IGP metric: Split out local_metric again
When removing BIRD 2 RTA, I joined rta->igp_metric with
ea_gen_igp_metric, while these two attributes actually have different
semantics. Reintroducing the distinction but with the first one
named local_metric.
This join caused quite some confusion as the igp_metric attribute set in
import filters gets rewritten by recursive nexthop resolution in BIRD 3
up to 3.1.x. Now the igp_metric stays intact and the local_metric is
the original rta->igp_metric which always meant "cost of the local part
of the whole route".
The local_metric attribute also represents the "interior cost" as used
in RFC 4271. When using BGP as an underlay / IGP, one should explicitly
set the igp_metric in the import filter, while the local_metric will be
updated by next hop resolver independently.
Ondrej Zajicek [Thu, 27 Nov 2025 16:59:44 +0000 (17:59 +0100)]
RAdv: Fix flags for deprecated prefixes
When a prefix is deprecated (valid_lifetime == 0), it should be
announced with the same flags as before. The old code announced it
without any flags, which leads to being ignored by recipients.
Note that a prefix could be depreacted for two reason - it is removed
from the interface, or it is deconfigured in BIRD configuration.
The only use of the spinlock in `struct bfd_proto` was removed in 38acb415f
and now it should be fine to remove it.
The check for `pthread_spin*` in `aclocal.m4` is also removed
accordingly. This makes it possible to port BFD support to platforms
without spinlocks (e.g. Darwin)
Maria Matejka [Tue, 25 Nov 2025 17:22:26 +0000 (18:22 +0100)]
MRT: Fix crash when protocol mrtdump configured without a specified file
If somebody configured mrtdump option in the protocol block and had no
mrtdump file configured on toplevel, BIRD 3 would crash on first MRT
message to be written after that.
Fixed by passing the rfile structure through up to the actual place
where mrt_dump_message() checks whether anything is open. Theoretically
there could be a smaller fix involving just adding null-checks to
mrt_dump_bgp_message() and mrt_dump_bgp_state_change(), but this is more
future-proof.
Maria Matejka [Thu, 20 Nov 2025 11:03:02 +0000 (12:03 +0100)]
Filter: fix undefined memory access in nexthop handling
If nexthop is special (blackhole, unreachable, prohibit), directly
asking for anything else from there accessed undefined memory. This
manifested with checking for interface name where it actually
dereferenced a pointer from there, causing a crash.
In some cases, reading ifindex might have also sometimes caused a crash.
The other data (weight, gw, gw_mpls, onlink) is stored directly inside
the nexthop EA, and even though it's shorter, in most cases it "just"
returned garbage. The only exception would be probably if the
unreachable nexthop EA was allocated right at the end of tmp_pool or
locally on stack.
Fixing this by properly checking for reachability before reading the
nexthop data.
Reported-By: Lars Gierth <larsg@systemli.org> Reproduced-By: David Petera <david.petera@nic.cz> Fixes: #313
Maria Matejka [Sat, 22 Nov 2025 22:23:28 +0000 (23:23 +0100)]
Tools: Release initialization script
There is a bunch of things one needs to do in gitlab when releasing and
this script simply checks whether there is everything needed and fixes
what is missing,
Maria Matejka [Sat, 22 Nov 2025 22:23:28 +0000 (23:23 +0100)]
Tools: Release initialization script
There is a bunch of things one needs to do in gitlab when releasing and
this script simply checks whether there is everything needed and fixes
what is missing,
Maria Matejka [Fri, 21 Nov 2025 13:08:46 +0000 (14:08 +0100)]
MRT: Fix dumps with layered attibutes
When route is modified during the import and the import table is on,
it's stored with layered attributes, the bottom layer containing the
original data and over that there are one or more layers containing the
changes.
Dumping into MRT files calls bgp_encode_attr() on that attribute list,
and that function requires attributes to be normalized, i.e. squashed to
one layer which can be then walked. This was mistakenly done only on
uncached attribute lists, while in reality it should be done always
because of the layering.
This caused MRT to crash in case when the import table is on and the
import filter unsets certain BGP attributes, specifically any attribute
which is longer than int, e.g. bgp_path, bgp_cluster_list,
bgp_community or bgp_next_hop. That creates a pseudo-attribute in the
top layer with the undef flag set, and bgp_encode_attr() then crashes on
unexpected NULL pointer dereference.
Fixed by requiring the attributes to be normalized always.
David Petera [Fri, 14 Nov 2025 13:07:38 +0000 (14:07 +0100)]
CI: fix broken debian-11-i386 packaging
The error occured when trying to install 'markupsafe' v3.0.3 python package (dependency of 'apkg') on debian-11-i386 arch.
Fixed by preinstalling older version of 'markupsafe' package before the installation of 'apkg'.
Since the behavior is added to 'pkg-deb-legacy' it also effects packaging of ubuntu-18.04-amd64.
Also debian-11-amd64 is moved to 'pkg-deb-legacy' together with effected debian-11-i386 just for code clarity.
Maria Matejka [Fri, 14 Nov 2025 19:11:59 +0000 (20:11 +0100)]
Channel: Fix race condition of a new feed with an update in the table
In case an update arrived while another thread feeding anew, there was
a race condition causing the old route to be counted against the export limit
despite never reached.
In long run, this caused limit counter underflow and a hard assertion failure.
David Petera [Fri, 14 Nov 2025 13:07:38 +0000 (14:07 +0100)]
CI: fix broken debian-11-i386 packaging
The error occured when trying to install 'markupsafe' v3.0.3 python package (dependency of 'apkg') on debian-11-i386 arch.
Fixed by preinstalling older version of 'markupsafe' package before the installation of 'apkg'.
Since the behavior is added to 'pkg-deb-legacy' it also effects packaging of ubuntu-18.04-amd64.
Also debian-11-amd64 is moved to 'pkg-deb-legacy' together with effected debian-11-i386 just for code clarity.
Maria Matejka [Sat, 11 Oct 2025 18:16:57 +0000 (20:16 +0200)]
BGP: Decoupling of listen sockets
For performance reasons, every listening socket now gets its own loop so
that with strict bind one can accept connections by more threads than
just one.
Locking: Add another locking level between service and protocol
The BGP protocol needs a domain accessed from the protocols but
launching common services. This could have been done, theoretically,
by abusing the rtable and attrs levels, but that would require
having a loop on that level which we'd like to not do.
BGP: Sending the incoming socket to the receiving protocol by callback
Instead of processing the incoming sockets directly in the main loop,
this can be done in the protocol's loop without having to enter and exit
that loop in a complicated way.
BGP: Storing the protocol accept-matching data in the request
The common TCP accept routine should not touch the protocol structure
from outside, and it should get all the relevant information directly in
the listen request.
BGP: Forward-port better solution of listening socket requests
In master (upcoming v2.18), the listening socket creation has been
resolved better than in current v3, and thus we forward-port that
solution from the mq-bgp-multilisten branch before actually merging
master.
Maria Matejka [Thu, 3 Jul 2025 15:16:14 +0000 (17:16 +0200)]
BGP: Interface range bind
For dynamic onlink connections, we need to find out which interface
the connection came in, and we need to pin that connection to
that interface. To achieve that, we create a listening socket
bound to each interface separately, and match the incoming connection
by the socket. Otherwise, the kernel would not give us any information
on where the connection came from.
Maria Matejka [Wed, 2 Jul 2025 14:40:34 +0000 (16:40 +0200)]
Socket: Warnings for link-local addresses without interfaces
In certain corner cases (e.g. mixed global and link-local IPv6 address)
the kernel fails to give us the interface ID. We log a warning for such
a case before a possibly misleading error message is spit out by BGP.
Also pass TCP interface information from parent to child on accept,
if the interface is bound to that interface.
Maria Matejka [Wed, 20 Aug 2025 13:35:32 +0000 (15:35 +0200)]
BGP: Fix dynamic instance reconfiguration
Every dynamic BGP was torn down on reconfig because the inherited
configuration is a little bit different than the parent one. Fixed this
by applying the same changes before the reconfiguration's memcmp().
Also fixed interface pattern reconfiguration which always restarted.
Added not only comparison but also actual reconfiguration of the pattern
itself so that one can update the pattern without restarting a running
BGP session.
Finally, extended documentation a bit to cover dynamic BGP scenarios a
little bit better. Yet, it probably deserves a separate section on
dynamic BGP.
Maria Matejka [Wed, 16 Jul 2025 08:45:10 +0000 (10:45 +0200)]
BGP: Fixed link-local connections with wildcard local iface
When BGP was configured to accept link-local connections
in combination with interface range, it failed to recognize
that the incoming connection is indeed for that protocol.
Maria Matejka [Wed, 20 Aug 2025 13:34:31 +0000 (15:34 +0200)]
BGP: Fix TCP-AO single key rejection
When one key fails but others are working OK, do not shut down the BGP,
just disable that one key. We intended to do it this way but it somehow
slipped through.
Also added key cleanup in cases where the key addition fails for just
some sockets but not for all.
Maria Matejka [Tue, 8 Jul 2025 18:28:03 +0000 (20:28 +0200)]
BGP: Fixed unnumbered connections with wildcard local IP.
When the BGP was configured onlink with a neighbor range, interface
range and wildcard local IP, the connections failed to establish
because the inferred local IP wasn't properly propagated.
Igor Putovny [Thu, 9 Oct 2025 14:08:47 +0000 (16:08 +0200)]
Move the interfaces locking domain from attrs to rtable
The interface table domain level has been "attrs" due to limitations
imposed in the early stages of development when all operations on
routing tables, most notably next hop resolution, were done with the
table locked.
Another possible problem was accessing the BGP listen socket structures
which are on the "rtable" level.
Yet, with the introduction of the "service" level and stabilization of
other structures, the interface table domain level does not need to be
at "attrs" anymore, and therefore we may simply move it to "rtable"
as it is actually the right place for that.
This collision also caused problems with external resource locks which
are at the "attrs" level, causing a crash in interface reconfiguration
of RIP, Babel and OSPF, when the routines tried to acquire a resource
lock with the interface table being locked. Due to a lack of autotests
for interface reconfiguration, we missed this problem in BIRD 3.
Maria Matejka [Thu, 18 Sep 2025 16:01:37 +0000 (18:01 +0200)]
BGP: Fixed crash on Notification with a message
Due to wrong locking order, when a peer with an established BGP
session sent a Notification with a custom message, BIRD always
crashed when trying to allocate the memory for that message.
This is a minimal crashfix for stable branches; the development
branch will get a more systematic protocol allocation rework.
Maria Matejka [Mon, 22 Sep 2025 08:37:16 +0000 (10:37 +0200)]
BGP: Fixed invalid memory access in pending TX flush
When BGP is shutting down (or graceful-restarting), it must flush the
pending TX data. In quite rare cases, it may have happened that with the
export table on and shutting down a session with just the right amount
of unsent updates, the flush may have caused a step-down of the prefix
hash in the middle of walking it.
Usually, when downsizing, the prefix of the allocated block is used, but
if the block is large enough, it may have been re-used by another thread
early enough to cause some very unwanted out-of-buffer access.
Igor Putovny [Wed, 11 Jun 2025 15:44:38 +0000 (17:44 +0200)]
Hash: Assert that table is not resized during HASH_WALK
According to measurements of hash_test, hash table with this assertion added
was not found to be significantly slower than without it on average. Therefore
we conclude that this addition would not hamper the performance of HASH_WALK.
Igor Putovny [Wed, 11 Jun 2025 10:00:23 +0000 (12:00 +0200)]
Hash: fix buffer overflow in unit test
This bug manifested itself as segmentation fault of t_insert2_find test when
TEST_ORDER was increased from 13 to 14. When checking the validity of filled
table, the table is iterated from 0 to MAX_NUM. However, when order is an even
number, the size of the table is lower than MAX_NUM (due to table resizing),
which caused reading beyond the allocated memory.
Protocol: State announcements must be always processed before leaving the loop
When using PROTO_LOCKED_FROM_MAIN or other birdloop_enter, there may be
deferred state announcements which have to be sent immediately,
otherwise the main loop would try to execute them out of the appropriate
locked context.
Maria Matejka [Thu, 18 Sep 2025 10:43:44 +0000 (12:43 +0200)]
Proto: deferring start from proto_enable
When the enable command is issued from CLI, we actually do not need
to enable the protocol right away, it's enough to run the rethink goal
function later from a deferred context. This allows us to change the
protocol's loop safely.
Nest: Function aspa_check() should return ASPA_INVALID for paths containing AS_SET
The aspa_check() uses as_path_getlen() to estimate the size of a buffer,
which does not work for AS_SET segments, because as_path_getlen() returns
length 1 for them regardless of their length. This may cause buffer
overflow and crash.
As AS_SET segments are not valid for ASPA verification, we can just
handle them explicitly. See https://datatracker.ietf.org/doc/html/draft-ietf-sidrops-aspa-verification#section-6
Co-Authored-By: Alarig <alarig@swordarmor.fr>
Minor changes by committer.
Maria Matejka [Tue, 16 Sep 2025 10:04:21 +0000 (12:04 +0200)]
ROA Aggregator: Fix crash on multiwithdraw
Theoretically, multiple withdraw from the best feed should never happen
but apparently there is an opportunity. We are unable to reproduce that
but it's obvious that with the old code, if the last ROA to remove is at
the end of the list, an undefined memory is checked. If it accidentally
matches (which seems to be pretty rare), BIRD may call memcpy() with
a negative length and subsequently crash on segfault.
Ondrej Zajicek [Thu, 16 Oct 2025 15:03:38 +0000 (17:03 +0200)]
Conf: Add warning for symbol overriding keyword
In BIRD configuration, used-defined symbols can override keywords, which
could lead to an unexpected behavior when one tries to use such keyword
in its original meaning.
Ondrej Zajicek [Fri, 19 Sep 2025 16:46:41 +0000 (18:46 +0200)]
L3VPN: Add support for import/export target none and import target all
The patch adds support for 'import/export target none' (or '[]' to
specify an empty set). It can be used when we do not want to import/export
any route from/to the VRF, or if we prefer to set the RT it in filters
(e.g., adding a different RT for different IP prefixes).
The patch also adds support for 'import target all', i.e. all VPN routes
are imported in the VRF IP table regardless of the RTs. Useful when more
complexx policy implemented in filters.
Based on patches from Sébastien Parisot <sparisot@iliad-free.fr>, thanks!
Nest: Function aspa_check() should return ASPA_INVALID for paths containing AS_SET
The aspa_check() uses as_path_getlen() to estimate the size of a buffer,
which does not work for AS_SET segments, because as_path_getlen() returns
length 1 for them regardless of their length. This may cause buffer
overflow and crash.
As AS_SET segments are not valid for ASPA verification, we can just
handle them explicitly. See https://datatracker.ietf.org/doc/html/draft-ietf-sidrops-aspa-verification#section-6
Co-Authored-By: Alarig <alarig@swordarmor.fr>
Minor changes by committer.
Maria Matejka [Tue, 26 Aug 2025 14:14:38 +0000 (16:14 +0200)]
Table: Optimal and Any Export refactoring
The original channel_notify_basic() function was so complicated that it
made more sense to split this one into two different functions, one for
RA_ANY, another for RA_OPTIMAL.
I also changed the export_filter() to not touch the rejected_map, and
just return true/false while modifying the route in place which was
already happening anyway.
In addition to this, I added more comments and I hope that now the code
is better approachable and understandable.
Last but not least, I changed several export flag consistency checks
to just error messages if these were harmless enough.
Maria Matejka [Tue, 29 Jul 2025 12:15:08 +0000 (14:15 +0200)]
BGP: Do not restart when next hop keep/self is changed
The change in dade7147eb6b62b2d58d478a370baef513d96975 forces BGP to restart
even if next hop self and next hop keep changes, which can be updated just by
reloading export, while explicit next hop address can not.
Maria Matejka [Thu, 17 Jul 2025 22:19:14 +0000 (00:19 +0200)]
CI: Autotests for BGP setting changes
There are actually 144 test variants. Choosing 12 of them, such that:
- m2 may request no RR, basic RR or enhanced RR
- m2 may have any combination of import and export table
- import and export table settings for m1 are pseudorandomized
- the same for multiple variants how to get basic RR negotiated
This should cover all the code with not too much resource consumption.
Maria Matejka [Wed, 25 Jun 2025 11:00:11 +0000 (13:00 +0200)]
BGP: restart on outgoing next hop setting change
When next hop self / keep / address changed, BGP only reloaded
the exports but it didn't apply the changes. To fix this problem
before actually implementing a proper change detection algorithm,
we restart the protocol if this setting changes.
Maria Matejka [Sun, 29 Jun 2025 18:14:31 +0000 (20:14 +0200)]
CI: adding tests cf-bgp-unnumbered and cf-bgp-error-states
The unnumbered test checks the onlink neighbor scenarios,
and the cf-bgp-error-states checks a regression for BIRD 3
where BGP crashed when listening socket failed to bind.