Ondrej Zajicek [Sat, 6 Jun 2026 16:04:03 +0000 (18:04 +0200)]
OSPF: Fix OOB read in Router-LSA validation
The missing check in lsa_validate_rt2() may lead to OOB read in OSPFv2
Router-LSA validation for malformed Router-LSAs. The OSPFv3 case is in
fact safe, but the patch improves these checks in uniform way.
Ondrej Zajicek [Fri, 5 Jun 2026 13:48:46 +0000 (15:48 +0200)]
Fix several issues in Flowspec handling
The patch fixes several issues in Flowspec handling, namely:
- Out-of-bounds read during flowspec validation
- Rejection of NLRI for anomalies that MUST be ignored
- Incorrect check of operand lengths
- Broken label component construction
- Broken formatting of IPv6 prefixes with specific offsets
The first issue was reported by multiple people in recent time.
The second issue found by Bronson Yen of Calif.io in collaboration
with Claude and Anthropic Research.
Gathering interface statistics can be a relatively expensive operation
on certain systems as it requires iterating over all the cpus.
This patch instructs the kernel to omit device statistics,
when scanning devices, as the statistics aren't used by BIRD.
Kernel-side support for this was added in Linux v4.4 in commit d5566fd72ec1 ("rtnetlink: RTEXT_FILTER_SKIP_STATS support to avoid
dumping inet/inet6 stats"), this saves 368 bytes of statistics.
In the Linux v6.19 commit 105bae321862 ("rtnetlink: honor
RTEXT_FILTER_SKIP_STATS in IFLA_STATS"), it is expanded to skip more
statistics bringing the savings to 800 bytes per device.
Edited-By: Ondrej Zajicek <santiago@crfreenet.org>
Ondrej Zajicek [Wed, 27 May 2026 04:07:04 +0000 (06:07 +0200)]
Netlink: Enable and decode extended acknowledgements
This patch adds support for extended acknowledgements. While
there are more attributes, this patch only adds support for the
extended error message, and uses it to augment the return code.
We don't check the return value of setsockopt(), as any message with
extended acknowledgements will have NLM_F_ACK_TLVS set in nlmsg_flags.
NETLINK_EXT_ACK / NLM_F_ACK_TLVS / NLMSGERR_ATTR_MSG was all added in
Linux v4.12 commit 2d4bc93368f5 ("netlink: extended ACK reporting"), so
AFAICT theres no need to add them to netlink-sys.h
Based on the patch from Asbjørn Sloth Tønnesen <ast@2e8.dk>, thanks!
Set NETLINK_CAP_ACK, as we don't need a full copy of the request
payload, so it's "rather wasteful"[1] to not set NETLINK_CAP_ACK.
We don't check the return value of setsockopt(), as any capped
message will have NLM_F_CAPPED set in nlmsg_flags.
NETLINK_CAP_ACK was introduced in Linux v4.3 commit 0a6a3a23ea6e
("netlink: add NETLINK_CAP_ACK socket option"), and so AFAICT it
doesn't need to be added to netlink-sys.h.
Enable strict checking of netlink messages on the nl_req connection,
so it is enabled on both connections.
Also rename the function to remove "_dump" suffix, as it's a generic
option.
Strict checking was originally called NETLINK_DUMP_STRICT_CHK,
but was renamed to NETLINK_GET_STRICT_CHK, as it should apply to
all calls, not only dumps (Linux commmit d3e8869ec826).
When set on nl_req, we don't need to check the return code, as it will
fail on nl_scan as well, and so one log message should be enough and
unlike nl_scan, we don't need to alter the mode of operation.
Ondrej Zajicek [Mon, 6 Nov 2023 02:38:40 +0000 (03:38 +0100)]
EVPN: BGP/MPLS Ethernet VPNs using VXLAN tunnels - preliminary support
The EVPN protocol implements RFC 7432 BGP Etherent VPNs using VXLAN overlays.
It works similarly to L3VPN. It connects ethernet table (one per VRF) with
(global) EVPN table. Routes passed from EVPN table to ethernet table are
stripped of RD and filtered by import targets, routes passed in the other
direction are extended with RD, MPLS/VNI labels, and export targets in
extended communities.
When VLANs are configured in EVPN protocol, vlan requests are sent to
Bridge protocol, who configures these VLANs on VXLAN interface in kernel.
Minor contributions by Igor Putovny
Thanks to Pim van Pelt and Tomáš Matuš for comments and patches!
Ondrej Zajicek [Mon, 30 Oct 2023 00:50:14 +0000 (01:50 +0100)]
Bridge: Linux bridge interface - preliminary support
The Bridge protocol synchronizes BIRD eth table with Linux kernel bridge
forwarding table. It works analogously to the Kernel protocol, but for
ethernet FDB entries instead of IP routes. The instance of Bridge
protocol is associated with the specific Linux bridge device.
The Bridge protocol handles not only bridge forwarding entries, but also
VXLAN forwarding entries, as Linux kernel VXLAN tunnel device manages its
own forwarding table, with IPs of remote endpoints.
Note that we use ethernet route next_hop to store VXLAN forwarding
address, instead of dedicated route attribute. That is preliminary and
will change in the future.
The Bridge protocol uses netlink to scan VLANs on associated interfaces,
listens to vlan requests from other protocols (EVPN), keeps state of
both desired and actual state of VLANs, and configure VLANs on managed
interfaces according to received requests.
Ondrej Zajicek [Tue, 16 Sep 2025 13:34:44 +0000 (15:34 +0200)]
Lib: Publish/subscribe queues
Implement a publish/subscribe messaging system with dynamic topic
management and resource tracking. The system allows multiple publishers
to send messages to named topics, which are then distributed to all
subscribers of those topics. Publishers and subscribers are managed as
resources and and automatically cleaned up when destroyed.
Ondrej Zajicek [Thu, 25 Jan 2024 17:39:40 +0000 (18:39 +0100)]
Filter: Ethernet and EVPN support
Add mac filter type (for mac_addr) and various accessors for ethernet
and EVPN net types (mac, vlan_id, evpn_type, evpn_tag, evpn_esi mac, rd,
ip, router_ip).
Maria Matejka [Tue, 12 May 2026 12:54:06 +0000 (14:54 +0200)]
Fix brandom() to return u32
I missed that random() returns only positive results (31 bits), while
jrand() returns 32 bits but half negative. Rectifying that to return
always unsigned 32 bits.
Reported-By: Ondrej Zajicek <santiago@crfreenet.org>
Issue: #312
Igor Putovny [Tue, 3 Mar 2026 19:49:21 +0000 (20:49 +0100)]
Replace random() with jrand48()
In BIRD 3, every random() call locks. Converting these to jrand48()
with thread-local context buffers. The change starts in BIRD 2
though, to keep the code aligned better.
Igor Putovny [Wed, 22 Apr 2026 14:49:05 +0000 (16:49 +0200)]
Filter: Add missing operations for clists of ints
Despite that clist can be used with ints as its values, some operations
(specifically matching and ones with set arguments) were not properly
implemented.
Maria Matejka [Thu, 19 Mar 2026 11:01:15 +0000 (12:01 +0100)]
Log: Set a reasonable lower bound for the log file size limit
The log rotation needs a minimal file size. The 16 kB limit imposed
by this commit effectively allows about 150 lines to fit into one file,
and by that all the accompanying log messages (e.g. with debug latency)
fit into there and don't cause another rotation.
Maria Matejka [Mon, 23 Mar 2026 22:08:12 +0000 (23:08 +0100)]
CI: Templated gitlab dockerfiles
This allows to add new distributions and mass-modify build environments
at a single place if any such need occurs. Also there is less risk that
some file is omitted if modifying multiple places in the same way.
There is also a check re-generating the templates in the CI and failing
immediately if they are not up-to-date.
Maria Matejka [Sun, 22 Mar 2026 01:54:36 +0000 (02:54 +0100)]
CI: Refactoring of pipeline job rules
Implemented:
- manual and scheduled pipeline run support
- inputs to explicitly choose which job categories to run
- docker rebuild only manually
- no packaging for development branches
Maria Matejka [Sun, 15 Mar 2026 17:39:28 +0000 (18:39 +0100)]
ASPA: Document our aspa_check() implementation.
There are certain design choices behind the implementation,
and as the ASPA algorithm is quite complex even in the specification,
we should add some explanation here.
Our approach is not directly following the specification, as checking
the authorized() function specified in the draft is performance-heavy.
Also, there are some more future plans with this, and they deserve
documenting as well.
Maria Matejka [Sat, 14 Mar 2026 20:57:46 +0000 (21:57 +0100)]
ASPA: Fix downstream check for two-point apex
The ASPA algorithm is quite complex if one wants to execute it fast.
Most notably, the performance-critical part is looking up the ASPA
records, and we are trying to reduce that to minimum.
Yet, in that effort, we missed the fact that in the downstream
algorithm, the down-ramp and up-ramp may touch, i.e. their top ends
have a lateral peering.
The original idea was to find the point where the down-ramp is
impossible to be extended, and from there on, the algorithm is basically
just the upstream algorithm. But it isn't, most notably with the lateral
peering scenario it is much more complex than this.
This issue was discovered by several people, and got a fix submitted by
Evann DREUMONT. That fix was correct but replaced the algorithm too
deeply. We don't want to do such large changes (including semantics)
inside the stable versions, and we have some more plans with all of this
considering performance, as soon as more ASPA records emerge.
This patch therefore simply removes the force_upstream shortcut from
where the down ramp is terminated, fixes the downstream code so that
it works without that shortcut, and explicitly allows the two-apex
downstream scenario.
Ondrej Zajicek [Tue, 24 Feb 2026 22:15:06 +0000 (23:15 +0100)]
BGP: Automatic peering based on discovered neighbors
Extend existing dynamic BGP code to support spawning of active BGP
instances for discovered neighbors.
The existence of such dynamic BGP instances is controlled by exporting
information from a table containing neighbor entries through a newly
introduced neighbor channel. This means that the feature will only work
if there is another protocol responsible for discovering and announcing
neighbor entries (e.g. RAdv with 'router discovery' enabled).
Based on the patch from Matteo Perin <matteo.perin@canonical.com>, thanks!
Matteo Perin [Tue, 24 Feb 2026 22:15:06 +0000 (23:15 +0100)]
RAdv: Router discovery based on incoming Router Advertisments
Up until this point no much use has been make of incoming RAs, this commit
tries to amend that by announcing peer-based routes to a new peer channel.
This will allow to use the information discovered about remote routers
by other protocols.
RA staleness has also been taken into consideration and the routes are
withdrawn whenever the advertised router lifetime expires.
The feature is meant to be enabled via a new configuration option in
the RAdv protocol called router discovery [yes/no].
Matteo Perin [Tue, 24 Feb 2026 22:15:06 +0000 (23:15 +0100)]
Nest: Add net_addr_nbr route type to track discovered neighbors
The definition and helper functions for a new route-like object to track
peer discovery data has been added. It only contains the (v4 or v6)
neighbor address and the ingress iface index, for now.
The main intent of this is, currently, to enable BGP unnumbered auto
peer discovery via RAdv incoming advertisments, but in the future the
same data structure could be used to allow discovery coming from
different protocols.
Ondrej Zajicek [Thu, 26 Feb 2026 14:29:40 +0000 (15:29 +0100)]
Nest: Improve reconfiguration of dynamic BGP
During reconfiguration, first add all existing dynamic protocols to the
new BGP config to ensure that there is already full set of protocols
when reconfiguration hooks for individual protocols are called.
Also, bgp_spawn() should not be called when parent BGP is not yet
configured, otherwise we would end with an old proto_config linked
from the new configuration.
Joshua Rogers [Tue, 10 Feb 2026 00:10:28 +0000 (01:10 +0100)]
Netlink: Fix handling of RTAX_CC_ALGO netlink attribute
The kernel-provided congestion control algorithm (RTAX_CC_ALGO) is stored in
an EAF_TYPE_STRING adata blob without the terminating NULL. When exporting
metrics back to netlink, the value is treated as a C string and passed to
nl_add_attr_str(), which uses strlen(str)+1. This may read past the allocated
adata and leak adjacent memory or crash.
Maria Matejka [Thu, 25 Dec 2025 17:19:48 +0000 (18:19 +0100)]
CI: Packaging cleanup
With the removal of APKG, we don't need to split out "-legacy" DEB with
old python, and "-wa" RPM with an obscure sed applied to specfile which
never actually did anything since added in 2021.
Maria Matejka [Sat, 6 Dec 2025 21:43:54 +0000 (22:43 +0100)]
CI: No more APKG in packaging
We have had minor and subtle but repeating problems with the APKG
dependency chain and its overall usability. It has become apparent
that we actually don't need that kind of abstraction layer because
all our problems are actually solvable by just a bunch of short scripts.
With that, we are now using the (standard) dpkg-buildpackage and
rpmbuild tools directly from bash script.
From now on, with just several exceptions, all our distribution builds
should be fully reproducible.
Maria Matejka [Mon, 22 Dec 2025 21:55:22 +0000 (22:55 +0100)]
Source package and documentation builds are now reproducible
We now explicitly set the PDF build datetime to commit datetime, and we
also clean all the file metadata in the TGZ archives, so that the
generated archives are now bit-identical.
Maria Matejka [Mon, 22 Dec 2025 22:27:27 +0000 (23:27 +0100)]
Lib: Fix comments so that progdoc is deterministic
For some weird reason, the old Perl code behaves non-deterministically on
@foo() and there is no clear explanation why. The snails should not be there
anyway so removing them.
Ondrej Zajicek [Thu, 27 Nov 2025 16:59:44 +0000 (17:59 +0100)]
RAdv: Fix flags for deprecated prefixes
When a prefix is deprecated (valid_lifetime == 0), it should be
announced with the same flags as before. The old code announced it
without any flags, which leads to being ignored by recipients.
Note that a prefix could be depreacted for two reason - it is removed
from the interface, or it is deconfigured in BIRD configuration.
Maria Matejka [Sat, 22 Nov 2025 22:23:28 +0000 (23:23 +0100)]
Tools: Release initialization script
There is a bunch of things one needs to do in gitlab when releasing and
this script simply checks whether there is everything needed and fixes
what is missing,
David Petera [Fri, 14 Nov 2025 13:07:38 +0000 (14:07 +0100)]
CI: fix broken debian-11-i386 packaging
The error occured when trying to install 'markupsafe' v3.0.3 python package (dependency of 'apkg') on debian-11-i386 arch.
Fixed by preinstalling older version of 'markupsafe' package before the installation of 'apkg'.
Since the behavior is added to 'pkg-deb-legacy' it also effects packaging of ubuntu-18.04-amd64.
Also debian-11-amd64 is moved to 'pkg-deb-legacy' together with effected debian-11-i386 just for code clarity.
Maria Matejka [Wed, 20 Aug 2025 13:35:32 +0000 (15:35 +0200)]
BGP: Fix dynamic instance reconfiguration
Every dynamic BGP was torn down on reconfig because the inherited
configuration is a little bit different than the parent one. Fixed this
by applying the same changes before the reconfiguration's memcmp().
Also fixed interface pattern reconfiguration which always restarted.
Added not only comparison but also actual reconfiguration of the pattern
itself so that one can update the pattern without restarting a running
BGP session.
Finally, extended documentation a bit to cover dynamic BGP scenarios a
little bit better. Yet, it probably deserves a separate section on
dynamic BGP.
Maria Matejka [Wed, 20 Aug 2025 13:34:31 +0000 (15:34 +0200)]
BGP: Fix TCP-AO single key rejection
When one key fails but others are working OK, do not shut down the BGP,
just disable that one key. We intended to do it this way but it somehow
slipped through.
Also added key cleanup in cases where the key addition fails for just
some sockets but not for all.
Maria Matejka [Wed, 16 Jul 2025 08:45:10 +0000 (10:45 +0200)]
BGP: Fixed link-local connections with wildcard local iface
When BGP was configured to accept link-local connections
in combination with interface range, it failed to recognize
that the incoming connection is indeed for that protocol.
Maria Matejka [Tue, 8 Jul 2025 18:28:03 +0000 (20:28 +0200)]
BGP: Fixed unnumbered connections with wildcard local IP.
When the BGP was configured onlink with a neighbor range, interface
range and wildcard local IP, the connections failed to establish
because the inferred local IP wasn't properly propagated.
Maria Matejka [Thu, 3 Jul 2025 15:16:14 +0000 (17:16 +0200)]
BGP: Interface range bind
For dynamic onlink connections, we need to find out which interface
the connection came in, and we need to pin that connection to
that interface. To achieve that, we create a listening socket
bound to each interface separately, and match the incoming connection
by the socket. Otherwise, the kernel would not give us any information
on where the connection came from.
Maria Matejka [Wed, 2 Jul 2025 14:40:34 +0000 (16:40 +0200)]
Socket: Warnings for link-local addresses without interfaces
In certain corner cases (e.g. mixed global and link-local IPv6 address)
the kernel fails to give us the interface ID. We log a warning for such
a case before a possibly misleading error message is spit out by BGP.
Also pass TCP interface information from parent to child on accept,
if the interface is bound to that interface.