Maria Matejka [Wed, 31 Aug 2022 14:04:36 +0000 (16:04 +0200)]
Flowspec revalidate notification converted to an export hook
Instead of synchronous notifications, we use the asynchronous export
framework to notify flowspec src route updates. This allows us to
invoke flowspec revalidation without locking collisions.
Maria Matejka [Wed, 31 Aug 2022 12:01:59 +0000 (14:01 +0200)]
Hostcache update notification converted to an export hook
Instead of synchronous notifications, we use the asynchronous export
framework to notify also hostcache updates. This allows us to do the
hostcache update and the subsequent next hop update notification without
locking collisions.
Maria Matejka [Tue, 30 Aug 2022 16:05:00 +0000 (18:05 +0200)]
Tables: Requesting prune only after export cleanup
We can't free the network structures before the export has been cleaned
up, therefore it makes more sense to request prune only after export
cleanup. This change also reduces prune calls on table shutdown.
Had to fix route source locking inside BGP export table as we need to
keep the route sources properly allocated until even last BGP pending
update is sent out, therefore the export table printout is accurate.
Maria Matejka [Tue, 2 Aug 2022 15:58:14 +0000 (17:58 +0200)]
Merge branch 'ballygarvan' into HEAD
Replacing the old 3.0-alpha0 cork mechanism with another one inside the
routing table. This version should be simpler and also quite clear what
it does, why and when.
Maria Matejka [Thu, 28 Jul 2022 11:50:59 +0000 (13:50 +0200)]
Route table cork: Indicate whether the export queues are congested.
These routines detect the export congestion (as defined by configurable
thresholds) and propagate the state to readers. There are no readers for
now, they will be added in following commits.
Maria Matejka [Mon, 1 Aug 2022 13:17:41 +0000 (15:17 +0200)]
Fixed main birdloop init in unit tests
Some unit tests weren't initializing the birdloop, trying to write the
birdloop ping into stdin. Fixed this and also forced stdin close on
startup of every test just to be sure that CI and local build behave the
same in this. (CI was failing on this while local build not.)
Ondrej Zajicek [Tue, 26 Jul 2022 16:45:20 +0000 (18:45 +0200)]
Netlink: Restrict route replace for IPv6
Seems like the previous patch was too optimistic, as route replace is
still broken even in Linux 4.19 LTS (but fixed in Linux 5.10 LTS) for:
ip route add 2001:db8::/32 via fe80::1 dev eth0
ip route replace 2001:db8::/32 dev eth0
It ends with two routes instead of just the second.
The issue is limited to direct and special type (e.g. unreachable)
routes, the patch restricts route replace for cases when the new route
is a regular route (with a next hop address).
Ondrej Zajicek [Sun, 24 Jul 2022 22:11:40 +0000 (00:11 +0200)]
Netlink: Simplify handling of IPv6 ECMP routes
When IPv6 ECMP support first appeared in Linux kernel, it used different
API than IPv4 ECMP. Individual next hops were updated and announced
separately, instead of using RTA_MULTIPATH as in IPv4. This has several
drawbacks and requires complex code to merge received notifications to
one multipath route.
When Linux came with IPv6 RTA_MULTIPATH support, the initial versions
were somewhat buggy, so we kept using the old API for updates (splitting
multipath routes to sequences of route updates), while accepting both
old-style routes and RTA_MULTIPATH routes in scans / notifications.
As IPv6 RTA_MULTIPATH support is here for a long time, this patch fully
switches Netlink to the IPv6 RTA_MULTIPATH API and removes old complex
code for handling individual next hop announces.
The required Linux version is at least 4.11 for reliable operation.
Ondrej Zajicek [Sun, 24 Jul 2022 00:15:20 +0000 (02:15 +0200)]
KRT: Scan routing tables separetely on linux to avoid congestion
Remove compile-time sysdep option CONFIG_ALL_TABLES_AT_ONCE, replace it
with runtime ability to run either separate table scans or shared scan.
On Linux, use separate table scans by default when the netlink socket
option NETLINK_GET_STRICT_CHK is available, but retreat to shared scan
when it fails.
Running separate table scans has advantages where some routing tables are
managed independently, e.g. when multiple routing daemons are running on
the same machine, as kernel routing table modification performance is
significantly reduced when the table is modified while it is being
scanned.
Thanks Daniel Gröber for the original patch and Toke Høiland-Jørgensen
for suggestions.
Maria Matejka [Fri, 22 Jul 2022 13:37:21 +0000 (15:37 +0200)]
Revert "Export table: Delay freeing of old stored route."
This reverts commit cee0cd148c9b71bf47d007c850193b5fbf9486c1.
This change is not needed in version 2 and the surrounding code has
disappeared mostly in version 3.
Maria Matejka [Fri, 24 Jun 2022 17:53:34 +0000 (19:53 +0200)]
Event lists rewritten to a single linked list
In multithreaded environment, we need to pass messages between workers.
This is done by queuing events to their respective queues. The
double-linked list is not really useful for that as it needs locking
everywhere.
This commit rewrites the event subsystem to use a single-linked list
where events are enqueued by a single atomic instruction and the queue
is processed after atomically moving the whole queue aside.
Maria Matejka [Fri, 15 Jul 2022 12:57:02 +0000 (14:57 +0200)]
Merge commit 'c70b3198' into thread-next [lots of conflicts]
There were more conflicts that I'd like to see, most notably in route
export. If a bisect identifies this commit with something related, it
may be simply true that this commit introduces that bug. Let's hope it
doesn't happen.
Maria Matejka [Thu, 14 Jul 2022 09:09:23 +0000 (11:09 +0200)]
Fixed invalid routes handling
The invalid routes were filtered out before they could ever get
exported, yet some of the routines need them available, e.g. for
display or import reload.
Now the invalid routes are properly exported and dropped in channel
export routines instead.
Maria Matejka [Wed, 13 Jul 2022 09:19:00 +0000 (11:19 +0200)]
Fixed bug in repeated show route command
Introduced by 13ef5e53dd4a98c80261139b4c9ce4b1074cac40, the CLI was not
properly cleaned up when the command finished, causing BIRD to not parse
any other command after "show route".
Maria Matejka [Tue, 12 Jul 2022 10:40:18 +0000 (12:40 +0200)]
Removing the rte_modify API
For BGP LLGR purposes, there was an API allowing a protocol to directly
modify their stale routes in table before flushing them. This API was
called by the table prune routine which violates the future locking
requirements.
Instead of this, BGP now requests a special route export and reimports
these routes into the table, allowing for asynchronous execution without
locking the table on export.
Maria Matejka [Tue, 12 Jul 2022 08:36:10 +0000 (10:36 +0200)]
Route refresh in tables uses a stale counter.
Until now, we were marking routes as REF_STALE and REF_DISCARD to
cleanup old routes after route refresh. This needed a synchronous route
table walk at both beginning and the end of route refresh routine,
marking the routes by the flags.
We avoid these walks by using a stale counter. Every route contains:
u8 stale_cycle;
Every import hook contains:
u8 stale_set;
u8 stale_valid;
u8 stale_pruned;
u8 stale_pruning;
In base_state, stale_set == stale_valid == stale_pruned == stale_pruning
and all routes' stale_cycle also have the same value.
The route refresh looks like follows:
+ ----------- + --------- + ----------- + ------------- + ------------ +
| | stale_set | stale_valid | stale_pruning | stale_pruned |
| Base | x | x | x | x |
| Begin | x+1 | x | x | x |
... now routes are being inserted with stale_cycle == (x+1)
| End | x+1 | x+1 | x | x |
... now table pruning routine is scheduled
| Prune begin | x+1 | x+1 | x+1 | x |
... now routes with stale_cycle not between stale_set and stale_valid
are deleted
| Prune end | x+1 | x+1 | x+1 | x+1 |
+ ----------- + --------- + ----------- + ------------- + ------------ +
The pruning routine is asynchronous and may have high latency in
high-load environments. Therefore, multiple route refresh requests may
happen before the pruning routine starts, leading to this situation:
| Prune begin | x+k | x+k | x -> x+k | x |
... or even
| Prune begin | x+k+1 | x+k | x -> x+k | x |
... if the prune event starts while another route refresh is running.
In such a case, the pruning routine still deletes routes not fitting
between stale_set and and stale_valid, effectively pruning the remnants
of all unpruned route refreshes from before:
| Prune end | x+k | x+k | x+k | x+k |
In extremely rare cases, there may happen too many route refreshes
before any route prune routine finishes. If the difference between
stale_valid and stale_pruned becomes more than 128 when requesting for
another route refresh, the routine walks the table synchronously and
resets all the stale values to a base state, while logging a warning.
Implement BGP roles as described in RFC 9234. It is a mechanism for
route leak prevention and automatic route filtering based on common BGP
topology relationships. It defines role capability (controlled by 'local
role' option) and OTC route attribute, which is used for automatic route
filtering and leak detection.
Maria Matejka [Mon, 11 Jul 2022 15:04:52 +0000 (17:04 +0200)]
Dropped the internal kernel protocol table for learnt routes.
The learnt routes are now pushed all into the connected table, not only
the best one. This shouldn't do any damage in well managed setups, yet
it should be noted that it is a change of behavior.
If anybody misses a feature which they implemented by misusing this
internal learn table, let us know, we'll consider implementing it in a
better way.
Maria Matejka [Mon, 20 Jun 2022 17:10:49 +0000 (19:10 +0200)]
Export tables merged with BGP prefix hash
Until now, if export table was enabled, Nest was storing exactly the
route before rt_notify() was called on it. This was quite sloppy and
spooky and it also wasn't reflecting the changes BGP does before
sending. And as BGP is storing the routes to be sent anyway, we are
simply keeping the already-sent routes in there to better rule out
unneeded reexports.
Some of the route attributes (IGP metric, preference) make no sense in
BGP, therefore these will be probably replaced by something sensible.
Also the nexthop shown in the short output is the BGP nexthop.
Maria Matejka [Wed, 29 Jun 2022 11:22:40 +0000 (13:22 +0200)]
Hash: iterable now per partes by an iterator
It's now possible to pause iteration through hash. This requires
struct hash_iterator to be allocated somewhere handy.
The iteration itself is surrounded by HASH_WALK_ITER and
HASH_WALK_ITER_END. Call HASH_WALK_ITER_PUT to ask for pausing; it may
still do some more iterations until it comes to a suitable pausing
point. The iterator must be initalized to an empty structure. No cleanup
is needed if iteration is abandoned inbetween.
Filter: Improve handling of stack frames in filter bytecode
When f_line is done, we have to pop the stack frame. The old code just
removed nominal number of args/vars. Change it to use stored ventry value
modified by number of returned values. This allows to allocate variables
on a stack frame during execution of f_lines instead of just at start.
But we need to know the number of returned values for a f_line. It is 1
for term, 0 for cmd. Store that to f_line during linearization.
When a new variable used the same name as an existing symbol in an outer
scope, then offset number was defined based on a scope of the existing
symbol ($3) instead of a scope of the new symbol (sym_). That can lead
to two variables sharing the same memory slot.
Direct recursion almost worked, just crashed on function signature check.
Split function parsing such that function signature is saved before
function body is processed. Recursive calls are marked so they can be
avoided during f_same() and similar code walking.
Also, include tower of hanoi solver as a test case.
Filter: Ensure that all expressions declared return type
All instructions with a return value (i.e. expressions, ones with
non-zero outval, third argument in INST()) should declare their return
type. Check that automatically by M4 macros.
Set outval of FI_RETURN to 0. The instruction adds one value to stack,
but syntactically it is a statement, not an expression.
Add fake return type declaration to FI_CALL, otherwise the automatic
check would fail builds.
Pass instructions of function call arguments as vararg arguments to
FI_CALL instruction constructor and move necessary magic from parser
code to interpreter / instruction code.
Maria Matejka [Mon, 27 Jun 2022 17:04:22 +0000 (19:04 +0200)]
Preexport callback now takes the channel instead of protocol as argument
Passing protocol to preexport was in fact a historical relic from the
old times when channels weren't a thing. Refactoring that to match
current extensibility needs.