Maria Matejka [Wed, 4 May 2022 10:41:54 +0000 (12:41 +0200)]
Removing the route scope attribute. Use custom attributes instead.
The route scope attribute was used for simple user route marking. As
there is a better tool for this (custom attributes), the old and limited
way can be dropped.
Maria Matejka [Wed, 4 May 2022 10:24:30 +0000 (12:24 +0200)]
Conf: Allowing keyword redefinition
Some tokens are both keywords and symbols. For now, we allow only
specific keywords to be redefined; in future, more of the keywords may
be added to this category.
The redefinable keywords must be specified in any .Y file as follows:
Maria Matejka [Sat, 19 Mar 2022 15:23:42 +0000 (16:23 +0100)]
Explicit definition structures of route attributes
Changes in internal API:
* Every route attribute must be defined as struct ea_class somewhere.
* Registration of route attributes known at startup must be done by
ea_register_init() from protocol build functions.
* Every attribute has now its symbol registered in a global symbol table
defined as SYM_ATTRIBUTE
* All attribute ID's are dynamically allocated.
* Attribute value custom formatting hook is defined in the ea_class.
* Attribute names are the same for display and filters, always prefixed
by protocol name.
Also added some unit testing code for filters with route attributes.
Maria Matejka [Thu, 14 Apr 2022 14:51:18 +0000 (16:51 +0200)]
Enforcing certain data structure explicit paddings.
Implicit paddings have undefined values in C. We want the eattr blocks
to be comparable by memcmp and eattrs settable directly by structrure
literals. This check ensures that all paddings in eattr and bval are
explicit and therefore zeroed in all literals.
Maria Matejka [Sat, 26 Mar 2022 10:56:02 +0000 (11:56 +0100)]
Unified attribute and filter types
This commit removes the EAF_TYPE_* namespace completely and also for
route attributes, filter-based types T_* are used. This simplifies
fetching and setting route attributes from filters.
Also, there is now union bval which serves as an universal value holder
instead of private unions held separately by eattr and filter code.
Maria Matejka [Fri, 25 Mar 2022 18:51:35 +0000 (19:51 +0100)]
Filter: Bitfield eattrs reading / writing moved to filter code
Before this change, fetch-update-write and bitmasking was hardcoded in
attribute access code cased by the attribute type. Several filter
instructions are used to do it instead.
As this is certainly going to be a little bit slower than before, the
switch block in attribute access code should be completely removed in
near future, helping with both performance and code cleanliness.
Maria Matejka [Wed, 13 Apr 2022 15:05:12 +0000 (17:05 +0200)]
RIP: fixed the EA_RIP_FROM attribute
The interface pointer was improperly converted to u32 and back. Fixing
this by explicitly allocating an adata structure for it. It's not so
memory efficient, we'll optimize this later.
Maria Matejka [Tue, 5 Apr 2022 13:09:56 +0000 (15:09 +0200)]
BGP uses lp_save / lp_restore instead of linpool flushing
It is too cryptic to flush tmp_linpool in these cases and we don't want
anybody in the future to break this code by adding an allocation
somewhere which should persist over that flush.
Alignment of slabs should be at least sizeof(ptr) to avoid unaligned
pointers in slab structures. Fixme: Use proper way to choose alignment
for internal allocators.
After switching to 16-way tries, trie format ignored unaligned / internal
prefixes and only reported the primary prefix of a trie node.
Fix trie format by showing internal prefixes based on the 'local' bitmask
of a node. Also do basic (intra-node) reconstruction of prefix patterns
by finding common subtrees in 'local' bitmask.
In future, we could improve that by doing inter-node reconstruction, so
prefixes entered as one pattern for a subtree (e.g. 192.168.0.0/18+)
would be reported as such, like with aligned prefixes.
Nest: Implement locking of prefix tries during walks
The prune loop may may rebuild the prefix trie and therefore invalidate
walk state for asynchronous walks (used in 'show route in' cmd). Fix it
by adding locking that keeps the old trie in memory until current walks
are done.
In future this could be improved by rebuilding trie walk states (by
lookup for last found prefix) after the prefix trie rebuild.
When rtable is pruned and network fib nodes are removed, we also need to
prune prefix trie. Unfortunately, rebuilding prefix trie takes long time
(got about 400 ms for 1M networks), so must not be atomic, we have to
rebuild a new trie while current one is still active. That may require
some considerable amount of temporary memory, so we do that only if
we expect significant trie size reduction.
Implement flowspec validation procedure as described in RFC 8955 sec. 6
and RFC 9117. The Validation procedure enforces that only routers in the
forwarding path for a network can originate flowspec rules for that
network.
The patch adds new mechanism for tracking inter-table dependencies, which
is necessary as the flowspec validation depends on IP routes, and flowspec
rules must be revalidated when best IP routes change.
The validation procedure is disabled by default and requires that
relevant IP table uses trie, as it uses interval queries for subnets.
Nest: Avoid unnecessary net_format() in 'show route' command
When output of 'show route' command was generated, the net_format() was
called for each network prematurely, even if the result was not needed.
Fix the code to call net_format() only when needed. This makes queries
that process many networks but show only few (e.g. 'show route where ..',
or 'show route count') much faster (like 5x - 10x faster).
Nest: Attach prefix trie to rtable for faster LPM and interval queries
Attach a prefix trie to IP/VPN/ROA tables. Use it for net_route() and
net_roa_check(). This leads to 3-5x speedups for IPv4 and 5-10x
speedup for IPv6 of these calls.
TODO:
- Rebuild the trie during rt_prune_table()
- Better way to avoid trie_add_prefix() in net_get() for existing tables
- Make it configurable (?)
One of previous commits added error logging of invalid routes. This
also inadvertently caused error logging of route loops, which should
be ignored silently. Fix that.
Most error messages in attribute processing are in rx/decode step and
these use L_REMOTE log class. But there are few that are in tx/export
step and these should use L_ERR log class.
Use tx-specific macro (REJECT()) in tx/export code and rename field
err_withdraw to err_reject in struct bgp_export_state to ensure that
appropriate error reporting macros are called in proper contexts.
Therefore, since Linux 5.3 these route cache entries are dumped together
with regular routes during periodic KRT scans, which in some cases may be
huge amount of useless data. This can be avoided by using strict checking
for netlink dumps:
The patch mitigates the risk of receiving unknown and potentially large
number of FNHE records that would block BIRD I/O in each sync. There is a
known issue caused by the GRE tunnels on Linux that seems to be creating
one FNHE record for each destination IP address that is routed through
the tunnel, even when the PMTU equals to GRE interface MTU.
Kernel uses cloned routes to keep route cache entries, but reports them
together with regular routes. They were skipped implicitly as they
do not have rtm_protocol filled. Add explicit check for cloned flag
and skip such routes explicitly.
Add option to socket interface for nonlocal binding, i.e. binding to an
IP address that is not present on interfaces. This behaviour is enabled
when SKF_FREEBIND socket flag is set. For Linux systems, it is
implemented by IP_FREEBIND socket flag.
Currently, BIRD ignores dead routes to consider them absent. But it also
ignores its own routes and thus it can not correctly manage such routes
in some cases. This patch makes an exception for routes with proto bird
when ignoring dead routes, so they can be properly updated or removed.
Thanks to Alexander Zubkov for the original patch.
Lexer expression for bytestring was too loose, accepting also
full-length IPv6 addresses. It should be restricted such that
colon is used between every byte or never.
Fix the regex and also add some test cases for it.