Vojtech Vilimek [Wed, 10 Jan 2024 11:21:46 +0000 (12:21 +0100)]
SNMP: dead end
The code contains hard to debug bug where we periodically do some kind of error.
The problem is caused by weird values after the AgentX PDU header, exactly where
the first OID should lie. We expend value less than 15 but we found values like
0x00003b47. We found possible cause on assignment to socket's receive buffer
position in proto/snmp/subagent.c:snmp_rx at line 1569. An erroneous behavior
may have been caused by off-by-on error. More investigation is needed to gain
full picture.
To resolve this issue we use much more simpler approach. We will set max packet
size and wait until whole packet has arrived.
Vojtech Vilimek [Wed, 15 Nov 2023 14:03:55 +0000 (15:03 +0100)]
SNMP: Major code improvements
SNMP state changes are now handled by snmp_set_state() functions.
The registration structure and related variables are renamed to remove
confusion.
Manipulation of BGP peers, a reference to BGP protocol structures, is improved
by new functions that encapsulate raw hash table macros (moved from snmp.h).
IPv4 addresses now used by bgp_mib.c because BGP4-MIB does not support IPv6
addresses.
Configuration grammar rules are revised.
We now use DBG() and TRACE() macros to output information about SNMP state
chagnes and about received and transmitted packets.
Pieces of old code are removed, minor bugfixes are included. Large debug string
array are removed.
Vojtech Vilimek [Wed, 15 Nov 2023 11:37:10 +0000 (12:37 +0100)]
SNMP: Use compile-time selected byte order
In the AgentX communication the subagent chooses the byte order used. To reduce
code complexity, we decide to use compile-time selected byte order in PDU sent.
Supported options are:
- SNMP_NATIVE: use native CPU byte order
- SNMP_NETWORK_BYTE_ORDER: use big endian in PDUs, i.e. network byte order
It is recommended not to used both options at the same time (even it is possible
on big endian platforms).
The trie_walk_init() function now supports also searching whole trie
subnet and all successor subnets (in lexicographic order). This behavior
can be accomplished by setting @net, and @include_successors to subnet,
and non-zero respectivelly.
Element struct channel_class *channel was renamed to *class in struct channel
and struct channel_config. New pointers were added to structures above
in both directions. This can simplify and speedup the proces of finding
channel (configuration).
Maria Matejka [Fri, 24 Jun 2022 17:53:34 +0000 (19:53 +0200)]
Event lists rewritten to a single linked list
In multithreaded environment, we need to pass messages between workers.
This is done by queuing events to their respective queues. The
double-linked list is not really useful for that as it needs locking
everywhere.
This commit rewrites the event subsystem to use a single-linked list
where events are enqueued by a single atomic instruction and the queue
is processed after atomically moving the whole queue aside.
Maria Matejka [Fri, 15 Jul 2022 12:57:02 +0000 (14:57 +0200)]
Merge commit 'c70b3198' into thread-next [lots of conflicts]
There were more conflicts that I'd like to see, most notably in route
export. If a bisect identifies this commit with something related, it
may be simply true that this commit introduces that bug. Let's hope it
doesn't happen.
Maria Matejka [Thu, 14 Jul 2022 09:09:23 +0000 (11:09 +0200)]
Fixed invalid routes handling
The invalid routes were filtered out before they could ever get
exported, yet some of the routines need them available, e.g. for
display or import reload.
Now the invalid routes are properly exported and dropped in channel
export routines instead.
Maria Matejka [Wed, 13 Jul 2022 09:19:00 +0000 (11:19 +0200)]
Fixed bug in repeated show route command
Introduced by 13ef5e53dd4a98c80261139b4c9ce4b1074cac40, the CLI was not
properly cleaned up when the command finished, causing BIRD to not parse
any other command after "show route".
Maria Matejka [Tue, 12 Jul 2022 10:40:18 +0000 (12:40 +0200)]
Removing the rte_modify API
For BGP LLGR purposes, there was an API allowing a protocol to directly
modify their stale routes in table before flushing them. This API was
called by the table prune routine which violates the future locking
requirements.
Instead of this, BGP now requests a special route export and reimports
these routes into the table, allowing for asynchronous execution without
locking the table on export.
Maria Matejka [Tue, 12 Jul 2022 08:36:10 +0000 (10:36 +0200)]
Route refresh in tables uses a stale counter.
Until now, we were marking routes as REF_STALE and REF_DISCARD to
cleanup old routes after route refresh. This needed a synchronous route
table walk at both beginning and the end of route refresh routine,
marking the routes by the flags.
We avoid these walks by using a stale counter. Every route contains:
u8 stale_cycle;
Every import hook contains:
u8 stale_set;
u8 stale_valid;
u8 stale_pruned;
u8 stale_pruning;
In base_state, stale_set == stale_valid == stale_pruned == stale_pruning
and all routes' stale_cycle also have the same value.
The route refresh looks like follows:
+ ----------- + --------- + ----------- + ------------- + ------------ +
| | stale_set | stale_valid | stale_pruning | stale_pruned |
| Base | x | x | x | x |
| Begin | x+1 | x | x | x |
... now routes are being inserted with stale_cycle == (x+1)
| End | x+1 | x+1 | x | x |
... now table pruning routine is scheduled
| Prune begin | x+1 | x+1 | x+1 | x |
... now routes with stale_cycle not between stale_set and stale_valid
are deleted
| Prune end | x+1 | x+1 | x+1 | x+1 |
+ ----------- + --------- + ----------- + ------------- + ------------ +
The pruning routine is asynchronous and may have high latency in
high-load environments. Therefore, multiple route refresh requests may
happen before the pruning routine starts, leading to this situation:
| Prune begin | x+k | x+k | x -> x+k | x |
... or even
| Prune begin | x+k+1 | x+k | x -> x+k | x |
... if the prune event starts while another route refresh is running.
In such a case, the pruning routine still deletes routes not fitting
between stale_set and and stale_valid, effectively pruning the remnants
of all unpruned route refreshes from before:
| Prune end | x+k | x+k | x+k | x+k |
In extremely rare cases, there may happen too many route refreshes
before any route prune routine finishes. If the difference between
stale_valid and stale_pruned becomes more than 128 when requesting for
another route refresh, the routine walks the table synchronously and
resets all the stale values to a base state, while logging a warning.