From: Maria Matejka <mq@ucw.cz>
Date: Mon, 7 Apr 2025 13:49:42 +0000 (+0200)
Subject: Some updates from Maria
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=b6f3355d02a32992e5d7f8774d5395b335f3c8a5;p=thirdparty%2Fbird.git

Some updates from Maria
---

diff --git a/proto/aggregator/aggregator.c b/proto/aggregator/aggregator.c
index c392ef7cd..11d3c911f 100644
--- a/proto/aggregator/aggregator.c
+++ b/proto/aggregator/aggregator.c
@@ -248,7 +248,12 @@ aggregator_bucket_update(struct aggregator_proto *p, struct aggregator_bucket *b
   }
 
   struct rte *new = rte_get_temp(rta, p->p.main_source);
+  /* Here we need _some_ net to run the filters properly
+   * TODO: use a temporary local net instead of a real one to avoid confusion */
   new->net = net;
+  //struct network *n = allocz(sizeof(*n) + sizeof(addr));
+  //net_copy(n->n.addr, &addr);
+
 
   if (p->logging)
   {
diff --git a/proto/aggregator/trie.c b/proto/aggregator/trie.c
index fefa1b83f..f00fb8a6f 100644
--- a/proto/aggregator/trie.c
+++ b/proto/aggregator/trie.c
@@ -12,7 +12,6 @@
  *
  * Prefix aggregation implements the ORTC (Optimal Route Table Construction)
  * algorithm [1].
- * TODO: zdroje, odkazy na literaturu
  *
  * This algorithm uses a binary tree representation of the routing table.
  * An edge from the parent node to its left child represents bit 0, and
@@ -24,10 +23,10 @@
  * Prefixes are therefore represented as a path through the trie, beginning at
  * the root node. The last node on this path is called prefix node.
  *
- * TODO: popis originÃ¡lnÃ­ho algoritmu zÅetelnÄ odliÅ¡it od toho, co tady skuteÄnÄ dÄlÃ¡me
+ *** The Original Algorithm ***
  *
  * ORTC algorithm as described in the original paper consists of three passes
- * through the trie.
+ * through the trie. (This is not exactly how this is implemented here.)
  *
  * The first pass adds new nodes to the trie so that every node has either two
  * or zero children. During this pass, routing information is propagated to the
@@ -49,58 +48,75 @@
  *
  * Algorithm works with the assumption that there is a default route.
  *
- * The following is a description of this implementation.
+ *** Our Implementation ***
+ *
+ * Description of this implementation follows.
+ *
+ * Route attributes are represented as buckets. All routes with the same set of
+ * attributes matched by the "aggregate on" config clause get the same bucket.
  *
  * The trie contains three different kinds of nodes: original, aggregated and
- * fillers. Original nodes represent prefixes from the original (import)
- * routing table. Aggregated nodes represent prefixes that do not exist in the
- * original table but exist in the aggregated (export) table, as they are result
- * of the aggregation. Filler nodes are neither, they exist in the trie but do
- * not represent any prefixes in original or aggregated table.
+ * fillers.
+ *
+ * - Original nodes represent prefixes from the original (import) routing table.
+ * - Aggregated nodes represent prefixes that do not exist in the original table
+ *   but do exist in the aggregated (export) table.
+ * - Filler nodes exist neither in original or aggregated table, they represent
+ *   prefixes "on the way" to the original or aggregated nodes.
  *
  * Each node has a FIB status flag signalling whether this prefix was exported
  * to the FIB (IN_FIB) or not (NON_FIB). It is clear that IN_FIB nodes can be
  * either original or aggregated, whereas NON_FIB nodes can be either original
  * or fillers.
  *
- * Every node contains pointer to its closest IN_FIB ancestor.
+ * Every node contains pointer to its closest IN_FIB ancestor. If the node is
+ * IN_FIB, the ancestor pointer points to itself.
  *
  * After every aggregation, following invariants are always satisfied:
- * 1. No original bucket can be null.
- * 2. No ancestor pointer can be null.
- * 3. If a node is IN_FIB, then
- *      a) its selected bucket must not be null,
- *      b) its ancestor pointer must point to itself,
- *      c) its origin must be ORIGINAL or AGGREGATED.
- * 4. If a node is NON_FIB, then
- *      a) its selected bucket must be null,
- *      b) its ancestor pointer must point to the nearest IN_FIB ancestor,
- *      c) its origin must be ORIGINAL or FILLER.
+ *
+ *   1. All nodes have some bucket.
+ *   2. All nodes have the IN_FIB ancestor pointer set.
+ *   3. If a node is IN_FIB, then
+ *        a) its selected bucket must not be null,
+ *        b) its ancestor pointer must point to itself,
+ *        c) it must be ORIGINAL or AGGREGATED.
+ *   4. If a node is NON_FIB, then
+ *        a) its selected bucket must be null,
+ *        b) its ancestor pointer must point to the closest IN_FIB ancestor,
+ *        c) it must be ORIGINAL or FILLER.
  *
  * Our implementation differs from the algorithm as described in the original
- * paper in several aspects. First, we do not normalize the trie by adding new
- * nodes so that every node has either zero or two children. Second, propagation
- * of original buckets, which was formerly done during first pass, is now done
- * in the second pass. First pass is completely omitted.
- * The two phases of aggregation are named propagate_and_merge() for first and
- * second pass and group_prefixes() for third pass.
+ * paper in several aspects:
  *
- * Aggregator is capable of processing incremental updates. After receiving
- * an update, which can be either announce or withdraw, corresponding node
- * is found in the trie and its original bucket is updated. Trie now needs to
- * be recomputed to reflect this update.
- * Trie is traversed from the updated node upwards until its closest IN_FIB
- * ancestor is found. This is the prefix node that covers an address space which
- * is affected by received update. This is followed by propagate_and_merge(),
- * which propagates potential buckets from the leaves upwards. Merging of sets
- * of potential buckets continues upwards until the node's set is not changed by
- * this operation. Finally, the third pass runs from this node, finishing the
- * aggregation. During the third pass, changes in prefix FIB status are detected
- * and routes are exported or removed from the routing table accordingly. All
- * new routes are exported immmediately, whereas routes that are to be
- * withdrawed are pushed on the stack and removed after recomputing the trie.
+ * - We do not normalize the trie by adding new nodes. This way, nodes may
+ *   have one child (not only zero or two).
+ * - The first pass is merged with the second pass. These two passes together
+ *   are named propagate_and_merge().
+ * - The third pass is called group_prefixes().
+ *
+ * The Aggregator is capable of processing incremental updates in the following
+ * way. After receiving an update, which can be either announce or withdraw:
+ *
+ *    1. The corresponding node is found in the trie and its original bucket
+ *       is updated. The trie now needs to be recomputed to reflect this update.
+ *    2. The trie is traversed from the updated node upwards until its closest
+ *	 IN_FIB ancestor is found. This is the prefix node that covers an
+ *	 address space which is directly affected by the received update.
+ *    3. The propagate_and_merge() pass is started for the subtree rooted in
+ *	 the node found in the previous step. This pass propagates buckets
+ *	 eligible for selection from the leaves upwards.
+ *    4. Merging of sets of eligible buckets may leak from the subtree upwards
+ *       by computing a different eligible bucket set for the node selected in
+ *       step 2. In this case, we continue upwards until the computed set is equal
+ *       with the previous one.
+ *    5. From the last node changed in the last step, the group_prefixes()
+ *	 is started downwards.
+ *    6. When this function decides to change IN_FIB status or exchange the
+ *       selected bucket, either route update is done immediately, or route
+ *       retraction is scheduled for later to avoid short-term misroutings.
  *
  * References:
+ *
  * [1] R. P. Draves, C. King, S. Venkatachary and B. D. Zill. Constructing
  *     Optimal IP Routing Tables. In Proceedings of IEEE INFOCOM, volume 1,
  *     pages 88-97, 1999.
@@ -128,13 +144,6 @@
 
 #include <stdbool.h>
 
-/* TODO: comment what purpose this array has */
-/*
- * Nodes in the trie can be either original, representing input prefixes;
- * aggregated, representing prefixes created by aggregation; or fillers,
- * which are neither. This array maps these values to strings which are
- * printed when dumping the contents of the trie.
- */
 static const char *px_origin_str[] = {
   [FILLER]     = "filler",
   [ORIGINAL]   = "original",
@@ -170,13 +179,6 @@ aggregator_root_init(struct aggregator_bucket *bucket, struct slab *trie_slab)
   return root;
 }
 
-static inline int
-aggregator_is_leaf(const struct trie_node *node)
-{
-  ASSERT_DIE(node != NULL);
-  return !node->child[0] && !node->child[1];
-}
-
 /*
  * Unlink node from the trie by setting appropriate child of parent node to NULL
  * and free memory.
@@ -229,8 +231,6 @@ aggregator_node_add_potential_bucket(struct trie_node *node, const struct aggreg
 static inline int
 aggregator_is_bucket_potential(const struct trie_node *node, u32 id)
 {
-  /* TODO: obecnÃ¡ otÃ¡zka: musÃ­me do tÄchto funkcÃ­ pÅedÃ¡vat  bucket, nebo by staÄilo bucket id? */
-
   ASSERT_DIE(node != NULL);
 
   ASSERT_DIE(id < MAX_POTENTIAL_BUCKETS_COUNT);
@@ -271,6 +271,9 @@ aggregator_select_lowest_id_bucket(const struct aggregator_proto *p, const struc
      */
     u32 id = u32_clz(node->potential_buckets[i]) + i * 32;
 
+    /* We would love if this got optimized out */
+    ASSERT_DIE(BIT32R_TEST(node->potential_buckets, id));
+
     struct aggregator_bucket *bucket = aggregator_get_bucket_from_id(p, id);
     ASSERT_DIE(bucket != NULL);
     ASSERT_DIE(bucket->id == id);
@@ -436,9 +439,7 @@ aggregator_create_route(struct aggregator_proto *p, ip_addr prefix, u32 pxlen, s
   struct net_addr addr = { 0 };
   net_fill_ipa(&addr, prefix, pxlen);
 
-  struct network *n = allocz(sizeof(*n) + sizeof(addr));
-  net_copy(n->n.addr, &addr);
-
+  /* TODO: ProÄ sem vlastnÄ pÅedÃ¡vÃ¡me struct network? MÄlo by nÃ¡m staÄit net_addr. */
   aggregator_bucket_update(p, bucket, n);
 }
 
@@ -455,14 +456,17 @@ aggregator_prepare_rte_withdrawal(struct aggregator_proto *p, ip_addr prefix, u3
   struct rte_withdrawal_item *item = lp_allocz(p->rte_withdrawal_pool, sizeof(*item));
 
   /* Fill in net and bucket */
+  net_fill_ipa(&item->addr, prefix, pxlen);
+#if 0 // TODO
   struct net_addr addr = { 0 };
   net_fill_ipa(&addr, prefix, pxlen);
   net_copy(&item->addr, &addr);
+#endif 
 
   item->bucket = bucket;
 
   /* Push item onto stack */
-  item->next = p->rte_withdrawal_stack,
+  item->next = p->rte_withdrawal_stack;
   p->rte_withdrawal_stack = item;
   p->rte_withdrawal_count++;
 }
@@ -526,7 +530,6 @@ aggregator_trie_remove_prefix(struct aggregator_proto *p, ip_addr prefix, u32 px
   ASSERT_DIE(node->px_origin == ORIGINAL);
   ASSERT_DIE((u32)node->depth == pxlen);
 
-  /* TODO: okomentovat, proÄ tady jeÅ¡tÄ nesmÃ­me uklÃ­zet smÄrem nahoru */
   /*
    * Even though this function is called to remove prefix from the trie, we
    * can only change its origin from original to filler. Node itself cannot be
@@ -599,18 +602,21 @@ aggregator_find_subtree_prefix(const struct trie_node *target, ip_addr *prefix,
 }
 
 /*
- * TODO:
- * - okomentovat (radÅ¡i vÃ­c)
- * - pÅejmenovat (aggregator_propagate_update?)
- * - zruÅ¡it `recomputing`
- *
  * First and second pass of Optimal Route Table Construction (ORTC) algorithm
  *
- * This function performs two tasks. First, it propagates original buckets from
- * target node to the leaves. Original bucket from prefix node is assigned to
- * all his descendants in a downward direction until another original node is
- * reached. Second, it merges sets of potential buckets from leaves upward to
- * the target node.
+ * This function is called after the trie is changed. This function is called recursively.
+ *
+ * First, this function propagates original bucket information from the node's
+ * parent to the current one. (This is basically the first pass in the original algorithm.)
+ *
+ * Then this function calls itself to its children.
+ *
+ * After the recursion returns, sets of potential buckets from the children are merged
+ * to form the potential_buckets bitmap.
+ *
+ * With this, the function both propagates changes down and up during one pass.
+ * 
+ * The argument is the node from which to descend.
  */
 static void
 aggregator_propagate_and_merge(struct trie_node *node)
@@ -619,74 +625,58 @@ aggregator_propagate_and_merge(struct trie_node *node)
   ASSERT_DIE(node->status != UNASSIGNED_FIB);
   ASSERT_DIE(node->potential_buckets_count <= MAX_POTENTIAL_BUCKETS_COUNT);
 
-  /* Propagate original buckets from original nodes to their descendants */
-  if (node->px_origin != ORIGINAL)
+  if (node->px_origin == ORIGINAL)
+    ASSERT_DIE(node->original_bucket != NULL);
+  else
   {
+    /* Non-original node needs to get the original bucket from its parent. */
+    ASSERT_DIE(node->parent->original_bucket != NULL);
     node->original_bucket = node->parent->original_bucket;
 
-    /*
-     * During initial aggregation, there are only original and filler nodes,
-     * thus this statement has no effect. When recomputing, aggregated nodes
-     * become fillers.
-     */
+    /* This node will be recalculated anyway, therefore for now we indicate
+     * by FILLER that the trie state is not consistent with the routes
+     * in the target routing table. */
     node->px_origin = FILLER;
   }
 
-  ASSERT_DIE(node->original_bucket != NULL);
+  /* Get children for traversal */
+  struct trie_node *left  = node->child[0];
+  struct trie_node *right = node->child[1];
 
-  if (aggregator_is_leaf(node))
+  /* Special case for leaf nodes */
+  if (!left && !right)
   {
-    /*
-     * When running aggregation for the first time, erasing sets is not
-     * necessary, because they are empty. However, when recomputing, sets
-     * of the leaf nodes must be cleared. Sets in internal nodes don't have
-     * to, because they will be overwritten by merging operation.
-     */
+    /* Reset the bucket bitmap to cleanup possible old bucket information */
     node->potential_buckets_count = 0;
     memset(node->potential_buckets, 0, sizeof(node->potential_buckets));
 
-    ASSERT_DIE(node->potential_buckets_count == 0);
-
-    /* Original bucket of leaf nodes is their potential bucket */
+    /* For the leaf node, by definition, the only bucket in the bitmap is the
+     * original bucket. */
     aggregator_node_add_potential_bucket(node, node->original_bucket);
+
+    /* No children, no further work. Done! */
     return;
   }
 
-  struct trie_node *left  = node->child[0];
-  struct trie_node *right = node->child[1];
+  /* Prepare an imaginary node in case some children are missing.
+   * This node's potential buckets is just this node's original bucket
+   * and nothing else. This fixes the (kinda) missing first pass
+   * when comparing our algorithm to the original one. */
+  struct trie_node imaginary_node = { 0 };
+  aggregator_node_add_potential_bucket(&imaginary_node, node->original_bucket);
 
-  /* Postorder traversal */
+  /* Process children */
   if (left)
     aggregator_propagate_and_merge(left);
+  else
+    left = &imaginary_node;
 
   if (right)
     aggregator_propagate_and_merge(right);
-
-  /*
-   * Merging sets of potential buckets obviously require node's two children as
-   * arguments. Since our implementation doesn't normalize the trie and therefore
-   * some nodes may have only one child, we simulate missing node by creating
-   * temporary node on stack and using it as an argument for merging.
-   */
-  struct trie_node imaginary_node = { 0 };
-
-  /* Imaginary node inherits potential bucket from its parent */
-  aggregator_node_add_potential_bucket(&imaginary_node, node->original_bucket);
-
-  /* Nodes with only one child */
-  if (left && !right)
+  else
     right = &imaginary_node;
-  else if (!left && right)
-    left = &imaginary_node;
-
-  ASSERT_DIE(left != NULL && right != NULL);
 
-  /*
-   * If there are no common buckets among children's buckets, parent's
-   * buckets are computed as union of its children's buckets.
-   * Otherwise, parent's buckets are computed as intersection of its
-   * children's buckets.
-   */
+  /* Merge sets of potential buckets */
   aggregator_merge_potential_buckets(node, left, right);
 }
 
@@ -867,7 +857,7 @@ aggregator_group_prefixes_helper(struct aggregator_proto *p, struct trie_node *n
     aggregator_process_one_child_nodes(node, inherited_bucket, p->trie_slab);
 
   /* Preorder traversal */
-  if (node->child[0])
+  if (node->child[0]) /* TODO: nestaÄÃ­ tady left a right? Takhle to vypadÃ¡, Å¾e se left a right pod rukama mÅ¯Å¾ou pÅepsat. */
   {
     ASSERT_DIE((u32)node->depth == pxlen);
     ip6_clrbit(prefix, node->depth + ipa_shift[p->addr_type]);