]> git.ipfire.org Git - thirdparty/kernel/linux.git/commitdiff
rcu: Document separation of rcu_state and rnp's gp_seq
authorJoel Fernandes <joelagnelf@nvidia.com>
Tue, 15 Jul 2025 20:01:54 +0000 (16:01 -0400)
committerNeeraj Upadhyay <neeraj.iitr10@gmail.com>
Tue, 22 Jul 2025 11:40:31 +0000 (17:10 +0530)
The details of this are subtle and was discussed recently. Add a
quick-quiz about this and refer to it from the code, for more clarity.

Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Neeraj Upadhyay (AMD) <neeraj.upadhyay@kernel.org>
Documentation/RCU/Design/Data-Structures/Data-Structures.rst
kernel/rcu/tree.c

index 04e16775c752badece70c5a1f75bd55ec4500433..1b0aad184dd70ea44d8c2362d07e18a934cdc746 100644 (file)
@@ -286,6 +286,39 @@ in order to detect the beginnings and ends of grace periods in a
 distributed fashion. The values flow from ``rcu_state`` to ``rcu_node``
 (down the tree from the root to the leaves) to ``rcu_data``.
 
++-----------------------------------------------------------------------+
+| **Quick Quiz**:                                                       |
++-----------------------------------------------------------------------+
+| Given that the root rcu_node structure has a gp_seq field,            |
+| why does RCU maintain a separate gp_seq in the rcu_state structure?   |
+| Why not just use the root rcu_node's gp_seq as the official record    |
+| and update it directly when starting a new grace period?              |
++-----------------------------------------------------------------------+
+| **Answer**:                                                           |
++-----------------------------------------------------------------------+
+| On single-node RCU trees (where the root node is also a leaf),        |
+| updating the root node's gp_seq immediately would create unnecessary  |
+| lock contention. Here's why:                                          |
+|                                                                       |
+| If we did rcu_seq_start() directly on the root node's gp_seq:         |
+|                                                                       |
+| 1. All CPUs would immediately see their node's gp_seq from their rdp's|
+|    gp_seq, in rcu_pending(). They would all then invoke the RCU-core. |
+| 2. Which calls note_gp_changes() and try to acquire the node lock.    |
+| 3. But rnp->qsmask isn't initialized yet (happens later in            |
+|    rcu_gp_init())                                                     |
+| 4. So each CPU would acquire the lock, find it can't determine if it  |
+|    needs to report quiescent state (no qsmask), update rdp->gp_seq,   |
+|    and release the lock.                                              |
+| 5. Result: Lots of lock acquisitions with no grace period progress    |
+|                                                                       |
+| By having a separate rcu_state.gp_seq, we can increment the official  |
+| grace period counter without immediately affecting what CPUs see in   |
+| their nodes. The hierarchical propagation in rcu_gp_init() then       |
+| updates the root node's gp_seq and qsmask together under the same lock|
+| acquisition, avoiding this useless contention.                        |
++-----------------------------------------------------------------------+
+
 Miscellaneous
 '''''''''''''
 
index 384af6b5cf9c9dd66903f82252cc26a3d26f56bb..b206db119546ef46351e21ddc051f5490692573f 100644 (file)
@@ -1845,6 +1845,10 @@ static noinline_for_stack bool rcu_gp_init(void)
         * use-after-free errors. For a detailed explanation of this race, see
         * Documentation/RCU/Design/Requirements/Requirements.rst in the
         * "Hotplug CPU" section.
+        *
+        * Also note that the root rnp's gp_seq is kept separate from, and lags,
+        * the rcu_state's gp_seq, for a reason. See the Quick-Quiz on
+        * Single-node systems for more details (in Data-Structures.rst).
         */
        rcu_seq_start(&rcu_state.gp_seq);
        /* Ensure that rcu_seq_done_exact() guardband doesn't give false positives. */