From: Joel Fernandes Date: Tue, 15 Jul 2025 20:01:54 +0000 (-0400) Subject: rcu: Document separation of rcu_state and rnp's gp_seq X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=186779c036468038b0d077ec5333a51512f867e5;p=thirdparty%2Fkernel%2Flinux.git rcu: Document separation of rcu_state and rnp's gp_seq The details of this are subtle and was discussed recently. Add a quick-quiz about this and refer to it from the code, for more clarity. Reviewed-by: "Paul E. McKenney" Signed-off-by: Joel Fernandes Signed-off-by: Neeraj Upadhyay (AMD) --- diff --git a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst index 04e16775c752b..1b0aad184dd70 100644 --- a/Documentation/RCU/Design/Data-Structures/Data-Structures.rst +++ b/Documentation/RCU/Design/Data-Structures/Data-Structures.rst @@ -286,6 +286,39 @@ in order to detect the beginnings and ends of grace periods in a distributed fashion. The values flow from ``rcu_state`` to ``rcu_node`` (down the tree from the root to the leaves) to ``rcu_data``. ++-----------------------------------------------------------------------+ +| **Quick Quiz**: | ++-----------------------------------------------------------------------+ +| Given that the root rcu_node structure has a gp_seq field, | +| why does RCU maintain a separate gp_seq in the rcu_state structure? | +| Why not just use the root rcu_node's gp_seq as the official record | +| and update it directly when starting a new grace period? | ++-----------------------------------------------------------------------+ +| **Answer**: | ++-----------------------------------------------------------------------+ +| On single-node RCU trees (where the root node is also a leaf), | +| updating the root node's gp_seq immediately would create unnecessary | +| lock contention. Here's why: | +| | +| If we did rcu_seq_start() directly on the root node's gp_seq: | +| | +| 1. All CPUs would immediately see their node's gp_seq from their rdp's| +| gp_seq, in rcu_pending(). They would all then invoke the RCU-core. | +| 2. Which calls note_gp_changes() and try to acquire the node lock. | +| 3. But rnp->qsmask isn't initialized yet (happens later in | +| rcu_gp_init()) | +| 4. So each CPU would acquire the lock, find it can't determine if it | +| needs to report quiescent state (no qsmask), update rdp->gp_seq, | +| and release the lock. | +| 5. Result: Lots of lock acquisitions with no grace period progress | +| | +| By having a separate rcu_state.gp_seq, we can increment the official | +| grace period counter without immediately affecting what CPUs see in | +| their nodes. The hierarchical propagation in rcu_gp_init() then | +| updates the root node's gp_seq and qsmask together under the same lock| +| acquisition, avoiding this useless contention. | ++-----------------------------------------------------------------------+ + Miscellaneous ''''''''''''' diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 384af6b5cf9c9..b206db119546e 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -1845,6 +1845,10 @@ static noinline_for_stack bool rcu_gp_init(void) * use-after-free errors. For a detailed explanation of this race, see * Documentation/RCU/Design/Requirements/Requirements.rst in the * "Hotplug CPU" section. + * + * Also note that the root rnp's gp_seq is kept separate from, and lags, + * the rcu_state's gp_seq, for a reason. See the Quick-Quiz on + * Single-node systems for more details (in Data-Structures.rst). */ rcu_seq_start(&rcu_state.gp_seq); /* Ensure that rcu_seq_done_exact() guardband doesn't give false positives. */