From: Martin Schwenke Date: Thu, 15 May 2025 04:01:16 +0000 (+1000) Subject: ctdb-daemon: Run "startipreallocate" event in SHUTDOWN runstate X-Git-Tag: samba-4.21.6~5 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=ffe9e620cc9cd9b8bb9fb790e4a1f578dd0d309d;p=thirdparty%2Fsamba.git ctdb-daemon: Run "startipreallocate" event in SHUTDOWN runstate Even though all nodes may be shutting down there is still a very small window for a race when multiple nodes are shut down. For simplicity, assume 2 nodes. Assume the shutdowns of nodes are staggered, which is usual because they're usually initiated by a loop (e.g. onnode -p all ctdb shutdown). Although commands can continue in parallel, some commands are started later than others. Consider this sequence: 1. Node 0 reaches ctdb_shutdown_takeover() in ctdb_shutdown_sequence() and a takeover run starts 2. Node 1 has not yet set its runlevel to SHUTDOWN in ctdb_shutdown_sequence() 3. The leader node asks node 1 which IPs it can host 4. Node 1 replies "all of them" 5. Node 1 now sets its runlevel to SHUTDOWN in ctdb_shutdown_sequence() 6. The leader node continues with the takeover run, first asking all nodes to run "startipreallocate" 7. Node 0 runs "startipreallocate", so its NFS server starts grace 8. Node 1 does not run "startipreallocate" because it is not in RUNNING runstate, so its NFS server does not start grace 9. The leader node continues with the takeover run, first asking all nodes to run "releaseip" for IPs they can no longer hold 10. Node 0 releases all IPs, since it is SHUTDOWN runstate (so can't host IPs) 11. As part of this, the NFS server on node 0 releases locks held against IPs it is releasing 12. A client connected to node 1, where the NFS server is not in grace, takes ("steals") one of those locks This client is then permitted to reclaim the lock when nodes are restarted. BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858 Signed-off-by: Martin Schwenke Reviewed-by: Amitay Isaacs (cherry picked from commit 4877541cfd8f782f516f6471edc52629720963fb) --- diff --git a/ctdb/server/ctdb_takeover.c b/ctdb/server/ctdb_takeover.c index ad543452e62..b9196e3ff63 100644 --- a/ctdb/server/ctdb_takeover.c +++ b/ctdb/server/ctdb_takeover.c @@ -2510,8 +2510,9 @@ int32_t ctdb_control_start_ipreallocate(struct ctdb_context *ctdb, struct start_ipreallocate_callback_state *state; /* Nodes that are not RUNNING can not host IPs */ - if (ctdb->runstate != CTDB_RUNSTATE_RUNNING) { - DBG_INFO("Skipping \"startipreallocate\" event, not RUNNING\n"); + if (ctdb->runstate < CTDB_RUNSTATE_RUNNING) { + DBG_INFO("Skipping \"startipreallocate\" event, " + "not RUNNING/SHUTDOWN\n"); return 0; }