Even though all nodes may be shutting down there is still a very small
window for a race when multiple nodes are shut down. For simplicity,
assume 2 nodes. Assume the shutdowns of nodes are staggered, which is
usual because they're usually initiated by a loop (e.g. onnode -p all
ctdb shutdown). Although commands can continue in parallel, some
commands are started later than others.
Consider this sequence:
1. Node 0 reaches ctdb_shutdown_takeover() in
ctdb_shutdown_sequence() and a takeover run starts
2. Node 1 has not yet set its runlevel to SHUTDOWN in
ctdb_shutdown_sequence()
3. The leader node asks node 1 which IPs it can host
4. Node 1 replies "all of them"
5. Node 1 now sets its runlevel to SHUTDOWN in
ctdb_shutdown_sequence()
6. The leader node continues with the takeover run, first asking all
nodes to run "startipreallocate"
7. Node 0 runs "startipreallocate", so its NFS server starts grace
8. Node 1 does not run "startipreallocate" because it is not in
RUNNING runstate, so its NFS server does not start grace
9. The leader node continues with the takeover run, first asking all
nodes to run "releaseip" for IPs they can no longer hold
10. Node 0 releases all IPs, since it is SHUTDOWN runstate (so can't
host IPs)
11. As part of this, the NFS server on node 0 releases locks held
against IPs it is releasing
12. A client connected to node 1, where the NFS server is not in
grace, takes ("steals") one of those locks
This client is then permitted to reclaim the lock when nodes are
restarted.
BUG: https://bugzilla.samba.org/show_bug.cgi?id=15858
Signed-off-by: Martin Schwenke <mschwenke@ddn.com>
Reviewed-by: Amitay Isaacs <amitay@gmail.com>
struct start_ipreallocate_callback_state *state;
/* Nodes that are not RUNNING can not host IPs */
- if (ctdb->runstate != CTDB_RUNSTATE_RUNNING) {
- DBG_INFO("Skipping \"startipreallocate\" event, not RUNNING\n");
+ if (ctdb->runstate < CTDB_RUNSTATE_RUNNING) {
+ DBG_INFO("Skipping \"startipreallocate\" event, "
+ "not RUNNING/SHUTDOWN\n");
return 0;
}