From: Martin Schwenke Date: Mon, 10 Jan 2022 03:18:32 +0000 (+1100) Subject: ctdb-doc: Update documentation for leader and cluster lock X-Git-Tag: tdb-1.4.6~97 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=d752a92e1153fa355b0cbaa1f482fdc0d88e42f5;p=thirdparty%2Fsamba.git ctdb-doc: Update documentation for leader and cluster lock Signed-off-by: Martin Schwenke Reviewed-by: Amitay Isaacs --- diff --git a/ctdb/doc/ctdb.7.xml b/ctdb/doc/ctdb.7.xml index 274d12c7002..6b5391e9d44 100644 --- a/ctdb/doc/ctdb.7.xml +++ b/ctdb/doc/ctdb.7.xml @@ -82,10 +82,30 @@ - Recovery Lock + Cluster leader - CTDB uses a recovery lock to avoid a + CTDB uses a cluster leader and follower + model of cluster management. All nodes in a cluster elect one + node to be the leader. The leader node coordinates privileged + operations such as database recovery and IP address failover. + + + + CTDB previously referred to the leader as the recovery + master or recmaster. References + to these terms may still be found in documentation and code. + + + + + Cluster Lock + + + CTDB uses a cluster lock to assert its privileged role in the + cluster. This node takes the cluster lock when it becomes + leader and holds the lock until it is no longer leader. The + cluster lock helps CTDB to avoid a split brain, where a cluster becomes partitioned and each partition attempts to operate independently. Issues that can result from a split brain @@ -94,34 +114,50 @@ - CTDB uses a cluster leader and follower - model of cluster management. All nodes in a cluster elect one - node to be the leader. The leader node coordinates privileged - operations such as database recovery and IP address failover. - CTDB refers to the leader node as the recovery - master. This node takes and holds the recovery lock - to assert its privileged role in the cluster. + CTDB previously referred to the cluster lock as the + recovery lock. The abbreviation + reclock is still used - just "clock" would + be confusing. + + + + CTDB is unable configure a default cluster + lock, because this would depend on factors such as + cluster filesystem mountpoints. However, running CTDB + without a cluster lock is not recommended as there + will be no split brain protection. + + + + When a cluster lock is configured it is used as the election + mechanism. Nodes race to take the cluster lock and the winner + is the cluster leader. This avoids problems when a node wins an + election but is unable to take the lock - this can occur if a + cluster becomes partitioned (for example, due to a communication + failure) and a different leader is elected by the nodes in each + partition, or if the cluster filesystem has a high failover + latency. - By default, the recovery lock is implemented using a file - (specified by recovery lock in the + By default, the cluster lock is implemented using a file + (specified by cluster lock in the [cluster] section of ctdb.conf 5) residing in shared storage (usually) on a cluster filesystem. To support a - recovery lock the cluster filesystem must support lock + cluster lock the cluster filesystem must support lock coherence. See ping_pong 1 for more details. - The recovery lock can also be implemented using an arbitrary + The cluster lock can also be implemented using an arbitrary cluster mutex helper (or call-out). This is indicated by using an exclamation point ('!') as the first character of the - recovery lock parameter. For example, a - value of !/usr/local/bin/myhelper recovery + cluster lock parameter. For example, a + value of !/usr/local/bin/myhelper cluster would run the given helper with the specified arguments. The helper will continue to run as long as it holds its mutex. See ctdb/doc/cluster_mutex_helper.txt in the @@ -129,7 +165,7 @@ - When a file is specified for the recovery + When a file is specified for the cluster lock parameter (i.e. no leading '!') the file lock is implemented by a default helper (/usr/local/libexec/ctdb/ctdb_mutex_fcntl_helper). @@ -148,26 +184,9 @@ - If a cluster becomes partitioned (for example, due to a - communication failure) and a different recovery master is - elected by the nodes in each partition, then only one of these - recovery masters will be able to take the recovery lock. The - recovery master in the "losing" partition will not be able to - take the recovery lock and will be excluded from the cluster. - The nodes in the "losing" partition will elect each node in turn - as their recovery master so eventually all the nodes in that - partition will be excluded. - - - - CTDB does sanity checks to ensure that the recovery lock is held + CTDB does sanity checks to ensure that the cluster lock is held as expected. - - - CTDB can run without a recovery lock but this is not recommended - as there will be no protection from split brains. -