Ronnie Sahlberg [Thu, 18 Oct 2007 22:58:30 +0000 (08:58 +1000)]
add a new transport method so that when a node is marked as dead, we
shut down and restart the transport
othervise, if we use the tcp transport the tcp connection might try to
retransmit the queued data during the time the node is unavailable.
this together with the exponential backoff for tcp means that the tcp
connection quickly reaches the maximum backoff rto which is often 60 or
120 seconds. this would mean that it could take up to 60/120 seconds
before the tcp layer detects that the connection is dead and it has to
be reestablished.
Ronnie Sahlberg [Tue, 16 Oct 2007 05:27:07 +0000 (15:27 +1000)]
add back the test inside the daemon that if someone asks us to drop
recovery mode back to NORMAL that we can not lock the reclock file
since at this stage it MUST be locked by the recovery daemon.
in order to avoid a non-blocking fnctl() lock from blocking and cause
"issues" we move the 'test that we can not lock reclock file' into a
child process.
Ronnie Sahlberg [Tue, 16 Oct 2007 02:15:02 +0000 (12:15 +1000)]
add a new tunable : DeterministicIPs that makes the allocation of
public addresses to nodes deterministic.
Activate it by adding CTDB_SET_DeterministicIPs=1 in /etc/sysconfig/ctdb
When this is set, the first entry in /etc/ctdb/public_addresses will
always be hosted by node 0, when that node is available, the second
entry by node1 and so on.
This tunable allows the allocation of addresses to become very
unbalanced and is only for debugging/testing use.
Beware, this feature requires that /etc/ctdb/public_addresses are
identical on all the nodes in the cluster.
Ronnie Sahlberg [Mon, 15 Oct 2007 23:50:31 +0000 (09:50 +1000)]
dont try to lock the file from inside the ctdb daemon.
eventhough we dont want a blocking lock it does appear that the fcntl()
call can block for a while if gpfs is in the process of rebuilding
itself after a node arriving/leaving the cluster
Ronnie Sahlberg [Wed, 10 Oct 2007 20:16:36 +0000 (06:16 +1000)]
simplify election handling
make sure we read and update the flags from all remote nodes before we
reach the first codepath that can call do_recovery()
since during do_recovery() we need to know what the flags are.
Ronnie Sahlberg [Tue, 9 Oct 2007 23:42:32 +0000 (09:42 +1000)]
add a --single-public-ip argument to ctdbd to specify the ip address
used in single public ip address mode.
when using this argument, --public-interface must also be used.
add a vnn structure to the ctdb context to describe the single public ip
address
update the killtcp control in the daemon that if a socketpair that is to
be killed does not match a normal public address it checks if the
destination address maches the single public ip address and if so uses
that vnn structure from the ctdb context
this allows killtcp to kill also connections to the single public ip
instead of only normal public addresses
Ronnie Sahlberg [Mon, 8 Oct 2007 04:05:22 +0000 (14:05 +1000)]
add an initial test version of an ip multiplex tool that allows us
to have one single public ip address for the entire cluster.
this ip address is attached to lo on all nodes but only the recmaster
will respond to arp requests for this address.
the recmaster then runs an ipmux process that will pass any incoming
packets to this ip address onto the other node sin the cluster based on
the ip address of the client host
to use this feature one must
1, have one fixed ip address in the customers network attached
permanently attached to an interface
2, set CTDB_PUBLI_INTERFACE=
to specify on which interface the clients attach to the node
3, CTDB_SINGLE_PUBLI_IP=ip-address
to specify which ipaddress should be the "single public ip address"
to test with only one single client, attach several ip addresses to
the client and ping the public address from the client with different -I
options. look in network trace to see to which node the packet is
passed onto.
Ronnie Sahlberg [Sun, 7 Oct 2007 23:47:20 +0000 (09:47 +1000)]
add a function in the ctdb tool to determine whether the local node is
the recmaster or not.
return 0 if the node is the recmaster and 1 (true) if it is not or if
we could not communicate with the ctdb daemon.
call it 'isnotrecmaster' to cope with that if the tool could not bind to
the socket to tyalk to the daemon, the tool will automatically return an
error and exit code 1
thus the tool will only return 0 if it could talk successfully to the
local daemon and if the local daemon confirms this node is the recmaster
Ronnie Sahlberg [Mon, 24 Sep 2007 00:52:26 +0000 (10:52 +1000)]
when we have a public ip address mismatch (i.e. we hold addresses we
shouldnt or we are not holding addresses wqe should)
we must first freeze the local node before we set the recovery mode
Andrew Tridgell [Mon, 24 Sep 2007 00:00:14 +0000 (10:00 +1000)]
no longer wait at startup for services to become available, instead
set the node initially unhealthy and let the status monitoring bring the node online.
This fixes a problem with winbindd, where it refused to start because secrets.tdb was not populated
but we could not populate ctdbd, because the net command would not run while ctdbd was still doing startup
and thus frozen
(This used to be ctdb commit 3a001b793dd76fb96addf1e2ccb74da326fbcfbc)
Ronnie Sahlberg [Fri, 21 Sep 2007 05:19:33 +0000 (15:19 +1000)]
in ctdb_control_persistent_store() we must talloc_steal() the pointer to
c to prevent it from being immediately freed (and our persistent store
state with it) if we need to wait asynchronously for other nodes before
we can reply back to the client
Ronnie Sahlberg [Fri, 14 Sep 2007 00:16:36 +0000 (10:16 +1000)]
let each node verify that they have a correct assignment of public ip
addresses (i.e. htey hold those they should hold and they dont hold
any of those they shouldnt hold)
if an inconsistency is found, mark the local node as recovery mode
active
and wait for the recovery master to trigger a full blown recovery
Ronnie Sahlberg [Thu, 13 Sep 2007 04:51:37 +0000 (14:51 +1000)]
when a ctdb_takeover_run has failed we must make sure that
need_takeover_run is set to true or else we might forget to rerun it
again during the next recovery
othervise, need_takeover_run is only set to true IFF the node flags for
a remote node and the local nodes differ.
It is possible that a takeover run fails and thus the reassignment of
ip addresses is incomplete but before we get back to the test in
monitor_cluster() that all the node flags of all nodes have converged
and they now match each others again. and thus causing
monitor_cluster() to fail to realize that a takeover run is needed.
Andrew Tridgell [Wed, 12 Sep 2007 03:23:36 +0000 (13:23 +1000)]
- set arp_ignore to prevent replying to arp requests for addresses on loopback
- put removed IPs on loopback with scope host
- check for nul strings in ethtool call
;
Andrew Tridgell [Wed, 12 Sep 2007 03:22:31 +0000 (13:22 +1000)]
- don't allow the registration of clients with IPs we don't hold
- change some debug levels to make tracking of IP release problems easier
(This used to be ctdb commit 5f9aed62adaf87750f953412c55b29c58e4bb6c0)