]> git.ipfire.org Git - thirdparty/samba.git/log
thirdparty/samba.git
18 years agowait for ctdbd to finish cleanup before considering "service ctdb stop" to be done
Andrew Tridgell [Thu, 13 Sep 2007 23:25:11 +0000 (09:25 +1000)] 
wait for ctdbd to finish cleanup before considering "service ctdb stop" to be done
(This used to be ctdb commit 216eb4be7ec481cfe9aaeeada257b77cb394d2e4)

18 years agonicer use of testparm
Andrew Tridgell [Thu, 13 Sep 2007 23:24:34 +0000 (09:24 +1000)] 
nicer use of testparm
(This used to be ctdb commit a611ea930fb9dae6e56f6a74b2bdc9e08066d4d1)

18 years agoensure smbd and winbindd do die in 50.samba
Andrew Tridgell [Thu, 13 Sep 2007 04:36:23 +0000 (14:36 +1000)] 
ensure smbd and winbindd do die in 50.samba
(This used to be ctdb commit 6f23affedb626fc7a5ca86c4763f3045a5586231)

18 years agoprevent recursion in the calling of ctdb_takeover_run
Andrew Tridgell [Thu, 13 Sep 2007 04:08:18 +0000 (14:08 +1000)] 
prevent recursion in the calling of ctdb_takeover_run
(This used to be ctdb commit 0fbdeb7c91b965d9bc5ecc7b24e31070378d8f1d)

18 years agomore shell scripting fixes in 10.interface
Andrew Tridgell [Thu, 13 Sep 2007 01:57:42 +0000 (11:57 +1000)] 
more shell scripting fixes in 10.interface
(This used to be ctdb commit 4ee2230b3f2ae7437a9d0cf973eb4645d276accd)

18 years agoforce recovery if unable to tell a node to release an IP
Andrew Tridgell [Thu, 13 Sep 2007 01:19:49 +0000 (11:19 +1000)] 
force recovery if unable to tell a node to release an IP
(This used to be ctdb commit 6895788d2499344a03357e5c1103cb8383e9eaf7)

18 years agofixed script errors in 10.interface
Andrew Tridgell [Thu, 13 Sep 2007 01:19:30 +0000 (11:19 +1000)] 
fixed script errors in 10.interface
(This used to be ctdb commit 0c759614d27758cef3eba5942b2cccad54193cbb)

18 years agowe don't need the is_loopback logic in ctdb any more
Andrew Tridgell [Thu, 13 Sep 2007 00:45:06 +0000 (10:45 +1000)] 
we don't need the is_loopback logic in ctdb any more
(This used to be ctdb commit 4ecf29ade0099c7180932288191de9840c8d90a9)

18 years agoremove more cruft from the logs
Andrew Tridgell [Thu, 13 Sep 2007 00:39:05 +0000 (10:39 +1000)] 
remove more cruft from the logs
(This used to be ctdb commit b67f35c483b6cbb5facaa6380c7794709f44213a)

18 years agonew approach for killing TCP connections on IP release
Andrew Tridgell [Thu, 13 Sep 2007 00:24:48 +0000 (10:24 +1000)] 
new approach for killing TCP connections on IP release
(This used to be ctdb commit c33a0db29b5604966f582b1f8c5fd66760c72197)

18 years agoremove clutter from ctdb log file
Andrew Tridgell [Thu, 13 Sep 2007 00:03:18 +0000 (10:03 +1000)] 
remove clutter from ctdb log file
(This used to be ctdb commit 54d5dcaaee0498f40bbee5059cc72d0ca75d33b7)

18 years agofixed return code
Andrew Tridgell [Thu, 13 Sep 2007 00:02:56 +0000 (10:02 +1000)] 
fixed return code
(This used to be ctdb commit 30165b5a19f9bd9d1f62c9c222df0711c1c6a927)

18 years agohandle hung or slow ctdb daemons on shutdown
Andrew Tridgell [Wed, 12 Sep 2007 03:26:24 +0000 (13:26 +1000)] 
handle hung or slow ctdb daemons on shutdown
(This used to be ctdb commit a3089211782ab12387c1b04efa28914c94d89b30)

18 years ago- set arp_ignore to prevent replying to arp requests for addresses on loopback
Andrew Tridgell [Wed, 12 Sep 2007 03:23:36 +0000 (13:23 +1000)] 
- set arp_ignore to prevent replying to arp requests for addresses on loopback
- put removed IPs on loopback with scope host
- check for nul strings in ethtool call
;

(This used to be ctdb commit e2df1d6d08e67a36ff05a590a34c56e900741287)

18 years ago- don't allow the registration of clients with IPs we don't hold
Andrew Tridgell [Wed, 12 Sep 2007 03:22:31 +0000 (13:22 +1000)] 
- don't allow the registration of clients with IPs we don't hold
- change some debug levels to make tracking of IP release problems easier
(This used to be ctdb commit 5f9aed62adaf87750f953412c55b29c58e4bb6c0)

18 years agochanged some debug levels
Andrew Tridgell [Wed, 12 Sep 2007 03:21:19 +0000 (13:21 +1000)] 
changed some debug levels
(This used to be ctdb commit ed764533e1c2f8982e1577ca5e7f5f4482a15345)

18 years agofixed location of arp_filter
Andrew Tridgell [Tue, 11 Sep 2007 06:38:32 +0000 (16:38 +1000)] 
fixed location of arp_filter
(This used to be ctdb commit ea239c82fca2b9a648d21e5c603e632011958452)

18 years agoget interface right
Andrew Tridgell [Mon, 10 Sep 2007 10:45:27 +0000 (20:45 +1000)] 
get interface right
(This used to be ctdb commit e0edc38d7e897f7de2850eb2cfd17fea75c16fcc)

18 years agofixed a pointer cast warning
Andrew Tridgell [Mon, 10 Sep 2007 05:16:17 +0000 (15:16 +1000)] 
fixed a pointer cast warning
(This used to be ctdb commit df0e7a4aa13112d613702d8ea0fb0e18510d293c)

18 years agoadded back --public-interface to startup script
Andrew Tridgell [Mon, 10 Sep 2007 05:09:28 +0000 (15:09 +1000)] 
added back --public-interface to startup script
(This used to be ctdb commit 9e9cb3c0da7251f522c655366ef0868037577a9c)

18 years ago- use struct sockaddr_in more consistently instead of string addresses
Andrew Tridgell [Mon, 10 Sep 2007 04:27:29 +0000 (14:27 +1000)] 
- use struct sockaddr_in more consistently instead of string addresses
- allow for public_address lines with a defaulting interface

(This used to be ctdb commit 29cb760f76e639a0f2ce1d553645a9dc26ee09e5)

18 years agoadd back in --public-interface as a default
Andrew Tridgell [Mon, 10 Sep 2007 04:26:35 +0000 (14:26 +1000)] 
add back in --public-interface as a default
(This used to be ctdb commit cdf56daf69b2c8381ee673943e982ad20f19affd)

18 years agomerge from ronnie
Andrew Tridgell [Mon, 10 Sep 2007 03:21:11 +0000 (13:21 +1000)] 
merge from ronnie
(This used to be ctdb commit 1f21d4d563232926c35d03c4d69eb69190823dc6)

18 years agoadd crontab and sysctl output
Andrew Tridgell [Mon, 10 Sep 2007 01:27:07 +0000 (11:27 +1000)] 
add crontab and sysctl output
(This used to be ctdb commit b1b59f3294ee7a5ed6d685f373bf19d3152170fa)

18 years agoupdate a comment
Ronnie Sahlberg [Sun, 9 Sep 2007 21:45:57 +0000 (07:45 +1000)] 
update a comment

(This used to be ctdb commit e7d3ef4443686529299e8f293398cc0522235627)

18 years agochange the signature to ctdb_sys_have_ip() to also return:
Ronnie Sahlberg [Sun, 9 Sep 2007 21:20:44 +0000 (07:20 +1000)] 
change the signature to ctdb_sys_have_ip() to also return:
 a bool that specifies whether the ip was held by a loopback adaptor or
not
 the name of the interface where the ip was held

when we release an ip address from an interface, move the ip address
over to the loopback interface

when we release an ip address  after we have move it onto loopback,
use 60.nfs to kill off the server side (the local part) of the tcp
connection   so that the tcp connections dont survive a
failover/failback

61.nfstickle,   since we kill hte tcp connections when we release an ip
address   we no longer need to restart the nfs service in 61.nfstickle

update ctdb_takeover to use the new signature for ctdb_sys_have_ip

when we add a tcp connection to kill in ctdb_killtcp_add_connection()
check if either the srouce or destination address match a known public
address

(This used to be ctdb commit f9fd2a4719c50f6b8e01d0a1b3a74b76b52ecaf3)

18 years agoset /proc/sys/net/ipv4/conf/all/arp_filter to 1 by default when
Ronnie Sahlberg [Fri, 7 Sep 2007 22:09:02 +0000 (08:09 +1000)] 
set /proc/sys/net/ipv4/conf/all/arp_filter to 1 by default when
10.interfaces startsup

this setting makes the system only respond to APR requests from the NIC
where the ip address is tied to and adds to the
"principle of least surprise" when using multihoming servers

(This used to be ctdb commit 39ddf347dc45f599964a4c17e67e71faed00e544)

18 years agoctdb ip must loop over all connected nodes to pull hte public ip list
Ronnie Sahlberg [Fri, 7 Sep 2007 06:45:19 +0000 (16:45 +1000)] 
ctdb ip    must loop over all connected nodes to pull hte public ip list
and merge into a big list   since with the deassociation between a node
and a public ipaddress    the /etc/ctdb/public_addresses files can
differ between nodes and no node know about all public addresses that a
cluster can use

(This used to be ctdb commit e208294fed183977cacc44b2cd1195c11d967c18)

18 years agoremove the ctdb publicip command
Ronnie Sahlberg [Fri, 7 Sep 2007 05:39:26 +0000 (15:39 +1000)] 
remove the ctdb publicip  command
this command no longer makes sense when there is no on-to-one mapping
between a node and its default public ip

(This used to be ctdb commit 91280db7f6dd3d659edd86fae21ba347d6f9da9e)

18 years agoupdate web nfs with the new NFS_HOSTNAME variable we need to be able to
Ronnie Sahlberg [Fri, 7 Sep 2007 02:20:48 +0000 (12:20 +1000)] 
update web nfs with the new NFS_HOSTNAME variable we need to be able to
stat notify using the correct hostname

(This used to be ctdb commit 1498e33e48a4654e02b74a00ef7473fed3225d69)

18 years agoadd a short delay after stopping nfslock to make it less likely that
Ronnie Sahlberg [Fri, 7 Sep 2007 02:14:53 +0000 (12:14 +1000)] 
add a short delay after stopping nfslock   to make it less likely that
"weird" things happen

(This used to be ctdb commit 4934c083cbcc19714094e08a0b7da1fb6fdc8a5a)

18 years agomerge from tridge
Ronnie Sahlberg [Thu, 6 Sep 2007 23:21:40 +0000 (09:21 +1000)] 
merge from tridge

(This used to be ctdb commit 58c918b1bfe09c31049769dee266129cbad4cb20)

18 years ago60.nfs:
Ronnie Sahlberg [Thu, 6 Sep 2007 22:52:56 +0000 (08:52 +1000)] 
60.nfs:
we must always restart the lockmanager when the cluster has been
reconfigured and ip addresses has changed. This is to make sure we get a
clusterwide grace period for nfs locking.
if we dont do this and only restart locking on the nodes that were
direclty affected, a different client can take out a conflicting lock
from a different node before affected clients has had a chance to
reclaim all the locks lost during reconfigure.
grace period on rhel5 kernel has bene increased to 90 seconds!

statd-callout:
we must restart lockmanager to ensure a clusterwide grace period for
nfs. this makes locking "more correct" for nfs clients and prevents
other clients/nodes from taking out a conflicting lock while a different
client/node tries to reclaim lost locks.
This makes it "almost consistent" for NFS clients   but there is still
the possibility that a cifs client can take out a conflicting lock
before an nfs client has had a chance to reclaim an existing lock.
This can not be solved with anything less than making the kernel nfs
lock manager "samba aware" and making samba aware of the internal state
of the kernel lock manager so that they can cooperate.

we can not just stop/start the lockmanager back to back in rhel5 since
if they are stopped/started too close to eachother then when the new
lockmanager upon starting up sends out statd notifications two things
can happen:
1, new lockmanager sends out notification BEFORE it has registered with
portmapper leading to
  lockmanager starts
  lockmanager sends notification to the client
  client tries to recover the lock and tries to portmap the lockmanager
  port on the server.
  server is not (yet) registered with portmapper and server responds
  "no such program" to hte clients request to discover where lockmanager
   is.
  client then just completely gives up reclaiming the lock and doesnt
  even reattempt the portmapper call after some timeout.
  ==> lock reclaim failed.
2, if they are started back to back, and a client tries to reclaim the
   lock  the lockmanager sometimes sends two responses back to back
   to the client.   one with status NLM_GRANTED (==you got the lock
reclaimed) and one with status NLM_DENIED (==you could not get the lock
reclaimed)
   This confuses the client and leads to the server thinking that the
client does have the lock   and the client thinking it has not got the
lock    and orphaned locks result.

We also send out additional notification messages of different formats
to allow more legacy clients to interoperate with locking.

(This used to be ctdb commit 13208c1aab2942e28dff87e38e6794bf0c026033)

18 years agowe dont need the rpc.statd on shared directory neither do we need
Ronnie Sahlberg [Thu, 6 Sep 2007 01:32:18 +0000 (11:32 +1000)] 
we dont need the rpc.statd on shared directory   neither do we need
PUBLIC_IP anymore

(This used to be ctdb commit fd571ac87f65928e92dde6977745083bf381df1a)

18 years agoimprove the handling of hosts to notify with statd
Ronnie Sahlberg [Thu, 6 Sep 2007 01:30:49 +0000 (11:30 +1000)] 
improve the handling of hosts to notify with statd

(This used to be ctdb commit cc87bda7e344bc777b9620a6211e62de4dce4e3b)

18 years agospecify the additional ports for nfs
Ronnie Sahlberg [Thu, 6 Sep 2007 00:26:44 +0000 (10:26 +1000)] 
specify the additional ports for nfs

(This used to be ctdb commit 1934163f0b393738615a05854082a7d488003e1c)

18 years agothe event scripts for nfs are called 60.nfs and 61.nfstickle
Ronnie Sahlberg [Thu, 6 Sep 2007 00:18:13 +0000 (10:18 +1000)] 
the event scripts for nfs are called 60.nfs and 61.nfstickle

(This used to be ctdb commit b15f1c25560320993b93aa3d943985dab4e47947)

18 years agodocument NFS_TICKLE_SHARED_DIRECTORY on our web page
Ronnie Sahlberg [Wed, 5 Sep 2007 22:21:11 +0000 (08:21 +1000)] 
document NFS_TICKLE_SHARED_DIRECTORY on our web page

(This used to be ctdb commit 40ec29f602897e9b01a6747806f502ab38423d54)

18 years agowe dont use 'sendip' any more so dont check for it and exit from the
Ronnie Sahlberg [Wed, 5 Sep 2007 05:39:51 +0000 (15:39 +1000)] 
we dont use 'sendip' any more   so dont check for it and exit from the
61.nfstickles script if it is missing from the host

(This used to be ctdb commit 8eac441e24f4ef33b55f9eaa4856b5c1e1c15213)

18 years agowe should always get data back from getnodemap
Ronnie Sahlberg [Wed, 5 Sep 2007 04:59:29 +0000 (14:59 +1000)] 
we should always get data back from getnodemap

(This used to be ctdb commit ff999a4b56f714c58c81baa454a2d39d04944136)

18 years agodont dereference vnn before we have assigned it a pointer value
Ronnie Sahlberg [Wed, 5 Sep 2007 04:29:44 +0000 (14:29 +1000)] 
dont dereference vnn before we have assigned it a pointer value

(This used to be ctdb commit 2a8fc69aea8527b22a3fe57427677e4caff57338)

18 years agoadded a diagnostics tool for ctdb
Andrew Tridgell [Wed, 5 Sep 2007 04:20:34 +0000 (14:20 +1000)] 
added a diagnostics tool for ctdb
(This used to be ctdb commit 032a2238caf688656b00e06bf363182368e037e1)

18 years agoallow different nodes in the cluster to use different public_addresses
Ronnie Sahlberg [Tue, 4 Sep 2007 13:15:23 +0000 (23:15 +1000)] 
allow different nodes in the cluster to use different public_addresses
files
so that we can partition the cluster into different subsets of nodes
which each serve a different subset of the public addresses

(This used to be ctdb commit 889e0fe69e4c88c6166282b12843b8d9727552d6)

18 years agoget rid of the ctdb_vnn_list structure and just use a single list of
Ronnie Sahlberg [Tue, 4 Sep 2007 08:20:29 +0000 (18:20 +1000)] 
get rid of the ctdb_vnn_list structure and just use a single list of
ctdb_vnn

(This used to be ctdb commit 7b9fd06321af17043136b1420b57284450ae7ba5)

18 years agowe cant have takeover_ctx hanging off ctdb since it is freed/recreated
Ronnie Sahlberg [Tue, 4 Sep 2007 04:36:52 +0000 (14:36 +1000)] 
we cant have takeover_ctx hanging off ctdb  since it is freed/recreated
everytime we release an ip.
this context is used to hold all resources needed when sending out
gratious arps and tcp tickles during ip takeover.

we hang it off the vnn structure that manages that particular ip address
instead   so that we can have multiple ones going in parallell

this bug (or the same bug in different shape) has probably been in ctdb
for very very long   but is likely to be hard to trigger

(This used to be ctdb commit c58db1cadaba253b2659573673b28c235ef7db76)

18 years agofix typo in debug output
Ronnie Sahlberg [Tue, 4 Sep 2007 04:21:35 +0000 (14:21 +1000)] 
fix typo in debug output

(This used to be ctdb commit 011a777c6e538ca79f104c7884a4f0e222997382)

18 years agodont just always return 0 from the killtcp control.
Ronnie Sahlberg [Tue, 4 Sep 2007 04:19:18 +0000 (14:19 +1000)] 
dont just always return 0 from the killtcp control.
return 0 or -1 so that the ctdb tool knows whether the control succeeded
or not

(This used to be ctdb commit cace8b40090be5529ec6b463d3839d0e22f4039d)

18 years agochange vnn to pnn in the traverse structure
Ronnie Sahlberg [Tue, 4 Sep 2007 00:49:21 +0000 (10:49 +1000)] 
change vnn to pnn in the traverse structure

(This used to be ctdb commit d56ae0963b420edea6a2d5eeb408a9811af3f3f6)

18 years agochange debug output from vnn to pnn
Ronnie Sahlberg [Tue, 4 Sep 2007 00:47:02 +0000 (10:47 +1000)] 
change debug output from vnn to pnn

(This used to be ctdb commit 93a7cf759ae3f9af6671b9f8589e1399a669b46f)

18 years agochange debug output from vnn to pnn
Ronnie Sahlberg [Tue, 4 Sep 2007 00:45:41 +0000 (10:45 +1000)] 
change debug output from vnn to pnn

change ctdb_daemon_send_message to take pnn as parameter isntead of vnn

(This used to be ctdb commit e352a2bbf9bb9a0b2c4f8329e8a529cf02414097)

18 years agochange ctdb_send_message to take pnn as parameter instead of vnn
Ronnie Sahlberg [Tue, 4 Sep 2007 00:42:20 +0000 (10:42 +1000)] 
change ctdb_send_message  to take pnn as parameter instead of vnn

(This used to be ctdb commit 93dd4fba2e0fa6a011d15406652836785a974880)

18 years agochange ctdb_ctrl_getvnn to ctdb_ctrl_getpnn
Ronnie Sahlberg [Tue, 4 Sep 2007 00:38:48 +0000 (10:38 +1000)] 
change ctdb_ctrl_getvnn to ctdb_ctrl_getpnn

(This used to be ctdb commit ef47cc4cd416065c69382e4d9e76c30a0a34e42f)

18 years agochange ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn
Ronnie Sahlberg [Tue, 4 Sep 2007 00:33:10 +0000 (10:33 +1000)] 
change ctdb_node_flags_change.vnn to ctdb_node_flags_changed.pnn

change ctdb_ban_info.vnn to ctdb_ban_info.pnn

(This used to be ctdb commit fcedd40e0493948829e1c921d4fe30e9196e398a)

18 years agochange server_id.vnn to server_id.pnn
Ronnie Sahlberg [Tue, 4 Sep 2007 00:21:51 +0000 (10:21 +1000)] 
change server_id.vnn to server_id.pnn

(This used to be ctdb commit 26f2ee2b754a9271454412f05111a19b3013c6eb)

18 years agochange ctdb_get_vnn to ctdb_get_pnn
Ronnie Sahlberg [Tue, 4 Sep 2007 00:18:44 +0000 (10:18 +1000)] 
change ctdb_get_vnn to ctdb_get_pnn

(This used to be ctdb commit 1e19930198c2bcc7ccb755e0ee51555fb823029a)

18 years agochange vnn to pnn in the ctdb tool
Ronnie Sahlberg [Tue, 4 Sep 2007 00:14:41 +0000 (10:14 +1000)] 
change vnn to pnn in the ctdb tool

(This used to be ctdb commit 822556a4d4ba23459be3a25cbd3f48d1f64ba95f)

18 years agochange ctdb_validate_vnn to ctdb_validate_pnn
Ronnie Sahlberg [Tue, 4 Sep 2007 00:09:58 +0000 (10:09 +1000)] 
change ctdb_validate_vnn to ctdb_validate_pnn

(This used to be ctdb commit a4a1f41b69475b9dc16d8fd7f8965c32e96c32f0)

18 years agochange ctdb->vnn to ctdb->pnn
Ronnie Sahlberg [Tue, 4 Sep 2007 00:06:36 +0000 (10:06 +1000)] 
change ctdb->vnn to ctdb->pnn

(This used to be ctdb commit 8c776e5707e503ec6586aae39ac6b3ea5a2fd2bc)

18 years agochange how we do public addresses and takeover so that we can have
Ronnie Sahlberg [Mon, 3 Sep 2007 23:50:07 +0000 (09:50 +1000)] 
change how we do public addresses and takeover so that we can have
multiple public addresses spread across multiple interfaces on each
node.

this is a massive patch since we have previously made the assumtion that
we only have one public address per node.

get rid of the public_interface argument.  the public addresses file
now explicitely lists which interface the address belongs to

(This used to be ctdb commit 462ebbc791e906a6b874c862defea43235597ca8)

18 years agomerge from tridge
Ronnie Sahlberg [Sun, 2 Sep 2007 23:29:30 +0000 (09:29 +1000)] 
merge from tridge

(This used to be ctdb commit 5e2a9333363d76378d27f93231f217999a0c30e5)

18 years agoup the release number
Andrew Tridgell [Thu, 30 Aug 2007 07:51:05 +0000 (17:51 +1000)] 
up the release number
(This used to be ctdb commit 71a6213c92a12bf794c17c30ae4987149b68fe1b)

18 years agomerge from ronnie
Andrew Tridgell [Thu, 30 Aug 2007 07:16:23 +0000 (17:16 +1000)] 
merge from ronnie
(This used to be ctdb commit e8138d9375fc34ae0cb31cc0e6ca042baf83eff8)

18 years agowhen we start 60.nfs we must make sure that the shared storage
Ronnie Sahlberg [Thu, 30 Aug 2007 05:27:45 +0000 (15:27 +1000)] 
when we start 60.nfs   we must make sure that the shared storage
nfs-state directory actually exists (by creating it)
or else the lock manager will not start

(This used to be ctdb commit f2d15d04df842538c8d8331796a3c6fbe23463f2)

18 years agomerge from ronnie
Andrew Tridgell [Mon, 27 Aug 2007 08:04:53 +0000 (18:04 +1000)] 
merge from ronnie
(This used to be ctdb commit ab11fd70cf4d2165a5b55930cbad6fddf5397f54)

18 years agomerge from tridge
Ronnie Sahlberg [Mon, 27 Aug 2007 08:04:17 +0000 (18:04 +1000)] 
merge from tridge

(This used to be ctdb commit 7cb17a0752c683f9b244e6f61fa45a770593c68d)

18 years agoadd an extra debug statement when we send a SIGTERM to a process
Ronnie Sahlberg [Mon, 27 Aug 2007 07:33:46 +0000 (17:33 +1000)] 
add an extra debug statement when we send a SIGTERM to a process

(This used to be ctdb commit a9c1be9cf9efdc69bfc95657b70e9f8b8230cda8)

18 years agomake the ctdb shutdown command use the async _send() function to send
Ronnie Sahlberg [Mon, 27 Aug 2007 05:03:52 +0000 (15:03 +1000)] 
make the ctdb shutdown command use the async _send() function to send
the shutdown command
and return success to the caller if the _send() was successful

(This used to be ctdb commit 6bacaf8c7a96044708a6eda10cc8576adb7f5f79)

18 years agofixed segv when no public interface is set
Andrew Tridgell [Mon, 27 Aug 2007 01:49:42 +0000 (11:49 +1000)] 
fixed segv when no public interface is set
(This used to be ctdb commit 55b415f87bd3cba13c73ccd2fe661720754a6af7)

18 years agoadd async versions of the freeze node control and freeze all nodes in
Ronnie Sahlberg [Mon, 27 Aug 2007 00:31:22 +0000 (10:31 +1000)] 
add async versions of the freeze node control and freeze all nodes in
parallell

(This used to be ctdb commit f34e89f54d9f4380e76eb1b5b2385a4d8500b505)

18 years agochange the monitoring of recmode in the recovery daemon to use a fully
Ronnie Sahlberg [Sun, 26 Aug 2007 23:40:10 +0000 (09:40 +1000)] 
change the monitoring of recmode in the recovery daemon to use a fully
async eventdriven api for controls

(This used to be ctdb commit 8d0e43428c507967a0d96e6a4c5c821ac269c546)

18 years agoadd a control to pull the server id list off a node
Ronnie Sahlberg [Sun, 26 Aug 2007 00:57:02 +0000 (10:57 +1000)] 
add a control to pull the server id list off a node

(This used to be ctdb commit 38aa759aa88a042c31b401551f6a713fb7bbe84e)

18 years agoadd an initial implementation of a service_id structure and three
Ronnie Sahlberg [Fri, 24 Aug 2007 05:53:41 +0000 (15:53 +1000)] 
add an initial implementation of a service_id structure and three
controls to  register/unregister/check a server id.

a server id consists of TYPE:VNN:ID    where type is specific to the
application.  VNN is the node where the serverid was registered and ID
might be a node unique identifier such as a pid or similar.

Clients can register a server id for themself at the local ctdb daemon.
When a client dissappears   or when the domain socket connection for the
client drops  then any and all server ids registered across that domain
socket will also be automatically removed from the store.

clients can register as many server_ids as they want at the same time
but each TYPE:VNN:ID must be globally unique.

Clients have the option of explicitely unregister a server id by using
the UNREGISTER control.

Registration and unregistration can only be done by clients to the local
daemon. clients can not register their server id to a remote node.

clients can check if a server id does exist on any ctdb node in the
network by using the check control

(This used to be ctdb commit d44798feec26147c5cc05922cb2186f0ef0307be)

18 years agocleanup invoke_control_callback. we dont need to pass some of these
Ronnie Sahlberg [Fri, 24 Aug 2007 00:54:34 +0000 (10:54 +1000)] 
cleanup invoke_control_callback.   we dont need to pass some of these
parameters to _recv() since they are already set

(This used to be ctdb commit 2034dbebb26d7a2d51241943f6ccbe15bb6a5169)

18 years agochange the api for managing callbacks to controls so that isntead of
Ronnie Sahlberg [Fri, 24 Aug 2007 00:42:06 +0000 (10:42 +1000)] 
change the api for managing callbacks to controls   so that isntead of
passing it as a parameter we set the callback function explicitely from
the caller if the ..._send() function returned a valid state pointer.

(This used to be ctdb commit aa939570662786455f63299b62c99882cff29d42)

18 years agocomment why we do a talloc_steal
Ronnie Sahlberg [Thu, 23 Aug 2007 23:34:04 +0000 (09:34 +1000)] 
comment why we do a talloc_steal

(This used to be ctdb commit aba7972728307e0ae52ccf8c0dd5808110fb92d7)

18 years agoget rid of the explicit global timeout used in the previous example and
Ronnie Sahlberg [Thu, 23 Aug 2007 09:38:54 +0000 (19:38 +1000)] 
get rid of the explicit global timeout used in the previous example and
try this time by relying on the timeouts for the individual controls

(This used to be ctdb commit 448a0eb4fd896dc545aa0b4bb2ba4628491578be)

18 years agotry out a slightly different api for controls where you provide a
Ronnie Sahlberg [Thu, 23 Aug 2007 09:27:09 +0000 (19:27 +1000)] 
try out a slightly different api for controls where you provide a
callback function which is called upon completion (or timeout) of the
control.

modify scanning of recmaster in the monitoring_cluster code to try the
api out

(This used to be ctdb commit c37843f1d97b169afec910e7ddb4e5ac12c3015c)

18 years agobreak checking that the recoverymode on all nodes are ok out into its
Ronnie Sahlberg [Thu, 23 Aug 2007 03:48:39 +0000 (13:48 +1000)] 
break checking that the recoverymode on all nodes are ok  out into its
own function

(This used to be ctdb commit 813cf9a252af96da24122b80f24aabeed2911939)

18 years agohang the ctdb_req_control structure off the ctdb_client_control_state
Ronnie Sahlberg [Thu, 23 Aug 2007 03:00:10 +0000 (13:00 +1000)] 
hang the ctdb_req_control structure off the ctdb_client_control_state
struct  so that if we timeout a control we can print debug info such as
what opcode failed and to which node

we dont need the *status parameter to ctdb_client_control_state

create async versions of the getrecmaster control

pass a memory context to getrecmaster

(This used to be ctdb commit 558b680c82f830fba82c283c78c2de8a0b150b75)

18 years agoin ctdb_call_recv() we must check that state is non-NULL since
Ronnie Sahlberg [Thu, 23 Aug 2007 01:58:09 +0000 (11:58 +1000)] 
in ctdb_call_recv() we must check that state is non-NULL since
ctdb_call() may pass a null pointer to _recv() and this would cause a
segfault.
fortunately there appears there are no critical users for this codepath
right now so the risk was more theoretical IF clients start using this
call it coult segfault.

change ctdb_control() to become fully async so we later can make
recovery daemon do the expensive controls to nodes in parallell instead
of in sequence

(This used to be ctdb commit 379789cda6ef049f389f10136aaa1b37a4d063a9)

18 years agocreate an enum to describe the state of a control in flight instead of
Ronnie Sahlberg [Wed, 22 Aug 2007 23:53:10 +0000 (09:53 +1000)] 
create an enum to describe the state of a control in flight  instead of
using the enum that is for calls

(This used to be ctdb commit f9cf7076151af983a1c4ea56fbeb6d94ea508a34)

18 years agomerge from tridge
Ronnie Sahlberg [Wed, 22 Aug 2007 09:28:03 +0000 (19:28 +1000)] 
merge from tridge

(This used to be ctdb commit 3e17a62e7d9f2867d6f697d5dc5cdddf9fdc3497)

18 years agomerge from ronnie
Andrew Tridgell [Wed, 22 Aug 2007 07:31:29 +0000 (17:31 +1000)] 
merge from ronnie
(This used to be ctdb commit e0f1c1acb1188500674626d631e1a1b8726e72ad)

18 years agomerge from volker
Andrew Tridgell [Wed, 22 Aug 2007 07:18:55 +0000 (17:18 +1000)] 
merge from volker
(This used to be ctdb commit a5587b3c065f7115ad5e55429c2c9d9923d3b4dc)

18 years agomerge from volker
Andrew Tridgell [Wed, 22 Aug 2007 07:16:01 +0000 (17:16 +1000)] 
merge from volker
(This used to be ctdb commit 7007e4f2292aa96287b899d6b9e82c7b597ef58f)

18 years agowhen we receive a packet from the network, check explicitely that the
Ronnie Sahlberg [Wed, 22 Aug 2007 02:53:24 +0000 (12:53 +1000)] 
when we receive a packet from the network, check explicitely that the
node is not banned it the call is for a database record. i.e a REQ/REPLY
CALL/DMASTER

if we get such a call while banned, ignore the packet and write an entry
in the logfile

(This used to be ctdb commit 79eb0863609fbb12e28ebf734101b1d3f359b330)

18 years agocreate a define to represent the 'invalid' generation id we used in two
Ronnie Sahlberg [Wed, 22 Aug 2007 02:38:31 +0000 (12:38 +1000)] 
create a define to represent the 'invalid' generation id we used in two
places.

create a new helper function to generate new generation id values that
know about the invalid id and avoids generating it.

update the ctdb status tool to know about the invalid generation id and
print the string INVALID instead

(This used to be ctdb commit 4fbcd189543cb8a92227fdcd3d158472e558ccda)

18 years agoif the node is inactive i.e. banned or disconnected then that node is
Ronnie Sahlberg [Wed, 22 Aug 2007 01:34:48 +0000 (11:34 +1000)] 
if the node is inactive  i.e. banned or disconnected  then that node is
not participating in the cluster

if a client tries to attach to a database while the node is inactive,
return an error back to the client and fail the attach

(This used to be ctdb commit b26949f3c8e54f3bc60da04d7b4ac69f301068fc)

18 years agowhen a node becomes banned its databases are no longer part of ctdb
Ronnie Sahlberg [Wed, 22 Aug 2007 00:38:35 +0000 (10:38 +1000)] 
when a node becomes banned    its databases are no longer part of ctdb
and it should thus no longer serve any database access calls until it
has been reintroduced into the cluster.

when becoming banned,   reset the local generation id to 1   to prevent
any further database access calls from other nodes from being processed.

(This used to be ctdb commit b531021db43ebaa5f5d0ace28c59913d359bd8a8)

18 years agoif lockwait takes an excessive time to complete. log the time it took to
Ronnie Sahlberg [Tue, 21 Aug 2007 23:46:48 +0000 (09:46 +1000)] 
if lockwait takes an excessive time to complete. log the time it took to
complete and also the name of the database

(This used to be ctdb commit 221ef0348fd8113a017d229d8c2c7aa5c4dfb5c2)

18 years agochange the structure used for node flag change messages so that we can
Ronnie Sahlberg [Tue, 21 Aug 2007 07:25:15 +0000 (17:25 +1000)] 
change the structure used for node flag change messages so that we can
see both the old flags as well as the new flags (so we can tell which
flags changed)

send the CTDB_SRVID_RECONFIGURE messages to connected nodes only, not to
every node, connected or not, in the cluster.

in the handler inside the recovery daemon which is invoked for node flag
change messages, only do a takeover_run() and redistribute the ip addresses IF it was the
disabled or the unhealthy flags that changed. Also send out the cluster
reconfigured message in this case.
If any of the other flags changed we dont need to do the takeover_run(0
here since that will be done during recovery.

(This used to be ctdb commit 5549b2058e2c148a8ca9d419123acf3247bb8829)

18 years agowhen we shutdown the service due to receiving a 'ctdb shutdown' command
Ronnie Sahlberg [Mon, 20 Aug 2007 23:46:27 +0000 (09:46 +1000)] 
when we shutdown the service due to receiving a 'ctdb shutdown' command
from the administrator, log this as 'Received SHUTDOWN command. Stopping
CTDB daemon.'   so that the administrator will know when looking at the
log 'why' the ctdb service was terminated.

Previously the only thing logged was 'shutting down' which is not
detailed enough.

(This used to be ctdb commit 5b818c1b72b6594a8d6e45e1865026e3ce33ae63)

18 years agoadd an atexit() that will print "CTDB daemon shutting down" in the log
Ronnie Sahlberg [Mon, 20 Aug 2007 23:43:53 +0000 (09:43 +1000)] 
add an atexit() that will print "CTDB daemon shutting down" in the log
when the main daemon exits

(This used to be ctdb commit f7422397be2e319bfbee5bf0670583c353eda86d)

18 years agosetup the logfile much earlier in the startup procedure for ctdbd
Ronnie Sahlberg [Mon, 20 Aug 2007 23:33:03 +0000 (09:33 +1000)] 
setup the logfile much earlier in the startup procedure for ctdbd

change initial errors that cause ctdb to fail to start from printf to
DEBUG(0

add a DEBUG(0 to log that the ctdb service is starting

(This used to be ctdb commit 680b4fbb283dd68567a62a83345f11a6cc1dd0e5)

18 years agomake sure that the event script is executable and just ignore it
Ronnie Sahlberg [Mon, 20 Aug 2007 23:22:14 +0000 (09:22 +1000)] 
make sure that the event script is executable and just ignore it
othervise

(This used to be ctdb commit 65eb7845c70489d654acaaf99cd2c8eac7df11dc)

18 years agodont pollute the log with 'Registered PID XXX for client YYY' at log
Ronnie Sahlberg [Mon, 20 Aug 2007 22:42:42 +0000 (08:42 +1000)] 
dont pollute the log with 'Registered PID XXX for client YYY' at log
level 0.

change the log level to 3 for this information message

(This used to be ctdb commit f28d713d9cacd2312932b51175aa8402c96ef76b)

18 years agoif a public address has already been taken over by a node, then let that
Ronnie Sahlberg [Mon, 20 Aug 2007 04:16:58 +0000 (14:16 +1000)] 
if a public address has already been taken over by a node, then let that
public address remain at that node until either the node becomes
unhealthy or the original/primary node for that address becomes healthy
again.

Othervise what will happen is
1, if we ban a node,   the banning code immediately does a
takeover_run() and reassigns the public address to a different node in
the cluster.
2, a few seconds later (at most) the recovery daemon will detect that
the number of nodes has shrunk and will initiate a recovery.
During the recovery  the public address would again be assigned to a
node, this time a different node.

(This used to be ctdb commit 30a6b7a648e22873d8ce6289a3d6dc42c4b9e3b3)

18 years agomerge from tridge
Ronnie Sahlberg [Mon, 20 Aug 2007 03:29:27 +0000 (13:29 +1000)] 
merge from tridge

(This used to be ctdb commit 42f38e787eaa3d8534ce24fca4f29d9ff5bdb9e6)

18 years agoremoved redundent debug message
Andrew Tridgell [Mon, 20 Aug 2007 01:13:38 +0000 (11:13 +1000)] 
removed redundent debug message
(This used to be ctdb commit 9ee742b7cc43be7da6b568308912a3f2cfe4f4d3)

18 years agomerged new event script calling code from ronnnie
Andrew Tridgell [Mon, 20 Aug 2007 01:10:30 +0000 (11:10 +1000)] 
merged new event script calling code from ronnnie
(This used to be ctdb commit bbacad61b3eee4276ffe44ed2a23949aca8152cf)