From: Martin Schwenke Date: Tue, 18 Jun 2024 05:38:18 +0000 (+1000) Subject: ctdb-daemon: Improve error handling when releasing all IPs X-Git-Tag: tdb-1.4.13~994 X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=46f6b50f7a56942dcf444b0a7083ff0e5b4c2697;p=thirdparty%2Fsamba.git ctdb-daemon: Improve error handling when releasing all IPs Currently, event failures are completely ignored in favour of checking if the IP is on an interface. This misses the case where event scripts up to and including 10.interface succeed, but something later fails. When that occurs, count is incremented, so the failure is counted as a success in the summary that is logged. Fail when releaseip fails even though 10.interface succeeded in releasing the IP. This may result in the IP address coming back, but that's a different problem. Underlying this is a design question about when releaseip is successful. Should releaseip be a distinct operation, with subsequent reconfigurations considered separately? Update logging to clearly identify each of the 3 possible errors. Signed-off-by: Martin Schwenke Reviewed-by: Anoop C S --- diff --git a/ctdb/server/ctdb_takeover.c b/ctdb/server/ctdb_takeover.c index 2176c6ab806..7780a95f05c 100644 --- a/ctdb/server/ctdb_takeover.c +++ b/ctdb/server/ctdb_takeover.c @@ -1730,6 +1730,9 @@ void ctdb_release_all_ips(struct ctdb_context *ctdb) } for (vnn = ctdb->vnn; vnn != NULL; vnn = next) { + bool have_ip; + int ret; + /* vnn can be freed below in release_ip_post() */ next = vnn->next; @@ -1757,19 +1760,34 @@ void ctdb_release_all_ips(struct ctdb_context *ctdb) vnn->public_netmask_bits, ctdb_vnn_iface_string(vnn))); - ctdb_event_script_args(ctdb, CTDB_EVENT_RELEASE_IP, "%s %s %u", - ctdb_vnn_iface_string(vnn), - ctdb_addr_to_str(&vnn->public_address), - vnn->public_netmask_bits); - /* releaseip timeouts are converted to success, so to - * detect failures just check if the IP address is - * still there... + /* + * releaseip timeouts are converted to success, or IP + * might be released but releaseip event failed (due + * to failure of script after 10.interface), so try + * hard to correctly report failures... */ - if (ctdb_sys_have_ip(&vnn->public_address)) { - DEBUG(DEBUG_ERR, - (__location__ - " IP address %s not released\n", - ctdb_addr_to_str(&vnn->public_address))); + ret = ctdb_event_script_args( + ctdb, + CTDB_EVENT_RELEASE_IP, + "%s %s %u", + ctdb_vnn_iface_string(vnn), + ctdb_addr_to_str(&vnn->public_address), + vnn->public_netmask_bits); + have_ip = ctdb_sys_have_ip(&vnn->public_address); + if (have_ip) { + if (ret != 0) { + DBG_ERR("Error releasing IP %s\n", + ctdb_addr_to_str(&vnn->public_address)); + } else { + DBG_ERR("IP %s not released (timed out?)\n", + ctdb_addr_to_str(&vnn->public_address)); + } + vnn->update_in_flight = false; + continue; + } + if (ret != 0) { + DBG_ERR("Error releasing IP %s (but IP is gone!)\n", + ctdb_addr_to_str(&vnn->public_address)); vnn->update_in_flight = false; continue; }