This can happen if an IKE_SA is terminated forcefully shortly before
terminating the daemon. The thread that handles the terminate command
will call checkin_and_destroy(), which unregisters the IKE_SA from the
manager before destroying it. The main thread that calls flush() on the
IKE_SA manager won't wait for this SA (its entry is already gone), so
the processor and in turn the watcher job/thread might get canceled
before the first thread started deleting the VIP. It would then wait
indefinitely for a signal that can never be sent.
There is still a small chance the thread hangs in wait() if the state check
happens right before the watcher is canceled and it wasn't yet able to
deliver the event from the kernel, we counter that by rechecking the state
after a while.
if (status == SUCCESS && wait)
{ /* wait until the address is really gone */
this->lock->write_lock(this->lock);
- while (is_known_vip(this, virtual_ip))
- {
- this->condvar->wait(this->condvar, this->lock);
+ while (is_known_vip(this, virtual_ip) &&
+ lib->watcher->get_state(lib->watcher) != WATCHER_STOPPED)
+ { /* don't wait during deinit when we can't get notified,
+ * re-evaluate watcher state if we have to wait longer */
+ this->condvar->timed_wait(this->condvar, this->lock, 1000);
}
this->lock->unlock(this->lock);
}