From: Vladimír Čunát Date: Thu, 24 Apr 2025 08:10:44 +0000 (+0200) Subject: daemon/worker send_waiting(): be more defensive X-Git-Tag: v6.0.12~1^2~3 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=7210f16e65e3ed8794ace2ee1f57e4eac6c1a38f;p=thirdparty%2Fknot-resolver.git daemon/worker send_waiting(): be more defensive We encountered non-recoverable assertions due to popping from an empty queue here, but I see no reason to block recovery here. I'm still keeping it as a soft assertion until it's better understood. I *suspect* what happened is that: - multiple queries queued up before outgoing TCP handshake completed - the session got into closing state for some reason *before* processing this whole queue - during that the queue got emptied --- diff --git a/NEWS b/NEWS index fd589b89d..5ff5a7b90 100644 --- a/NEWS +++ b/NEWS @@ -11,6 +11,8 @@ Bugfixes Improvements ------------ - /local-data/rpz/*/watchdog: new configuration to enable watchdog for RPZ files (!1665) +- daemon: fix rare crashes with either of the lines below + [system] requirement "h && h->end > h->begin" failed in queue_pop_impl Knot Resolver 6.0.11 (2025-02-26) diff --git a/daemon/worker.c b/daemon/worker.c index 70963afdd..a58db0caf 100644 --- a/daemon/worker.c +++ b/daemon/worker.c @@ -732,7 +732,14 @@ static int send_waiting(struct session2 *session) session2_close(session); break; } - session2_waitinglist_pop(session, true); + // Let's be a bit defensive and check that nothing's changed before _pop() + // and recover if it has, as qr_task_send() is rather complex. + // TODO: fully analyze why the waitinglist could get empty here. + if (session2_waitinglist_get(session) == t) { + session2_waitinglist_pop(session, true); + } else { // a normal assertion could kr_error_log() too much in some rarer cases + VERBOSE_MSG(NULL, "soft assertion: waitinglist mismatch in send_waiting()\n"); + } } while (!session2_waitinglist_is_empty(session)); defer_sample_stop(&defer_prev_sample_state, true);