From 33d6eed59ca38e18bb46a0f63075b53e679a60b9 Mon Sep 17 00:00:00 2001 From: Vsevolod Stakhov Date: Tue, 4 Nov 2025 10:54:06 +0000 Subject: [PATCH] [Fix] Keep srv events active during shutdown to track auxiliary processes When Rspamd shuts down with auxiliary processes running (e.g., neural network training spawned by workers), the main process was stopping srv_pipe event handlers immediately after sending SIGTERM to workers. This prevented workers from sending RSPAMD_SRV_ON_FORK notifications when their auxiliary child processes terminated, causing these children to remain tracked indefinitely. The main process would then hang for 90 seconds waiting for already-dead processes that it couldn't properly clean up from the workers hash table. Root cause analysis: - Direct workers have ev_child watchers and are removed via SIGCHLD handler - Auxiliary processes (fork from workers) have NO ev_child watchers - They are removed ONLY via srv_pipe notifications (RSPAMD_SRV_ON_FORK) - Stopping srv events during shutdown breaks this notification channel The original stop_srv_ev code was added in 2019 (commit eafdd221) to avoid "false notifications" during a major refactoring. However, this is no longer an issue because: 1. srv_ev handlers automatically stop on EOF when worker pipes close 2. There is no risk of duplicate notifications 3. Auxiliary processes critically need these events to report termination Solution: Remove the stop_srv_ev call from rspamd_term_handler. This allows workers to continue sending process termination notifications during shutdown. The srv_ev handlers will stop naturally when workers close their pipes. Fixes: #5689, #5694 --- src/rspamd.c | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/src/rspamd.c b/src/rspamd.c index 2329d42108..a4eb2e0f2c 100644 --- a/src/rspamd.c +++ b/src/rspamd.c @@ -1061,8 +1061,13 @@ rspamd_term_handler(struct ev_loop *loop, ev_signal *w, int revents) msg_info_main("catch termination signal, waiting for %d children for %.2f seconds", (int) g_hash_table_size(rspamd_main->workers), valgrind_mode ? shutdown_ts * 10 : shutdown_ts); - /* Stop srv events to avoid false notifications */ - g_hash_table_foreach(rspamd_main->workers, stop_srv_ev, rspamd_main); + /* + * Keep srv events active to process ON_FORK notifications from dying workers. + * This is critical for tracking auxiliary processes (e.g., neural network training) + * that may still be running when shutdown begins. Without this, child_dead + * notifications get lost and the main process hangs waiting for already-terminated + * processes. srv_ev will be stopped naturally when workers close their pipes. + */ rspamd_pass_signal(rspamd_main->workers, SIGTERM); if (control_fd != -1) { -- 2.47.3