]> git.ipfire.org Git - thirdparty/rspamd.git/commitdiff
[Fix] Keep srv events active during shutdown to track auxiliary processes 5728/head
authorVsevolod Stakhov <vsevolod@rspamd.com>
Tue, 4 Nov 2025 10:54:06 +0000 (10:54 +0000)
committerVsevolod Stakhov <vsevolod@rspamd.com>
Tue, 4 Nov 2025 10:54:06 +0000 (10:54 +0000)
When Rspamd shuts down with auxiliary processes running (e.g., neural network
training spawned by workers), the main process was stopping srv_pipe event
handlers immediately after sending SIGTERM to workers. This prevented workers
from sending RSPAMD_SRV_ON_FORK notifications when their auxiliary child
processes terminated, causing these children to remain tracked indefinitely.

The main process would then hang for 90 seconds waiting for already-dead
processes that it couldn't properly clean up from the workers hash table.

Root cause analysis:
- Direct workers have ev_child watchers and are removed via SIGCHLD handler
- Auxiliary processes (fork from workers) have NO ev_child watchers
- They are removed ONLY via srv_pipe notifications (RSPAMD_SRV_ON_FORK)
- Stopping srv events during shutdown breaks this notification channel

The original stop_srv_ev code was added in 2019 (commit eafdd221) to avoid
"false notifications" during a major refactoring. However, this is no longer
an issue because:
1. srv_ev handlers automatically stop on EOF when worker pipes close
2. There is no risk of duplicate notifications
3. Auxiliary processes critically need these events to report termination

Solution: Remove the stop_srv_ev call from rspamd_term_handler. This allows
workers to continue sending process termination notifications during shutdown.
The srv_ev handlers will stop naturally when workers close their pipes.

Fixes: #5689, #5694
src/rspamd.c

index 2329d421086ff6985bd056e953f88275183630ba..a4eb2e0f2c438a655b28e1ec4a7d7616e9ce01c5 100644 (file)
@@ -1061,8 +1061,13 @@ rspamd_term_handler(struct ev_loop *loop, ev_signal *w, int revents)
                msg_info_main("catch termination signal, waiting for %d children for %.2f seconds",
                                          (int) g_hash_table_size(rspamd_main->workers),
                                          valgrind_mode ? shutdown_ts * 10 : shutdown_ts);
-               /* Stop srv events to avoid false notifications */
-               g_hash_table_foreach(rspamd_main->workers, stop_srv_ev, rspamd_main);
+               /*
+                * Keep srv events active to process ON_FORK notifications from dying workers.
+                * This is critical for tracking auxiliary processes (e.g., neural network training)
+                * that may still be running when shutdown begins. Without this, child_dead
+                * notifications get lost and the main process hangs waiting for already-terminated
+                * processes. srv_ev will be stopped naturally when workers close their pipes.
+                */
                rspamd_pass_signal(rspamd_main->workers, SIGTERM);
 
                if (control_fd != -1) {