When Rspamd shuts down with auxiliary processes running (e.g., neural network
training spawned by workers), the main process was stopping srv_pipe event
handlers immediately after sending SIGTERM to workers. This prevented workers
from sending RSPAMD_SRV_ON_FORK notifications when their auxiliary child
processes terminated, causing these children to remain tracked indefinitely.
The main process would then hang for 90 seconds waiting for already-dead
processes that it couldn't properly clean up from the workers hash table.
Root cause analysis:
- Direct workers have ev_child watchers and are removed via SIGCHLD handler
- Auxiliary processes (fork from workers) have NO ev_child watchers
- They are removed ONLY via srv_pipe notifications (RSPAMD_SRV_ON_FORK)
- Stopping srv events during shutdown breaks this notification channel
The original stop_srv_ev code was added in 2019 (commit
eafdd221) to avoid
"false notifications" during a major refactoring. However, this is no longer
an issue because:
1. srv_ev handlers automatically stop on EOF when worker pipes close
2. There is no risk of duplicate notifications
3. Auxiliary processes critically need these events to report termination
Solution: Remove the stop_srv_ev call from rspamd_term_handler. This allows
workers to continue sending process termination notifications during shutdown.
The srv_ev handlers will stop naturally when workers close their pipes.
Fixes: #5689, #5694
msg_info_main("catch termination signal, waiting for %d children for %.2f seconds",
(int) g_hash_table_size(rspamd_main->workers),
valgrind_mode ? shutdown_ts * 10 : shutdown_ts);
- /* Stop srv events to avoid false notifications */
- g_hash_table_foreach(rspamd_main->workers, stop_srv_ev, rspamd_main);
+ /*
+ * Keep srv events active to process ON_FORK notifications from dying workers.
+ * This is critical for tracking auxiliary processes (e.g., neural network training)
+ * that may still be running when shutdown begins. Without this, child_dead
+ * notifications get lost and the main process hangs waiting for already-terminated
+ * processes. srv_ev will be stopped naturally when workers close their pipes.
+ */
rspamd_pass_signal(rspamd_main->workers, SIGTERM);
if (control_fd != -1) {