In cli_parse_delete_server(), we take care of checking that the server is
in MAINT and that the cur_sess counter is set to 0, in the hope that no
connection/stream ressources continue to point to the server, else we
refuse to delete it.
As shown in GH #2298, this is not sufficient.
Indeed, when the server option "on-marked-down shutdown-sessions" is not
used, server streams are not purged when srv enters maintenance mode.
As such, there could be remaining streams that point to the server. To
detect this, a secondary check on srv->cur_sess counter was performed in
cli_parse_delete_server(). Unfortunately, there are some code paths that
could lead to cur_sess being decremented, and not resulting in a stream
being actually shutdown. As such, if the delete_server cli is handled
right after cur_sess has been decremented with streams still pointing to
the server, we could face some nasty bugs where stream->srv_conn could
point to garbage memory area, as described in the original github report.
To make the check more reliable prior to deleting the server, we don't
rely exclusively on cur_sess and directly check that the server is not
used in any stream through the srv_has_stream() helper function.
Thanks to @capflam which found out the root cause for the bug and greatly
helped to provide the fix.
This should be backported up to 2.6.
return 0;
}
+/* Returns 1 if the server has streams pointing to it, and 0 otherwise.
+ *
+ * Must be called with the server lock held.
+ */
+static int srv_has_streams(struct server *srv)
+{
+ int thr;
+
+ for (thr = 0; thr < global.nbthread; thr++)
+ if (!MT_LIST_ISEMPTY(&srv->per_thr[thr].streams))
+ return 1;
+ return 0;
+}
+
/* Shutdown all connections of a server. The caller must pass a termination
* code in <why>, which must be one of SF_ERR_* indicating the reason for the
* shutdown.
* cleanup function should be implemented to be used here.
*/
if (srv->cur_sess || srv->curr_idle_conns ||
- !eb_is_empty(&srv->queue.head)) {
+ !eb_is_empty(&srv->queue.head) || srv_has_streams(srv)) {
cli_err(appctx, "Server still has connections attached to it, cannot remove it.");
goto out;
}