From: Mike Yuan Date: Fri, 25 Oct 2024 23:51:04 +0000 (+0200) Subject: core/service: introduce sd_notify() RESTART_RESET=1 for resetting restart counter X-Git-Tag: v258-rc1~1118^2~1 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=406aeb5da691180d7f655564f6543576ec211909;p=thirdparty%2Fsystemd.git core/service: introduce sd_notify() RESTART_RESET=1 for resetting restart counter We have RestartMaxDelaySec= + RestartSteps= to exponentially increase auto restart durations, but it currently cannot be reset by the service itself, which makes it sometimes awkward to use. A typical pattern in real life is that a service was once down (e.g. due to temporary network interruption) and multiple restarts were attempted. Then, future restarts would always wait for increated amount of time based on RestartMaxDelaySec=, even after the original problem got resolved. Such "persistence" could result in longer unavailablity than there should be for failures that come later. (C.f. https://utcc.utoronto.ca/~cks/space/blog/linux/SystemdResettingUnitBackoff) Let's introduce a new sd_notify() notification for resetting the restart counter. There were discussions about making this timer-based, but I think it's more flexible to leave the decision-making to the service. This enables them to do a combination of N successful requests + uptime check for instance. --- diff --git a/man/sd_notify.xml b/man/sd_notify.xml index c017f484870..746789e955e 100644 --- a/man/sd_notify.xml +++ b/man/sd_notify.xml @@ -333,7 +333,7 @@ systemd.service5 for information how to enable this functionality and sd_watchdog_enabled3 - for the details of how the service can check whether the watchdog is enabled. + for the details of how the service can check whether the watchdog is enabled. @@ -345,7 +345,7 @@ in time. Note that WatchdogSec= does not need to be enabled for WATCHDOG=trigger to trigger the watchdog action. See systemd.service5 - for information about the watchdog behavior. + for information about the watchdog behavior. @@ -376,6 +376,18 @@ + + RESTART_RESET=1 + + Reset the restart counter of the service, which has the effect of restoring + the restart duration to RestartSec= if RestartSteps= and + RestartMaxDelaySec= are in use. For more information, refer to + systemd.service5. + + + + + FDSTORE=1 diff --git a/src/core/service.c b/src/core/service.c index 4bad026537d..ccfa439dd0a 100644 --- a/src/core/service.c +++ b/src/core/service.c @@ -4861,6 +4861,17 @@ static void service_notify_message( service_override_watchdog_timeout(s, watchdog_override_usec); } + /* Interpret RESTART_RESET=1 */ + if (strv_contains(tags, "RESTART_RESET=1") && IN_SET(s->state, SERVICE_RUNNING, SERVICE_STOP)) { + log_unit_struct(u, LOG_NOTICE, + LOG_UNIT_MESSAGE(u, "Got RESTART_RESET=1, resetting restart counter from %u.", s->n_restarts), + "N_RESTARTS=0", + LOG_UNIT_INVOCATION_ID(u)); + + s->n_restarts = 0; + notify_dbus = true; + } + /* Process FD store messages. Either FDSTOREREMOVE=1 for removal, or FDSTORE=1 for addition. In both cases, * process FDNAME= for picking the file descriptor name to use. Note that FDNAME= is required when removing * fds, but optional when pushing in new fds, for compatibility reasons. */