[`FileDescriptorStorePreserve=`](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#FileDescriptorStorePreserve=)
setting in service unit files. If set to `yes` the fdstore will be kept as long
as the service definition is loaded into memory by the service manager, i.e. as
-long as at least one other loaded unit has a reference to it.
+long as at least one other loaded unit has a reference to it. If set to
+`on-success` the behaviour is the same as `yes`, except that the fdstore is
+discarded once the service enters the permanent `failed` state, i.e. after all
+automated restart attempts driven by `Restart=` have been exhausted.
The `systemctl clean --what=fdstore …` command may be used to explicitly clear
the fdstore of a service. This is only allowed when the service is fully
<varlistentry>
<term><varname>FileDescriptorStorePreserve=</varname></term>
<listitem><para>Takes one of <constant>no</constant>, <constant>yes</constant>,
- <constant>restart</constant> and controls when to release the service's file descriptor store
- (i.e. when to close the contained file descriptors, if any). If set to <constant>no</constant> the
- file descriptor store is automatically released when the service is stopped; if
- <constant>restart</constant> (the default) it is kept around as long as the unit is neither inactive
- nor failed, or a job is queued for the service, or the service is expected to be restarted. If
- <constant>yes</constant> the file descriptor store is kept around and garbage collection of the unit
- is disabled. The latter is useful to keep entries in the file descriptor store pinned until the unit
- is removed, the service manager exits, or the file descriptors get <constant>EPOLLHUP</constant> or
- <constant>EPOLLERR</constant>.</para>
-
- <para>When set to <constant>yes</constant>, and the service is itself running under another service
- manager (e.g. a service of <filename>user@.service</filename>, or a payload inside
+ <constant>restart</constant>, <constant>on-success</constant> and controls when to release the
+ service's file descriptor store (i.e. when to close the contained file descriptors, if any). If set
+ to <constant>no</constant> the file descriptor store is automatically released when the service is
+ stopped; if <constant>restart</constant> (the default) it is kept around as long as the unit is
+ neither inactive nor failed, or a job is queued for the service, or the service is expected to be
+ restarted. If <constant>yes</constant> the file descriptor store is kept around and garbage
+ collection of the unit is disabled. The latter is useful to keep entries in the file descriptor
+ store pinned until the unit is removed, the service manager exits, or the file descriptors get
+ <constant>EPOLLHUP</constant> or <constant>EPOLLERR</constant>. If <constant>on-success</constant>
+ the behaviour is identical to <constant>yes</constant>, except that the file descriptor store is
+ discarded if the unit enters the permanent <literal>failed</literal> state (i.e. once all automated
+ restart attempts driven by <varname>Restart=</varname> have been exhausted). The store is preserved
+ across the transitionary failed states that precede each individual auto-restart attempt.</para>
+
+ <para>When set to <constant>yes</constant> or <constant>on-success</constant>, and the service is
+ itself running under another service manager (e.g. a service of <filename>user@.service</filename>,
+ or a payload inside
<citerefentry><refentrytitle>systemd-nspawn</refentrytitle><manvolnum>1</manvolnum></citerefentry>),
file descriptors pushed into the store are also forwarded one level up via the enveloping manager's
<varname>$NOTIFY_SOCKET</varname>, tagged with the originating unit id, so that they are preserved
across restarts of the inner manager and handed back to the originating unit when it is started
again. For this to take effect, the enveloping unit must itself enable
- <varname>FileDescriptorStoreMax=</varname> and <varname>FileDescriptorStorePreserve=yes</varname>.
+ <varname>FileDescriptorStoreMax=</varname> and a non-<constant>no</constant>/<constant>restart</constant>
+ value for <varname>FileDescriptorStorePreserve=</varname>.
See the <ulink url="https://systemd.io/FILE_DESCRIPTOR_STORE">File Descriptor Store</ulink>
overview for details.</para>
- <para>Setting this to <constant>yes</constant> also ensures the file descriptor store is kept loaded
- across a <literal>kexec</literal>-based reboot on kernels supporting the <ulink
+ <para>Setting this to <constant>yes</constant> or <constant>on-success</constant> also ensures the
+ file descriptor store is kept loaded across a <literal>kexec</literal>-based reboot on kernels
+ supporting the <ulink
url="https://docs.kernel.org/userspace-api/liveupdate.html">Live Update Orchestrator</ulink>,
so that compatible file descriptors (such as <citerefentry
project='man-pages'><refentrytitle>memfd_create</refentrytitle><manvolnum>2</manvolnum></citerefentry>)
DEFINE_STRING_TABLE_LOOKUP(exec_utmp_mode, ExecUtmpMode);
static const char* const exec_preserve_mode_table[_EXEC_PRESERVE_MODE_MAX] = {
- [EXEC_PRESERVE_NO] = "no",
- [EXEC_PRESERVE_YES] = "yes",
- [EXEC_PRESERVE_RESTART] = "restart",
+ [EXEC_PRESERVE_NO] = "no",
+ [EXEC_PRESERVE_YES] = "yes",
+ [EXEC_PRESERVE_RESTART] = "restart",
+ [EXEC_PRESERVE_ON_SUCCESS] = "on-success",
};
DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN(exec_preserve_mode, ExecPreserveMode, EXEC_PRESERVE_YES);
EXEC_PRESERVE_NO,
EXEC_PRESERVE_YES,
EXEC_PRESERVE_RESTART,
+ EXEC_PRESERVE_ON_SUCCESS,
_EXEC_PRESERVE_MODE_MAX,
_EXEC_PRESERVE_MODE_INVALID = -EINVAL,
} ExecPreserveMode;
s = SERVICE(u);
- if (s->fd_store_preserve_mode != EXEC_PRESERVE_YES)
+ if (!IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS))
continue;
if (!s->fd_store)
* unit and the original fdname. This way fdstore persistence chains all the way up to whichever
* entity is ultimately responsible for surviving across kexec/restart, regardless of fdname
* length or charset constraints. */
- if (propagate_upstream && s->fd_store_preserve_mode == EXEC_PRESERVE_YES) {
+ if (propagate_upstream && IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS)) {
Manager *m = ASSERT_PTR(UNIT(s)->manager);
char idx_str[STRLEN(SERVICE_FDSTORE_SUB_FDNAME_PREFIX) + DECIMAL_STR_MAX(uint64_t)];
LIST_PREPEND(fd_store, s->fd_store, TAKE_PTR(fs));
s->n_fd_store++;
- if (propagate_upstream && s->fd_store_preserve_mode == EXEC_PRESERVE_YES)
+ if (propagate_upstream && IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS))
/* Refresh the JSON mapping memfd so the supervisor can resolve the new index. Do this
* after LIST_PREPEND so the new entry is visible to the helper. */
(void) service_propagate_fd_store_mapping_upstream(UNIT(s)->manager);
if (r > 0 &&
s->state == SERVICE_DEAD &&
s->deserialized_state == SERVICE_DEAD &&
- s->fd_store_preserve_mode == EXEC_PRESERVE_YES) {
+ IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS)) {
service_set_state(s, SERVICE_DEAD_RESOURCES_PINNED);
s->deserialized_state = SERVICE_DEAD_RESOURCES_PINNED;
}
static ServiceState service_determine_dead_state(Service *s) {
assert(s);
- return SERVICE_FD_STORE_POPULATED(s) && s->fd_store_preserve_mode == EXEC_PRESERVE_YES ? SERVICE_DEAD_RESOURCES_PINNED : SERVICE_DEAD;
+ return SERVICE_FD_STORE_POPULATED(s) && IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS) ? SERVICE_DEAD_RESOURCES_PINNED : SERVICE_DEAD;
}
static void service_enter_dead(Service *s, ServiceResult f, bool allow_restart) {
unit_destroy_runtime_data(UNIT(s), &s->exec_context, /* destroy_runtime_dir= */ true);
/* Also get rid of the fd store, if that's configured. */
- if (s->fd_store_preserve_mode == EXEC_PRESERVE_NO)
+ if (s->fd_store_preserve_mode == EXEC_PRESERVE_NO ||
+ (s->fd_store_preserve_mode == EXEC_PRESERVE_ON_SUCCESS && s->state == SERVICE_FAILED))
service_release_fd_store(s);
/* Get rid of the IPC bits of the user */
service_release_extra_fds(s);
s->root_directory_fd = asynchronous_close(s->root_directory_fd);
- if (s->fd_store_preserve_mode != EXEC_PRESERVE_YES)
+ if (IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_NO, EXEC_PRESERVE_RESTART) ||
+ (s->fd_store_preserve_mode == EXEC_PRESERVE_ON_SUCCESS && s->state == SERVICE_FAILED))
service_release_fd_store(s);
if (s->state == SERVICE_DEAD_RESOURCES_PINNED && !SERVICE_FD_STORE_POPULATED(s))
ExecPreserveMode,
SD_VARLINK_DEFINE_ENUM_VALUE(no),
SD_VARLINK_DEFINE_ENUM_VALUE(yes),
- SD_VARLINK_DEFINE_ENUM_VALUE(restart));
+ SD_VARLINK_DEFINE_ENUM_VALUE(restart),
+ SD_VARLINK_DEFINE_ENUM_VALUE(on_success));
SD_VARLINK_DEFINE_ENUM_TYPE(
ExecKeyringMode,
run0 -u testuser -i "cat >.config/systemd/user/systemd-nspawn@fdstore.service.d/fdstore.conf <<EOF
[Service]
FileDescriptorStoreMax=8
-FileDescriptorStorePreserve=yes
+FileDescriptorStorePreserve=on-success
EOF"
run0 -u testuser systemctl --user daemon-reload
n_user_at_fds=$(systemctl show -P NFileDescriptorStore "user@${TESTUSER_UID}.service")
test "${n_user_at_fds}" -ge 2
-# 3) Stop the nspawn service: payload is gone but FileDescriptorStorePreserve=yes
+# 3) Stop the nspawn service: payload is gone but FileDescriptorStorePreserve=on-success
# must keep the fds in the user-side fdstore (and propagated copy in PID 1).
run0 -u testuser systemctl --user stop systemd-nspawn@fdstore.service
n_nspawn_fds=$(run0 -u testuser systemctl --user show -P NFileDescriptorStore systemd-nspawn@fdstore.service)
run0 -u testuser systemctl --user start systemd-nspawn@fdstore.service
run0 -u testuser systemctl is-active --user systemd-nspawn@fdstore.service
-run0 -u testuser systemctl --user stop systemd-nspawn@fdstore.service
+# 7) Failure case: with FileDescriptorStorePreserve=on-success, the fdstore must
+# be dropped once the unit enters the permanent failed state (i.e. once all
+# automated restart attempts driven by Restart= are exhausted). The
+# systemd-nspawn@.service template doesn't set Restart=, so killing the inner
+# payload with SIGKILL forces the unit straight into 'failed'.
+timeout 30s bash -c \
+ "until [[ \"\$(run0 -u testuser systemctl --user show -P NFileDescriptorStore systemd-nspawn@fdstore.service)\" -ge 2 ]]; do sleep 0.5; done"
+run0 -u testuser systemctl --user kill --kill-whom=all -s SIGKILL systemd-nspawn@fdstore.service
+timeout 30s bash -c \
+ "until [[ \"\$(run0 -u testuser systemctl --user show -P ActiveState systemd-nspawn@fdstore.service)\" == failed ]]; do sleep 0.5; done"
+# The fdstore must be discarded once the failed state is reached.
+assert_eq "$(run0 -u testuser systemctl --user show -P NFileDescriptorStore systemd-nspawn@fdstore.service)" 0
+assert_eq "$(run0 -u testuser systemctl --user show -P SubState systemd-nspawn@fdstore.service)" failed
+run0 -u testuser systemctl --user reset-failed systemd-nspawn@fdstore.service
+
machinectl terminate fdstore 2>/dev/null || true
loginctl disable-linger testuser
n_fds=$(systemctl show -P NFileDescriptorStore TEST-91-LIVEUPDATE-late-zerofds.service)
test "$n_fds" -eq 0
systemctl start TEST-91-LIVEUPDATE-late-zerofds.service
+
+ # Verify that with FileDescriptorStorePreserve=on-success the fdstore is
+ # discarded once the unit enters the permanent failed state, while still
+ # being preserved across the transitionary failed states that precede
+ # each automated auto-restart attempt. Use Restart=on-failure with
+ # StartLimitBurst=2 so the manager runs the helper twice before
+ # giving up. The helper:
+ # - on the first attempt pushes an fd into the fdstore, becomes ready,
+ # and then crashes,
+ # - on subsequent attempts asserts that the previously stored fd is
+ # handed back via $LISTEN_FDS (proving the fdstore survived the
+ # auto-restart) and then crashes again.
+ # When the start-limit is hit the unit lands in the permanent failed
+ # state, at which point the fdstore must be empty.
+ cat >/run/TEST-91-LIVEUPDATE-failure.sh <<'EOF'
+#!/usr/bin/env bash
+set -eux
+state_file=/run/TEST-91-LIVEUPDATE-failure.attempt
+attempt=$(cat "$state_file" 2>/dev/null || echo 0)
+attempt=$((attempt + 1))
+echo "$attempt" > "$state_file"
+if [[ "$attempt" -eq 1 ]]; then
+ systemd-notify --fd=0 --fdname=mem </dev/zero
+else
+ # On any restart attempt the fdstore must have been preserved across the
+ # transitionary failed state and handed back to us via $LISTEN_FDS. Drop a
+ # marker file when the invariant is broken so the outer test can detect it.
+ if [[ "${LISTEN_FDS:-0}" -lt 1 ]]; then
+ touch /run/TEST-91-LIVEUPDATE-failure.preserve-broken
+ fi
+fi
+systemd-notify --ready
+# Give PID 1 a chance to process the FDSTORE=1/READY=1 notifications before
+# we exit, so the fdstore add is observed by the manager.
+sleep 0.5
+exit 1
+EOF
+ chmod +x /run/TEST-91-LIVEUPDATE-failure.sh
+ rm -f /run/TEST-91-LIVEUPDATE-failure.attempt \
+ /run/TEST-91-LIVEUPDATE-failure.preserve-broken
+ cat >/run/systemd/system/TEST-91-LIVEUPDATE-failure.service <<EOF
+[Unit]
+StartLimitIntervalSec=60
+StartLimitBurst=2
+[Service]
+Type=notify
+NotifyAccess=all
+FileDescriptorStoreMax=4
+FileDescriptorStorePreserve=on-success
+Restart=on-failure
+RestartSec=100ms
+ExecStart=/run/TEST-91-LIVEUPDATE-failure.sh
+EOF
+ systemctl daemon-reload
+ systemctl start TEST-91-LIVEUPDATE-failure.service || true
+ timeout 60s bash -c \
+ "until [[ \"\$(systemctl show -P ActiveState TEST-91-LIVEUPDATE-failure.service)\" == failed ]]; do sleep 0.5; done"
+ # Sanity: the helper ran more than once, proving the fdstore was preserved
+ # across at least one auto-restart attempt.
+ test "$(cat /run/TEST-91-LIVEUPDATE-failure.attempt)" -ge 2
+ # And the in-flight preservation invariant must hold for every restart.
+ test ! -e /run/TEST-91-LIVEUPDATE-failure.preserve-broken
+ # And the fdstore must be empty now that the permanent failed state was
+ # reached, since FileDescriptorStorePreserve=on-success is set.
+ n_fds=$(systemctl show -P NFileDescriptorStore TEST-91-LIVEUPDATE-failure.service)
+ test "$n_fds" -eq 0
+ systemctl reset-failed TEST-91-LIVEUPDATE-failure.service
+ rm -f /run/systemd/system/TEST-91-LIVEUPDATE-failure.service \
+ /run/TEST-91-LIVEUPDATE-failure.sh \
+ /run/TEST-91-LIVEUPDATE-failure.attempt \
+ /run/TEST-91-LIVEUPDATE-failure.preserve-broken
+ systemctl daemon-reload
else
# Create memfds with known content and push them to our fd store.
# Also request a LUO session, store a memfd in it, and push the session fd to the fd store.