From: Luca Boccassi Date: Mon, 18 May 2026 14:47:32 +0000 (+0100) Subject: core: add FileDescriptorStorePreserve=on-success option X-Git-Tag: v261-rc1~43^2 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=67dea1409b7083b6fa78d7adb752d13ce3e0909b;p=thirdparty%2Fsystemd.git core: add FileDescriptorStorePreserve=on-success option Currently with FileDescriptorStorePreserve=yes the FD store is kept around regardless of what happens to a unit, which is useful in many cases. But in some cases, for example when complex services crash horribly, it's hard to reason about what was in the intermediate state, and it's better to start fresh. Add a new 'on-success' option for the FileDescriptorStorePreserve= setting that keeps it around only for as long as the unit doesn't go to a persistently failed state. This is especially useful in combination with LUO, where we don't want to keep around LUO sessions created by units that then proceeded to crash and burn, and might be in a bad state afterwards. --- diff --git a/docs/FILE_DESCRIPTOR_STORE.md b/docs/FILE_DESCRIPTOR_STORE.md index c3336453781..6e174451059 100644 --- a/docs/FILE_DESCRIPTOR_STORE.md +++ b/docs/FILE_DESCRIPTOR_STORE.md @@ -123,7 +123,10 @@ This behavior can be modified via the [`FileDescriptorStorePreserve=`](https://www.freedesktop.org/software/systemd/man/latest/systemd.service.html#FileDescriptorStorePreserve=) setting in service unit files. If set to `yes` the fdstore will be kept as long as the service definition is loaded into memory by the service manager, i.e. as -long as at least one other loaded unit has a reference to it. +long as at least one other loaded unit has a reference to it. If set to +`on-success` the behaviour is the same as `yes`, except that the fdstore is +discarded once the service enters the permanent `failed` state, i.e. after all +automated restart attempts driven by `Restart=` have been exhausted. The `systemctl clean --what=fdstore …` command may be used to explicitly clear the fdstore of a service. This is only allowed when the service is fully diff --git a/man/systemd.service.xml b/man/systemd.service.xml index 028c1144b08..0c5f222bb7a 100644 --- a/man/systemd.service.xml +++ b/man/systemd.service.xml @@ -1245,29 +1245,36 @@ RestartMaxDelaySec=160s FileDescriptorStorePreserve= Takes one of no, yes, - restart and controls when to release the service's file descriptor store - (i.e. when to close the contained file descriptors, if any). If set to no the - file descriptor store is automatically released when the service is stopped; if - restart (the default) it is kept around as long as the unit is neither inactive - nor failed, or a job is queued for the service, or the service is expected to be restarted. If - yes the file descriptor store is kept around and garbage collection of the unit - is disabled. The latter is useful to keep entries in the file descriptor store pinned until the unit - is removed, the service manager exits, or the file descriptors get EPOLLHUP or - EPOLLERR. - - When set to yes, and the service is itself running under another service - manager (e.g. a service of user@.service, or a payload inside + restart, on-success and controls when to release the + service's file descriptor store (i.e. when to close the contained file descriptors, if any). If set + to no the file descriptor store is automatically released when the service is + stopped; if restart (the default) it is kept around as long as the unit is + neither inactive nor failed, or a job is queued for the service, or the service is expected to be + restarted. If yes the file descriptor store is kept around and garbage + collection of the unit is disabled. The latter is useful to keep entries in the file descriptor + store pinned until the unit is removed, the service manager exits, or the file descriptors get + EPOLLHUP or EPOLLERR. If on-success + the behaviour is identical to yes, except that the file descriptor store is + discarded if the unit enters the permanent failed state (i.e. once all automated + restart attempts driven by Restart= have been exhausted). The store is preserved + across the transitionary failed states that precede each individual auto-restart attempt. + + When set to yes or on-success, and the service is + itself running under another service manager (e.g. a service of user@.service, + or a payload inside systemd-nspawn1), file descriptors pushed into the store are also forwarded one level up via the enveloping manager's $NOTIFY_SOCKET, tagged with the originating unit id, so that they are preserved across restarts of the inner manager and handed back to the originating unit when it is started again. For this to take effect, the enveloping unit must itself enable - FileDescriptorStoreMax= and FileDescriptorStorePreserve=yes. + FileDescriptorStoreMax= and a non-no/restart + value for FileDescriptorStorePreserve=. See the File Descriptor Store overview for details. - Setting this to yes also ensures the file descriptor store is kept loaded - across a kexec-based reboot on kernels supporting the Setting this to yes or on-success also ensures the + file descriptor store is kept loaded across a kexec-based reboot on kernels + supporting the Live Update Orchestrator, so that compatible file descriptors (such as memfd_create2) diff --git a/src/core/execute.c b/src/core/execute.c index 7935da74316..ff6ef2de045 100644 --- a/src/core/execute.c +++ b/src/core/execute.c @@ -3182,9 +3182,10 @@ static const char* const exec_utmp_mode_table[_EXEC_UTMP_MODE_MAX] = { DEFINE_STRING_TABLE_LOOKUP(exec_utmp_mode, ExecUtmpMode); static const char* const exec_preserve_mode_table[_EXEC_PRESERVE_MODE_MAX] = { - [EXEC_PRESERVE_NO] = "no", - [EXEC_PRESERVE_YES] = "yes", - [EXEC_PRESERVE_RESTART] = "restart", + [EXEC_PRESERVE_NO] = "no", + [EXEC_PRESERVE_YES] = "yes", + [EXEC_PRESERVE_RESTART] = "restart", + [EXEC_PRESERVE_ON_SUCCESS] = "on-success", }; DEFINE_STRING_TABLE_LOOKUP_WITH_BOOLEAN(exec_preserve_mode, ExecPreserveMode, EXEC_PRESERVE_YES); diff --git a/src/core/execute.h b/src/core/execute.h index 4553ce9d84d..109dd774fcc 100644 --- a/src/core/execute.h +++ b/src/core/execute.h @@ -61,6 +61,7 @@ typedef enum ExecPreserveMode { EXEC_PRESERVE_NO, EXEC_PRESERVE_YES, EXEC_PRESERVE_RESTART, + EXEC_PRESERVE_ON_SUCCESS, _EXEC_PRESERVE_MODE_MAX, _EXEC_PRESERVE_MODE_INVALID = -EINVAL, } ExecPreserveMode; diff --git a/src/core/luo.c b/src/core/luo.c index 929a0e3f495..91c3a01dad8 100644 --- a/src/core/luo.c +++ b/src/core/luo.c @@ -204,7 +204,7 @@ int manager_luo_serialize_fd_stores(Manager *m, FILE **ret_f, FDSet **ret_fds) { s = SERVICE(u); - if (s->fd_store_preserve_mode != EXEC_PRESERVE_YES) + if (!IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS)) continue; if (!s->fd_store) diff --git a/src/core/service.c b/src/core/service.c index 6a02ca8d6f9..6997866f883 100644 --- a/src/core/service.c +++ b/src/core/service.c @@ -699,7 +699,7 @@ int service_add_fd_store(Service *s, int fd_in, const char *name, bool do_poll, * unit and the original fdname. This way fdstore persistence chains all the way up to whichever * entity is ultimately responsible for surviving across kexec/restart, regardless of fdname * length or charset constraints. */ - if (propagate_upstream && s->fd_store_preserve_mode == EXEC_PRESERVE_YES) { + if (propagate_upstream && IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS)) { Manager *m = ASSERT_PTR(UNIT(s)->manager); char idx_str[STRLEN(SERVICE_FDSTORE_SUB_FDNAME_PREFIX) + DECIMAL_STR_MAX(uint64_t)]; @@ -721,7 +721,7 @@ int service_add_fd_store(Service *s, int fd_in, const char *name, bool do_poll, LIST_PREPEND(fd_store, s->fd_store, TAKE_PTR(fs)); s->n_fd_store++; - if (propagate_upstream && s->fd_store_preserve_mode == EXEC_PRESERVE_YES) + if (propagate_upstream && IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS)) /* Refresh the JSON mapping memfd so the supervisor can resolve the new index. Do this * after LIST_PREPEND so the new entry is visible to the helper. */ (void) service_propagate_fd_store_mapping_upstream(UNIT(s)->manager); @@ -876,7 +876,7 @@ static int service_attach_external_fd_to_fdstore(Unit *u, int fd, const char *fd if (r > 0 && s->state == SERVICE_DEAD && s->deserialized_state == SERVICE_DEAD && - s->fd_store_preserve_mode == EXEC_PRESERVE_YES) { + IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS)) { service_set_state(s, SERVICE_DEAD_RESOURCES_PINNED); s->deserialized_state = SERVICE_DEAD_RESOURCES_PINNED; } @@ -2350,7 +2350,7 @@ static bool service_will_restart(Unit *u) { static ServiceState service_determine_dead_state(Service *s) { assert(s); - return SERVICE_FD_STORE_POPULATED(s) && s->fd_store_preserve_mode == EXEC_PRESERVE_YES ? SERVICE_DEAD_RESOURCES_PINNED : SERVICE_DEAD; + return SERVICE_FD_STORE_POPULATED(s) && IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_YES, EXEC_PRESERVE_ON_SUCCESS) ? SERVICE_DEAD_RESOURCES_PINNED : SERVICE_DEAD; } static void service_enter_dead(Service *s, ServiceResult f, bool allow_restart) { @@ -2454,7 +2454,8 @@ static void service_enter_dead(Service *s, ServiceResult f, bool allow_restart) unit_destroy_runtime_data(UNIT(s), &s->exec_context, /* destroy_runtime_dir= */ true); /* Also get rid of the fd store, if that's configured. */ - if (s->fd_store_preserve_mode == EXEC_PRESERVE_NO) + if (s->fd_store_preserve_mode == EXEC_PRESERVE_NO || + (s->fd_store_preserve_mode == EXEC_PRESERVE_ON_SUCCESS && s->state == SERVICE_FAILED)) service_release_fd_store(s); /* Get rid of the IPC bits of the user */ @@ -6119,7 +6120,8 @@ static void service_release_resources(Unit *u) { service_release_extra_fds(s); s->root_directory_fd = asynchronous_close(s->root_directory_fd); - if (s->fd_store_preserve_mode != EXEC_PRESERVE_YES) + if (IN_SET(s->fd_store_preserve_mode, EXEC_PRESERVE_NO, EXEC_PRESERVE_RESTART) || + (s->fd_store_preserve_mode == EXEC_PRESERVE_ON_SUCCESS && s->state == SERVICE_FAILED)) service_release_fd_store(s); if (s->state == SERVICE_DEAD_RESOURCES_PINNED && !SERVICE_FD_STORE_POPULATED(s)) diff --git a/src/shared/varlink-io.systemd.Unit.c b/src/shared/varlink-io.systemd.Unit.c index 2ed91121e48..0c63725c750 100644 --- a/src/shared/varlink-io.systemd.Unit.c +++ b/src/shared/varlink-io.systemd.Unit.c @@ -25,7 +25,8 @@ SD_VARLINK_DEFINE_ENUM_TYPE( ExecPreserveMode, SD_VARLINK_DEFINE_ENUM_VALUE(no), SD_VARLINK_DEFINE_ENUM_VALUE(yes), - SD_VARLINK_DEFINE_ENUM_VALUE(restart)); + SD_VARLINK_DEFINE_ENUM_VALUE(restart), + SD_VARLINK_DEFINE_ENUM_VALUE(on_success)); SD_VARLINK_DEFINE_ENUM_TYPE( ExecKeyringMode, diff --git a/test/units/TEST-13-NSPAWN.unpriv.sh b/test/units/TEST-13-NSPAWN.unpriv.sh index f14e3121d41..31a75eae3f1 100755 --- a/test/units/TEST-13-NSPAWN.unpriv.sh +++ b/test/units/TEST-13-NSPAWN.unpriv.sh @@ -343,7 +343,7 @@ run0 -u testuser mkdir -p ".config/systemd/user/systemd-nspawn@fdstore.service.d run0 -u testuser -i "cat >.config/systemd/user/systemd-nspawn@fdstore.service.d/fdstore.conf </dev/null || true loginctl disable-linger testuser diff --git a/test/units/TEST-91-LIVEUPDATE.sh b/test/units/TEST-91-LIVEUPDATE.sh index 802aaae1b5d..cd4bc3bc370 100755 --- a/test/units/TEST-91-LIVEUPDATE.sh +++ b/test/units/TEST-91-LIVEUPDATE.sh @@ -138,6 +138,78 @@ EOF n_fds=$(systemctl show -P NFileDescriptorStore TEST-91-LIVEUPDATE-late-zerofds.service) test "$n_fds" -eq 0 systemctl start TEST-91-LIVEUPDATE-late-zerofds.service + + # Verify that with FileDescriptorStorePreserve=on-success the fdstore is + # discarded once the unit enters the permanent failed state, while still + # being preserved across the transitionary failed states that precede + # each automated auto-restart attempt. Use Restart=on-failure with + # StartLimitBurst=2 so the manager runs the helper twice before + # giving up. The helper: + # - on the first attempt pushes an fd into the fdstore, becomes ready, + # and then crashes, + # - on subsequent attempts asserts that the previously stored fd is + # handed back via $LISTEN_FDS (proving the fdstore survived the + # auto-restart) and then crashes again. + # When the start-limit is hit the unit lands in the permanent failed + # state, at which point the fdstore must be empty. + cat >/run/TEST-91-LIVEUPDATE-failure.sh <<'EOF' +#!/usr/bin/env bash +set -eux +state_file=/run/TEST-91-LIVEUPDATE-failure.attempt +attempt=$(cat "$state_file" 2>/dev/null || echo 0) +attempt=$((attempt + 1)) +echo "$attempt" > "$state_file" +if [[ "$attempt" -eq 1 ]]; then + systemd-notify --fd=0 --fdname=mem /run/systemd/system/TEST-91-LIVEUPDATE-failure.service <