]> git.ipfire.org Git - thirdparty/systemd.git/blame - docs/FILE_DESCRIPTOR_STORE.md
man: add self-contained example of notify protocol
[thirdparty/systemd.git] / docs / FILE_DESCRIPTOR_STORE.md
CommitLineData
0959847a 1---
1df74d1c 2title: File Descriptor Store
0959847a
LP
3category: Interfaces
4layout: default
5SPDX-License-Identifier: LGPL-2.1-or-later
6---
7
8# The File Descriptor Store
9
10*TL;DR: The systemd service manager may optionally maintain a set of file
1df74d1c
LP
11descriptors for each service. Those file descriptors are under control of the
12service. Storing file descriptors in the manager makes is easier to restart
13services without dropping connections or losing state.*
0959847a
LP
14
15Since its inception `systemd` has supported the *socket* *activation*
16mechanism: the service manager creates and listens on some sockets (and similar
17UNIX file descriptors) on behalf of a service, and then passes them to the
18service during activation of the service via UNIX file descriptor (short: *fd*)
19passing over `execve()`. This is primarily exposed in the
20[.socket](https://www.freedesktop.org/software/systemd/man/systemd.socket.html)
21unit type.
22
23The *file* *descriptor* *store* (short: *fdstore*) extends this concept, and
24allows services to *upload* during runtime additional fds to the service
25manager that it shall keep on its behalf. File descriptors are passed back to
26the service on subsequent activations, the same way as any socket activation
27fds are passed.
28
29If a service fd is passed to the fdstore logic of the service manager it only
30maintains a duplicate of it (in the sense of UNIX
31[`dup(2)`](https://man7.org/linux/man-pages/man2/dup.2.html)), the fd remains
32also in possession of the service itself, and it may (and is expected to)
33invoke any operations on it that it likes.
34
1df74d1c 35The primary use-case of this logic is to permit services to restart seamlessly
0959847a
LP
36(for example to update them to a newer version), without losing execution
37context, dropping pinned resources, terminating established connections or even
38just momentarily losing connectivity. In fact, as the file descriptors can be
1df74d1c
LP
39uploaded freely at any time during the service runtime, this can even be used
40to implement services that robustly handle abnormal termination and can recover
0959847a
LP
41from that without losing pinned resources.
42
43Note that Linux supports the
44[`memfd`](https://man7.org/linux/man-pages/man2/memfd_create.2.html) concept
45that allows associating a memory-backed fd with arbitrary data. This may
46conveniently be used to serialize service state into and then place in the
47fdstore, in order to implement service restarts with full service state being
48passed over.
49
f04aac3d 50## Basic Mechanism
0959847a
LP
51
52The fdstore is enabled per-service via the
53[`FileDescriptorStoreMax=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#FileDescriptorStoreMax=)
54service setting. It defaults to zero (which means the fdstore logic is turned
55off), but can take an unsigned integer value that controls how many fds to
56permit the service to upload to the service manager to keep simultaneously.
57
58If set to values > 0, the fdstore is enabled. When invoked the service may now
59(asynchronously) upload file descriptors to the fdstore via the
60[`sd_pid_notify_with_fds()`](https://www.freedesktop.org/software/systemd/man/sd_pid_notify_with_fds.html)
1df74d1c 61API call (or an equivalent re-implementation). When uploading the fds it is
0959847a
LP
62necessary to set the `FDSTORE=1` field in the message, to indicate what the fd
63is intended for. It's recommended to also set the `FDNAME=…` field to any
64string of choice, which may be used to identify the fd later.
65
66Whenever the service is restarted the fds in its fdstore will be passed to the
67new instance following the same protocol as for socket activation fds. i.e. the
68`$LISTEN_FDS`, `$LISTEN_PIDS`, `$LISTEN_FDNAMES` environment variables will be
69set (the latter will be populated from the `FDNAME=…` field mentioned
70above). See
71[`sd_listen_fds()`](https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html)
72for details on receiving such fds in a service. (Note that the name set in
73`FDNAME=…` does not need to be unique, which is useful when operating with
74multiple fully equivalent sockets or similar, for example for a service that
75both operates on IPv4 and IPv6 and treats both more or less the same.).
76
77And that's already the gist of it.
78
f04aac3d 79## Seamless Service Restarts
0959847a
LP
80
81A system service that provides a client-facing interface that shall be able to
82seamlessly restart can make use of this in a scheme like the following:
83whenever a new connection comes in it uploads its fd immediately into its
7227dd81 84fdstore. At appropriate times it also serializes its state into a memfd it
0959847a
LP
85uploads to the service manager — either whenever the state changed
86sufficiently, or simply right before it terminates. (The latter of course means
87that state only survives on *clean* restarts and abnormal termination implies the
88state is lost completely — while the former would mean there's a good chance the
89next restart after an abnormal termination could continue where it left off
90with only some context lost.)
91
92Using the fdstore for such seamless service restarts is generally recommended
93over implementations that attempt to leave a process from the old service
94instance around until after the new instance already started, so that the old
95then communicates with the new service instance, and passes the fds over
96directly. Typically service restarts are a mechanism for implementing *code*
97updates, hence leaving two version of the service running at the same time is
98generally problematic. It also collides with the systemd service manager's
99general principle of guaranteeing a pristine execution environment, a pristine
100security context, and a pristine resource management context for freshly
1df74d1c 101started services, without uncontrolled "leftovers" from previous runs. For
0959847a 102example: leaving processes from previous runs generally negatively affects
55e40b0b 103lifecycle management (i.e. `KillMode=none` must be set), which disables large
0959847a
LP
104parts of the service managers state tracking, resource management (as resource
105counters cannot start at zero during service activation anymore, since the old
106processes remaining skew them), security policies (as processes with possibly
7227dd81 107out-of-date security policies – SElinux, AppArmor, any LSM, seccomp, BPF — in
0959847a
LP
108effect remain), and similar.
109
f04aac3d 110## File Descriptor Store Lifecycle
0959847a
LP
111
112By default any file descriptor stored in the fdstore for which a `POLLHUP` or
113`POLLERR` is seen is automatically closed and removed from the fdstore. This
1df74d1c 114behavior can be turned off, by setting the `FDPOLL=0` field when uploading the
0959847a
LP
115fd via `sd_notify_with_fds()`.
116
117The fdstore is automatically closed whenever the service is fully deactivated
118and no jobs are queued for it anymore. This means that a restart job for a
119service will leave the fdstore intact, but a separate stop and start job for
120it — executed synchronously one after the other — will likely not.
121
1df74d1c 122This behavior can be modified via the
0959847a
LP
123[`FileDescriptorStorePreserve=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#FileDescriptorStorePreserve=)
124setting in service unit files. If set to `yes` the fdstore will be kept as long
125as the service definition is loaded into memory by the service manager, i.e. as
126long as at least one other loaded unit has a reference to it.
127
128The `systemctl clean --what=fdstore …` command may be used to explicitly clear
129the fdstore of a service. This is only allowed when the service is fully
130deactivated, and is hence primarily useful in case
131`FileDescriptorStorePreserve=yes` is set (because the fdstore is otherwise
132fully closed anyway in this state).
133
134Individual file descriptors may be removed from the fdstore via the
135`sd_notify()` mechanism, by sending an `FDSTOREREMOVE=1` message, accompanied
136by an `FDNAME=…` string identifying the fds to remove. (The name does not have
137to be unique, as mentioned, in which case *all* matching fds are
138closed). Generally it's a good idea to send such messages to the service
139manager during initialization of the service whenever an unrecognized fd is
140received, to make the service robust for code updates: if an old version
141uploaded an fd that the new version doesn't recognize anymore it's good idea to
142close it both in the service and in the fdstore.
143
1df74d1c
LP
144Note that storing a duplicate of an fd in the fdstore means the resource pinned
145by the fd remains pinned even if the service closes its duplicate of the
146fd. This in particular means that peers on a connection socket uploaded this
147way will not receive an automatic `POLLHUP` event anymore if the service code
148issues `close()` on the socket. It must accompany it with an `FDSTOREREMOVE=1`
149notification to the service manager, so that the fd is comprehensively closed.
0959847a 150
f04aac3d 151## Access Control
0959847a
LP
152
153Access to the fds in the file descriptor store is generally restricted to the
154service code itself. Pushing fds into or removing fds from the fdstore is
155subject to the access control restrictions of any other `sd_notify()` message,
156which is controlled via
157[`NotifyAccess=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#NotifyAccess=).
158
159By default only the main service process hence can push/remove fds, but by
74b2c22f 160setting `NotifyAccess=all` this may be relaxed to allow arbitrary service
0959847a
LP
161child processes to do the same.
162
f04aac3d 163## Soft Reboot
0959847a
LP
164
165The fdstore is particularly interesting in [soft
166reboot](https://www.freedesktop.org/software/systemd/man/systemd-soft-reboot.service.html)
167scenarios, as per `systemctl soft-reboot` (which restarts userspace like in a
168real reboot, but leaves the kernel running). File descriptor stores that remain
169loaded at the very end of the system cycle — just before the soft-reboot – are
170passed over to the next system cycle, and propagated to services they originate
171from there. This enables updating the full userspace of a system during
172runtime, fully replacing all processes without losing pinning resources,
173interrupting connectivity or established connections and similar.
174
175This mechanism can be enabled either by making sure the service survives until
176the very end (i.e. by setting `DefaultDependencies=no` so that it keeps running
177for the whole system lifetime without being regularly deactivated at shutdown)
1df74d1c 178or by setting `FileDescriptorStorePreserve=yes` (and referencing the unit
7227dd81 179continuously).
0959847a 180
1df74d1c
LP
181For further details see [Resource
182Pass-Through](https://www.freedesktop.org/software/systemd/man/systemd-soft-reboot.service.html#Resource%20Pass-Through).
183
f04aac3d 184## Initrd Transitions
1df74d1c 185
61afc539
ZJS
186The fdstore may also be used to pass file descriptors for resources from the
187initrd context to the main system. Restarting all processes after the
188transition is important as code running in the initrd should generally not
189continue to run after the switch to the host file system, since that pins
190backing files from the initrd, and the initrd might contain different versions
191of programs than the host.
1df74d1c
LP
192
193Any service that still runs during the initrd→host transition will have its
194fdstore passed over the transition, where it will be passed back to any queued
195services of the same name.
196
197The soft reboot cycle transition and the initrd→host transition are
198semantically very similar, hence similar rules apply, and in both cases it is
199recommended to use the fdstore if pinned resources shall be passed over.
200
f04aac3d 201## Debugging
0959847a
LP
202
203The
204[`systemd-analyze`](https://www.freedesktop.org/software/systemd/man/systemd-analyze.html#systemd-analyze%20fdstore%20%5BUNIT...%5D)
205tool may be used to list the current contents of the fdstore of any running
206service.
207
208The
209[`systemd-run`](https://www.freedesktop.org/software/systemd/man/systemd-run.html)
210tool may be used to quickly start a testing binary or similar as a service. Use
211`-p FileDescriptorStore=4711` to enable the fdstore from `systemd-run`'s
212command line. By using the `-t` switch you can even interactively communicate
213via processes spawned that way, via the TTY.