]> git.ipfire.org Git - thirdparty/systemd.git/blob - docs/FILE_DESCRIPTOR_STORE.md
c14ae23051c84f7d032f800c8306a7ad387bb8c0
[thirdparty/systemd.git] / docs / FILE_DESCRIPTOR_STORE.md
1 ---
2 title: The File Descriptor Store
3 category: Interfaces
4 layout: default
5 SPDX-License-Identifier: LGPL-2.1-or-later
6 ---
7
8 # The File Descriptor Store
9
10 *TL;DR: The systemd service manager may optionally maintain a set of file
11 descriptors for each service, that are under control of the service and that
12 help making service restarts without losing connectivity or context easier to
13 implement.*
14
15 Since its inception `systemd` has supported the *socket* *activation*
16 mechanism: the service manager creates and listens on some sockets (and similar
17 UNIX file descriptors) on behalf of a service, and then passes them to the
18 service during activation of the service via UNIX file descriptor (short: *fd*)
19 passing over `execve()`. This is primarily exposed in the
20 [.socket](https://www.freedesktop.org/software/systemd/man/systemd.socket.html)
21 unit type.
22
23 The *file* *descriptor* *store* (short: *fdstore*) extends this concept, and
24 allows services to *upload* during runtime additional fds to the service
25 manager that it shall keep on its behalf. File descriptors are passed back to
26 the service on subsequent activations, the same way as any socket activation
27 fds are passed.
28
29 If a service fd is passed to the fdstore logic of the service manager it only
30 maintains a duplicate of it (in the sense of UNIX
31 [`dup(2)`](https://man7.org/linux/man-pages/man2/dup.2.html)), the fd remains
32 also in possession of the service itself, and it may (and is expected to)
33 invoke any operations on it that it likes.
34
35 The primary use case of this logic is to permit services to restart seamlessly
36 (for example to update them to a newer version), without losing execution
37 context, dropping pinned resources, terminating established connections or even
38 just momentarily losing connectivity. In fact, as the file descriptors can be
39 uploaded freely at any time during the service runtime, this can even be used to
40 implement services that robustly handle abnormal termination and can recover
41 from that without losing pinned resources.
42
43 Note that Linux supports the
44 [`memfd`](https://man7.org/linux/man-pages/man2/memfd_create.2.html) concept
45 that allows associating a memory-backed fd with arbitrary data. This may
46 conveniently be used to serialize service state into and then place in the
47 fdstore, in order to implement service restarts with full service state being
48 passed over.
49
50 # Basic Mechanism
51
52 The fdstore is enabled per-service via the
53 [`FileDescriptorStoreMax=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#FileDescriptorStoreMax=)
54 service setting. It defaults to zero (which means the fdstore logic is turned
55 off), but can take an unsigned integer value that controls how many fds to
56 permit the service to upload to the service manager to keep simultaneously.
57
58 If set to values > 0, the fdstore is enabled. When invoked the service may now
59 (asynchronously) upload file descriptors to the fdstore via the
60 [`sd_pid_notify_with_fds()`](https://www.freedesktop.org/software/systemd/man/sd_pid_notify_with_fds.html)
61 API call (or an equivalent reimplementation). When uploading the fds it is
62 necessary to set the `FDSTORE=1` field in the message, to indicate what the fd
63 is intended for. It's recommended to also set the `FDNAME=…` field to any
64 string of choice, which may be used to identify the fd later.
65
66 Whenever the service is restarted the fds in its fdstore will be passed to the
67 new instance following the same protocol as for socket activation fds. i.e. the
68 `$LISTEN_FDS`, `$LISTEN_PIDS`, `$LISTEN_FDNAMES` environment variables will be
69 set (the latter will be populated from the `FDNAME=…` field mentioned
70 above). See
71 [`sd_listen_fds()`](https://www.freedesktop.org/software/systemd/man/sd_listen_fds.html)
72 for details on receiving such fds in a service. (Note that the name set in
73 `FDNAME=…` does not need to be unique, which is useful when operating with
74 multiple fully equivalent sockets or similar, for example for a service that
75 both operates on IPv4 and IPv6 and treats both more or less the same.).
76
77 And that's already the gist of it.
78
79 # Seamless Service Restarts
80
81 A system service that provides a client-facing interface that shall be able to
82 seamlessly restart can make use of this in a scheme like the following:
83 whenever a new connection comes in it uploads its fd immediately into its
84 fdstore. At appropriate times it also serializes its state into a memfd it
85 uploads to the service manager — either whenever the state changed
86 sufficiently, or simply right before it terminates. (The latter of course means
87 that state only survives on *clean* restarts and abnormal termination implies the
88 state is lost completely — while the former would mean there's a good chance the
89 next restart after an abnormal termination could continue where it left off
90 with only some context lost.)
91
92 Using the fdstore for such seamless service restarts is generally recommended
93 over implementations that attempt to leave a process from the old service
94 instance around until after the new instance already started, so that the old
95 then communicates with the new service instance, and passes the fds over
96 directly. Typically service restarts are a mechanism for implementing *code*
97 updates, hence leaving two version of the service running at the same time is
98 generally problematic. It also collides with the systemd service manager's
99 general principle of guaranteeing a pristine execution environment, a pristine
100 security context, and a pristine resource management context for freshly
101 started services, without uncontrolled "left-overs" from previous runs. For
102 example: leaving processes from previous runs generally negatively affects
103 lifecycle management (i.e. `KillMode=none` must be set), which disables large
104 parts of the service managers state tracking, resource management (as resource
105 counters cannot start at zero during service activation anymore, since the old
106 processes remaining skew them), security policies (as processes with possibly
107 out-of-date security policies – SElinux, AppArmor, any LSM, seccomp, BPF — in
108 effect remain), and similar.
109
110 # File Descriptor Store Lifecycle
111
112 By default any file descriptor stored in the fdstore for which a `POLLHUP` or
113 `POLLERR` is seen is automatically closed and removed from the fdstore. This
114 behaviour can be turned off, by setting the `FDPOLL=0` field when uploading the
115 fd via `sd_notify_with_fds()`.
116
117 The fdstore is automatically closed whenever the service is fully deactivated
118 and no jobs are queued for it anymore. This means that a restart job for a
119 service will leave the fdstore intact, but a separate stop and start job for
120 it — executed synchronously one after the other — will likely not.
121
122 This behaviour can be modified via the
123 [`FileDescriptorStorePreserve=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#FileDescriptorStorePreserve=)
124 setting in service unit files. If set to `yes` the fdstore will be kept as long
125 as the service definition is loaded into memory by the service manager, i.e. as
126 long as at least one other loaded unit has a reference to it.
127
128 The `systemctl clean --what=fdstore …` command may be used to explicitly clear
129 the fdstore of a service. This is only allowed when the service is fully
130 deactivated, and is hence primarily useful in case
131 `FileDescriptorStorePreserve=yes` is set (because the fdstore is otherwise
132 fully closed anyway in this state).
133
134 Individual file descriptors may be removed from the fdstore via the
135 `sd_notify()` mechanism, by sending an `FDSTOREREMOVE=1` message, accompanied
136 by an `FDNAME=…` string identifying the fds to remove. (The name does not have
137 to be unique, as mentioned, in which case *all* matching fds are
138 closed). Generally it's a good idea to send such messages to the service
139 manager during initialization of the service whenever an unrecognized fd is
140 received, to make the service robust for code updates: if an old version
141 uploaded an fd that the new version doesn't recognize anymore it's good idea to
142 close it both in the service and in the fdstore.
143
144 Note that storing a duplicate of an fd in the fdstore means the fd remains
145 pinned even if the service closes it. This in particular means that peers on a
146 connection socket uploaded this way will not receive an automatic `POLLHUP`
147 event anymore if the service code issues `close()` on the socket. It must
148 accompany it with an `FDSTOREREMOVE=1` notification to the service manager, so
149 that the fd is comprehensively closed.
150
151 # Access Control
152
153 Access to the fds in the file descriptor store is generally restricted to the
154 service code itself. Pushing fds into or removing fds from the fdstore is
155 subject to the access control restrictions of any other `sd_notify()` message,
156 which is controlled via
157 [`NotifyAccess=`](https://www.freedesktop.org/software/systemd/man/systemd.service.html#NotifyAccess=).
158
159 By default only the main service process hence can push/remove fds, but by
160 setting `NotifyAccess=cgroup` this may be relaxed to allow arbitrary service
161 child processes to do the same.
162
163 # Soft Reboot
164
165 The fdstore is particularly interesting in [soft
166 reboot](https://www.freedesktop.org/software/systemd/man/systemd-soft-reboot.service.html)
167 scenarios, as per `systemctl soft-reboot` (which restarts userspace like in a
168 real reboot, but leaves the kernel running). File descriptor stores that remain
169 loaded at the very end of the system cycle — just before the soft-reboot – are
170 passed over to the next system cycle, and propagated to services they originate
171 from there. This enables updating the full userspace of a system during
172 runtime, fully replacing all processes without losing pinning resources,
173 interrupting connectivity or established connections and similar.
174
175 This mechanism can be enabled either by making sure the service survives until
176 the very end (i.e. by setting `DefaultDependencies=no` so that it keeps running
177 for the whole system lifetime without being regularly deactivated at shutdown)
178 or by setting `FileDescriptorStorePresever=yes` (and referencing the unit
179 continuously).
180
181 # Debugging
182
183 The
184 [`systemd-analyze`](https://www.freedesktop.org/software/systemd/man/systemd-analyze.html#systemd-analyze%20fdstore%20%5BUNIT...%5D)
185 tool may be used to list the current contents of the fdstore of any running
186 service.
187
188 The
189 [`systemd-run`](https://www.freedesktop.org/software/systemd/man/systemd-run.html)
190 tool may be used to quickly start a testing binary or similar as a service. Use
191 `-p FileDescriptorStore=4711` to enable the fdstore from `systemd-run`'s
192 command line. By using the `-t` switch you can even interactively communicate
193 via processes spawned that way, via the TTY.