]>
Commit | Line | Data |
---|---|---|
514094f9 | 1 | <?xml version='1.0'?> |
3a54a157 | 2 | <!DOCTYPE refentry PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" |
7a8aa0ec | 3 | "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" [ |
5fadff33 ZJS |
4 | <!ENTITY % entities SYSTEM "custom-entities.ent" > |
5 | %entities; | |
7a8aa0ec | 6 | ]> |
db9ecf05 | 7 | <!-- SPDX-License-Identifier: LGPL-2.1-or-later --> |
8f7a3c14 | 8 | |
dfdebb1b | 9 | <refentry id="systemd-nspawn" |
798d3a52 ZJS |
10 | xmlns:xi="http://www.w3.org/2001/XInclude"> |
11 | ||
12 | <refentryinfo> | |
13 | <title>systemd-nspawn</title> | |
14 | <productname>systemd</productname> | |
798d3a52 ZJS |
15 | </refentryinfo> |
16 | ||
17 | <refmeta> | |
18 | <refentrytitle>systemd-nspawn</refentrytitle> | |
19 | <manvolnum>1</manvolnum> | |
20 | </refmeta> | |
21 | ||
22 | <refnamediv> | |
23 | <refname>systemd-nspawn</refname> | |
a7e2e50d | 24 | <refpurpose>Spawn a command or OS in a light-weight container</refpurpose> |
798d3a52 ZJS |
25 | </refnamediv> |
26 | ||
27 | <refsynopsisdiv> | |
28 | <cmdsynopsis> | |
29 | <command>systemd-nspawn</command> | |
30 | <arg choice="opt" rep="repeat">OPTIONS</arg> | |
31 | <arg choice="opt"><replaceable>COMMAND</replaceable> | |
32 | <arg choice="opt" rep="repeat">ARGS</arg> | |
33 | </arg> | |
34 | </cmdsynopsis> | |
35 | <cmdsynopsis> | |
36 | <command>systemd-nspawn</command> | |
4447e799 | 37 | <arg choice="plain">--boot</arg> |
798d3a52 ZJS |
38 | <arg choice="opt" rep="repeat">OPTIONS</arg> |
39 | <arg choice="opt" rep="repeat">ARGS</arg> | |
40 | </cmdsynopsis> | |
41 | </refsynopsisdiv> | |
42 | ||
43 | <refsect1> | |
44 | <title>Description</title> | |
45 | ||
b09c0bba LP |
46 | <para><command>systemd-nspawn</command> may be used to run a command or OS in a light-weight namespace |
47 | container. In many ways it is similar to <citerefentry | |
48 | project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>1</manvolnum></citerefentry>, but more powerful | |
49 | since it fully virtualizes the file system hierarchy, as well as the process tree, the various IPC subsystems and | |
50 | the host and domain name.</para> | |
51 | ||
5164c3b4 | 52 | <para><command>systemd-nspawn</command> may be invoked on any directory tree containing an operating system tree, |
b09c0bba | 53 | using the <option>--directory=</option> command line option. By using the <option>--machine=</option> option an OS |
5164c3b4 | 54 | tree is automatically searched for in a couple of locations, most importantly in |
3b121157 | 55 | <filename>/var/lib/machines/</filename>, the suggested directory to place OS container images installed on the |
b09c0bba LP |
56 | system.</para> |
57 | ||
58 | <para>In contrast to <citerefentry | |
59 | project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>1</manvolnum></citerefentry> <command>systemd-nspawn</command> | |
60 | may be used to boot full Linux-based operating systems in a container.</para> | |
61 | ||
62 | <para><command>systemd-nspawn</command> limits access to various kernel interfaces in the container to read-only, | |
3b121157 | 63 | such as <filename>/sys/</filename>, <filename>/proc/sys/</filename> or <filename>/sys/fs/selinux/</filename>. The |
b09c0bba LP |
64 | host's network interfaces and the system clock may not be changed from within the container. Device nodes may not |
65 | be created. The host system cannot be rebooted and kernel modules may not be loaded from within the | |
798d3a52 ZJS |
66 | container.</para> |
67 | ||
b09c0bba LP |
68 | <para>Use a tool like <citerefentry |
69 | project='mankier'><refentrytitle>dnf</refentrytitle><manvolnum>8</manvolnum></citerefentry>, <citerefentry | |
70 | project='die-net'><refentrytitle>debootstrap</refentrytitle><manvolnum>8</manvolnum></citerefentry>, or | |
71 | <citerefentry project='archlinux'><refentrytitle>pacman</refentrytitle><manvolnum>8</manvolnum></citerefentry> to | |
72 | set up an OS directory tree suitable as file system hierarchy for <command>systemd-nspawn</command> containers. See | |
73 | the Examples section below for details on suitable invocation of these commands.</para> | |
74 | ||
75 | <para>As a safety check <command>systemd-nspawn</command> will verify the existence of | |
76 | <filename>/usr/lib/os-release</filename> or <filename>/etc/os-release</filename> in the container tree before | |
926f2a04 | 77 | booting a container (see |
b09c0bba LP |
78 | <citerefentry><refentrytitle>os-release</refentrytitle><manvolnum>5</manvolnum></citerefentry>). It might be |
79 | necessary to add this file to the container tree manually if the OS of the container is too old to contain this | |
798d3a52 | 80 | file out-of-the-box.</para> |
b09c0bba LP |
81 | |
82 | <para><command>systemd-nspawn</command> may be invoked directly from the interactive command line or run as system | |
83 | service in the background. In this mode each container instance runs as its own service instance; a default | |
84 | template unit file <filename>systemd-nspawn@.service</filename> is provided to make this easy, taking the container | |
85 | name as instance identifier. Note that different default options apply when <command>systemd-nspawn</command> is | |
6dd6a9c4 | 86 | invoked by the template unit file than interactively on the command line. Most importantly the template unit file |
b47013fd BF |
87 | makes use of the <option>--boot</option> option which is not the default in case <command>systemd-nspawn</command> |
88 | is invoked from the interactive command line. Further differences with the defaults are documented along with the | |
b09c0bba LP |
89 | various supported options below.</para> |
90 | ||
91 | <para>The <citerefentry><refentrytitle>machinectl</refentrytitle><manvolnum>1</manvolnum></citerefentry> tool may | |
92 | be used to execute a number of operations on containers. In particular it provides easy-to-use commands to run | |
93 | containers as system services using the <filename>systemd-nspawn@.service</filename> template unit | |
94 | file.</para> | |
95 | ||
96 | <para>Along with each container a settings file with the <filename>.nspawn</filename> suffix may exist, containing | |
97 | additional settings to apply when running the container. See | |
98 | <citerefentry><refentrytitle>systemd.nspawn</refentrytitle><manvolnum>5</manvolnum></citerefentry> for | |
99 | details. Settings files override the default options used by the <filename>systemd-nspawn@.service</filename> | |
100 | template unit file, making it usually unnecessary to alter this template file directly.</para> | |
101 | ||
102 | <para>Note that <command>systemd-nspawn</command> will mount file systems private to the container to | |
3b121157 | 103 | <filename>/dev/</filename>, <filename>/run/</filename> and similar. These will not be visible outside of the |
b09c0bba LP |
104 | container, and their contents will be lost when the container exits.</para> |
105 | ||
106 | <para>Note that running two <command>systemd-nspawn</command> containers from the same directory tree will not make | |
107 | processes in them see each other. The PID namespace separation of the two containers is complete and the containers | |
3a9d9f2a | 108 | will share very few runtime objects except for the underlying file system. Rather use |
b09c0bba LP |
109 | <citerefentry><refentrytitle>machinectl</refentrytitle><manvolnum>1</manvolnum></citerefentry>'s |
110 | <command>login</command> or <command>shell</command> commands to request an additional login session in a running | |
111 | container.</para> | |
112 | ||
113 | <para><command>systemd-nspawn</command> implements the <ulink | |
53dc5fbc | 114 | url="https://systemd.io/CONTAINER_INTERFACE">Container Interface</ulink> specification.</para> |
b09c0bba LP |
115 | |
116 | <para>While running, containers invoked with <command>systemd-nspawn</command> are registered with the | |
117 | <citerefentry><refentrytitle>systemd-machined</refentrytitle><manvolnum>8</manvolnum></citerefentry> service that | |
118 | keeps track of running containers, and provides programming interfaces to interact with them.</para> | |
798d3a52 ZJS |
119 | </refsect1> |
120 | ||
121 | <refsect1> | |
122 | <title>Options</title> | |
123 | ||
b47013fd | 124 | <para>If option <option>--boot</option> is specified, the arguments |
3f2d1365 | 125 | are used as arguments for the init program. Otherwise, |
798d3a52 ZJS |
126 | <replaceable>COMMAND</replaceable> specifies the program to launch |
127 | in the container, and the remaining arguments are used as | |
b09c0bba | 128 | arguments for this program. If <option>--boot</option> is not used and |
ff9b60f3 | 129 | no arguments are specified, a shell is launched in the |
798d3a52 ZJS |
130 | container.</para> |
131 | ||
132 | <para>The following options are understood:</para> | |
133 | ||
134 | <variablelist> | |
d99058c9 LP |
135 | |
136 | <varlistentry> | |
137 | <term><option>-q</option></term> | |
138 | <term><option>--quiet</option></term> | |
139 | ||
140 | <listitem><para>Turns off any status output by the tool | |
141 | itself. When this switch is used, the only output from nspawn | |
142 | will be the console output of the container OS | |
143 | itself.</para></listitem> | |
144 | </varlistentry> | |
145 | ||
146 | <varlistentry> | |
147 | <term><option>--settings=</option><replaceable>MODE</replaceable></term> | |
148 | ||
149 | <listitem><para>Controls whether | |
150 | <command>systemd-nspawn</command> shall search for and use | |
151 | additional per-container settings from | |
152 | <filename>.nspawn</filename> files. Takes a boolean or the | |
153 | special values <option>override</option> or | |
154 | <option>trusted</option>.</para> | |
155 | ||
156 | <para>If enabled (the default), a settings file named after the | |
157 | machine (as specified with the <option>--machine=</option> | |
158 | setting, or derived from the directory or image file name) | |
159 | with the suffix <filename>.nspawn</filename> is searched in | |
160 | <filename>/etc/systemd/nspawn/</filename> and | |
161 | <filename>/run/systemd/nspawn/</filename>. If it is found | |
162 | there, its settings are read and used. If it is not found | |
163 | there, it is subsequently searched in the same directory as the | |
164 | image file or in the immediate parent of the root directory of | |
165 | the container. In this case, if the file is found, its settings | |
166 | will be also read and used, but potentially unsafe settings | |
167 | are ignored. Note that in both these cases, settings on the | |
168 | command line take precedence over the corresponding settings | |
169 | from loaded <filename>.nspawn</filename> files, if both are | |
170 | specified. Unsafe settings are considered all settings that | |
171 | elevate the container's privileges or grant access to | |
172 | additional resources such as files or directories of the | |
173 | host. For details about the format and contents of | |
174 | <filename>.nspawn</filename> files, consult | |
175 | <citerefentry><refentrytitle>systemd.nspawn</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> | |
176 | ||
177 | <para>If this option is set to <option>override</option>, the | |
178 | file is searched, read and used the same way, however, the order of | |
179 | precedence is reversed: settings read from the | |
180 | <filename>.nspawn</filename> file will take precedence over | |
181 | the corresponding command line options, if both are | |
182 | specified.</para> | |
183 | ||
184 | <para>If this option is set to <option>trusted</option>, the | |
185 | file is searched, read and used the same way, but regardless | |
186 | of being found in <filename>/etc/systemd/nspawn/</filename>, | |
187 | <filename>/run/systemd/nspawn/</filename> or next to the image | |
188 | file or container root directory, all settings will take | |
189 | effect, however, command line arguments still take precedence | |
190 | over corresponding settings.</para> | |
191 | ||
192 | <para>If disabled, no <filename>.nspawn</filename> file is read | |
193 | and no settings except the ones on the command line are in | |
194 | effect.</para></listitem> | |
195 | </varlistentry> | |
196 | ||
197 | </variablelist> | |
198 | ||
199 | <refsect2> | |
200 | <title>Image Options</title> | |
201 | ||
202 | <variablelist> | |
203 | ||
798d3a52 ZJS |
204 | <varlistentry> |
205 | <term><option>-D</option></term> | |
206 | <term><option>--directory=</option></term> | |
207 | ||
208 | <listitem><para>Directory to use as file system root for the | |
209 | container.</para> | |
210 | ||
211 | <para>If neither <option>--directory=</option>, nor | |
212 | <option>--image=</option> is specified the directory is | |
32b64cce RM |
213 | determined by searching for a directory named the same as the |
214 | machine name specified with <option>--machine=</option>. See | |
215 | <citerefentry><refentrytitle>machinectl</refentrytitle><manvolnum>1</manvolnum></citerefentry> | |
216 | section "Files and Directories" for the precise search path.</para> | |
217 | ||
218 | <para>If neither <option>--directory=</option>, | |
219 | <option>--image=</option>, nor <option>--machine=</option> | |
220 | are specified, the current directory will | |
221 | be used. May not be specified together with | |
798d3a52 ZJS |
222 | <option>--image=</option>.</para></listitem> |
223 | </varlistentry> | |
224 | ||
225 | <varlistentry> | |
226 | <term><option>--template=</option></term> | |
227 | ||
3f2fa834 LP |
228 | <listitem><para>Directory or <literal>btrfs</literal> subvolume to use as template for the |
229 | container's root directory. If this is specified and the container's root directory (as configured by | |
230 | <option>--directory=</option>) does not yet exist it is created as <literal>btrfs</literal> snapshot | |
231 | (if supported) or plain directory (otherwise) and populated from this template tree. Ideally, the | |
232 | specified template path refers to the root of a <literal>btrfs</literal> subvolume, in which case a | |
233 | simple copy-on-write snapshot is taken, and populating the root directory is instant. If the | |
234 | specified template path does not refer to the root of a <literal>btrfs</literal> subvolume (or not | |
235 | even to a <literal>btrfs</literal> file system at all), the tree is copied (though possibly in a | |
236 | 'reflink' copy-on-write scheme — if the file system supports that), which can be substantially more | |
237 | time-consuming. Note that the snapshot taken is of the specified directory or subvolume, including | |
238 | all subdirectories and subvolumes below it, but excluding any sub-mounts. May not be specified | |
239 | together with <option>--image=</option> or <option>--ephemeral</option>.</para> | |
3fe22bb4 | 240 | |
38b38500 | 241 | <para>Note that this switch leaves hostname, machine ID and |
3fe22bb4 LP |
242 | all other settings that could identify the instance |
243 | unmodified.</para></listitem> | |
798d3a52 ZJS |
244 | </varlistentry> |
245 | ||
246 | <varlistentry> | |
247 | <term><option>-x</option></term> | |
248 | <term><option>--ephemeral</option></term> | |
249 | ||
0f3be6ca LP |
250 | <listitem><para>If specified, the container is run with a temporary snapshot of its file system that is removed |
251 | immediately when the container terminates. May not be specified together with | |
3fe22bb4 | 252 | <option>--template=</option>.</para> |
38b38500 | 253 | <para>Note that this switch leaves hostname, machine ID and all other settings that could identify |
3f2fa834 LP |
254 | the instance unmodified. Please note that — as with <option>--template=</option> — taking the |
255 | temporary snapshot is more efficient on file systems that support subvolume snapshots or 'reflinks' | |
256 | natively (<literal>btrfs</literal> or new <literal>xfs</literal>) than on more traditional file | |
257 | systems that do not (<literal>ext4</literal>). Note that the snapshot taken is of the specified | |
258 | directory or subvolume, including all subdirectories and subvolumes below it, but excluding any | |
259 | sub-mounts.</para> | |
b23f1628 LP |
260 | |
261 | <para>With this option no modifications of the container image are retained. Use | |
262 | <option>--volatile=</option> (described below) for other mechanisms to restrict persistency of | |
263 | container images during runtime.</para> | |
264 | </listitem> | |
798d3a52 ZJS |
265 | </varlistentry> |
266 | ||
267 | <varlistentry> | |
268 | <term><option>-i</option></term> | |
269 | <term><option>--image=</option></term> | |
270 | ||
271 | <listitem><para>Disk image to mount the root directory for the | |
272 | container from. Takes a path to a regular file or to a block | |
273 | device node. The file or block device must contain | |
274 | either:</para> | |
275 | ||
276 | <itemizedlist> | |
277 | <listitem><para>An MBR partition table with a single | |
278 | partition of type 0x83 that is marked | |
279 | bootable.</para></listitem> | |
280 | ||
281 | <listitem><para>A GUID partition table (GPT) with a single | |
282 | partition of type | |
283 | 0fc63daf-8483-4772-8e79-3d69d8477de4.</para></listitem> | |
284 | ||
285 | <listitem><para>A GUID partition table (GPT) with a marked | |
286 | root partition which is mounted as the root directory of the | |
287 | container. Optionally, GPT images may contain a home and/or | |
288 | a server data partition which are mounted to the appropriate | |
289 | places in the container. All these partitions must be | |
290 | identified by the partition types defined by the <ulink | |
db811444 | 291 | url="https://uapi-group.org/specifications/specs/discoverable_partitions_specification">Discoverable |
798d3a52 | 292 | Partitions Specification</ulink>.</para></listitem> |
58abb66f LP |
293 | |
294 | <listitem><para>No partition table, and a single file system spanning the whole image.</para></listitem> | |
798d3a52 ZJS |
295 | </itemizedlist> |
296 | ||
0f3be6ca LP |
297 | <para>On GPT images, if an EFI System Partition (ESP) is discovered, it is automatically mounted to |
298 | <filename>/efi</filename> (or <filename>/boot</filename> as fallback) in case a directory by this name exists | |
299 | and is empty.</para> | |
300 | ||
58abb66f LP |
301 | <para>Partitions encrypted with LUKS are automatically decrypted. Also, on GPT images dm-verity data integrity |
302 | hash partitions are set up if the root hash for them is specified using the <option>--root-hash=</option> | |
303 | option.</para> | |
304 | ||
e7cbe5cb LB |
305 | <para>Single file system images (i.e. file systems without a surrounding partition table) can be opened using |
306 | dm-verity if the integrity data is passed using the <option>--root-hash=</option> and | |
c2923fdc | 307 | <option>--verity-data=</option> (and optionally <option>--root-hash-sig=</option>) options.</para> |
e7cbe5cb | 308 | |
0f3be6ca LP |
309 | <para>Any other partitions, such as foreign partitions or swap partitions are not mounted. May not be specified |
310 | together with <option>--directory=</option>, <option>--template=</option>.</para></listitem> | |
798d3a52 | 311 | </varlistentry> |
58abb66f | 312 | |
9ea81191 LP |
313 | <varlistentry> |
314 | <term><option>--image-policy=<replaceable>policy</replaceable></option></term> | |
315 | ||
316 | <listitem><para>Takes an image policy string as argument, as per | |
317 | <citerefentry><refentrytitle>systemd.image-policy</refentrytitle><manvolnum>7</manvolnum></citerefentry>. The | |
318 | policy is enforced when operating on the disk image specified via <option>--image=</option>, see | |
319 | above. If not specified defaults to | |
320 | <literal>root=verity+signed+encrypted+unprotected+absent:usr=verity+signed+encrypted+unprotected+absent:home=encrypted+unprotected+absent:srv=encrypted+unprotected+absent:esp=unprotected+absent:xbootldr=unprotected+absent:tmp=encrypted+unprotected+absent:var=encrypted+unprotected+absent</literal>, | |
321 | i.e. all recognized file systems in the image are used, but not the swap partition.</para></listitem> | |
322 | </varlistentry> | |
323 | ||
3d6c3675 LP |
324 | <varlistentry> |
325 | <term><option>--oci-bundle=</option></term> | |
326 | ||
327 | <listitem><para>Takes the path to an OCI runtime bundle to invoke, as specified in the <ulink | |
328 | url="https://github.com/opencontainers/runtime-spec/blob/master/spec.md">OCI Runtime Specification</ulink>. In | |
329 | this case no <filename>.nspawn</filename> file is loaded, and the root directory and various settings are read | |
330 | from the OCI runtime JSON data (but data passed on the command line takes precedence).</para></listitem> | |
331 | </varlistentry> | |
332 | ||
d99058c9 LP |
333 | <varlistentry> |
334 | <term><option>--read-only</option></term> | |
335 | ||
336 | <listitem><para>Mount the container's root file system (and any other file systems container in the container | |
337 | image) read-only. This has no effect on additional mounts made with <option>--bind=</option>, | |
338 | <option>--tmpfs=</option> and similar options. This mode is implied if the container image file or directory is | |
339 | marked read-only itself. It is also implied if <option>--volatile=</option> is used. In this case the container | |
340 | image on disk is strictly read-only, while changes are permitted but kept non-persistently in memory only. For | |
341 | further details, see below.</para></listitem> | |
342 | </varlistentry> | |
343 | ||
344 | <varlistentry> | |
345 | <term><option>--volatile</option></term> | |
346 | <term><option>--volatile=</option><replaceable>MODE</replaceable></term> | |
347 | ||
348 | <listitem><para>Boots the container in volatile mode. When no mode parameter is passed or when mode is | |
349 | specified as <option>yes</option>, full volatile mode is enabled. This means the root directory is mounted as a | |
350 | mostly unpopulated <literal>tmpfs</literal> instance, and <filename>/usr/</filename> from the OS tree is | |
351 | mounted into it in read-only mode (the system thus starts up with read-only OS image, but pristine state and | |
352 | configuration, any changes are lost on shutdown). When the mode parameter is specified as | |
353 | <option>state</option>, the OS tree is mounted read-only, but <filename>/var/</filename> is mounted as a | |
354 | writable <literal>tmpfs</literal> instance into it (the system thus starts up with read-only OS resources and | |
355 | configuration, but pristine state, and any changes to the latter are lost on shutdown). When the mode parameter | |
356 | is specified as <option>overlay</option> the read-only root file system is combined with a writable | |
357 | <filename>tmpfs</filename> instance through <literal>overlayfs</literal>, so that it appears at it normally | |
358 | would, but any changes are applied to the temporary file system only and lost when the container is | |
359 | terminated. When the mode parameter is specified as <option>no</option> (the default), the whole OS tree is | |
360 | made available writable (unless <option>--read-only</option> is specified, see above).</para> | |
361 | ||
211c99c7 ZJS |
362 | <para>Note that if one of the volatile modes is chosen, its effect is limited to the root file system |
363 | (or <filename>/var/</filename> in case of <option>state</option>), and any other mounts placed in the | |
364 | hierarchy are unaffected — regardless if they are established automatically (e.g. the EFI system | |
365 | partition that might be mounted to <filename>/efi/</filename> or <filename>/boot/</filename>) or | |
366 | explicitly (e.g. through an additional command line option such as <option>--bind=</option>, see | |
367 | below). This means, even if <option>--volatile=overlay</option> is used changes to | |
368 | <filename>/efi/</filename> or <filename>/boot/</filename> are prohibited in case such a partition | |
369 | exists in the container image operated on, and even if <option>--volatile=state</option> is used the | |
370 | hypothetical file <filename index="false">/etc/foobar</filename> is potentially writable if | |
371 | <option>--bind=/etc/foobar</option> if used to mount it from outside the read-only container | |
3b121157 | 372 | <filename>/etc/</filename> directory.</para> |
d99058c9 LP |
373 | |
374 | <para>The <option>--ephemeral</option> option is closely related to this setting, and provides similar | |
375 | behaviour by making a temporary, ephemeral copy of the whole OS image and executing that. For further details, | |
376 | see above.</para> | |
377 | ||
378 | <para>The <option>--tmpfs=</option> and <option>--overlay=</option> options provide similar functionality, but | |
379 | for specific sub-directories of the OS image only. For details, see below.</para> | |
380 | ||
381 | <para>This option provides similar functionality for containers as the <literal>systemd.volatile=</literal> | |
382 | kernel command line switch provides for host systems. See | |
383 | <citerefentry><refentrytitle>kernel-command-line</refentrytitle><manvolnum>7</manvolnum></citerefentry> for | |
384 | details.</para> | |
385 | ||
2e542f4e LP |
386 | <para>Note that setting this option to <option>yes</option> or <option>state</option> will only work |
387 | correctly with operating systems in the container that can boot up with only | |
388 | <filename>/usr/</filename> mounted, and are able to automatically populate <filename>/var/</filename> | |
389 | (and <filename>/etc/</filename> in case of <literal>--volatile=yes</literal>). Specifically, this | |
390 | means that operating systems that follow the historic split of <filename>/bin/</filename> and | |
391 | <filename>/lib/</filename> (and related directories) from <filename>/usr/</filename> (i.e. where the | |
392 | former are not symlinks into the latter) are not supported by <literal>--volatile=yes</literal> as | |
393 | container payload. The <option>overlay</option> option does not require any particular preparations | |
394 | in the OS, but do note that <literal>overlayfs</literal> behaviour differs from regular file systems | |
395 | in a number of ways, and hence compatibility is limited.</para></listitem> | |
d99058c9 LP |
396 | </varlistentry> |
397 | ||
58abb66f LP |
398 | <varlistentry> |
399 | <term><option>--root-hash=</option></term> | |
400 | ||
401 | <listitem><para>Takes a data integrity (dm-verity) root hash specified in hexadecimal. This option enables data | |
402 | integrity checks using dm-verity, if the used image contains the appropriate integrity data (see above). The | |
ef3116b5 | 403 | specified hash must match the root hash of integrity data, and is usually at least 256 bits (and hence 64 |
41488e1f LP |
404 | formatted hexadecimal characters) long (in case of SHA256 for example). If this option is not specified, but |
405 | the image file carries the <literal>user.verity.roothash</literal> extended file attribute (see <citerefentry | |
406 | project='man-pages'><refentrytitle>xattr</refentrytitle><manvolnum>7</manvolnum></citerefentry>), then the root | |
407 | hash is read from it, also as formatted hexadecimal characters. If the extended file attribute is not found (or | |
ef3116b5 | 408 | is not supported by the underlying file system), but a file with the <filename>.roothash</filename> suffix is |
e7cbe5cb LB |
409 | found next to the image file, bearing otherwise the same name (except if the image has the |
410 | <filename>.raw</filename> suffix, in which case the root hash file must not have it in its name), the root hash | |
329cde79 LP |
411 | is read from it and automatically used, also as formatted hexadecimal characters.</para> |
412 | ||
413 | <para>Note that this configures the root hash for the root file system. Disk images may also contain | |
414 | separate file systems for the <filename>/usr/</filename> hierarchy, which may be Verity protected as | |
415 | well. The root hash for this protection may be configured via the | |
416 | <literal>user.verity.usrhash</literal> extended file attribute or via a <filename>.usrhash</filename> | |
417 | file adjacent to the disk image, following the same format and logic as for the root hash for the | |
418 | root file system described here. Note that there's currently no switch to configure the root hash for | |
9e7600cf ZJS |
419 | the <filename>/usr/</filename> from the command line.</para> |
420 | ||
421 | <para>Also see the <varname>RootHash=</varname> option in | |
422 | <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>.</para> | |
423 | </listitem> | |
e7cbe5cb LB |
424 | </varlistentry> |
425 | ||
c2923fdc LB |
426 | <varlistentry> |
427 | <term><option>--root-hash-sig=</option></term> | |
428 | ||
9e7600cf ZJS |
429 | <listitem><para>Takes a PKCS7 signature of the <option>--root-hash=</option> option. |
430 | The semantics are the same as for the <varname>RootHashSignature=</varname> option, see | |
431 | <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry>. | |
432 | </para></listitem> | |
c2923fdc LB |
433 | </varlistentry> |
434 | ||
e7cbe5cb LB |
435 | <varlistentry> |
436 | <term><option>--verity-data=</option></term> | |
437 | ||
438 | <listitem><para>Takes the path to a data integrity (dm-verity) file. This option enables data integrity checks | |
fe0bdcac | 439 | using dm-verity, if a root-hash is passed and if the used image itself does not contain the integrity data. |
e7cbe5cb LB |
440 | The integrity data must be matched by the root hash. If this option is not specified, but a file with the |
441 | <filename>.verity</filename> suffix is found next to the image file, bearing otherwise the same name (except if | |
442 | the image has the <filename>.raw</filename> suffix, in which case the verity data file must not have it in its name), | |
443 | the verity data is read from it and automatically used.</para></listitem> | |
58abb66f | 444 | </varlistentry> |
798d3a52 | 445 | |
d99058c9 LP |
446 | <varlistentry> |
447 | <term><option>--pivot-root=</option></term> | |
448 | ||
449 | <listitem><para>Pivot the specified directory to <filename>/</filename> inside the container, and either unmount the | |
450 | container's old root, or pivot it to another specified directory. Takes one of: a path argument — in which case the | |
451 | specified path will be pivoted to <filename>/</filename> and the old root will be unmounted; or a colon-separated pair | |
452 | of new root path and pivot destination for the old root. The new root path will be pivoted to <filename>/</filename>, | |
453 | and the old <filename>/</filename> will be pivoted to the other directory. Both paths must be absolute, and are resolved | |
454 | in the container's file system namespace.</para> | |
455 | ||
456 | <para>This is for containers which have several bootable directories in them; for example, several | |
b66a6e1a ZJS |
457 | <ulink url="https://ostree.readthedocs.io/en/latest/">OSTree</ulink> deployments. It emulates the |
458 | behavior of the boot loader and the initrd which normally select which directory to mount as the root | |
459 | and start the container's PID 1 in.</para></listitem> | |
d99058c9 LP |
460 | </varlistentry> |
461 | </variablelist> | |
462 | ||
463 | </refsect2><refsect2> | |
464 | <title>Execution Options</title> | |
465 | ||
466 | <variablelist> | |
7732f92b LP |
467 | <varlistentry> |
468 | <term><option>-a</option></term> | |
469 | <term><option>--as-pid2</option></term> | |
470 | ||
471 | <listitem><para>Invoke the shell or specified program as process ID (PID) 2 instead of PID 1 (init). By | |
3f2d1365 AJ |
472 | default, if neither this option nor <option>--boot</option> is used, the selected program is run as the process |
473 | with PID 1, a mode only suitable for programs that are aware of the special semantics that the process with | |
474 | PID 1 has on UNIX. For example, it needs to reap all processes reparented to it, and should implement | |
7732f92b LP |
475 | <command>sysvinit</command> compatible signal handling (specifically: it needs to reboot on SIGINT, reexecute |
476 | on SIGTERM, reload configuration on SIGHUP, and so on). With <option>--as-pid2</option> a minimal stub init | |
3f2d1365 | 477 | process is run as PID 1 and the selected program is executed as PID 2 (and hence does not need to implement any |
7732f92b LP |
478 | special semantics). The stub init process will reap processes as necessary and react appropriately to |
479 | signals. It is recommended to use this mode to invoke arbitrary commands in containers, unless they have been | |
480 | modified to run correctly as PID 1. Or in other words: this switch should be used for pretty much all commands, | |
481 | except when the command refers to an init or shell implementation, as these are generally capable of running | |
a6b5216c | 482 | correctly as PID 1. This option may not be combined with <option>--boot</option>.</para> |
7732f92b LP |
483 | </listitem> |
484 | </varlistentry> | |
485 | ||
798d3a52 ZJS |
486 | <varlistentry> |
487 | <term><option>-b</option></term> | |
488 | <term><option>--boot</option></term> | |
489 | ||
3f2d1365 | 490 | <listitem><para>Automatically search for an init program and invoke it as PID 1, instead of a shell or a user |
7732f92b | 491 | supplied program. If this option is used, arguments specified on the command line are used as arguments for the |
3f2d1365 | 492 | init program. This option may not be combined with <option>--as-pid2</option>.</para> |
7732f92b LP |
493 | |
494 | <para>The following table explains the different modes of invocation and relationship to | |
495 | <option>--as-pid2</option> (see above):</para> | |
496 | ||
497 | <table> | |
498 | <title>Invocation Mode</title> | |
499 | <tgroup cols='2' align='left' colsep='1' rowsep='1'> | |
500 | <colspec colname="switch" /> | |
501 | <colspec colname="explanation" /> | |
502 | <thead> | |
503 | <row> | |
504 | <entry>Switch</entry> | |
505 | <entry>Explanation</entry> | |
506 | </row> | |
507 | </thead> | |
508 | <tbody> | |
509 | <row> | |
510 | <entry>Neither <option>--as-pid2</option> nor <option>--boot</option> specified</entry> | |
4447e799 | 511 | <entry>The passed parameters are interpreted as the command line, which is executed as PID 1 in the container.</entry> |
7732f92b LP |
512 | </row> |
513 | ||
514 | <row> | |
515 | <entry><option>--as-pid2</option> specified</entry> | |
4447e799 | 516 | <entry>The passed parameters are interpreted as the command line, which is executed as PID 2 in the container. A stub init process is run as PID 1.</entry> |
7732f92b LP |
517 | </row> |
518 | ||
519 | <row> | |
520 | <entry><option>--boot</option> specified</entry> | |
3f2d1365 | 521 | <entry>An init program is automatically searched for and run as PID 1 in the container. The passed parameters are used as invocation parameters for this process.</entry> |
7732f92b LP |
522 | </row> |
523 | ||
524 | </tbody> | |
525 | </tgroup> | |
526 | </table> | |
b09c0bba LP |
527 | |
528 | <para>Note that <option>--boot</option> is the default mode of operation if the | |
529 | <filename>systemd-nspawn@.service</filename> template unit file is used.</para> | |
7732f92b | 530 | </listitem> |
798d3a52 ZJS |
531 | </varlistentry> |
532 | ||
5f932eb9 LP |
533 | <varlistentry> |
534 | <term><option>--chdir=</option></term> | |
535 | ||
536 | <listitem><para>Change to the specified working directory before invoking the process in the container. Expects | |
537 | an absolute path in the container's file system namespace.</para></listitem> | |
538 | </varlistentry> | |
539 | ||
b53ede69 | 540 | <varlistentry> |
0d2a0179 ZJS |
541 | <term><option>-E <replaceable>NAME</replaceable>[=<replaceable>VALUE</replaceable>]</option></term> |
542 | <term><option>--setenv=<replaceable>NAME</replaceable>[=<replaceable>VALUE</replaceable>]</option></term> | |
543 | ||
544 | <listitem><para>Specifies an environment variable to pass to the init process in the container. This | |
545 | may be used to override the default variables or to set additional variables. It may be used more | |
546 | than once to set multiple variables. When <literal>=</literal> and <replaceable>VALUE</replaceable> | |
547 | are omitted, the value of the variable with the same name in the program environment will be used. | |
548 | </para></listitem> | |
b53ede69 PW |
549 | </varlistentry> |
550 | ||
798d3a52 ZJS |
551 | <varlistentry> |
552 | <term><option>-u</option></term> | |
553 | <term><option>--user=</option></term> | |
554 | ||
e9dd6984 ZJS |
555 | <listitem><para>After transitioning into the container, change to the specified user defined in the |
556 | container's user database. Like all other systemd-nspawn features, this is not a security feature and | |
557 | provides protection against accidental destructive operations only.</para></listitem> | |
798d3a52 ZJS |
558 | </varlistentry> |
559 | ||
d99058c9 LP |
560 | <varlistentry> |
561 | <term><option>--kill-signal=</option></term> | |
562 | ||
563 | <listitem><para>Specify the process signal to send to the container's PID 1 when nspawn itself receives | |
564 | <constant>SIGTERM</constant>, in order to trigger an orderly shutdown of the container. Defaults to | |
565 | <constant>SIGRTMIN+3</constant> if <option>--boot</option> is used (on systemd-compatible init systems | |
566 | <constant>SIGRTMIN+3</constant> triggers an orderly shutdown). If <option>--boot</option> is not used and this | |
567 | option is not specified the container's processes are terminated abruptly via <constant>SIGKILL</constant>. For | |
568 | a list of valid signals, see <citerefentry | |
569 | project='man-pages'><refentrytitle>signal</refentrytitle><manvolnum>7</manvolnum></citerefentry>.</para></listitem> | |
570 | </varlistentry> | |
571 | ||
572 | <varlistentry> | |
573 | <term><option>--notify-ready=</option></term> | |
574 | ||
575 | <listitem><para>Configures support for notifications from the container's init process. | |
576 | <option>--notify-ready=</option> takes a boolean (<option>no</option> and <option>yes</option>). | |
577 | With option <option>no</option> systemd-nspawn notifies systemd | |
578 | with a <literal>READY=1</literal> message when the init process is created. | |
579 | With option <option>yes</option> systemd-nspawn waits for the | |
580 | <literal>READY=1</literal> message from the init process in the container | |
581 | before sending its own to systemd. For more details about notifications | |
f4e1a425 | 582 | see <citerefentry><refentrytitle>sd_notify</refentrytitle><manvolnum>3</manvolnum></citerefentry>.</para></listitem> |
d99058c9 | 583 | </varlistentry> |
4a4654e0 LP |
584 | |
585 | <varlistentry> | |
586 | <term><option>--suppress-sync=</option></term> | |
587 | ||
588 | <listitem><para>Expects a boolean argument. If true, turns off any form of on-disk file system | |
589 | synchronization for the container payload. This means all system calls such as <citerefentry | |
590 | project='man-pages'><refentrytitle>sync</refentrytitle><manvolnum>2</manvolnum></citerefentry>, | |
591 | <function>fsync()</function>, <function>syncfs()</function>, … will execute no operation, and the | |
592 | <constant>O_SYNC</constant>/<constant>O_DSYNC</constant> flags to <citerefentry | |
593 | project='man-pages'><refentrytitle>open</refentrytitle><manvolnum>2</manvolnum></citerefentry> and | |
594 | related calls will be made unavailable. This is potentially dangerous, as assumed data integrity | |
595 | guarantees to the container payload are not actually enforced (i.e. data assumed to have been written | |
596 | to disk might be lost if the system is shut down abnormally). However, this can dramatically improve | |
597 | container runtime performance – as long as these guarantees are not required or desirable, for | |
598 | example because any data written by the container is of temporary, redundant nature, or just an | |
599 | intermediary artifact that will be further processed and finalized by a later step in a | |
600 | pipeline. Defaults to false.</para></listitem> | |
601 | </varlistentry> | |
d99058c9 LP |
602 | </variablelist> |
603 | ||
604 | </refsect2><refsect2> | |
605 | <title>System Identity Options</title> | |
606 | ||
607 | <variablelist> | |
798d3a52 ZJS |
608 | <varlistentry> |
609 | <term><option>-M</option></term> | |
610 | <term><option>--machine=</option></term> | |
611 | ||
612 | <listitem><para>Sets the machine name for this container. This | |
613 | name may be used to identify this container during its runtime | |
614 | (for example in tools like | |
615 | <citerefentry><refentrytitle>machinectl</refentrytitle><manvolnum>1</manvolnum></citerefentry> | |
616 | and similar), and is used to initialize the container's | |
617 | hostname (which the container can choose to override, | |
618 | however). If not specified, the last component of the root | |
619 | directory path of the container is used, possibly suffixed | |
620 | with a random identifier in case <option>--ephemeral</option> | |
621 | mode is selected. If the root directory selected is the host's | |
622 | root directory the host's hostname is used as default | |
623 | instead.</para></listitem> | |
624 | </varlistentry> | |
625 | ||
3a9530e5 LP |
626 | <varlistentry> |
627 | <term><option>--hostname=</option></term> | |
628 | ||
629 | <listitem><para>Controls the hostname to set within the container, if different from the machine name. Expects | |
630 | a valid hostname as argument. If this option is used, the kernel hostname of the container will be set to this | |
631 | value, otherwise it will be initialized to the machine name as controlled by the <option>--machine=</option> | |
632 | option described above. The machine name is used for various aspect of identification of the container from the | |
633 | outside, the kernel hostname configurable with this option is useful for the container to identify itself from | |
634 | the inside. It is usually a good idea to keep both forms of identification synchronized, in order to avoid | |
635 | confusion. It is hence recommended to avoid usage of this option, and use <option>--machine=</option> | |
636 | exclusively. Note that regardless whether the container's hostname is initialized from the name set with | |
637 | <option>--hostname=</option> or the one set with <option>--machine=</option>, the container can later override | |
638 | its kernel hostname freely on its own as well.</para> | |
639 | </listitem> | |
640 | </varlistentry> | |
641 | ||
798d3a52 ZJS |
642 | <varlistentry> |
643 | <term><option>--uuid=</option></term> | |
644 | ||
645 | <listitem><para>Set the specified UUID for the container. The | |
646 | init system will initialize | |
647 | <filename>/etc/machine-id</filename> from this if this file is | |
e01ff70a MS |
648 | not set yet. Note that this option takes effect only if |
649 | <filename>/etc/machine-id</filename> in the container is | |
650 | unpopulated.</para></listitem> | |
798d3a52 | 651 | </varlistentry> |
d99058c9 | 652 | </variablelist> |
798d3a52 | 653 | |
d99058c9 LP |
654 | </refsect2><refsect2> |
655 | <title>Property Options</title> | |
656 | ||
657 | <variablelist> | |
798d3a52 | 658 | <varlistentry> |
4deb5503 | 659 | <term><option>-S</option></term> |
798d3a52 ZJS |
660 | <term><option>--slice=</option></term> |
661 | ||
cd2dfc6f LP |
662 | <listitem><para>Make the container part of the specified slice, instead of the default |
663 | <filename>machine.slice</filename>. This applies only if the machine is run in its own scope unit, i.e. if | |
664 | <option>--keep-unit</option> isn't used.</para> | |
f36933fe LP |
665 | </listitem> |
666 | </varlistentry> | |
667 | ||
668 | <varlistentry> | |
669 | <term><option>--property=</option></term> | |
670 | ||
cd2dfc6f LP |
671 | <listitem><para>Set a unit property on the scope unit to register for the machine. This applies only if the |
672 | machine is run in its own scope unit, i.e. if <option>--keep-unit</option> isn't used. Takes unit property | |
673 | assignments in the same format as <command>systemctl set-property</command>. This is useful to set memory | |
15102ced | 674 | limits and similar for the container.</para> |
798d3a52 ZJS |
675 | </listitem> |
676 | </varlistentry> | |
677 | ||
d99058c9 LP |
678 | <varlistentry> |
679 | <term><option>--register=</option></term> | |
680 | ||
681 | <listitem><para>Controls whether the container is registered with | |
682 | <citerefentry><refentrytitle>systemd-machined</refentrytitle><manvolnum>8</manvolnum></citerefentry>. Takes a | |
683 | boolean argument, which defaults to <literal>yes</literal>. This option should be enabled when the container | |
684 | runs a full Operating System (more specifically: a system and service manager as PID 1), and is useful to | |
685 | ensure that the container is accessible via | |
686 | <citerefentry><refentrytitle>machinectl</refentrytitle><manvolnum>1</manvolnum></citerefentry> and shown by | |
687 | tools such as <citerefentry | |
688 | project='man-pages'><refentrytitle>ps</refentrytitle><manvolnum>1</manvolnum></citerefentry>. If the container | |
689 | does not run a service manager, it is recommended to set this option to | |
690 | <literal>no</literal>.</para></listitem> | |
691 | </varlistentry> | |
692 | ||
693 | <varlistentry> | |
694 | <term><option>--keep-unit</option></term> | |
695 | ||
696 | <listitem><para>Instead of creating a transient scope unit to run the container in, simply use the service or | |
697 | scope unit <command>systemd-nspawn</command> has been invoked in. If <option>--register=yes</option> is set | |
698 | this unit is registered with | |
699 | <citerefentry><refentrytitle>systemd-machined</refentrytitle><manvolnum>8</manvolnum></citerefentry>. This | |
700 | switch should be used if <command>systemd-nspawn</command> is invoked from within a service unit, and the | |
701 | service unit's sole purpose is to run a single <command>systemd-nspawn</command> container. This option is not | |
702 | available if run from a user session.</para> | |
703 | <para>Note that passing <option>--keep-unit</option> disables the effect of <option>--slice=</option> and | |
704 | <option>--property=</option>. Use <option>--keep-unit</option> and <option>--register=no</option> in | |
705 | combination to disable any kind of unit allocation or registration with | |
706 | <command>systemd-machined</command>.</para></listitem> | |
707 | </varlistentry> | |
708 | </variablelist> | |
709 | ||
710 | </refsect2><refsect2> | |
711 | <title>User Namespacing Options</title> | |
712 | ||
713 | <variablelist> | |
03cfe0d5 LP |
714 | <varlistentry> |
715 | <term><option>--private-users=</option></term> | |
716 | ||
d2e5535f LP |
717 | <listitem><para>Controls user namespacing. If enabled, the container will run with its own private set of UNIX |
718 | user and group ids (UIDs and GIDs). This involves mapping the private UIDs/GIDs used in the container (starting | |
719 | with the container's root user 0 and up) to a range of UIDs/GIDs on the host that are not used for other | |
720 | purposes (usually in the range beyond the host's UID/GID 65536). The parameter may be specified as follows:</para> | |
721 | ||
722 | <orderedlist> | |
2dd67817 | 723 | <listitem><para>If one or two colon-separated numbers are specified, user namespacing is turned on. The first |
ae209204 ZJS |
724 | parameter specifies the first host UID/GID to assign to the container, the second parameter specifies the |
725 | number of host UIDs/GIDs to assign to the container. If the second parameter is omitted, 65536 UIDs/GIDs are | |
726 | assigned.</para></listitem> | |
727 | ||
22326f15 LP |
728 | <listitem><para>If the parameter is <literal>yes</literal>, user namespacing is turned on. The |
729 | UID/GID range to use is determined automatically from the file ownership of the root directory of | |
730 | the container's directory tree. To use this option, make sure to prepare the directory tree in | |
731 | advance, and ensure that all files and directories in it are owned by UIDs/GIDs in the range you'd | |
732 | like to use. Also, make sure that used file ACLs exclusively reference UIDs/GIDs in the appropriate | |
733 | range. In this mode, the number of UIDs/GIDs assigned to the container is 65536, and the owner | |
734 | UID/GID of the root directory must be a multiple of 65536.</para></listitem> | |
735 | ||
736 | <listitem><para>If the parameter is <literal>no</literal>, user namespacing is turned off. This is | |
737 | the default.</para> | |
ae209204 ZJS |
738 | </listitem> |
739 | ||
22326f15 LP |
740 | <listitem><para>If the parameter is <literal>identity</literal>, user namespacing is employed with |
741 | an identity mapping for the first 65536 UIDs/GIDs. This is mostly equivalent to | |
742 | <option>--private-users=0:65536</option>. While it does not provide UID/GID isolation, since all | |
743 | host and container UIDs/GIDs are chosen identically it does provide process capability isolation, | |
744 | and hence is often a good choice if proper user namespacing with distinct UID maps is not | |
745 | appropriate.</para></listitem> | |
746 | ||
747 | <listitem><para>The special value <literal>pick</literal> turns on user namespacing. In this case | |
748 | the UID/GID range is automatically chosen. As first step, the file owner UID/GID of the root | |
749 | directory of the container's directory tree is read, and it is checked that no other container is | |
750 | currently using it. If this check is successful, the UID/GID range determined this way is used, | |
15102ced ZJS |
751 | similarly to the behavior if <literal>yes</literal> is specified. If the check is not successful |
752 | (and thus the UID/GID range indicated in the root directory's file owner is already used elsewhere) | |
753 | a new – currently unused – UID/GID range of 65536 UIDs/GIDs is randomly chosen between the host | |
22326f15 LP |
754 | UID/GIDs of 524288 and 1878982656, always starting at a multiple of 65536, and, if possible, |
755 | consistently hashed from the machine name. This setting implies | |
756 | <option>--private-users-ownership=auto</option> (see below), which possibly has the effect that the | |
757 | files and directories in the container's directory tree will be owned by the appropriate users of | |
758 | the range picked. Using this option makes user namespace behavior fully automatic. Note that the | |
759 | first invocation of a previously unused container image might result in picking a new UID/GID range | |
760 | for it, and thus in the (possibly expensive) file ownership adjustment operation. However, | |
761 | subsequent invocations of the container will be cheap (unless of course the picked UID/GID range is | |
762 | assigned to a different use by then).</para></listitem> | |
d2e5535f LP |
763 | </orderedlist> |
764 | ||
765 | <para>It is recommended to assign at least 65536 UIDs/GIDs to each container, so that the usable UID/GID range in the | |
766 | container covers 16 bit. For best security, do not assign overlapping UID/GID ranges to multiple containers. It is | |
767 | hence a good idea to use the upper 16 bit of the host 32-bit UIDs/GIDs as container identifier, while the lower 16 | |
2dd67817 | 768 | bit encode the container UID/GID used. This is in fact the behavior enforced by the |
d2e5535f LP |
769 | <option>--private-users=pick</option> option.</para> |
770 | ||
771 | <para>When user namespaces are used, the GID range assigned to each container is always chosen identical to the | |
772 | UID range.</para> | |
773 | ||
774 | <para>In most cases, using <option>--private-users=pick</option> is the recommended option as it enhances | |
775 | container security massively and operates fully automatically in most cases.</para> | |
776 | ||
777 | <para>Note that the picked UID/GID range is not written to <filename>/etc/passwd</filename> or | |
778 | <filename>/etc/group</filename>. In fact, the allocation of the range is not stored persistently anywhere, | |
aa10469e LP |
779 | except in the file ownership of the files and directories of the container.</para> |
780 | ||
781 | <para>Note that when user namespacing is used file ownership on disk reflects this, and all of the container's | |
782 | files and directories are owned by the container's effective user and group IDs. This means that copying files | |
783 | from and to the container image requires correction of the numeric UID/GID values, according to the UID/GID | |
784 | shift applied.</para></listitem> | |
03cfe0d5 LP |
785 | </varlistentry> |
786 | ||
d2e5535f | 787 | <varlistentry> |
22326f15 LP |
788 | <term><option>--private-users-ownership=</option></term> |
789 | ||
790 | <listitem><para>Controls how to adjust the container image's UIDs and GIDs to match the UID/GID range | |
791 | chosen with <option>--private-users=</option>, see above. Takes one of <literal>off</literal> (to | |
792 | leave the image as is), <literal>chown</literal> (to recursively <function>chown()</function> the | |
793 | container's directory tree as needed), <literal>map</literal> (in order to use transparent ID mapping | |
794 | mounts) or <literal>auto</literal> for automatically using <literal>map</literal> where available and | |
795 | <literal>chown</literal> where not.</para> | |
796 | ||
797 | <para>If <literal>chown</literal> is selected, all files and directories in the container's directory | |
798 | tree will be adjusted so that they are owned by the appropriate UIDs/GIDs selected for the container | |
799 | (see above). This operation is potentially expensive, as it involves iterating through the full | |
800 | directory tree of the container. Besides actual file ownership, file ACLs are adjusted as | |
801 | well.</para> | |
802 | ||
803 | <para>Typically <literal>map</literal> is the best choice, since it transparently maps UIDs/GIDs in | |
804 | memory as needed without modifying the image, and without requiring an expensive recursive adjustment | |
805 | operation. However, it is not available for all file systems, currently.</para> | |
806 | ||
807 | <para>The <option>--private-users-ownership=auto</option> option is implied if | |
808 | <option>--private-users=pick</option> is used. This option has no effect if user namespacing is not | |
809 | used.</para></listitem> | |
d2e5535f | 810 | </varlistentry> |
03cfe0d5 | 811 | |
6265bde2 ZJS |
812 | <varlistentry> |
813 | <term><option>-U</option></term> | |
814 | ||
815 | <listitem><para>If the kernel supports the user namespaces feature, equivalent to | |
22326f15 | 816 | <option>--private-users=pick --private-users-ownership=auto</option>, otherwise equivalent to |
6265bde2 ZJS |
817 | <option>--private-users=no</option>.</para> |
818 | ||
819 | <para>Note that <option>-U</option> is the default if the | |
820 | <filename>systemd-nspawn@.service</filename> template unit file is used.</para> | |
821 | ||
22326f15 | 822 | <para>Note: it is possible to undo the effect of <option>--private-users-ownership=chown</option> (or |
6265bde2 ZJS |
823 | <option>-U</option>) on the file system by redoing the operation with the first UID of 0:</para> |
824 | ||
22326f15 | 825 | <programlisting>systemd-nspawn … --private-users=0 --private-users-ownership=chown</programlisting> |
6265bde2 ZJS |
826 | </listitem> |
827 | </varlistentry> | |
828 | ||
d99058c9 LP |
829 | </variablelist> |
830 | ||
831 | </refsect2><refsect2> | |
832 | <title>Networking Options</title> | |
833 | ||
834 | <variablelist> | |
835 | ||
798d3a52 ZJS |
836 | <varlistentry> |
837 | <term><option>--private-network</option></term> | |
838 | ||
839 | <listitem><para>Disconnect networking of the container from | |
840 | the host. This makes all network interfaces unavailable in the | |
841 | container, with the exception of the loopback device and those | |
842 | specified with <option>--network-interface=</option> and | |
843 | configured with <option>--network-veth</option>. If this | |
ec562515 | 844 | option is specified, the <constant>CAP_NET_ADMIN</constant> capability will be |
798d3a52 | 845 | added to the set of capabilities the container retains. The |
bc96c63c ZJS |
846 | latter may be disabled by using <option>--drop-capability=</option>. |
847 | If this option is not specified (or implied by one of the options | |
848 | listed below), the container will have full access to the host network. | |
849 | </para></listitem> | |
798d3a52 ZJS |
850 | </varlistentry> |
851 | ||
852 | <varlistentry> | |
853 | <term><option>--network-interface=</option></term> | |
854 | ||
2f091b1b TM |
855 | <listitem><para>Assign the specified network interface to the container. Either takes a single |
856 | interface name, referencing the name on the host, or a colon-separated pair of interfaces, in which | |
857 | case the first one references the name on the host, and the second one the name in the container. | |
858 | When the container terminates, the interface is moved back to the calling namespace and renamed to | |
859 | its original name. Note that <option>--network-interface=</option> implies | |
860 | <option>--private-network</option>. This option may be used more than once to add multiple network | |
861 | interfaces to the container.</para> | |
44a8ad7a LP |
862 | |
863 | <para>Note that any network interface specified this way must already exist at the time the container | |
864 | is started. If the container shall be started automatically at boot via a | |
865 | <filename>systemd-nspawn@.service</filename> unit file instance, it might hence make sense to add a | |
866 | unit file drop-in to the service instance | |
867 | (e.g. <filename>/etc/systemd/system/systemd-nspawn@foobar.service.d/50-network.conf</filename>) with | |
868 | contents like the following:</para> | |
869 | ||
870 | <programlisting>[Unit] | |
871 | Wants=sys-subsystem-net-devices-ens1.device | |
872 | After=sys-subsystem-net-devices-ens1.device</programlisting> | |
873 | ||
874 | <para>This will make sure that activation of the container service will be delayed until the | |
875 | <literal>ens1</literal> network interface has shown up. This is required since hardware probing is | |
876 | fully asynchronous, and network interfaces might be discovered only later during the boot process, | |
877 | after the container would normally be started without these explicit dependencies.</para> | |
878 | </listitem> | |
798d3a52 ZJS |
879 | </varlistentry> |
880 | ||
881 | <varlistentry> | |
882 | <term><option>--network-macvlan=</option></term> | |
883 | ||
44a8ad7a | 884 | <listitem><para>Create a <literal>macvlan</literal> interface of the specified Ethernet network |
2f091b1b TM |
885 | interface and add it to the container. Either takes a single interface name, referencing the name |
886 | on the host, or a colon-separated pair of interfaces, in which case the first one references the name | |
887 | on the host, and the second one the name in the container. A <literal>macvlan</literal> interface is | |
888 | a virtual interface that adds a second MAC address to an existing physical Ethernet link. If the | |
889 | container interface name is not defined, the interface in the container will be named after the | |
890 | interface on the host, prefixed with <literal>mv-</literal>. Note that | |
44a8ad7a LP |
891 | <option>--network-macvlan=</option> implies <option>--private-network</option>. This option may be |
892 | used more than once to add multiple network interfaces to the container.</para> | |
893 | ||
894 | <para>As with <option>--network-interface=</option>, the underlying Ethernet network interface must | |
895 | already exist at the time the container is started, and thus similar unit file drop-ins as described | |
896 | above might be useful.</para></listitem> | |
798d3a52 ZJS |
897 | </varlistentry> |
898 | ||
899 | <varlistentry> | |
900 | <term><option>--network-ipvlan=</option></term> | |
901 | ||
44a8ad7a | 902 | <listitem><para>Create an <literal>ipvlan</literal> interface of the specified Ethernet network |
2f091b1b TM |
903 | interface and add it to the container. Either takes a single interface name, referencing the name on |
904 | the host, or a colon-separated pair of interfaces, in which case the first one references the name | |
905 | on the host, and the second one the name in the container. An <literal>ipvlan</literal> interface is | |
906 | a virtual interface, | |
44a8ad7a | 907 | similar to a <literal>macvlan</literal> interface, which uses the same MAC address as the underlying |
2f091b1b TM |
908 | interface. If the container interface name is not defined, the interface in the container will be |
909 | named after the interface on the host, prefixed | |
44a8ad7a LP |
910 | with <literal>iv-</literal>. Note that <option>--network-ipvlan=</option> implies |
911 | <option>--private-network</option>. This option may be used more than once to add multiple network | |
912 | interfaces to the container.</para> | |
913 | ||
914 | <para>As with <option>--network-interface=</option>, the underlying Ethernet network interface must | |
915 | already exist at the time the container is started, and thus similar unit file drop-ins as described | |
916 | above might be useful.</para></listitem> | |
798d3a52 ZJS |
917 | </varlistentry> |
918 | ||
919 | <varlistentry> | |
920 | <term><option>-n</option></term> | |
921 | <term><option>--network-veth</option></term> | |
922 | ||
5e7423ff LP |
923 | <listitem><para>Create a virtual Ethernet link (<literal>veth</literal>) between host and container. The host |
924 | side of the Ethernet link will be available as a network interface named after the container's name (as | |
925 | specified with <option>--machine=</option>), prefixed with <literal>ve-</literal>. The container side of the | |
926 | Ethernet link will be named <literal>host0</literal>. The <option>--network-veth</option> option implies | |
927 | <option>--private-network</option>.</para> | |
928 | ||
929 | <para>Note that | |
930 | <citerefentry><refentrytitle>systemd-networkd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry> | |
931 | includes by default a network file <filename>/usr/lib/systemd/network/80-container-ve.network</filename> | |
932 | matching the host-side interfaces created this way, which contains settings to enable automatic address | |
933 | provisioning on the created virtual link via DHCP, as well as automatic IP routing onto the host's external | |
934 | network interfaces. It also contains <filename>/usr/lib/systemd/network/80-container-host0.network</filename> | |
935 | matching the container-side interface created this way, containing settings to enable client side address | |
936 | assignment via DHCP. In case <filename>systemd-networkd</filename> is running on both the host and inside the | |
937 | container, automatic IP communication from the container to the host is thus available, with further | |
938 | connectivity to the external network.</para> | |
b09c0bba LP |
939 | |
940 | <para>Note that <option>--network-veth</option> is the default if the | |
941 | <filename>systemd-nspawn@.service</filename> template unit file is used.</para> | |
6cc68362 LP |
942 | |
943 | <para>Note that on Linux network interface names may have a length of 15 characters at maximum, while | |
944 | container names may have a length up to 64 characters. As this option derives the host-side interface | |
945 | name from the container name the name is possibly truncated. Thus, care needs to be taken to ensure | |
946 | that interface names remain unique in this case, or even better container names are generally not | |
bc5ea049 KK |
947 | chosen longer than 12 characters, to avoid the truncation. If the name is truncated, |
948 | <command>systemd-nspawn</command> will automatically append a 4-digit hash value to the name to | |
949 | reduce the chance of collisions. However, the hash algorithm is not collision-free. (See | |
950 | <citerefentry><refentrytitle>systemd.net-naming-scheme</refentrytitle><manvolnum>7</manvolnum></citerefentry> | |
951 | for details on older naming algorithms for this interface). Alternatively, the | |
6cc68362 LP |
952 | <option>--network-veth-extra=</option> option may be used, which allows free configuration of the |
953 | host-side interface name independently of the container name — but might require a bit more | |
954 | additional configuration in case bridging in a fashion similar to <option>--network-bridge=</option> | |
955 | is desired.</para> | |
5e7423ff | 956 | </listitem> |
798d3a52 ZJS |
957 | </varlistentry> |
958 | ||
f6d6bad1 LP |
959 | <varlistentry> |
960 | <term><option>--network-veth-extra=</option></term> | |
961 | ||
962 | <listitem><para>Adds an additional virtual Ethernet link | |
963 | between host and container. Takes a colon-separated pair of | |
964 | host interface name and container interface name. The latter | |
965 | may be omitted in which case the container and host sides will | |
966 | be assigned the same name. This switch is independent of | |
ccddd104 | 967 | <option>--network-veth</option>, and — in contrast — may be |
f6d6bad1 LP |
968 | used multiple times, and allows configuration of the network |
969 | interface names. Note that <option>--network-bridge=</option> | |
970 | has no effect on interfaces created with | |
971 | <option>--network-veth-extra=</option>.</para></listitem> | |
972 | </varlistentry> | |
973 | ||
798d3a52 ZJS |
974 | <varlistentry> |
975 | <term><option>--network-bridge=</option></term> | |
976 | ||
6cc68362 LP |
977 | <listitem><para>Adds the host side of the Ethernet link created with <option>--network-veth</option> |
978 | to the specified Ethernet bridge interface. Expects a valid network interface name of a bridge device | |
979 | as argument. Note that <option>--network-bridge=</option> implies <option>--network-veth</option>. If | |
980 | this option is used, the host side of the Ethernet link will use the <literal>vb-</literal> prefix | |
981 | instead of <literal>ve-</literal>. Regardless of the used naming prefix the same network interface | |
982 | name length limits imposed by Linux apply, along with the complications this creates (for details see | |
44a8ad7a LP |
983 | above).</para> |
984 | ||
985 | <para>As with <option>--network-interface=</option>, the underlying bridge network interface must | |
986 | already exist at the time the container is started, and thus similar unit file drop-ins as described | |
987 | above might be useful.</para></listitem> | |
798d3a52 ZJS |
988 | </varlistentry> |
989 | ||
938d2579 LP |
990 | <varlistentry> |
991 | <term><option>--network-zone=</option></term> | |
992 | ||
993 | <listitem><para>Creates a virtual Ethernet link (<literal>veth</literal>) to the container and adds it to an | |
994 | automatically managed Ethernet bridge interface. The bridge interface is named after the passed argument, | |
995 | prefixed with <literal>vz-</literal>. The bridge interface is automatically created when the first container | |
996 | configured for its name is started, and is automatically removed when the last container configured for its | |
997 | name exits. Hence, each bridge interface configured this way exists only as long as there's at least one | |
998 | container referencing it running. This option is very similar to <option>--network-bridge=</option>, besides | |
999 | this automatic creation/removal of the bridge device.</para> | |
1000 | ||
1001 | <para>This setting makes it easy to place multiple related containers on a common, virtual Ethernet-based | |
1002 | broadcast domain, here called a "zone". Each container may only be part of one zone, but each zone may contain | |
1003 | any number of containers. Each zone is referenced by its name. Names may be chosen freely (as long as they form | |
1004 | valid network interface names when prefixed with <literal>vz-</literal>), and it is sufficient to pass the same | |
cf917c27 | 1005 | name to the <option>--network-zone=</option> switch of the various concurrently running containers to join |
938d2579 LP |
1006 | them in one zone.</para> |
1007 | ||
1008 | <para>Note that | |
1009 | <citerefentry><refentrytitle>systemd-networkd.service</refentrytitle><manvolnum>8</manvolnum></citerefentry> | |
1010 | includes by default a network file <filename>/usr/lib/systemd/network/80-container-vz.network</filename> | |
1011 | matching the bridge interfaces created this way, which contains settings to enable automatic address | |
1012 | provisioning on the created virtual network via DHCP, as well as automatic IP routing onto the host's external | |
1013 | network interfaces. Using <option>--network-zone=</option> is hence in most cases fully automatic and | |
1014 | sufficient to connect multiple local containers in a joined broadcast domain to the host, with further | |
1015 | connectivity to the external network.</para> | |
1016 | </listitem> | |
1017 | </varlistentry> | |
1018 | ||
798d3a52 | 1019 | <varlistentry> |
d99058c9 | 1020 | <term><option>--network-namespace-path=</option></term> |
798d3a52 | 1021 | |
d99058c9 LP |
1022 | <listitem><para>Takes the path to a file representing a kernel |
1023 | network namespace that the container shall run in. The specified path | |
1024 | should refer to a (possibly bind-mounted) network namespace file, as | |
1025 | exposed by the kernel below <filename>/proc/$PID/ns/net</filename>. | |
1026 | This makes the container enter the given network namespace. One of the | |
1027 | typical use cases is to give a network namespace under | |
1028 | <filename>/run/netns</filename> created by <citerefentry | |
1029 | project='man-pages'><refentrytitle>ip-netns</refentrytitle><manvolnum>8</manvolnum></citerefentry>, | |
1030 | for example, <option>--network-namespace-path=/run/netns/foo</option>. | |
1031 | Note that this option cannot be used together with other | |
1032 | network-related options, such as <option>--private-network</option> | |
1033 | or <option>--network-interface=</option>.</para></listitem> | |
1034 | </varlistentry> | |
1035 | ||
1036 | <varlistentry> | |
1037 | <term><option>-p</option></term> | |
1038 | <term><option>--port=</option></term> | |
1039 | ||
1040 | <listitem><para>If private networking is enabled, maps an IP | |
1041 | port on the host onto an IP port on the container. Takes a | |
1042 | protocol specifier (either <literal>tcp</literal> or | |
798d3a52 ZJS |
1043 | <literal>udp</literal>), separated by a colon from a host port |
1044 | number in the range 1 to 65535, separated by a colon from a | |
1045 | container port number in the range from 1 to 65535. The | |
1046 | protocol specifier and its separating colon may be omitted, in | |
1047 | which case <literal>tcp</literal> is assumed. The container | |
7c918141 | 1048 | port number and its colon may be omitted, in which case the |
798d3a52 | 1049 | same port as the host port is implied. This option is only |
a8eaaee7 | 1050 | supported if private networking is used, such as with |
938d2579 | 1051 | <option>--network-veth</option>, <option>--network-zone=</option> |
798d3a52 ZJS |
1052 | <option>--network-bridge=</option>.</para></listitem> |
1053 | </varlistentry> | |
d99058c9 | 1054 | </variablelist> |
798d3a52 | 1055 | |
d99058c9 LP |
1056 | </refsect2><refsect2> |
1057 | <title>Security Options</title> | |
798d3a52 | 1058 | |
d99058c9 | 1059 | <variablelist> |
798d3a52 ZJS |
1060 | <varlistentry> |
1061 | <term><option>--capability=</option></term> | |
1062 | ||
ec562515 ZJS |
1063 | <listitem><para>List one or more additional capabilities to grant the container. Takes a |
1064 | comma-separated list of capability names, see <citerefentry | |
1065 | project='man-pages'><refentrytitle>capabilities</refentrytitle><manvolnum>7</manvolnum></citerefentry> | |
a30504ed | 1066 | for more information. Note that the following capabilities will be granted in any way: |
ec562515 ZJS |
1067 | <constant>CAP_AUDIT_CONTROL</constant>, <constant>CAP_AUDIT_WRITE</constant>, |
1068 | <constant>CAP_CHOWN</constant>, <constant>CAP_DAC_OVERRIDE</constant>, | |
1069 | <constant>CAP_DAC_READ_SEARCH</constant>, <constant>CAP_FOWNER</constant>, | |
1070 | <constant>CAP_FSETID</constant>, <constant>CAP_IPC_OWNER</constant>, <constant>CAP_KILL</constant>, | |
1071 | <constant>CAP_LEASE</constant>, <constant>CAP_LINUX_IMMUTABLE</constant>, | |
1072 | <constant>CAP_MKNOD</constant>, <constant>CAP_NET_BIND_SERVICE</constant>, | |
1073 | <constant>CAP_NET_BROADCAST</constant>, <constant>CAP_NET_RAW</constant>, | |
1074 | <constant>CAP_SETFCAP</constant>, <constant>CAP_SETGID</constant>, <constant>CAP_SETPCAP</constant>, | |
1075 | <constant>CAP_SETUID</constant>, <constant>CAP_SYS_ADMIN</constant>, | |
1076 | <constant>CAP_SYS_BOOT</constant>, <constant>CAP_SYS_CHROOT</constant>, | |
1077 | <constant>CAP_SYS_NICE</constant>, <constant>CAP_SYS_PTRACE</constant>, | |
1078 | <constant>CAP_SYS_RESOURCE</constant>, <constant>CAP_SYS_TTY_CONFIG</constant>. Also | |
1079 | <constant>CAP_NET_ADMIN</constant> is retained if <option>--private-network</option> is specified. | |
1080 | If the special value <literal>all</literal> is passed, all capabilities are retained.</para> | |
8a99bd0c ZJS |
1081 | |
1082 | <para>If the special value of <literal>help</literal> is passed, the program will print known | |
88fc9c9b TH |
1083 | capability names and exit.</para> |
1084 | ||
1085 | <para>This option sets the bounding set of capabilities which | |
1086 | also limits the ambient capabilities as given with the | |
1087 | <option>--ambient-capability=</option>.</para></listitem> | |
798d3a52 ZJS |
1088 | </varlistentry> |
1089 | ||
1090 | <varlistentry> | |
1091 | <term><option>--drop-capability=</option></term> | |
1092 | ||
1093 | <listitem><para>Specify one or more additional capabilities to | |
1094 | drop for the container. This allows running the container with | |
1095 | fewer capabilities than the default (see | |
8a99bd0c ZJS |
1096 | above).</para> |
1097 | ||
1098 | <para>If the special value of <literal>help</literal> is passed, the program will print known | |
88fc9c9b TH |
1099 | capability names and exit.</para> |
1100 | ||
1101 | <para>This option sets the bounding set of capabilities which | |
1102 | also limits the ambient capabilities as given with the | |
1103 | <option>--ambient-capability=</option>.</para></listitem> | |
1104 | </varlistentry> | |
1105 | ||
1106 | <varlistentry> | |
1107 | <term><option>--ambient-capability=</option></term> | |
1108 | ||
1109 | <listitem><para>Specify one or more additional capabilities to | |
1110 | pass in the inheritable and ambient set to the program started | |
1111 | within the container. The value <literal>all</literal> is not | |
1112 | supported for this setting.</para> | |
1113 | ||
1114 | <para>All capabilities specified here must be in the set | |
1115 | allowed with the <option>--capability=</option> and | |
1116 | <option>--drop-capability=</option> options. Otherwise, an | |
1117 | error message will be shown.</para> | |
1118 | ||
1119 | <para>This option cannot be combined with the boot mode of the | |
1120 | container (as requested via <option>--boot</option>).</para> | |
1121 | ||
1122 | <para>If the special value of <literal>help</literal> is | |
1123 | passed, the program will print known capability names and | |
1124 | exit.</para></listitem> | |
798d3a52 ZJS |
1125 | </varlistentry> |
1126 | ||
66edd963 LP |
1127 | <varlistentry> |
1128 | <term><option>--no-new-privileges=</option></term> | |
1129 | ||
6b000af4 LP |
1130 | <listitem><para>Takes a boolean argument. Specifies the value of the |
1131 | <constant>PR_SET_NO_NEW_PRIVS</constant> flag for the container payload. Defaults to off. When turned | |
1132 | on the payload code of the container cannot acquire new privileges, i.e. the "setuid" file bit as | |
1133 | well as file system capabilities will not have an effect anymore. See <citerefentry | |
1134 | project='man-pages'><refentrytitle>prctl</refentrytitle><manvolnum>2</manvolnum></citerefentry> for | |
1135 | details about this flag. </para></listitem> | |
66edd963 LP |
1136 | </varlistentry> |
1137 | ||
960e4569 | 1138 | <varlistentry> |
6b000af4 LP |
1139 | <term><option>--system-call-filter=</option></term> <listitem><para>Alter the system call filter |
1140 | applied to containers. Takes a space-separated list of system call names or group names (the latter | |
1141 | prefixed with <literal>@</literal>, as listed by the <command>syscall-filter</command> command of | |
c7fc3c4c | 1142 | <citerefentry><refentrytitle>systemd-analyze</refentrytitle><manvolnum>1</manvolnum></citerefentry>). Passed |
6b000af4 LP |
1143 | system calls will be permitted. The list may optionally be prefixed by <literal>~</literal>, in which |
1144 | case all listed system calls are prohibited. If this command line option is used multiple times the | |
1145 | configured lists are combined. If both a positive and a negative list (that is one system call list | |
1146 | without and one with the <literal>~</literal> prefix) are configured, the negative list takes | |
1147 | precedence over the positive list. Note that <command>systemd-nspawn</command> always implements a | |
1148 | system call allow list (as opposed to a deny list!), and this command line option hence adds or | |
1149 | removes entries from the default allow list, depending on the <literal>~</literal> prefix. Note that | |
1150 | the applied system call filter is also altered implicitly if additional capabilities are passed using | |
1151 | the <command>--capabilities=</command>.</para></listitem> | |
960e4569 LP |
1152 | </varlistentry> |
1153 | ||
d99058c9 LP |
1154 | <varlistentry> |
1155 | <term><option>-Z</option></term> | |
1156 | <term><option>--selinux-context=</option></term> | |
1157 | ||
1158 | <listitem><para>Sets the SELinux security context to be used | |
1159 | to label processes in the container.</para> | |
1160 | </listitem> | |
1161 | </varlistentry> | |
1162 | ||
1163 | <varlistentry> | |
1164 | <term><option>-L</option></term> | |
1165 | <term><option>--selinux-apifs-context=</option></term> | |
1166 | ||
1167 | <listitem><para>Sets the SELinux security context to be used | |
1168 | to label files in the virtual API file systems in the | |
1169 | container.</para> | |
1170 | </listitem> | |
1171 | </varlistentry> | |
1172 | </variablelist> | |
1173 | ||
1174 | </refsect2><refsect2> | |
1175 | <title>Resource Options</title> | |
1176 | ||
1177 | <variablelist> | |
1178 | ||
bf428efb LP |
1179 | <varlistentry> |
1180 | <term><option>--rlimit=</option></term> | |
1181 | ||
1182 | <listitem><para>Sets the specified POSIX resource limit for the container payload. Expects an assignment of the | |
1183 | form | |
1184 | <literal><replaceable>LIMIT</replaceable>=<replaceable>SOFT</replaceable>:<replaceable>HARD</replaceable></literal> | |
1185 | or <literal><replaceable>LIMIT</replaceable>=<replaceable>VALUE</replaceable></literal>, where | |
1186 | <replaceable>LIMIT</replaceable> should refer to a resource limit type, such as | |
1187 | <constant>RLIMIT_NOFILE</constant> or <constant>RLIMIT_NICE</constant>. The <replaceable>SOFT</replaceable> and | |
1188 | <replaceable>HARD</replaceable> fields should refer to the numeric soft and hard resource limit values. If the | |
1b2ad5d9 | 1189 | second form is used, <replaceable>VALUE</replaceable> may specify a value that is used both as soft and hard |
bf428efb LP |
1190 | limit. In place of a numeric value the special string <literal>infinity</literal> may be used to turn off |
1191 | resource limiting for the specific type of resource. This command line option may be used multiple times to | |
1b2ad5d9 | 1192 | control limits on multiple limit types. If used multiple times for the same limit type, the last use |
bf428efb LP |
1193 | wins. For details about resource limits see <citerefentry |
1194 | project='man-pages'><refentrytitle>setrlimit</refentrytitle><manvolnum>2</manvolnum></citerefentry>. By default | |
1195 | resource limits for the container's init process (PID 1) are set to the same values the Linux kernel originally | |
1196 | passed to the host init system. Note that some resource limits are enforced on resources counted per user, in | |
1197 | particular <constant>RLIMIT_NPROC</constant>. This means that unless user namespacing is deployed | |
1198 | (i.e. <option>--private-users=</option> is used, see above), any limits set will be applied to the resource | |
1199 | usage of the same user on all local containers as well as the host. This means particular care needs to be | |
1200 | taken with these limits as they might be triggered by possibly less trusted code. Example: | |
1201 | <literal>--rlimit=RLIMIT_NOFILE=8192:16384</literal>.</para></listitem> | |
1202 | </varlistentry> | |
1203 | ||
81f345df LP |
1204 | <varlistentry> |
1205 | <term><option>--oom-score-adjust=</option></term> | |
1206 | ||
1207 | <listitem><para>Changes the OOM ("Out Of Memory") score adjustment value for the container payload. This controls | |
1208 | <filename>/proc/self/oom_score_adj</filename> which influences the preference with which this container is | |
1209 | terminated when memory becomes scarce. For details see <citerefentry | |
1210 | project='man-pages'><refentrytitle>proc</refentrytitle><manvolnum>5</manvolnum></citerefentry>. Takes an | |
1211 | integer in the range -1000…1000.</para></listitem> | |
1212 | </varlistentry> | |
1213 | ||
d107bb7d LP |
1214 | <varlistentry> |
1215 | <term><option>--cpu-affinity=</option></term> | |
1216 | ||
1217 | <listitem><para>Controls the CPU affinity of the container payload. Takes a comma separated list of CPU numbers | |
1218 | or number ranges (the latter's start and end value separated by dashes). See <citerefentry | |
1219 | project='man-pages'><refentrytitle>sched_setaffinity</refentrytitle><manvolnum>2</manvolnum></citerefentry> for | |
1220 | details.</para></listitem> | |
1221 | </varlistentry> | |
1222 | ||
c6c8f6e2 | 1223 | <varlistentry> |
d99058c9 | 1224 | <term><option>--personality=</option></term> |
b09c0bba | 1225 | |
d99058c9 LP |
1226 | <listitem><para>Control the architecture ("personality") |
1227 | reported by | |
1228 | <citerefentry project='man-pages'><refentrytitle>uname</refentrytitle><manvolnum>2</manvolnum></citerefentry> | |
1229 | in the container. Currently, only <literal>x86</literal> and | |
1230 | <literal>x86-64</literal> are supported. This is useful when | |
1231 | running a 32-bit container on a 64-bit host. If this setting | |
1232 | is not used, the personality reported in the container is the | |
1233 | same as the one reported on the host.</para></listitem> | |
798d3a52 | 1234 | </varlistentry> |
d99058c9 | 1235 | </variablelist> |
798d3a52 | 1236 | |
d99058c9 LP |
1237 | </refsect2><refsect2> |
1238 | <title>Integration Options</title> | |
798d3a52 | 1239 | |
d99058c9 | 1240 | <variablelist> |
09d423e9 LP |
1241 | <varlistentry> |
1242 | <term><option>--resolv-conf=</option></term> | |
1243 | ||
e309b929 LP |
1244 | <listitem><para>Configures how <filename>/etc/resolv.conf</filename> inside of the container shall be |
1245 | handled (i.e. DNS configuration synchronization from host to container). Takes one of | |
1246 | <literal>off</literal>, <literal>copy-host</literal>, <literal>copy-static</literal>, | |
1247 | <literal>copy-uplink</literal>, <literal>copy-stub</literal>, <literal>replace-host</literal>, | |
1248 | <literal>replace-static</literal>, <literal>replace-uplink</literal>, | |
1249 | <literal>replace-stub</literal>, <literal>bind-host</literal>, <literal>bind-static</literal>, | |
1250 | <literal>bind-uplink</literal>, <literal>bind-stub</literal>, <literal>delete</literal> or | |
1251 | <literal>auto</literal>.</para> | |
1252 | ||
1253 | <para>If set to <literal>off</literal> the <filename>/etc/resolv.conf</filename> file in the | |
1254 | container is left as it is included in the image, and neither modified nor bind mounted over.</para> | |
1255 | ||
1256 | <para>If set to <literal>copy-host</literal>, the <filename>/etc/resolv.conf</filename> file from the | |
1257 | host is copied into the container, unless the file exists already and is not a regular file (e.g. a | |
15102ced ZJS |
1258 | symlink). Similarly, if <literal>replace-host</literal> is used the file is copied, replacing any |
1259 | existing inode, including symlinks. Similarly, if <literal>bind-host</literal> is used, the file is | |
e309b929 LP |
1260 | bind mounted from the host into the container.</para> |
1261 | ||
1262 | <para>If set to <literal>copy-static</literal>, <literal>replace-static</literal> or | |
1263 | <literal>bind-static</literal> the static <filename>resolv.conf</filename> file supplied with | |
1264 | <citerefentry><refentrytitle>systemd-resolved.service</refentrytitle><manvolnum>8</manvolnum></citerefentry> | |
1265 | (specifically: <filename>/usr/lib/systemd/resolv.conf</filename>) is copied or bind mounted into the | |
1266 | container.</para> | |
1267 | ||
1268 | <para>If set to <literal>copy-uplink</literal>, <literal>replace-uplink</literal> or | |
1269 | <literal>bind-uplink</literal> the uplink <filename>resolv.conf</filename> file managed by | |
1270 | <filename>systemd-resolved.service</filename> (specifically: | |
1271 | <filename>/run/systemd/resolve/resolv.conf</filename>) is copied or bind mounted into the | |
1272 | container.</para> | |
1273 | ||
1274 | <para>If set to <literal>copy-stub</literal>, <literal>replace-stub</literal> or | |
1275 | <literal>bind-stub</literal> the stub <filename>resolv.conf</filename> file managed by | |
1276 | <filename>systemd-resolved.service</filename> (specifically: | |
1277 | <filename>/run/systemd/resolve/stub-resolv.conf</filename>) is copied or bind mounted into the | |
1278 | container.</para> | |
1279 | ||
1280 | <para>If set to <literal>delete</literal> the <filename>/etc/resolv.conf</filename> file in the | |
1281 | container is deleted if it exists.</para> | |
1282 | ||
1283 | <para>Finally, if set to <literal>auto</literal> the file is left as it is if private networking is | |
1284 | turned on (see <option>--private-network</option>). Otherwise, if | |
e9dd6984 ZJS |
1285 | <filename>systemd-resolved.service</filename> is running its stub <filename>resolv.conf</filename> |
1286 | file is used, and if not the host's <filename>/etc/resolv.conf</filename> file. In the latter cases | |
1287 | the file is copied if the image is writable, and bind mounted otherwise.</para> | |
e309b929 LP |
1288 | |
1289 | <para>It's recommended to use <literal>copy-…</literal> or <literal>replace-…</literal> if the | |
1290 | container shall be able to make changes to the DNS configuration on its own, deviating from the | |
1291 | host's settings. Otherwise <literal>bind</literal> is preferable, as it means direct changes to | |
1292 | <filename>/etc/resolv.conf</filename> in the container are not allowed, as it is a read-only bind | |
1293 | mount (but note that if the container has enough privileges, it might simply go ahead and unmount the | |
1294 | bind mount anyway). Note that both if the file is bind mounted and if it is copied no further | |
1295 | propagation of configuration is generally done after the one-time early initialization (this is | |
1296 | because the file is usually updated through copying and renaming). Defaults to | |
09d423e9 LP |
1297 | <literal>auto</literal>.</para></listitem> |
1298 | </varlistentry> | |
1299 | ||
1688841f LP |
1300 | <varlistentry> |
1301 | <term><option>--timezone=</option></term> | |
1302 | ||
e9dd6984 ZJS |
1303 | <listitem><para>Configures how <filename>/etc/localtime</filename> inside of the container |
1304 | (i.e. local timezone synchronization from host to container) shall be handled. Takes one of | |
1305 | <literal>off</literal>, <literal>copy</literal>, <literal>bind</literal>, <literal>symlink</literal>, | |
1306 | <literal>delete</literal> or <literal>auto</literal>. If set to <literal>off</literal> the | |
1307 | <filename>/etc/localtime</filename> file in the container is left as it is included in the image, and | |
1308 | neither modified nor bind mounted over. If set to <literal>copy</literal> the | |
1309 | <filename>/etc/localtime</filename> file of the host is copied into the container. Similarly, if | |
1310 | <literal>bind</literal> is used, the file is bind mounted from the host into the container. If set to | |
1311 | <literal>symlink</literal>, a symlink is created pointing from <filename>/etc/localtime</filename> in | |
1312 | the container to the timezone file in the container that matches the timezone setting on the host. If | |
1313 | set to <literal>delete</literal>, the file in the container is deleted, should it exist. If set to | |
1314 | <literal>auto</literal> and the <filename>/etc/localtime</filename> file of the host is a symlink, | |
1315 | then <literal>symlink</literal> mode is used, and <literal>copy</literal> otherwise, except if the | |
1316 | image is read-only in which case <literal>bind</literal> is used instead. Defaults to | |
1688841f LP |
1317 | <literal>auto</literal>.</para></listitem> |
1318 | </varlistentry> | |
1319 | ||
798d3a52 | 1320 | <varlistentry> |
d99058c9 | 1321 | <term><option>--link-journal=</option></term> |
798d3a52 | 1322 | |
d99058c9 LP |
1323 | <listitem><para>Control whether the container's journal shall |
1324 | be made visible to the host system. If enabled, allows viewing | |
1325 | the container's journal files from the host (but not vice | |
1326 | versa). Takes one of <literal>no</literal>, | |
1327 | <literal>host</literal>, <literal>try-host</literal>, | |
1328 | <literal>guest</literal>, <literal>try-guest</literal>, | |
1329 | <literal>auto</literal>. If <literal>no</literal>, the journal | |
1330 | is not linked. If <literal>host</literal>, the journal files | |
1331 | are stored on the host file system (beneath | |
1332 | <filename>/var/log/journal/<replaceable>machine-id</replaceable></filename>) | |
1333 | and the subdirectory is bind-mounted into the container at the | |
1334 | same location. If <literal>guest</literal>, the journal files | |
1335 | are stored on the guest file system (beneath | |
1336 | <filename>/var/log/journal/<replaceable>machine-id</replaceable></filename>) | |
1337 | and the subdirectory is symlinked into the host at the same | |
1338 | location. <literal>try-host</literal> and | |
1339 | <literal>try-guest</literal> do the same but do not fail if | |
1340 | the host does not have persistent journaling enabled. If | |
1341 | <literal>auto</literal> (the default), and the right | |
1342 | subdirectory of <filename>/var/log/journal</filename> exists, | |
1343 | it will be bind mounted into the container. If the | |
1344 | subdirectory does not exist, no linking is performed. | |
1345 | Effectively, booting a container once with | |
1346 | <literal>guest</literal> or <literal>host</literal> will link | |
1347 | the journal persistently if further on the default of | |
1348 | <literal>auto</literal> is used.</para> | |
1349 | ||
1350 | <para>Note that <option>--link-journal=try-guest</option> is the default if the | |
1351 | <filename>systemd-nspawn@.service</filename> template unit file is used.</para></listitem> | |
798d3a52 ZJS |
1352 | </varlistentry> |
1353 | ||
d99058c9 LP |
1354 | <varlistentry> |
1355 | <term><option>-j</option></term> | |
1356 | ||
1357 | <listitem><para>Equivalent to | |
1358 | <option>--link-journal=try-guest</option>.</para></listitem> | |
1359 | </varlistentry> | |
1360 | ||
1361 | </variablelist> | |
1362 | ||
1363 | </refsect2><refsect2> | |
1364 | <title>Mount Options</title> | |
1365 | ||
1366 | <variablelist> | |
1367 | ||
798d3a52 ZJS |
1368 | <varlistentry> |
1369 | <term><option>--bind=</option></term> | |
1370 | <term><option>--bind-ro=</option></term> | |
1371 | ||
86c0dd4a | 1372 | <listitem><para>Bind mount a file or directory from the host into the container. Takes one of: a path |
c7a4890c LP |
1373 | argument — in which case the specified path will be mounted from the host to the same path in the container, or |
1374 | a colon-separated pair of paths — in which case the first specified path is the source in the host, and the | |
1375 | second path is the destination in the container, or a colon-separated triple of source path, destination path | |
86c0dd4a | 1376 | and mount options. The source path may optionally be prefixed with a <literal>+</literal> character. If so, the |
c7a4890c LP |
1377 | source path is taken relative to the image's root directory. This permits setting up bind mounts within the |
1378 | container image. The source path may be specified as empty string, in which case a temporary directory below | |
3b121157 | 1379 | the host's <filename>/var/tmp/</filename> directory is used. It is automatically removed when the container is |
448f7377 DDM |
1380 | shut down. If the source path is not absolute, it is resolved relative to the current working directory. |
1381 | The <option>--bind-ro=</option> option creates read-only bind mounts. Backslash escapes are interpreted, | |
c0c8f718 AV |
1382 | so <literal>\:</literal> may be used to embed colons in either path. This option may be specified |
1383 | multiple times for creating multiple independent bind mount points.</para> | |
1384 | ||
1385 | <para>Mount options are comma-separated. <option>rbind</option> and <option>norbind</option> control whether | |
2b2777ed QD |
1386 | to create a recursive or a regular bind mount. Defaults to "rbind". <option>noidmap</option>, |
1387 | <option>idmap</option>, and <option>rootidmap</option> control ID mapping.</para> | |
1388 | ||
1389 | <para>Using <option>idmap</option> or <option>rootidmap</option> requires support by the source filesystem | |
1390 | for user/group ID mapped mounts. Defaults to "noidmap". With <option>x</option> being the container's UID range | |
1391 | offset, <option>y</option> being the length of the container's UID range, and <option>p</option> being the | |
1392 | owner UID of the bind mount source inode on the host: | |
1393 | ||
1394 | <itemizedlist> | |
1395 | <listitem><para>If <option>noidmap</option> is used, any user <option>z</option> in the range | |
1396 | <option>0 … y</option> seen from inside of the container is mapped to <option>x + z</option> in the | |
8b9f0921 | 1397 | <option>x … x + y</option> range on the host. Other host users are mapped to |
2b2777ed | 1398 | <option>nobody</option> inside the container.</para></listitem> |
8fb35004 | 1399 | |
2b2777ed QD |
1400 | <listitem><para>If <option>idmap</option> is used, any user <option>z</option> in the UID range |
1401 | <option>0 … y</option> as seen from inside the container is mapped to the same <option>z</option> | |
8fb35004 ZJS |
1402 | in the same <option>0 … y</option> range on the host. Other host users are mapped to |
1403 | <option>nobody</option> inside the container.</para></listitem> | |
1404 | ||
2b2777ed | 1405 | <listitem><para>If <option>rootidmap</option> is used, the user <option>0</option> seen from inside |
8fb35004 ZJS |
1406 | of the container is mapped to <option>p</option> on the host. Other host users are mapped to |
1407 | <option>nobody</option> inside the container.</para></listitem> | |
2b2777ed QD |
1408 | </itemizedlist></para> |
1409 | ||
1410 | <para>Whichever ID mapping option is used, the same mapping will be used for users and groups IDs. If | |
a9ba6f8a | 1411 | <option>rootidmap</option> is used, the group owning the bind mounted directory will have no effect.</para> |
994a6364 LP |
1412 | |
1413 | <para>Note that when this option is used in combination with <option>--private-users</option>, the resulting | |
1414 | mount points will be owned by the <constant>nobody</constant> user. That's because the mount and its files and | |
1415 | directories continue to be owned by the relevant host users and groups, which do not exist in the container, | |
1416 | and thus show up under the wildcard UID 65534 (nobody). If such bind mounts are created, it is recommended to | |
c0c8f718 | 1417 | make them read-only, using <option>--bind-ro=</option>. Alternatively you can use the "idmap" mount option to |
2b2777ed | 1418 | map the filesystem IDs.</para></listitem> |
798d3a52 ZJS |
1419 | </varlistentry> |
1420 | ||
a06c9ac2 LP |
1421 | <varlistentry> |
1422 | <term><option>--bind-user=</option></term> | |
1423 | ||
1424 | <listitem><para>Binds the home directory of the specified user on the host into the container. Takes | |
1425 | the name of an existing user on the host as argument. May be used multiple times to bind multiple | |
1426 | users into the container. This does three things:</para> | |
1427 | ||
1428 | <orderedlist> | |
1429 | <listitem><para>The user's home directory is bind mounted from the host into | |
f39d7d00 | 1430 | <filename>/run/host/home/</filename>.</para></listitem> |
a06c9ac2 LP |
1431 | |
1432 | <listitem><para>An additional UID/GID mapping is added that maps the host user's UID/GID to a | |
1433 | container UID/GID, allocated from the 60514…60577 range.</para></listitem> | |
1434 | ||
1435 | <listitem><para>A JSON user and group record is generated in <filename>/run/userdb/</filename> that | |
1436 | describes the mapped user. It contains a minimized representation of the host's user record, | |
1437 | adjusted to the UID/GID and home directory path assigned to the user in the container. The | |
1438 | <citerefentry><refentrytitle>nss-systemd</refentrytitle><manvolnum>8</manvolnum></citerefentry> | |
1439 | glibc NSS module will pick up these records from there and make them available in the container's | |
1440 | user/group databases.</para></listitem> | |
1441 | </orderedlist> | |
1442 | ||
1443 | <para>The combination of the three operations above ensures that it is possible to log into the | |
be0d27ee ZJS |
1444 | container using the same account information as on the host. The user is only mapped transiently, |
1445 | while the container is running, and the mapping itself does not result in persistent changes to the | |
1446 | container (except maybe for log messages generated at login time, and similar). Note that in | |
1447 | particular the UID/GID assignment in the container is not made persistently. If the user is mapped | |
1448 | transiently, it is best to not allow the user to make persistent changes to the container. If the | |
1449 | user leaves files or directories owned by the user, and those UIDs/GIDs are reused during later | |
a06c9ac2 LP |
1450 | container invocations (possibly with a different <option>--bind-user=</option> mapping), those files |
1451 | and directories will be accessible to the "new" user.</para> | |
1452 | ||
1453 | <para>The user/group record mapping only works if the container contains systemd 249 or newer, with | |
1454 | <command>nss-systemd</command> properly configured in <filename>nsswitch.conf</filename>. See | |
1455 | <citerefentry><refentrytitle>nss-systemd</refentrytitle><manvolnum>8</manvolnum></citerefentry> for | |
1456 | details.</para> | |
1457 | ||
1458 | <para>Note that the user record propagated from the host into the container will contain the UNIX | |
1459 | password hash of the user, so that seamless logins in the container are possible. If the container is | |
1460 | less trusted than the host it's hence important to use a strong UNIX password hash function | |
1461 | (e.g. yescrypt or similar, with the <literal>$y$</literal> hash prefix).</para> | |
1462 | ||
1463 | <para>When binding a user from the host into the container checks are executed to ensure that the | |
1464 | username is not yet known in the container. Moreover, it is checked that the UID/GID allocated for it | |
1465 | is not currently defined in the user/group databases of the container. Both checks directly access | |
1466 | the container's <filename>/etc/passwd</filename> and <filename>/etc/group</filename>, and thus might | |
1467 | not detect existing accounts in other databases.</para> | |
1468 | ||
1469 | <para>This operation is only supported in combination with | |
1470 | <option>--private-users=</option>/<option>-U</option>.</para></listitem> | |
1471 | </varlistentry> | |
1472 | ||
3d6c3675 LP |
1473 | <varlistentry> |
1474 | <term><option>--inaccessible=</option></term> | |
1475 | ||
1476 | <listitem><para>Make the specified path inaccessible in the container. This over-mounts the specified path | |
1477 | (which must exist in the container) with a file node of the same type that is empty and has the most | |
1478 | restrictive access mode supported. This is an effective way to mask files, directories and other file system | |
1479 | objects from the container payload. This option may be used more than once in case all specified paths are | |
1480 | masked.</para></listitem> | |
1481 | </varlistentry> | |
1482 | ||
798d3a52 ZJS |
1483 | <varlistentry> |
1484 | <term><option>--tmpfs=</option></term> | |
1485 | ||
b23f1628 LP |
1486 | <listitem><para>Mount a tmpfs file system into the container. Takes a single absolute path argument that |
1487 | specifies where to mount the tmpfs instance to (in which case the directory access mode will be chosen as 0755, | |
1488 | owned by root/root), or optionally a colon-separated pair of path and mount option string that is used for | |
1489 | mounting (in which case the kernel default for access mode and owner will be chosen, unless otherwise | |
1490 | specified). Backslash escapes are interpreted in the path, so <literal>\:</literal> may be used to embed colons | |
1491 | in the path.</para> | |
1492 | ||
1493 | <para>Note that this option cannot be used to replace the root file system of the container with a temporary | |
1494 | file system. However, the <option>--volatile=</option> option described below provides similar | |
1495 | functionality, with a focus on implementing stateless operating system images.</para></listitem> | |
798d3a52 ZJS |
1496 | </varlistentry> |
1497 | ||
5a8af538 LP |
1498 | <varlistentry> |
1499 | <term><option>--overlay=</option></term> | |
1500 | <term><option>--overlay-ro=</option></term> | |
1501 | ||
f075e32c DDM |
1502 | <listitem><para>Combine multiple directory trees into one overlay file system and mount it into the |
1503 | container. Takes a list of colon-separated paths to the directory trees to combine and the | |
1504 | destination mount point.</para> | |
1505 | ||
1506 | <para>Backslash escapes are interpreted in the paths, so <literal>\:</literal> may be used to embed | |
1507 | colons in the paths.</para> | |
1508 | ||
1509 | <para>If three or more paths are specified, then the last specified path is the destination mount | |
1510 | point in the container, all paths specified before refer to directory trees on the host and are | |
1511 | combined in the specified order into one overlay file system. The left-most path is hence the lowest | |
1512 | directory tree, the second-to-last path the highest directory tree in the stacking order. If | |
1513 | <option>--overlay-ro=</option> is used instead of <option>--overlay=</option>, a read-only overlay | |
1514 | file system is created. If a writable overlay file system is created, all changes made to it are | |
1515 | written to the highest directory tree in the stacking order, i.e. the second-to-last specified. | |
2eadf91c RM |
1516 | </para> |
1517 | ||
f075e32c DDM |
1518 | <para>If only two paths are specified, then the second specified path is used both as the top-level |
1519 | directory tree in the stacking order as seen from the host, as well as the mount point for the | |
1520 | overlay file system in the container. At least two paths have to be specified.</para> | |
5a8af538 | 1521 | |
3b121157 ZJS |
1522 | <para>The source paths may optionally be prefixed with <literal>+</literal> character. If so they are |
1523 | taken relative to the image's root directory. The uppermost source path may also be specified as an | |
1524 | empty string, in which case a temporary directory below the host's <filename>/var/tmp/</filename> is | |
1525 | used. The directory is removed automatically when the container is shut down. This behaviour is | |
1526 | useful in order to make read-only container directories writable while the container is running. For | |
1527 | example, use <literal>--overlay=+/var::/var</literal> in order to automatically overlay a writable | |
448f7377 DDM |
1528 | temporary directory on a read-only <filename>/var/</filename> directory. If a source path is not |
1529 | absolute, it is resolved relative to the current working directory.</para> | |
86c0dd4a | 1530 | |
5a8af538 | 1531 | <para>For details about overlay file systems, see <ulink |
0e685823 | 1532 | url="https://docs.kernel.org/filesystems/overlayfs.html">Overlay Filesystem</ulink>. |
2f8211c6 ZJS |
1533 | Note that the semantics of overlay file systems are substantially different from normal file systems, |
1534 | in particular regarding reported device and inode information. Device and inode information may | |
1535 | change for a file while it is being written to, and processes might see out-of-date versions of files | |
1536 | at times. Note that this switch automatically derives the <literal>workdir=</literal> mount option | |
1537 | for the overlay file system from the top-level directory tree, making it a sibling of it. It is hence | |
1538 | essential that the top-level directory tree is not a mount point itself (since the working directory | |
1539 | must be on the same file system as the top-most directory tree). Also note that the | |
1540 | <literal>lowerdir=</literal> mount option receives the paths to stack in the opposite order of this | |
1541 | switch.</para> | |
b23f1628 LP |
1542 | |
1543 | <para>Note that this option cannot be used to replace the root file system of the container with an overlay | |
d99058c9 | 1544 | file system. However, the <option>--volatile=</option> option described above provides similar functionality, |
b23f1628 | 1545 | with a focus on implementing stateless operating system images.</para></listitem> |
5a8af538 | 1546 | </varlistentry> |
d99058c9 | 1547 | </variablelist> |
730bdfed | 1548 | </refsect2> |
5a8af538 | 1549 | |
730bdfed | 1550 | <refsect2> |
d99058c9 | 1551 | <title>Input/Output Options</title> |
798d3a52 | 1552 | |
d99058c9 | 1553 | <variablelist> |
3d6c3675 LP |
1554 | <varlistentry> |
1555 | <term><option>--console=</option><replaceable>MODE</replaceable></term> | |
1556 | ||
7a25ba55 ZJS |
1557 | <listitem><para>Configures how to set up standard input, output and error output for the container |
1558 | payload, as well as the <filename>/dev/console</filename> device for the container. Takes one of | |
10e8a60b LP |
1559 | <option>interactive</option>, <option>read-only</option>, <option>passive</option>, |
1560 | <option>pipe</option> or <option>autopipe</option>. If <option>interactive</option>, a pseudo-TTY is | |
1561 | allocated and made available as <filename>/dev/console</filename> in the container. It is then | |
1562 | bi-directionally connected to the standard input and output passed to | |
1563 | <command>systemd-nspawn</command>. <option>read-only</option> is similar but only the output of the | |
1564 | container is propagated and no input from the caller is read. If <option>passive</option>, a pseudo | |
1565 | TTY is allocated, but it is not connected anywhere. In <option>pipe</option> mode no pseudo TTY is | |
1566 | allocated, but the standard input, output and error output file descriptors passed to | |
1567 | <command>systemd-nspawn</command> are passed on — as they are — to the container payload, see the | |
1568 | following paragraph. Finally, <option>autopipe</option> mode operates like | |
1569 | <option>interactive</option> when <command>systemd-nspawn</command> is invoked on a terminal, and | |
1570 | like <option>pipe</option> otherwise. Defaults to <option>interactive</option> if | |
3d6c3675 | 1571 | <command>systemd-nspawn</command> is invoked from a terminal, and <option>read-only</option> |
7a25ba55 ZJS |
1572 | otherwise.</para> |
1573 | ||
1574 | <para>In <option>pipe</option> mode, <filename>/dev/console</filename> will not exist in the | |
1575 | container. This means that the container payload generally cannot be a full init system as init | |
1576 | systems tend to require <filename>/dev/console</filename> to be available. On the other hand, in this | |
1577 | mode container invocations can be used within shell pipelines. This is because intermediary pseudo | |
1578 | TTYs do not permit independent bidirectional propagation of the end-of-file (EOF) condition, which is | |
1579 | necessary for shell pipelines to work correctly. <emphasis>Note that the <option>pipe</option> mode | |
1580 | should be used carefully</emphasis>, as passing arbitrary file descriptors to less trusted container | |
1581 | payloads might open up unwanted interfaces for access by the container payload. For example, if a | |
1582 | passed file descriptor refers to a TTY of some form, APIs such as <constant>TIOCSTI</constant> may be | |
1583 | used to synthesize input that might be used for escaping the container. Hence <option>pipe</option> | |
1584 | mode should only be used if the payload is sufficiently trusted or when the standard | |
1585 | input/output/error output file descriptors are known safe, for example pipes.</para></listitem> | |
3d6c3675 LP |
1586 | </varlistentry> |
1587 | ||
1588 | <varlistentry> | |
1589 | <term><option>--pipe</option></term> | |
1590 | <term><option>-P</option></term> | |
1591 | ||
1592 | <listitem><para>Equivalent to <option>--console=pipe</option>.</para></listitem> | |
1593 | </varlistentry> | |
60cc90b9 LP |
1594 | </variablelist> |
1595 | ||
730bdfed ZJS |
1596 | </refsect2> |
1597 | <refsect2> | |
1598 | <title>Credentials</title> | |
1599 | ||
1600 | <variablelist> | |
1601 | <varlistentry> | |
1602 | <term><option>--load-credential=</option><replaceable>ID</replaceable>:<replaceable>PATH</replaceable></term> | |
1603 | <term><option>--set-credential=</option><replaceable>ID</replaceable>:<replaceable>VALUE</replaceable></term> | |
1604 | ||
1605 | <listitem><para>Pass a credential to the container. These two options correspond to the | |
1606 | <varname>LoadCredential=</varname> and <varname>SetCredential=</varname> settings in unit files. See | |
1607 | <citerefentry><refentrytitle>systemd.exec</refentrytitle><manvolnum>5</manvolnum></citerefentry> for | |
1608 | details about these concepts, as well as the syntax of the option's arguments.</para> | |
1609 | ||
1610 | <para>Note: when <command>systemd-nspawn</command> runs as systemd system service it can propagate | |
1611 | the credentials it received via <varname>LoadCredential=</varname>/<varname>SetCredential=</varname> | |
1612 | to the container payload. A systemd service manager running as PID 1 in the container can further | |
1613 | propagate them to the services it itself starts. It is thus possible to easily propagate credentials | |
1614 | from a parent service manager to a container manager service and from there into its payload. This | |
1615 | can even be done recursively.</para> | |
1616 | ||
1617 | <para>In order to embed binary data into the credential data for <option>--set-credential=</option>, | |
1618 | use C-style escaping (i.e. <literal>\n</literal> to embed a newline, or <literal>\x00</literal> to | |
1619 | embed a <constant>NUL</constant> byte). Note that the invoking shell might already apply unescaping | |
1620 | once, hence this might require double escaping!.</para> | |
1621 | ||
1622 | <para>The | |
1623 | <citerefentry><refentrytitle>systemd-sysusers.service</refentrytitle><manvolnum>8</manvolnum></citerefentry> | |
1624 | and | |
1625 | <citerefentry><refentrytitle>systemd-firstboot</refentrytitle><manvolnum>1</manvolnum></citerefentry> | |
1626 | services read credentials configured this way for the purpose of configuring the container's root | |
1627 | user's password and shell, as well as system locale, keymap and timezone during the first boot | |
1628 | process of the container. This is particularly useful in combination with | |
1629 | <option>--volatile=yes</option> where every single boot appears as first boot, since configuration | |
1630 | applied to <filename>/etc/</filename> is lost on container reboot cycles. See the respective man | |
1631 | pages for details. Example:</para> | |
1632 | ||
1633 | <programlisting># systemd-nspawn -i image.raw \ | |
1634 | --volatile=yes \ | |
1635 | --set-credential=firstboot.locale:de_DE.UTF-8 \ | |
1636 | --set-credential=passwd.hashed-password.root:'$y$j9T$yAuRJu1o5HioZAGDYPU5d.$F64ni6J2y2nNQve90M/p0ZP0ECP/qqzipNyaY9fjGpC' \ | |
1637 | -b</programlisting> | |
1638 | ||
1639 | <para>The above command line will invoke the specified image file <filename>image.raw</filename> in | |
1640 | volatile mode, i.e. with empty <filename>/etc/</filename> and <filename>/var/</filename>. The | |
1641 | container payload will recognize this as a first boot, and will invoke | |
1642 | <filename>systemd-firstboot.service</filename>, which then reads the two passed credentials to | |
1643 | configure the system's initial locale and root password.</para> | |
1644 | </listitem> | |
60cc90b9 | 1645 | </varlistentry> |
730bdfed | 1646 | </variablelist> |
60cc90b9 LP |
1647 | |
1648 | </refsect2><refsect2> | |
1649 | <title>Other</title> | |
1650 | ||
1651 | <variablelist> | |
bb068de0 | 1652 | <xi:include href="standard-options.xml" xpointer="no-pager" /> |
798d3a52 ZJS |
1653 | <xi:include href="standard-options.xml" xpointer="help" /> |
1654 | <xi:include href="standard-options.xml" xpointer="version" /> | |
1655 | </variablelist> | |
d99058c9 | 1656 | </refsect2> |
798d3a52 ZJS |
1657 | </refsect1> |
1658 | ||
4ef3ca34 | 1659 | <xi:include href="common-variables.xml" /> |
bb068de0 | 1660 | |
798d3a52 ZJS |
1661 | <refsect1> |
1662 | <title>Examples</title> | |
1663 | ||
1664 | <example> | |
12c4ee0a ZJS |
1665 | <title>Download a |
1666 | <ulink url="https://getfedora.org">Fedora</ulink> image and start a shell in it</title> | |
798d3a52 | 1667 | |
3797fd0a | 1668 | <programlisting># machinectl pull-raw --verify=no \ |
b12a67ae AZ |
1669 | https://download.fedoraproject.org/pub/fedora/linux/releases/&fedora_latest_version;/Cloud/x86_64/images/Fedora-Cloud-Base-&fedora_latest_version;-&fedora_cloud_release;.x86_64.raw.xz \ |
1670 | Fedora-Cloud-Base-&fedora_latest_version;-&fedora_cloud_release;.x86-64 | |
1671 | # systemd-nspawn -M Fedora-Cloud-Base-&fedora_latest_version;-&fedora_cloud_release;.x86-64</programlisting> | |
e0ea94c1 | 1672 | |
798d3a52 ZJS |
1673 | <para>This downloads an image using |
1674 | <citerefentry><refentrytitle>machinectl</refentrytitle><manvolnum>1</manvolnum></citerefentry> | |
1675 | and opens a shell in it.</para> | |
1676 | </example> | |
e0ea94c1 | 1677 | |
798d3a52 ZJS |
1678 | <example> |
1679 | <title>Build and boot a minimal Fedora distribution in a container</title> | |
8f7a3c14 | 1680 | |
7a8aa0ec | 1681 | <programlisting># dnf -y --releasever=&fedora_latest_version; --installroot=/var/lib/machines/f&fedora_latest_version; \ |
8c4db562 | 1682 | --repo=fedora --repo=updates --setopt=install_weak_deps=False install \ |
5015b501 | 1683 | passwd dnf fedora-release vim-minimal util-linux systemd systemd-networkd |
7a8aa0ec | 1684 | # systemd-nspawn -bD /var/lib/machines/f&fedora_latest_version;</programlisting> |
8f7a3c14 | 1685 | |
798d3a52 | 1686 | <para>This installs a minimal Fedora distribution into the |
b0343f8c | 1687 | directory <filename index="false">/var/lib/machines/f&fedora_latest_version;</filename> |
e9dd6984 | 1688 | and then boots that OS in a namespace container. Because the installation |
55107232 ZJS |
1689 | is located underneath the standard <filename>/var/lib/machines/</filename> |
1690 | directory, it is also possible to start the machine using | |
7a8aa0ec | 1691 | <command>systemd-nspawn -M f&fedora_latest_version;</command>.</para> |
798d3a52 | 1692 | </example> |
8f7a3c14 | 1693 | |
798d3a52 ZJS |
1694 | <example> |
1695 | <title>Spawn a shell in a container of a minimal Debian unstable distribution</title> | |
8f7a3c14 | 1696 | |
7f8b3d1d | 1697 | <programlisting># debootstrap unstable ~/debian-tree/ |
25f5971b | 1698 | # systemd-nspawn -D ~/debian-tree/</programlisting> |
8f7a3c14 | 1699 | |
798d3a52 ZJS |
1700 | <para>This installs a minimal Debian unstable distribution into |
1701 | the directory <filename>~/debian-tree/</filename> and then | |
e9dd6984 | 1702 | spawns a shell from this image in a namespace container.</para> |
12c4ee0a ZJS |
1703 | |
1704 | <para><command>debootstrap</command> supports | |
1705 | <ulink url="https://www.debian.org">Debian</ulink>, | |
1706 | <ulink url="https://www.ubuntu.com">Ubuntu</ulink>, | |
1707 | and <ulink url="https://www.tanglu.org">Tanglu</ulink> | |
1708 | out of the box, so the same command can be used to install any of those. For other | |
1709 | distributions from the Debian family, a mirror has to be specified, see | |
1710 | <citerefentry project='die-net'><refentrytitle>debootstrap</refentrytitle><manvolnum>8</manvolnum></citerefentry>. | |
1711 | </para> | |
798d3a52 | 1712 | </example> |
8f7a3c14 | 1713 | |
798d3a52 | 1714 | <example> |
12c4ee0a ZJS |
1715 | <title>Boot a minimal |
1716 | <ulink url="https://www.archlinux.org">Arch Linux</ulink> distribution in a container</title> | |
68562936 | 1717 | |
9a027075 | 1718 | <programlisting># pacstrap -c ~/arch-tree/ base |
68562936 WG |
1719 | # systemd-nspawn -bD ~/arch-tree/</programlisting> |
1720 | ||
ff9b60f3 | 1721 | <para>This installs a minimal Arch Linux distribution into the |
798d3a52 ZJS |
1722 | directory <filename>~/arch-tree/</filename> and then boots an OS |
1723 | in a namespace container in it.</para> | |
1724 | </example> | |
68562936 | 1725 | |
f518ee04 | 1726 | <example> |
12c4ee0a ZJS |
1727 | <title>Install the |
1728 | <ulink url="https://software.opensuse.org/distributions/tumbleweed">OpenSUSE Tumbleweed</ulink> | |
1729 | rolling distribution</title> | |
f518ee04 ZJS |
1730 | |
1731 | <programlisting># zypper --root=/var/lib/machines/tumbleweed ar -c \ | |
1732 | https://download.opensuse.org/tumbleweed/repo/oss tumbleweed | |
1733 | # zypper --root=/var/lib/machines/tumbleweed refresh | |
1734 | # zypper --root=/var/lib/machines/tumbleweed install --no-recommends \ | |
1735 | systemd shadow zypper openSUSE-release vim | |
1736 | # systemd-nspawn -M tumbleweed passwd root | |
1737 | # systemd-nspawn -M tumbleweed -b</programlisting> | |
1738 | </example> | |
1739 | ||
798d3a52 | 1740 | <example> |
17cbb288 | 1741 | <title>Boot into an ephemeral snapshot of the host system</title> |
f9f4dd51 | 1742 | |
798d3a52 | 1743 | <programlisting># systemd-nspawn -D / -xb</programlisting> |
f9f4dd51 | 1744 | |
17cbb288 LP |
1745 | <para>This runs a copy of the host system in a snapshot which is removed immediately when the container |
1746 | exits. All file system changes made during runtime will be lost on shutdown, hence.</para> | |
798d3a52 | 1747 | </example> |
f9f4dd51 | 1748 | |
798d3a52 ZJS |
1749 | <example> |
1750 | <title>Run a container with SELinux sandbox security contexts</title> | |
a8828ed9 | 1751 | |
798d3a52 | 1752 | <programlisting># chcon system_u:object_r:svirt_sandbox_file_t:s0:c0,c1 -R /srv/container |
3797fd0a ZJS |
1753 | # systemd-nspawn -L system_u:object_r:svirt_sandbox_file_t:s0:c0,c1 \ |
1754 | -Z system_u:system_r:svirt_lxc_net_t:s0:c0,c1 -D /srv/container /bin/sh</programlisting> | |
798d3a52 | 1755 | </example> |
b53ede69 PW |
1756 | |
1757 | <example> | |
1758 | <title>Run a container with an OSTree deployment</title> | |
1759 | ||
3797fd0a ZJS |
1760 | <programlisting># systemd-nspawn -b -i ~/image.raw \ |
1761 | --pivot-root=/ostree/deploy/$OS/deploy/$CHECKSUM:/sysroot \ | |
1762 | --bind=+/sysroot/ostree/deploy/$OS/var:/var</programlisting> | |
b53ede69 | 1763 | </example> |
798d3a52 ZJS |
1764 | </refsect1> |
1765 | ||
1766 | <refsect1> | |
1767 | <title>Exit status</title> | |
1768 | ||
1769 | <para>The exit code of the program executed in the container is | |
1770 | returned.</para> | |
1771 | </refsect1> | |
1772 | ||
1773 | <refsect1> | |
1774 | <title>See Also</title> | |
1775 | <para> | |
1776 | <citerefentry><refentrytitle>systemd</refentrytitle><manvolnum>1</manvolnum></citerefentry>, | |
f757855e | 1777 | <citerefentry><refentrytitle>systemd.nspawn</refentrytitle><manvolnum>5</manvolnum></citerefentry>, |
798d3a52 ZJS |
1778 | <citerefentry project='man-pages'><refentrytitle>chroot</refentrytitle><manvolnum>1</manvolnum></citerefentry>, |
1779 | <citerefentry project='mankier'><refentrytitle>dnf</refentrytitle><manvolnum>8</manvolnum></citerefentry>, | |
798d3a52 ZJS |
1780 | <citerefentry project='die-net'><refentrytitle>debootstrap</refentrytitle><manvolnum>8</manvolnum></citerefentry>, |
1781 | <citerefentry project='archlinux'><refentrytitle>pacman</refentrytitle><manvolnum>8</manvolnum></citerefentry>, | |
f518ee04 | 1782 | <citerefentry project='mankier'><refentrytitle>zypper</refentrytitle><manvolnum>8</manvolnum></citerefentry>, |
798d3a52 ZJS |
1783 | <citerefentry><refentrytitle>systemd.slice</refentrytitle><manvolnum>5</manvolnum></citerefentry>, |
1784 | <citerefentry><refentrytitle>machinectl</refentrytitle><manvolnum>1</manvolnum></citerefentry>, | |
3ba3a79d | 1785 | <citerefentry project='man-pages'><refentrytitle>btrfs</refentrytitle><manvolnum>8</manvolnum></citerefentry> |
798d3a52 ZJS |
1786 | </para> |
1787 | </refsect1> | |
8f7a3c14 LP |
1788 | |
1789 | </refentry> |