]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man2/pivot_root.2
pivot_root.2: Relegate text about what pivot_root() may or may not do to NOTES
[thirdparty/man-pages.git] / man2 / pivot_root.2
1 .\" Copyright (C) 2000 by Werner Almesberger
2 .\" and Copyright (C) 2019 Michael Kerrisk <mtk.manpages@gmail.com>
3 .\"
4 .\" %%%LICENSE_START(GPL_NOVERSION_ONELINE)
5 .\" May be distributed under GPL
6 .\" %%%LICENSE_END
7 .\"
8 .\" Written 2000-02-23 by Werner Almesberger
9 .\" Modified 2004-06-17 Michael Kerrisk <mtk.manpages@gmail.com>
10 .\"
11 .TH PIVOT_ROOT 2 2019-08-02 "Linux" "Linux Programmer's Manual"
12 .SH NAME
13 pivot_root \- change the root mount
14 .SH SYNOPSIS
15 .BI "int pivot_root(const char *" new_root ", const char *" put_old );
16 .PP
17 .IR Note :
18 There is no glibc wrapper for this system call; see NOTES.
19 .SH DESCRIPTION
20 .BR pivot_root ()
21 changes the root mount in the mount namespace of the calling process.
22 More precisely, it moves the root mount to the
23 directory \fIput_old\fP and makes \fInew_root\fP the new root mount.
24 The calling process must have the
25 .B CAP_SYS_ADMIN
26 capability in the user namespace that owns the caller's mount namespace.
27 .PP
28 .BR pivot_root ()
29 changes the root directory and the current working directory
30 of each process or thread in the same mount namespace to
31 .I new_root
32 if they point to the old root directory.
33 (See also NOTES.)
34 On the other hand,
35 .BR pivot_root ()
36 does not change the caller's current working directory
37 (unless it is on the old root directory),
38 and thus it should be followed by a
39 \fBchdir("/")\fP call.
40 .PP
41 The following restrictions apply:
42 .IP \- 3
43 .IR new_root
44 and
45 .IR put_old
46 must be directories.
47 .IP \-
48 .I new_root
49 and
50 .I put_old
51 must not be on the same mount as the current root.
52 .IP \-
53 \fIput_old\fP must be at or underneath \fInew_root\fP;
54 that is, adding a nonnegative
55 number of \fI/..\fP to the string pointed to by \fIput_old\fP must yield
56 the same directory as \fInew_root\fP.
57 .IP \-
58 .I new_root
59 must be a path to a mount point, but can't be
60 .IR """/""" .
61 A path that is not already a mount point can be converted into one by
62 bind mounting the path onto itself.
63 .IP \-
64 The propagation type of the parent mount of
65 .IR new_root
66 and the parent mount of the current root directory must not be
67 .BR MS_SHARED ;
68 similarly, if
69 .I put_old
70 is an existing mount point, its propagation type must not be
71 .BR MS_SHARED .
72 These restrictions ensure that
73 .BR pivot_root ()
74 never propagates any changes to another mount namespace.
75 .IP \-
76 The current root directory must be a mount point.
77 .SH RETURN VALUE
78 On success, zero is returned.
79 On error, \-1 is returned, and
80 \fIerrno\fP is set appropriately.
81 .SH ERRORS
82 .BR pivot_root ()
83 may fail with any of the same errors as
84 .BR stat (2).
85 Additionally, it may fail with the following errors:
86 .TP
87 .B EBUSY
88 .\" Reconfirmed that the following error occurs on Linux 5.0 by
89 .\" specifying 'new_root' as "/rootfs" and 'put_old' as
90 .\" "/rootfs/oldrootfs", and *not* bind mounting "/rootfs" on top of
91 .\" itself. Of course, this is an odd situation, since a later check
92 .\" in the kernel code will in any case yield EINVAL if 'new_root' is
93 .\" not a mount point. However, when the system call was first added,
94 .\" 'new_root' was not required to be a mount point. So, this
95 .\" error is nowadays probably just the result of crufty accumulation.
96 .\" This error can also occur if we bind mount "/" on top of itself
97 .\" and try to specify "/" as the 'new' (again, an odd situation). So,
98 .\" the EBUSY check in the kernel does still seem necessary to prevent
99 .\" that case. Furthermore, the "or put_old" piece is probably
100 .\" redundant text (although the check is in the kernel), since,
101 .\" in another check, 'put_old' is required to be under 'new_root'.
102 .I new_root
103 or
104 .I put_old
105 is on the current root filesystem.
106 (This error covers the pathological case where
107 .I new_root
108 is
109 .IR """/""" .)
110 .TP
111 .B EINVAL
112 .I new_root
113 is not a mount point.
114 .TP
115 .B EINVAL
116 \fIput_old\fP is not underneath \fInew_root\fP.
117 .TP
118 .B EINVAL
119 The current root directory is not a mount point
120 (because of an earlier
121 .BR chroot (2)).
122 .TP
123 .B EINVAL
124 The current root is on the rootfs (initial ramfs) filesystem; see NOTES.
125 .TP
126 .B EINVAL
127 Either the mount point at
128 .IR new_root ,
129 or the parent mount of that mount point,
130 has propagation type
131 .BR MS_SHARED .
132 .TP
133 .B EINVAL
134 .I put_old
135 is a mount point and has the propagation type
136 .BR MS_SHARED .
137 .TP
138 .B ENOTDIR
139 \fInew_root\fP or \fIput_old\fP is not a directory.
140 .TP
141 .B EPERM
142 The calling process does not have the
143 .B CAP_SYS_ADMIN
144 capability.
145 .SH VERSIONS
146 .BR pivot_root ()
147 was introduced in Linux 2.3.41.
148 .SH CONFORMING TO
149 .BR pivot_root ()
150 is Linux-specific and hence is not portable.
151 .SH NOTES
152 Glibc does not provide a wrapper for this system call; call it using
153 .BR syscall (2).
154 .PP
155 A command-line interface for this system call is provided by
156 .BR pivot_root (8).
157 .PP
158 .BR pivot_root ()
159 allows the caller to switch to a new root filesystem while at the same time
160 placing the old root mount at a location under
161 .I new_root
162 from where it can subsequently be unmounted.
163 (The fact that it moves all processes that have a root directory
164 or current working directory on the old root directory to the
165 new root frees the old root directory of users,
166 allowing the old root mount to be unmounted more easily.)
167 .PP
168 A typical use of
169 .BR pivot_root ()
170 is during system startup, when the
171 system mounts a temporary root filesystem (e.g., an \fBinitrd\fP), then
172 mounts the real root filesystem, and eventually turns the latter into
173 the current root of all relevant processes or threads.
174 A modern use is to set up a root filesystem during
175 the creation of a container.
176 .PP
177 The fact that
178 .BR pivot_root ()
179 modifies process root and current working directories in the
180 manner noted in DESCRIPTION
181 is necessary in order to prevent kernel threads from keeping the old
182 root directory busy with their root and current working directory,
183 even if they never access
184 the filesystem in any way.
185 .PP
186 The rootfs (initial ramfs) cannot be
187 .BR pivot_root ()ed.
188 The recommended method of changing the root filesystem in this case is
189 to delete everything in rootfs, overmount rootfs with the new root, attach
190 .IR stdin / stdout / stderr
191 to the new
192 .IR /dev/console ,
193 and exec the new
194 .BR init (1).
195 Helper programs for this process exist; see
196 .BR switch_root (8).
197 .\"
198 .SS pivot_root(\(dq.\(dq, \(dq.\(dq)
199 .PP
200 .I new_root
201 and
202 .I put_old
203 may be the same directory.
204 In particular, the following sequence allows a pivot-root operation
205 without needing to create and remove a temporary directory:
206 .PP
207 .in +4n
208 .EX
209 chdir(new_root);
210 pivot_root(".", ".");
211 umount2(".", MNT_DETACH);
212 .EE
213 .in
214 .PP
215 This sequence succeeds because the
216 .BR pivot_root ()
217 call stacks the old root mount point
218 on top of the new root mount point at
219 .IR / .
220 At that point, the calling process's root directory and current
221 working directory refer to the new root mount point
222 .RI ( new_root ).
223 During the subsequent
224 .BR umount ()
225 call, resolution of
226 .IR """."""
227 starts with
228 .I new_root
229 and then moves up the list of mounts stacked at
230 .IR / ,
231 with the result that old root mount point is unmounted.
232 .\"
233 .SS Historical notes
234 For many years, this manual page carried the following text:
235 .RS
236 .PP
237 .BR pivot_root ()
238 may or may not change the current root and the current
239 working directory of any processes or threads which use the old
240 root directory.
241 The caller of
242 .BR pivot_root ()
243 must ensure that processes with root or current working directory
244 at the old root operate correctly in either case.
245 An easy way to ensure this is to change their
246 root and current working directory to \fInew_root\fP before invoking
247 .BR pivot_root ().
248 .RE
249 .PP
250 This text, written before the system call implementation was
251 even finalized in the kernel, was probably intended to warn users
252 at that time that the implementation might change before final release.
253 However, the behavior stated in DESCRIPTION
254 has remained consistent since this system call
255 was first implemented and will not change now.
256 .SH EXAMPLE
257 .\" FIXME
258 .\" Would it be better, because simpler, to use unshare(2)
259 .\" rather than clone(2) in the example below?
260 .PP
261 The program below demonstrates the use of
262 .BR pivot_root ()
263 inside a mount namespace that is created using
264 .BR clone (2).
265 After pivoting to the root directory named in the program's
266 first command-line argument, the child created by
267 .BR clone (2)
268 then executes the program named in the remaining command-line arguments.
269 .PP
270 We demonstrate the program by creating a directory that will serve as
271 the new root filesystem and placing a copy of the (statically linked)
272 .BR busybox (1)
273 executable in that directory.
274 .PP
275 .in +4n
276 .EX
277 $ \fBmkdir /tmp/rootfs\fP
278 $ \fBls \-id /tmp/rootfs\fP # Show inode number of new root directory
279 319459 /tmp/rootfs
280 $ \fBcp $(which busybox) /tmp/rootfs\fP
281 $ \fBPS1='bbsh$ ' sudo ./pivot_root_demo /tmp/rootfs /busybox sh\fP
282 bbsh$ \fBPATH=/\fP
283 bbsh$ \fBbusybox ln busybox ln\fP
284 bbsh$ \fBln busybox echo\fP
285 bbsh$ \fBln busybox ls\fP
286 bbsh$ \fBls\fP
287 busybox echo ln ls
288 bbsh$ \fBls \-id /\fP # Compare with inode number above
289 319459 /
290 bbsh$ \fBecho \(aqhello world\(aq\fP
291 hello world
292 .EE
293 .in
294 .SS Program source
295 \&
296 .PP
297 .EX
298 /* pivot_root_demo.c */
299
300 #define _GNU_SOURCE
301 #include <sched.h>
302 #include <stdio.h>
303 #include <stdlib.h>
304 #include <unistd.h>
305 #include <sys/wait.h>
306 #include <sys/syscall.h>
307 #include <sys/mount.h>
308 #include <sys/stat.h>
309 #include <limits.h>
310
311 #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \e
312 } while (0)
313
314 static int
315 pivot_root(const char *new_root, const char *put_old)
316 {
317 return syscall(SYS_pivot_root, new_root, put_old);
318 }
319
320 #define STACK_SIZE (1024 * 1024)
321
322 static int /* Startup function for cloned child */
323 child(void *arg)
324 {
325 char **args = arg;
326 char *new_root = args[0];
327 const char *put_old = "/oldrootfs";
328 char path[PATH_MAX];
329
330 /* Ensure that \(aqnew_root\(aq and its parent mount don\(aqt have
331 shared propagation (which would cause pivot_root() to
332 return an error), and prevent propagation of mount
333 events to the initial mount namespace */
334
335 if (mount(NULL, "/", NULL, MS_REC | MS_PRIVATE, NULL) == 1)
336 errExit("mount\-MS_PRIVATE");
337
338 /* Ensure that \(aqnew_root\(aq is a mount point */
339
340 if (mount(new_root, new_root, NULL, MS_BIND, NULL) == \-1)
341 errExit("mount\-MS_BIND");
342
343 /* Create directory to which old root will be pivoted */
344
345 snprintf(path, sizeof(path), "%s/%s", new_root, put_old);
346 if (mkdir(path, 0777) == \-1)
347 errExit("mkdir");
348
349 /* And pivot the root filesystem */
350
351 if (pivot_root(new_root, path) == \-1)
352 errExit("pivot_root");
353
354 /* Switch the current working directory to "/" */
355
356 if (chdir("/") == \-1)
357 errExit("chdir");
358
359 /* Unmount old root and remove mount point */
360
361 if (umount2(put_old, MNT_DETACH) == \-1)
362 perror("umount2");
363 if (rmdir(put_old) == \-1)
364 perror("rmdir");
365
366 /* Execute the command specified in argv[1]... */
367
368 execv(args[1], &args[1]);
369 errExit("execv");
370 }
371
372 int
373 main(int argc, char *argv[])
374 {
375 /* Create a child process in a new mount namespace */
376
377 char *stack = malloc(STACK_SIZE);
378 if (stack == NULL)
379 errExit("malloc");
380
381 if (clone(child, stack + STACK_SIZE,
382 CLONE_NEWNS | SIGCHLD, &argv[1]) == \-1)
383 errExit("clone");
384
385 /* Parent falls through to here; wait for child */
386
387 if (wait(NULL) == \-1)
388 errExit("wait");
389
390 exit(EXIT_SUCCESS);
391 }
392 .EE
393 .SH SEE ALSO
394 .BR chdir (2),
395 .BR chroot (2),
396 .BR mount (2),
397 .BR stat (2),
398 .BR initrd (4),
399 .BR mount_namespaces (7),
400 .BR pivot_root (8),
401 .BR switch_root (8)