1 .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" and Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com>
4 .\" %%%LICENSE_START(GPLv2+)
6 .\" This program is free software; you can redistribute it and/or modify
7 .\" it under the terms of the GNU General Public License as published by
8 .\" the Free Software Foundation; either version 2 of the License, or
9 .\" (at your option) any later version.
11 .\" This program is distributed in the hope that it will be useful,
12 .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
13 .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14 .\" GNU General Public License for more details.
16 .\" You should have received a copy of the GNU General Public
17 .\" License along with this manual; if not, see
18 .\" <http://www.gnu.org/licenses/>.
21 .TH MEMFD_CREATE 2 2017-09-15 Linux "Linux Programmer's Manual"
23 memfd_create \- create an anonymous file
25 .B #include <sys/memfd.h>
27 .BI "int memfd_create(const char *" name ", unsigned int " flags ");"
30 There is no glibc wrapper for this system call; see NOTES.
33 creates an anonymous file and returns a file descriptor that refers to it.
34 The file behaves like a regular file, and so can be modified,
35 truncated, memory-mapped, and so on.
36 However, unlike a regular file,
37 it lives in RAM and has a volatile backing storage.
38 Once all references to the file are dropped, it is automatically released.
39 Anonymous memory is used for all backing pages of the file.
40 Therefore, files created by
42 have the same semantics as other anonymous
44 .\" memfd uses VM_NORESERVE so each page is accounted on first access.
45 .\" This means, the overcommit-limits (see __vm_enough_memory()) and the
46 .\" memory-cgroup limits (mem_cgroup_try_charge()) are applied. Note that
47 .\" those are accounted on "current" and "current->mm", that is, the
48 .\" process doing the first page access.
49 memory allocations such as those allocated using
55 The initial size of the file is set to 0.
56 Following the call, the file size should be set using
58 (Alternatively, the file may be populated by calls to
64 is used as a filename and will be displayed
65 as the target of the corresponding symbolic link in the directory
67 The displayed name is always prefixed with
69 and serves only for debugging purposes.
70 Names do not affect the behavior of the file descriptor,
71 and as such multiple files can have the same name without any side effects.
73 The following values may be bitwise ORed in
75 to change the behavior of
81 flag on the new file descriptor.
82 See the description of the
86 for reasons why this may be useful.
89 Allow sealing operations on this file.
90 See the discussion of the
96 and also NOTES, below.
97 The initial set of seals is empty.
98 If this flag is not set, the initial set of seals will be
100 meaning that no other seals can be set on the file.
101 .\" FIXME Why is the MFD_ALLOW_SEALING behavior not simply the default?
102 .\" Is it worth adding some text explaining this?
104 .BR MFD_HUGETLB " (since Linux 4.14)"
105 The anonymous file will be created in the hugetlbfs filesystem using
106 huge pages. See the Linux kernel source file
107 .I Documentation/vm/hugetlbpage.txt
108 for more information about hugetlbfs. The hugetlbfs filesystem does
109 not support file sealing operations. Therefore, specifying both
113 will result in an error
118 .BR MFD_HUGE_2MB ", " MFD_HUGE_1GB ", " "..."
119 Used in conjunction with
121 to select alternative hugetlb page sizes (respectively, 2 MB, 1 GB, ...)
122 on systems that support multiple hugetlb page sizes. Definitions for known
123 huge page sizes are included in the header file
126 For details on encoding huge page sizes not included in the header file,
127 see the discussion of the similarly named constants in
137 returns a new file descriptor that can be used to refer to the file.
138 This file descriptor is opened for both reading and writing
142 is set for the file descriptor.
148 the usual semantics apply for the file descriptor created by
150 A copy of the file descriptor is inherited by the child produced by
152 and refers to the same file.
153 The file descriptor is preserved across
155 unless the close-on-exec flag has been set.
159 returns a new file descriptor.
160 On error, \-1 is returned and
162 is set to indicate the error.
168 points to invalid memory.
171 An unsupported value was specified in one of the arguments:
173 included unknown bits, or
178 The per-process limit on the number of open file descriptors has been reached.
181 The system-wide limit on the total number of open files has been reached.
184 There was insufficient memory to create a new anonymous file.
188 system call first appeared in Linux 3.17.
192 system call is Linux-specific.
194 Glibc does not provide a wrapper for this system call; call it using
197 .\" See also http://lwn.net/Articles/593918/
198 .\" and http://lwn.net/Articles/594919/ and http://lwn.net/Articles/591108/
201 system call provides a simple alternative to manually mounting a
203 filesystem and creating and opening a file in that filesystem.
204 The primary purpose of
206 is to create files and associated file descriptors that are
207 used with the file-sealing APIs provided by
212 system call also has uses without file sealing
213 (which is why file-sealing is disabled, unless explicitly requested with the
214 .BR MFD_ALLOW_SEALING
216 In particular, it can be used as an alternative to creating files in
218 or as an alternative to using the
221 in cases where there is no intention to actually link the
222 resulting file into the filesystem.
224 In the absence of file sealing,
225 processes that communicate via shared memory must either trust each other,
226 or take measures to deal with the possibility that an untrusted peer
227 may manipulate the shared memory region in problematic ways.
228 For example, an untrusted peer might modify the contents of the
229 shared memory at any time, or shrink the shared memory region.
230 The former possibility leaves the local process vulnerable to
231 time-of-check-to-time-of-use race conditions
232 (typically dealt with by copying data from
233 the shared memory region before checking and using it).
234 The latter possibility leaves the local process vulnerable to
236 signals when an attempt is made to access a now-nonexistent
237 location in the shared memory region.
238 (Dealing with this possibility necessitates the use of a handler for the
242 Dealing with untrusted peers imposes extra complexity on
243 code that employs shared memory.
244 Memory sealing enables that extra complexity to be eliminated,
245 by allowing a process to operate secure in the knowledge that
246 its peer can't modify the shared memory in an undesired fashion.
248 An example of the usage of the sealing mechanism is as follows:
250 The first process creates a
254 The call yields a file descriptor used in subsequent steps.
257 sizes the file created in the previous step using
261 and populates the shared memory with the desired data.
263 The first process uses the
266 operation to place one or more seals on the file,
267 in order to restrict further modifications on the file.
270 then it will be necessary to first unmap the shared writable mapping
271 created in the previous step.)
273 A second process obtains a file descriptor for the
276 Among the possible ways in which this could happen are the following:
279 The process that called
281 could transfer the resulting file descriptor to the second process
282 via a UNIX domain socket (see
286 The second process then maps the file using
289 The second process is created via
291 and thus automatically inherits the file descriptor and mapping.
292 (Note that in this case and the next,
293 there is a natural trust relationship between the two processes,
294 since they are running under the same user ID.
295 Therefore, file sealing would not normally be necessary.)
297 The second process opens the file
298 .IR /proc/<pid>/fd/<fd> ,
301 is the PID of the first process (the one that called
302 .BR memfd_create ()),
305 is the number of the file descriptor returned by the call to
308 The second process then maps the file using
312 The second process uses the
315 operation to retrieve the bit mask of seals
316 that has been applied to the file.
317 This bit mask can be inspected in order to determine
318 what kinds of restrictions have been placed on file modifications.
319 If desired, the second process can apply further seals
320 to impose additional restrictions (so long as the
322 seal has not yet been applied).
324 Below are shown two example programs that demonstrate the use of
326 and the file sealing API.
329 .IR t_memfd_create.c ,
334 sets a size for the file, maps it into memory,
335 and optionally places some seals on the file.
336 The program accepts up to three command-line arguments,
337 of which the first two are required.
338 The first argument is the name to associate with the file,
339 the second argument is the size to be set for the file,
340 and the optional third argument is a string of characters that specify
341 seals to be set on file.
345 can be used to open an existing file that was created via
347 and inspect the set of seals that have been applied to that file.
349 The following shell session demonstrates the use of these programs.
352 file and set some seals on it:
356 $ \fB./t_memfd_create my_memfd_file 4096 sw &\fP
358 PID: 11775; fd: 3; /proc/11775/fd/3
364 program continues to run in the background.
365 From another program, we can obtain a file descriptor for the
370 file that corresponds to the file descriptor opened by
372 Using that pathname, we inspect the content of the
374 symbolic link, and use our
376 program to view the seals that have been placed on the file:
380 $ \fBreadlink /proc/11775/fd/3\fP
381 /memfd:my_memfd_file (deleted)
382 $ \fB./t_get_seals /proc/11775/fd/3\fP
383 Existing seals: WRITE SHRINK
386 .SS Program source: t_memfd_create.c
389 #include <sys/memfd.h>
396 #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \\
400 main(int argc, char *argv[])
405 char *name, *seals_arg;
409 fprintf(stderr, "%s name size [seals]\\n", argv[0]);
410 fprintf(stderr, "\\t\(aqseals\(aq can contain any of the "
411 "following characters:\\n");
412 fprintf(stderr, "\\t\\tg \- F_SEAL_GROW\\n");
413 fprintf(stderr, "\\t\\ts \- F_SEAL_SHRINK\\n");
414 fprintf(stderr, "\\t\\tw \- F_SEAL_WRITE\\n");
415 fprintf(stderr, "\\t\\tS \- F_SEAL_SEAL\\n");
423 /* Create an anonymous file in tmpfs; allow seals to be
424 placed on the file */
426 fd = memfd_create(name, MFD_ALLOW_SEALING);
428 errExit("memfd_create");
430 /* Size the file as specified on the command line */
432 if (ftruncate(fd, len) == \-1)
435 printf("PID: %ld; fd: %d; /proc/%ld/fd/%d\\n",
436 (long) getpid(), fd, (long) getpid(), fd);
438 /* Code to map the file and populate the mapping with data
441 /* If a \(aqseals\(aq command\-line argument was supplied, set some
444 if (seals_arg != NULL) {
447 if (strchr(seals_arg, \(aqg\(aq) != NULL)
448 seals |= F_SEAL_GROW;
449 if (strchr(seals_arg, \(aqs\(aq) != NULL)
450 seals |= F_SEAL_SHRINK;
451 if (strchr(seals_arg, \(aqw\(aq) != NULL)
452 seals |= F_SEAL_WRITE;
453 if (strchr(seals_arg, \(aqS\(aq) != NULL)
454 seals |= F_SEAL_SEAL;
456 if (fcntl(fd, F_ADD_SEALS, seals) == \-1)
460 /* Keep running, so that the file created by memfd_create()
461 continues to exist */
468 .SS Program source: t_get_seals.c
471 #include <sys/memfd.h>
478 #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \\
482 main(int argc, char *argv[])
488 fprintf(stderr, "%s /proc/PID/fd/FD\\n", argv[0]);
492 fd = open(argv[1], O_RDWR);
496 seals = fcntl(fd, F_GET_SEALS);
500 printf("Existing seals:");
501 if (seals & F_SEAL_SEAL)
503 if (seals & F_SEAL_GROW)
505 if (seals & F_SEAL_WRITE)
507 if (seals & F_SEAL_SHRINK)
511 /* Code to map the file and access the contents of the
512 resulting mapping omitted */