1 .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
2 .\" and Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com>
4 .\" %%%LICENSE_START(GPLv2+)
6 .\" This program is free software; you can redistribute it and/or modify
7 .\" it under the terms of the GNU General Public License as published by
8 .\" the Free Software Foundation; either version 2 of the License, or
9 .\" (at your option) any later version.
11 .\" This program is distributed in the hope that it will be useful,
12 .\" but WITHOUT ANY WARRANTY; without even the implied warranty of
13 .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
14 .\" GNU General Public License for more details.
16 .\" You should have received a copy of the GNU General Public
17 .\" License along with this manual; if not, see
18 .\" <http://www.gnu.org/licenses/>.
21 .TH MEMFD_CREATE 2 2015-12-28 Linux "Linux Programmer's Manual"
23 memfd_create \- create an anonymous file
25 .B #include <sys/memfd.h>
27 .BI "int memfd_create(const char *" name ", unsigned int " flags ");"
30 creates an anonymous file and returns a file descriptor that refers to it.
31 The file behaves like a regular file, and so can be modified,
32 truncated, memory-mapped, and so on.
33 However, unlike a regular file,
34 it lives in RAM and has a volatile backing storage.
35 Once all references to the file are dropped, it is automatically released.
36 Anonymous memory is used for all backing pages of the file.
37 Therefore, files created by
39 have the same semantics as other anonymous
41 .\" memfd uses VM_NORESERVE so each page is accounted on first access.
42 .\" This means, the overcommit-limits (see __vm_enough_memory()) and the
43 .\" memory-cgroup limits (mem_cgroup_try_charge()) are applied. Note that
44 .\" those are accounted on "current" and "current->mm", that is, the
45 .\" process doing the first page access.
46 memory allocations such as those allocated using
52 The initial size of the file is set to 0.
53 Following the call, the file size should be set using
55 (Alternatively, the file may be populated by calls to
61 is used as a filename and will be displayed
62 as the target of the corresponding symbolic link in the directory
64 The displayed name is always prefixed with
66 and serves only for debugging purposes.
67 Names do not affect the behavior of the file descriptor,
68 and as such multiple files can have the same name without any side effects.
70 The following values may be bitwise ORed in
72 to change the behavior of
78 flag on the new file descriptor.
79 See the description of the
83 for reasons why this may be useful.
86 Allow sealing operations on this file.
87 See the discussion of the
93 and also NOTES, below.
94 The initial set of seals is empty.
95 If this flag is not set, the initial set of seals will be
97 meaning that no other seals can be set on the file.
98 .\" FIXME Why is the MFD_ALLOW_SEALING behavior not simply the default?
99 .\" Is it worth adding some text explaining this?
107 returns a new file descriptor that can be used to refer to the file.
108 This file descriptor is opened for both reading and writing
112 is set for the file descriptor.
118 the usual semantics apply for the file descriptor created by
120 A copy of the file descriptor is inherited by the child produced by
122 and refers to the same file.
123 The file descriptor is preserved across
125 unless the close-on-exec flag has been set.
129 returns a new file descriptor.
130 On error, \-1 is returned and
132 is set to indicate the error.
138 points to invalid memory.
141 An unsupported value was specified in one of the arguments:
143 included unknown bits, or
148 The per-process limit on the number of open file descriptors has been reached.
151 The system-wide limit on the total number of open files has been reached.
154 There was insufficient memory to create a new anonymous file.
158 system call first appeared in Linux 3.17.
159 .\" FIXME . When glibc support appears, update the following sentence:
160 Support in the GNU C library is pending.
164 system call is Linux-specific.
166 .\" See also http://lwn.net/Articles/593918/
167 .\" and http://lwn.net/Articles/594919/ and http://lwn.net/Articles/591108/
170 system call provides a simple alternative to manually mounting a
172 filesystem and creating and opening a file in that filesystem.
173 The primary purpose of
175 is to create files and associated file descriptors that are
176 used with the file-sealing APIs provided by
181 system call also has uses without file sealing
182 (which is why file-sealing is disabled, unless explicitly requested with the
183 .BR MFD_ALLOW_SEALING
185 In particular, it can be used as an alternative to creating files in
187 or as an alternative to using the
190 in cases where there is no intention to actually link the
191 resulting file into the filesystem.
193 In the absence of file sealing,
194 processes that communicate via shared memory must either trust each other,
195 or take measures to deal with the possibility that an untrusted peer
196 may manipulate the shared memory region in problematic ways.
197 For example, an untrusted peer might modify the contents of the
198 shared memory at any time, or shrink the shared memory region.
199 The former possibility leaves the local process vulnerable to
200 time-of-check-to-time-of-use race conditions
201 (typically dealt with by copying data from
202 the shared memory region before checking and using it).
203 The latter possibility leaves the local process vulnerable to
205 signals when an attempt is made to access a now-nonexistent
206 location in the shared memory region.
207 (Dealing with this possibility necessitates the use of a handler for the
211 Dealing with untrusted peers imposes extra complexity on
212 code that employs shared memory.
213 Memory sealing enables that extra complexity to be eliminated,
214 by allowing a process to operate secure in the knowledge that
215 its peer can't modify the shared memory in an undesired fashion.
217 An example of the usage of the sealing mechanism is as follows:
220 The first process creates a
224 The call yields a file descriptor used in subsequent steps.
227 sizes the file created in the previous step using
231 and populates the shared memory with the desired data.
233 The first process uses the
236 operation to place one or more seals on the file,
237 in order to restrict further modifications on the file.
240 then it will be necessary to first unmap the shared writable mapping
241 created in the previous step.)
243 A second process obtains a file descriptor for the
246 Among the possible ways in which this could happen are the following:
249 The process that called
251 could transfer the resulting file descriptor to the second process
252 via a UNIX domain socket (see
256 The second process then maps the file using
259 The second process is created via
261 and thus automatically inherits the file descriptor and mapping.
262 (Note that in this case and the next,
263 there is a natural trust relationship between the two processes,
264 since they are running under the same user ID.
265 Therefore, file sealing would not normally be necessary.)
267 The second process opens the file
268 .IR /proc/<pd>/fd/<fd> ,
271 is the PID of the first process (the one that called
272 .BR memfd_create ()),
275 is the number of the file descriptor returned by the call to
278 The second process then maps the file using
282 The second process uses the
285 operation to retrieve the bit mask of seals
286 that has been applied to the file.
287 This bit mask can be inspected in order to determine
288 what kinds of restrictions have been placed on file modifications.
289 If desired, the second process can apply further seals
290 to impose additional restrictions (so long as the
292 seal has not yet been applied).
294 Below are shown two example programs that demonstrate the use of
296 and the file sealing API.
299 .IR t_memfd_create.c ,
304 sets a size for the file, maps it into memory,
305 and optionally places some seals on the file.
306 The program accepts up to three command-line arguments,
307 of which the first two are required.
308 The first argument is the name to associate with the file,
309 the second argument is the size to be set for the file,
310 and the optional third is a string of characters that specify
311 seals to be set on file.
315 can be used to open an existing file that was created via
317 and inspect the set of seals that have been applied to that file.
319 The following shell session demonstrates the use of these programs.
322 file and set some seals on it:
326 $ \fB./t_memfd_create my_memfd_file 4096 sw &\fP
328 PID: 11775; fd: 3; /proc/11775/fd/3
334 program continues to run in the background.
335 From another program, we can obtain a file descriptor for the
340 file that corresponds to the file descriptor opened by
342 Using that pathname, we inspect the content of the
344 symbolic link, and use our
346 program to view the seals that have been placed on the file:
350 $ \fBreadlink /proc/11775/fd/3\fP
351 /memfd:my_memfd_file (deleted)
352 $ \fB./t_get_seals /proc/11775/fd/3\fP
353 Existing seals: WRITE SHRINK
356 .SS Program source: t_memfd_create.c
359 #include <sys/memfd.h>
366 #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \\
370 main(int argc, char *argv[])
375 char *name, *seals_arg;
379 fprintf(stderr, "%s name size [seals]\\n", argv[0]);
380 fprintf(stderr, "\\t\(aqseals\(aq can contain any of the "
381 "following characters:\\n");
382 fprintf(stderr, "\\t\\tg \- F_SEAL_GROW\\n");
383 fprintf(stderr, "\\t\\ts \- F_SEAL_SHRINK\\n");
384 fprintf(stderr, "\\t\\tw \- F_SEAL_WRITE\\n");
385 fprintf(stderr, "\\t\\tS \- F_SEAL_SEAL\\n");
393 /* Create an anonymous file in tmpfs; allow seals to be
394 placed on the file */
396 fd = memfd_create(name, MFD_ALLOW_SEALING);
398 errExit("memfd_create");
400 /* Size the file as specified on the command line */
402 if (ftruncate(fd, len) == \-1)
405 printf("PID: %ld; fd: %d; /proc/%ld/fd/%d\\n",
406 (long) getpid(), fd, (long) getpid(), fd);
408 /* Code to map the file and populate the mapping with data
411 /* If a \(aqseals\(aq command\-line argument was supplied, set some
414 if (seals_arg != NULL) {
417 if (strchr(seals_arg, \(aqg\(aq) != NULL)
418 seals |= F_SEAL_GROW;
419 if (strchr(seals_arg, \(aqs\(aq) != NULL)
420 seals |= F_SEAL_SHRINK;
421 if (strchr(seals_arg, \(aqw\(aq) != NULL)
422 seals |= F_SEAL_WRITE;
423 if (strchr(seals_arg, \(aqS\(aq) != NULL)
424 seals |= F_SEAL_SEAL;
426 if (fcntl(fd, F_ADD_SEALS, seals) == \-1)
430 /* Keep running, so that the file created by memfd_create()
431 continues to exist */
438 .SS Program source: t_get_seals.c
441 #include <sys/memfd.h>
448 #define errExit(msg) do { perror(msg); exit(EXIT_FAILURE); \\
452 main(int argc, char *argv[])
458 fprintf(stderr, "%s /proc/PID/fd/FD\\n", argv[0]);
462 fd = open(argv[1], O_RDWR);
466 seals = fcntl(fd, F_GET_SEALS);
470 printf("Existing seals:");
471 if (seals & F_SEAL_SEAL)
473 if (seals & F_SEAL_GROW)
475 if (seals & F_SEAL_WRITE)
477 if (seals & F_SEAL_SHRINK)
481 /* Code to map the file and access the contents of the
482 resulting mapping omitted */