]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man2/memfd_secret.2
getsid.2: deduplicate getsid(0) case
[thirdparty/man-pages.git] / man2 / memfd_secret.2
CommitLineData
ac5edfeb
MR
1.\" Copyright (c) 2021, IBM Corporation.
2.\" Written by Mike Rapoport <rppt@linux.ibm.com>
3.\"
4.\" Based on memfd_create(2) man page
5.\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
6.\" and Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com>
7.\"
e4a74ca8 8.\" SPDX-License-Identifier: GPL-2.0-or-later
ac5edfeb
MR
9.\"
10.TH MEMFD_SECRET 2 2020-08-02 Linux "Linux Programmer's Manual"
11.SH NAME
12memfd_secret \- create an anonymous RAM-based file
13to access secret memory regions
87ba034d
AC
14.SH LIBRARY
15Standard C library
8fc3b2cf 16.RI ( libc ", " \-lc )
ac5edfeb
MR
17.SH SYNOPSIS
18.nf
19.PP
20.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
21.B #include <unistd.h>
22.PP
23.BI "int syscall(SYS_memfd_secret, unsigned int " flags );
24.fi
25.PP
26.IR Note :
27glibc provides no wrapper for
28.BR memfd_secret (),
29necessitating the use of
30.BR syscall (2).
31.SH DESCRIPTION
32.BR memfd_secret ()
eabb03a4
MK
33creates an anonymous RAM-based file and returns a file descriptor
34that refers to it.
ac5edfeb
MR
35The file provides a way to create and access memory regions
36with stronger protection than usual RAM-based files and
37anonymous memory mappings.
38Once all open references to the file are closed,
39it is automatically released.
40The initial size of the file is set to 0.
41Following the call, the file size should be set using
42.BR ftruncate (2).
43.PP
44The memory areas backing the file created with
84a2ce0f 45.BR memfd_secret (2)
ac5edfeb
MR
46are visible only to the processes that have access to the file descriptor.
47The memory region is removed from the kernel page tables
48and only the page tables of the processes holding the file descriptor
49map the corresponding physical memory.
50(Thus, the pages in the region can't be accessed by the kernel itself,
51so that, for example, pointers to the region can't be passed to
52system calls.)
53.PP
54The following values may be bitwise ORed in
55.I flags
56to control the behavior of
84a2ce0f 57.BR memfd_secret ():
ac5edfeb
MR
58.TP
59.B FD_CLOEXEC
60Set the close-on-exec flag on the new file descriptor,
61which causes the region to be removed from the process on
62.BR execve (2).
63See the description of the
64.B O_CLOEXEC
65flag in
66.BR open (2)
67.PP
68As its return value,
69.BR memfd_secret ()
70returns a new file descriptor that refers to an anonymous file.
71This file descriptor is opened for both reading and writing
72.RB ( O_RDWR )
73and
74.B O_LARGEFILE
75is set for the file descriptor.
76.PP
77With respect to
78.BR fork (2)
79and
80.BR execve (2),
81the usual semantics apply for the file descriptor created by
82.BR memfd_secret ().
83A copy of the file descriptor is inherited by the child produced by
84.BR fork (2)
85and refers to the same file.
86The file descriptor is preserved across
87.BR execve (2),
88unless the close-on-exec flag has been set.
89.PP
90The memory region is locked into memory in the same way as with
91.BR mlock (2),
881998d5 92so that it will never be written into swap,
93and hibernation is inhibited for as long as any
94.BR memfd_secret ()
95descriptions exist.
ac5edfeb 96However the implementation of
84a2ce0f 97.BR memfd_secret ()
ac5edfeb
MR
98will not try to populate the whole range during the
99.BR mmap (2)
100call that attaches the region into the process's address space;
101instead, the pages are only actually allocated
102as they are faulted in.
103The amount of memory allowed for memory mappings
104of the file descriptor obeys the same rules as
105.BR mlock (2)
106and cannot exceed
107.BR RLIMIT_MEMLOCK .
108.SH RETURN VALUE
109On success,
84a2ce0f 110.BR memfd_secret ()
ac5edfeb
MR
111returns a new file descriptor.
112On error, \-1 is returned and
113.I errno
114is set to indicate the error.
115.SH ERRORS
116.TP
117.B EINVAL
118.I flags
119included unknown bits.
120.TP
121.B EMFILE
122The per-process limit on the number of open file descriptors has been reached.
123.TP
124.B EMFILE
125The system-wide limit on the total number of open files has been reached.
126.TP
127.B ENOMEM
128There was insufficient memory to create a new anonymous file.
129.TP
130.B ENOSYS
131.BR memfd_secret ()
2386c2f6 132is not implemented on this architecture,
133or has not been enabled on the kernel command-line with
134.BR secretmem_enable =1.
ac5edfeb
MR
135.SH VERSIONS
136The
84a2ce0f 137.BR memfd_secret ()
ac5edfeb
MR
138system call first appeared in Linux 5.14.
139.SH CONFORMING TO
140The
84a2ce0f 141.BR memfd_secret ()
ac5edfeb 142system call is Linux-specific.
afcea05d 143.SH NOTES
afcea05d
MR
144The
145.BR memfd_secret ()
146system call is designed to allow a user-space process
147to create a range of memory that is inaccessible to anybody else -
148kernel included.
149There is no 100% guarantee that kernel won't be able to access
150memory ranges backed by
151.BR memfd_secret ()
152in any circumstances, but nevertheless,
153it is much harder to exfiltrate data from these regions.
154.PP
afcea05d
MR
155.BR memfd_secret ()
156provides the following protections:
157.IP \(bu 3
158Enhanced protection
159(in conjunction with all the other in-kernel attack prevention systems)
160against ROP attacks.
161Absence of any in-kernel primitive for accessing memory backed by
162.BR memfd_secret ()
163means that one-gadget ROP attack
164can't work to perform data exfiltration.
165The attacker would need to find enough ROP gadgets
166to reconstruct the missing page table entries,
167which significantly increases difficulty of the attack,
168especially when other protections like the kernel stack size limit
169and address space layout randomization are in place.
170.IP \(bu
624faf01 171Prevent cross-process user-space memory exposures.
afcea05d
MR
172Once a region for a
173.BR memfd_secret ()
174memory mapping is allocated,
175the user can't accidentally pass it into the kernel
176to be transmitted somewhere.
177The memory pages in this region cannot be accessed via the direct map
178and they are disallowed in get_user_pages.
179.IP \(bu
180Harden against exploited kernel flaws.
181In order to access memory areas backed by
1ae6b2c7 182.BR memfd_secret (),
afcea05d
MR
183a kernel-side attack would need to
184either walk the page tables and create new ones,
624faf01 185or spawn a new privileged user-space process to perform
afcea05d
MR
186secrets exfiltration using
187.BR ptrace (2).
188.PP
189The way
190.BR memfd_secret ()
191allocates and locks the memory may impact overall system performance,
192therefore the system call is disabled by default and only available
193if the system administrator turned it on using
194"secretmem.enable=y" kernel parameter.
195.PP
196To prevent potiential data leaks of memory regions backed by
1ae6b2c7 197.BR memfd_secret ()
afcea05d
MR
198from a hybernation image,
199hybernation is prevented when there are active
200.BR memfd_secret ()
201users.
ac5edfeb
MR
202.SH SEE ALSO
203.BR fcntl (2),
204.BR ftruncate (2),
205.BR mlock (2),
d5ee9f93 206.BR memfd_create (2),
ac5edfeb
MR
207.BR mmap (2),
208.BR setrlimit (2)