]>
Commit | Line | Data |
---|---|---|
ac5edfeb MR |
1 | .\" Copyright (c) 2021, IBM Corporation. |
2 | .\" Written by Mike Rapoport <rppt@linux.ibm.com> | |
3 | .\" | |
4 | .\" Based on memfd_create(2) man page | |
5 | .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com> | |
6 | .\" and Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com> | |
7 | .\" | |
8 | .\" %%%LICENSE_START(GPLv2+) | |
9 | .\" | |
10 | .\" This program is free software; you can redistribute it and/or modify | |
11 | .\" it under the terms of the GNU General Public License as published by | |
12 | .\" the Free Software Foundation; either version 2 of the License, or | |
13 | .\" (at your option) any later version. | |
14 | .\" | |
15 | .\" This program is distributed in the hope that it will be useful, | |
16 | .\" but WITHOUT ANY WARRANTY; without even the implied warranty of | |
17 | .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
18 | .\" GNU General Public License for more details. | |
19 | .\" | |
20 | .\" You should have received a copy of the GNU General Public | |
21 | .\" License along with this manual; if not, see | |
22 | .\" <http://www.gnu.org/licenses/>. | |
23 | .\" %%%LICENSE_END | |
24 | .\" | |
25 | .TH MEMFD_SECRET 2 2020-08-02 Linux "Linux Programmer's Manual" | |
26 | .SH NAME | |
27 | memfd_secret \- create an anonymous RAM-based file | |
28 | to access secret memory regions | |
29 | .SH SYNOPSIS | |
30 | .nf | |
31 | .PP | |
32 | .BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */" | |
33 | .B #include <unistd.h> | |
34 | .PP | |
35 | .BI "int syscall(SYS_memfd_secret, unsigned int " flags ); | |
36 | .fi | |
37 | .PP | |
38 | .IR Note : | |
39 | glibc provides no wrapper for | |
40 | .BR memfd_secret (), | |
41 | necessitating the use of | |
42 | .BR syscall (2). | |
43 | .SH DESCRIPTION | |
44 | .BR memfd_secret () | |
eabb03a4 MK |
45 | creates an anonymous RAM-based file and returns a file descriptor |
46 | that refers to it. | |
ac5edfeb MR |
47 | The file provides a way to create and access memory regions |
48 | with stronger protection than usual RAM-based files and | |
49 | anonymous memory mappings. | |
50 | Once all open references to the file are closed, | |
51 | it is automatically released. | |
52 | The initial size of the file is set to 0. | |
53 | Following the call, the file size should be set using | |
54 | .BR ftruncate (2). | |
55 | .PP | |
56 | The memory areas backing the file created with | |
84a2ce0f | 57 | .BR memfd_secret (2) |
ac5edfeb MR |
58 | are visible only to the processes that have access to the file descriptor. |
59 | The memory region is removed from the kernel page tables | |
60 | and only the page tables of the processes holding the file descriptor | |
61 | map the corresponding physical memory. | |
62 | (Thus, the pages in the region can't be accessed by the kernel itself, | |
63 | so that, for example, pointers to the region can't be passed to | |
64 | system calls.) | |
65 | .PP | |
66 | The following values may be bitwise ORed in | |
67 | .I flags | |
68 | to control the behavior of | |
84a2ce0f | 69 | .BR memfd_secret (): |
ac5edfeb MR |
70 | .TP |
71 | .B FD_CLOEXEC | |
72 | Set the close-on-exec flag on the new file descriptor, | |
73 | which causes the region to be removed from the process on | |
74 | .BR execve (2). | |
75 | See the description of the | |
76 | .B O_CLOEXEC | |
77 | flag in | |
78 | .BR open (2) | |
79 | .PP | |
80 | As its return value, | |
81 | .BR memfd_secret () | |
82 | returns a new file descriptor that refers to an anonymous file. | |
83 | This file descriptor is opened for both reading and writing | |
84 | .RB ( O_RDWR ) | |
85 | and | |
86 | .B O_LARGEFILE | |
87 | is set for the file descriptor. | |
88 | .PP | |
89 | With respect to | |
90 | .BR fork (2) | |
91 | and | |
92 | .BR execve (2), | |
93 | the usual semantics apply for the file descriptor created by | |
94 | .BR memfd_secret (). | |
95 | A copy of the file descriptor is inherited by the child produced by | |
96 | .BR fork (2) | |
97 | and refers to the same file. | |
98 | The file descriptor is preserved across | |
99 | .BR execve (2), | |
100 | unless the close-on-exec flag has been set. | |
101 | .PP | |
102 | The memory region is locked into memory in the same way as with | |
103 | .BR mlock (2), | |
104 | so that it will never be written into swap. | |
105 | However the implementation of | |
84a2ce0f | 106 | .BR memfd_secret () |
ac5edfeb MR |
107 | will not try to populate the whole range during the |
108 | .BR mmap (2) | |
109 | call that attaches the region into the process's address space; | |
110 | instead, the pages are only actually allocated | |
111 | as they are faulted in. | |
112 | The amount of memory allowed for memory mappings | |
113 | of the file descriptor obeys the same rules as | |
114 | .BR mlock (2) | |
115 | and cannot exceed | |
116 | .BR RLIMIT_MEMLOCK . | |
117 | .SH RETURN VALUE | |
118 | On success, | |
84a2ce0f | 119 | .BR memfd_secret () |
ac5edfeb MR |
120 | returns a new file descriptor. |
121 | On error, \-1 is returned and | |
122 | .I errno | |
123 | is set to indicate the error. | |
124 | .SH ERRORS | |
125 | .TP | |
126 | .B EINVAL | |
127 | .I flags | |
128 | included unknown bits. | |
129 | .TP | |
130 | .B EMFILE | |
131 | The per-process limit on the number of open file descriptors has been reached. | |
132 | .TP | |
133 | .B EMFILE | |
134 | The system-wide limit on the total number of open files has been reached. | |
135 | .TP | |
136 | .B ENOMEM | |
137 | There was insufficient memory to create a new anonymous file. | |
138 | .TP | |
139 | .B ENOSYS | |
140 | .BR memfd_secret () | |
141 | is not implemented on this architecture. | |
142 | .SH VERSIONS | |
143 | The | |
84a2ce0f | 144 | .BR memfd_secret () |
ac5edfeb MR |
145 | system call first appeared in Linux 5.14. |
146 | .SH CONFORMING TO | |
147 | The | |
84a2ce0f | 148 | .BR memfd_secret () |
ac5edfeb | 149 | system call is Linux-specific. |
afcea05d MR |
150 | .SH NOTES |
151 | .PP | |
152 | The | |
153 | .BR memfd_secret () | |
154 | system call is designed to allow a user-space process | |
155 | to create a range of memory that is inaccessible to anybody else - | |
156 | kernel included. | |
157 | There is no 100% guarantee that kernel won't be able to access | |
158 | memory ranges backed by | |
159 | .BR memfd_secret () | |
160 | in any circumstances, but nevertheless, | |
161 | it is much harder to exfiltrate data from these regions. | |
162 | .PP | |
163 | The | |
164 | .BR memfd_secret () | |
165 | provides the following protections: | |
166 | .IP \(bu 3 | |
167 | Enhanced protection | |
168 | (in conjunction with all the other in-kernel attack prevention systems) | |
169 | against ROP attacks. | |
170 | Absence of any in-kernel primitive for accessing memory backed by | |
171 | .BR memfd_secret () | |
172 | means that one-gadget ROP attack | |
173 | can't work to perform data exfiltration. | |
174 | The attacker would need to find enough ROP gadgets | |
175 | to reconstruct the missing page table entries, | |
176 | which significantly increases difficulty of the attack, | |
177 | especially when other protections like the kernel stack size limit | |
178 | and address space layout randomization are in place. | |
179 | .IP \(bu | |
180 | Prevent cross-process userspace memory exposures. | |
181 | Once a region for a | |
182 | .BR memfd_secret () | |
183 | memory mapping is allocated, | |
184 | the user can't accidentally pass it into the kernel | |
185 | to be transmitted somewhere. | |
186 | The memory pages in this region cannot be accessed via the direct map | |
187 | and they are disallowed in get_user_pages. | |
188 | .IP \(bu | |
189 | Harden against exploited kernel flaws. | |
190 | In order to access memory areas backed by | |
191 | .BR memfd_secret(), | |
192 | a kernel-side attack would need to | |
193 | either walk the page tables and create new ones, | |
194 | or spawn a new privileged userspace process to perform | |
195 | secrets exfiltration using | |
196 | .BR ptrace (2). | |
197 | .PP | |
198 | The way | |
199 | .BR memfd_secret () | |
200 | allocates and locks the memory may impact overall system performance, | |
201 | therefore the system call is disabled by default and only available | |
202 | if the system administrator turned it on using | |
203 | "secretmem.enable=y" kernel parameter. | |
204 | .PP | |
205 | To prevent potiential data leaks of memory regions backed by | |
206 | .BR memfd_secret() | |
207 | from a hybernation image, | |
208 | hybernation is prevented when there are active | |
209 | .BR memfd_secret () | |
210 | users. | |
ac5edfeb MR |
211 | .SH SEE ALSO |
212 | .BR fcntl (2), | |
213 | .BR ftruncate (2), | |
214 | .BR mlock (2), | |
d5ee9f93 | 215 | .BR memfd_create (2), |
ac5edfeb MR |
216 | .BR mmap (2), |
217 | .BR setrlimit (2) |