]>
Commit | Line | Data |
---|---|---|
ac5edfeb MR |
1 | .\" Copyright (c) 2021, IBM Corporation. |
2 | .\" Written by Mike Rapoport <rppt@linux.ibm.com> | |
3 | .\" | |
4 | .\" Based on memfd_create(2) man page | |
5 | .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com> | |
6 | .\" and Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com> | |
7 | .\" | |
e4a74ca8 | 8 | .\" SPDX-License-Identifier: GPL-2.0-or-later |
ac5edfeb | 9 | .\" |
4c1c5274 | 10 | .TH memfd_secret 2 (date) "Linux man-pages (unreleased)" |
ac5edfeb MR |
11 | .SH NAME |
12 | memfd_secret \- create an anonymous RAM-based file | |
13 | to access secret memory regions | |
87ba034d AC |
14 | .SH LIBRARY |
15 | Standard C library | |
8fc3b2cf | 16 | .RI ( libc ", " \-lc ) |
ac5edfeb MR |
17 | .SH SYNOPSIS |
18 | .nf | |
19 | .PP | |
20 | .BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */" | |
21 | .B #include <unistd.h> | |
22 | .PP | |
23 | .BI "int syscall(SYS_memfd_secret, unsigned int " flags ); | |
24 | .fi | |
25 | .PP | |
26 | .IR Note : | |
27 | glibc provides no wrapper for | |
28 | .BR memfd_secret (), | |
29 | necessitating the use of | |
30 | .BR syscall (2). | |
31 | .SH DESCRIPTION | |
32 | .BR memfd_secret () | |
eabb03a4 MK |
33 | creates an anonymous RAM-based file and returns a file descriptor |
34 | that refers to it. | |
ac5edfeb MR |
35 | The file provides a way to create and access memory regions |
36 | with stronger protection than usual RAM-based files and | |
37 | anonymous memory mappings. | |
38 | Once all open references to the file are closed, | |
39 | it is automatically released. | |
40 | The initial size of the file is set to 0. | |
41 | Following the call, the file size should be set using | |
42 | .BR ftruncate (2). | |
43 | .PP | |
44 | The memory areas backing the file created with | |
84a2ce0f | 45 | .BR memfd_secret (2) |
ac5edfeb MR |
46 | are visible only to the processes that have access to the file descriptor. |
47 | The memory region is removed from the kernel page tables | |
48 | and only the page tables of the processes holding the file descriptor | |
49 | map the corresponding physical memory. | |
50 | (Thus, the pages in the region can't be accessed by the kernel itself, | |
51 | so that, for example, pointers to the region can't be passed to | |
52 | system calls.) | |
53 | .PP | |
54 | The following values may be bitwise ORed in | |
55 | .I flags | |
56 | to control the behavior of | |
84a2ce0f | 57 | .BR memfd_secret (): |
ac5edfeb MR |
58 | .TP |
59 | .B FD_CLOEXEC | |
60 | Set the close-on-exec flag on the new file descriptor, | |
61 | which causes the region to be removed from the process on | |
62 | .BR execve (2). | |
63 | See the description of the | |
64 | .B O_CLOEXEC | |
65 | flag in | |
66 | .BR open (2) | |
67 | .PP | |
68 | As its return value, | |
69 | .BR memfd_secret () | |
70 | returns a new file descriptor that refers to an anonymous file. | |
71 | This file descriptor is opened for both reading and writing | |
72 | .RB ( O_RDWR ) | |
73 | and | |
74 | .B O_LARGEFILE | |
75 | is set for the file descriptor. | |
76 | .PP | |
77 | With respect to | |
78 | .BR fork (2) | |
79 | and | |
80 | .BR execve (2), | |
81 | the usual semantics apply for the file descriptor created by | |
82 | .BR memfd_secret (). | |
83 | A copy of the file descriptor is inherited by the child produced by | |
84 | .BR fork (2) | |
85 | and refers to the same file. | |
86 | The file descriptor is preserved across | |
87 | .BR execve (2), | |
88 | unless the close-on-exec flag has been set. | |
89 | .PP | |
90 | The memory region is locked into memory in the same way as with | |
91 | .BR mlock (2), | |
881998d5 | 92 | so that it will never be written into swap, |
93 | and hibernation is inhibited for as long as any | |
94 | .BR memfd_secret () | |
95 | descriptions exist. | |
ac5edfeb | 96 | However the implementation of |
84a2ce0f | 97 | .BR memfd_secret () |
ac5edfeb MR |
98 | will not try to populate the whole range during the |
99 | .BR mmap (2) | |
100 | call that attaches the region into the process's address space; | |
101 | instead, the pages are only actually allocated | |
102 | as they are faulted in. | |
103 | The amount of memory allowed for memory mappings | |
104 | of the file descriptor obeys the same rules as | |
105 | .BR mlock (2) | |
106 | and cannot exceed | |
107 | .BR RLIMIT_MEMLOCK . | |
108 | .SH RETURN VALUE | |
109 | On success, | |
84a2ce0f | 110 | .BR memfd_secret () |
ac5edfeb MR |
111 | returns a new file descriptor. |
112 | On error, \-1 is returned and | |
113 | .I errno | |
114 | is set to indicate the error. | |
115 | .SH ERRORS | |
116 | .TP | |
117 | .B EINVAL | |
118 | .I flags | |
119 | included unknown bits. | |
120 | .TP | |
121 | .B EMFILE | |
122 | The per-process limit on the number of open file descriptors has been reached. | |
123 | .TP | |
124 | .B EMFILE | |
125 | The system-wide limit on the total number of open files has been reached. | |
126 | .TP | |
127 | .B ENOMEM | |
128 | There was insufficient memory to create a new anonymous file. | |
129 | .TP | |
130 | .B ENOSYS | |
131 | .BR memfd_secret () | |
2386c2f6 | 132 | is not implemented on this architecture, |
133 | or has not been enabled on the kernel command-line with | |
134 | .BR secretmem_enable =1. | |
ac5edfeb MR |
135 | .SH VERSIONS |
136 | The | |
84a2ce0f | 137 | .BR memfd_secret () |
ac5edfeb | 138 | system call first appeared in Linux 5.14. |
3113c7f3 | 139 | .SH STANDARDS |
ac5edfeb | 140 | The |
84a2ce0f | 141 | .BR memfd_secret () |
ac5edfeb | 142 | system call is Linux-specific. |
afcea05d | 143 | .SH NOTES |
afcea05d MR |
144 | The |
145 | .BR memfd_secret () | |
146 | system call is designed to allow a user-space process | |
147 | to create a range of memory that is inaccessible to anybody else - | |
148 | kernel included. | |
149 | There is no 100% guarantee that kernel won't be able to access | |
150 | memory ranges backed by | |
151 | .BR memfd_secret () | |
152 | in any circumstances, but nevertheless, | |
153 | it is much harder to exfiltrate data from these regions. | |
154 | .PP | |
afcea05d MR |
155 | .BR memfd_secret () |
156 | provides the following protections: | |
157 | .IP \(bu 3 | |
158 | Enhanced protection | |
159 | (in conjunction with all the other in-kernel attack prevention systems) | |
160 | against ROP attacks. | |
161 | Absence of any in-kernel primitive for accessing memory backed by | |
162 | .BR memfd_secret () | |
163 | means that one-gadget ROP attack | |
164 | can't work to perform data exfiltration. | |
165 | The attacker would need to find enough ROP gadgets | |
166 | to reconstruct the missing page table entries, | |
167 | which significantly increases difficulty of the attack, | |
168 | especially when other protections like the kernel stack size limit | |
169 | and address space layout randomization are in place. | |
170 | .IP \(bu | |
624faf01 | 171 | Prevent cross-process user-space memory exposures. |
afcea05d MR |
172 | Once a region for a |
173 | .BR memfd_secret () | |
174 | memory mapping is allocated, | |
175 | the user can't accidentally pass it into the kernel | |
176 | to be transmitted somewhere. | |
177 | The memory pages in this region cannot be accessed via the direct map | |
178 | and they are disallowed in get_user_pages. | |
179 | .IP \(bu | |
180 | Harden against exploited kernel flaws. | |
181 | In order to access memory areas backed by | |
1ae6b2c7 | 182 | .BR memfd_secret (), |
afcea05d MR |
183 | a kernel-side attack would need to |
184 | either walk the page tables and create new ones, | |
624faf01 | 185 | or spawn a new privileged user-space process to perform |
afcea05d MR |
186 | secrets exfiltration using |
187 | .BR ptrace (2). | |
188 | .PP | |
189 | The way | |
190 | .BR memfd_secret () | |
191 | allocates and locks the memory may impact overall system performance, | |
192 | therefore the system call is disabled by default and only available | |
193 | if the system administrator turned it on using | |
194 | "secretmem.enable=y" kernel parameter. | |
195 | .PP | |
7843f3ad | 196 | To prevent potential data leaks of memory regions backed by |
1ae6b2c7 | 197 | .BR memfd_secret () |
afcea05d MR |
198 | from a hybernation image, |
199 | hybernation is prevented when there are active | |
200 | .BR memfd_secret () | |
201 | users. | |
ac5edfeb MR |
202 | .SH SEE ALSO |
203 | .BR fcntl (2), | |
204 | .BR ftruncate (2), | |
205 | .BR mlock (2), | |
d5ee9f93 | 206 | .BR memfd_create (2), |
ac5edfeb MR |
207 | .BR mmap (2), |
208 | .BR setrlimit (2) |