]>
Commit | Line | Data |
---|---|---|
ac5edfeb MR |
1 | .\" Copyright (c) 2021, IBM Corporation. |
2 | .\" Written by Mike Rapoport <rppt@linux.ibm.com> | |
3 | .\" | |
4 | .\" Based on memfd_create(2) man page | |
5 | .\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com> | |
6 | .\" and Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com> | |
7 | .\" | |
e4a74ca8 | 8 | .\" SPDX-License-Identifier: GPL-2.0-or-later |
ac5edfeb | 9 | .\" |
4c1c5274 | 10 | .TH memfd_secret 2 (date) "Linux man-pages (unreleased)" |
ac5edfeb MR |
11 | .SH NAME |
12 | memfd_secret \- create an anonymous RAM-based file | |
13 | to access secret memory regions | |
87ba034d AC |
14 | .SH LIBRARY |
15 | Standard C library | |
8fc3b2cf | 16 | .RI ( libc ", " \-lc ) |
ac5edfeb MR |
17 | .SH SYNOPSIS |
18 | .nf | |
c6d039a3 | 19 | .P |
ac5edfeb MR |
20 | .BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */" |
21 | .B #include <unistd.h> | |
c6d039a3 | 22 | .P |
ac5edfeb MR |
23 | .BI "int syscall(SYS_memfd_secret, unsigned int " flags ); |
24 | .fi | |
c6d039a3 | 25 | .P |
ac5edfeb MR |
26 | .IR Note : |
27 | glibc provides no wrapper for | |
28 | .BR memfd_secret (), | |
29 | necessitating the use of | |
30 | .BR syscall (2). | |
31 | .SH DESCRIPTION | |
32 | .BR memfd_secret () | |
eabb03a4 MK |
33 | creates an anonymous RAM-based file and returns a file descriptor |
34 | that refers to it. | |
ac5edfeb MR |
35 | The file provides a way to create and access memory regions |
36 | with stronger protection than usual RAM-based files and | |
37 | anonymous memory mappings. | |
38 | Once all open references to the file are closed, | |
39 | it is automatically released. | |
40 | The initial size of the file is set to 0. | |
41 | Following the call, the file size should be set using | |
42 | .BR ftruncate (2). | |
c6d039a3 | 43 | .P |
ac5edfeb | 44 | The memory areas backing the file created with |
84a2ce0f | 45 | .BR memfd_secret (2) |
ac5edfeb MR |
46 | are visible only to the processes that have access to the file descriptor. |
47 | The memory region is removed from the kernel page tables | |
48 | and only the page tables of the processes holding the file descriptor | |
49 | map the corresponding physical memory. | |
50 | (Thus, the pages in the region can't be accessed by the kernel itself, | |
51 | so that, for example, pointers to the region can't be passed to | |
52 | system calls.) | |
c6d039a3 | 53 | .P |
ac5edfeb MR |
54 | The following values may be bitwise ORed in |
55 | .I flags | |
56 | to control the behavior of | |
84a2ce0f | 57 | .BR memfd_secret (): |
ac5edfeb MR |
58 | .TP |
59 | .B FD_CLOEXEC | |
60 | Set the close-on-exec flag on the new file descriptor, | |
61 | which causes the region to be removed from the process on | |
62 | .BR execve (2). | |
63 | See the description of the | |
64 | .B O_CLOEXEC | |
65 | flag in | |
66 | .BR open (2) | |
c6d039a3 | 67 | .P |
ac5edfeb MR |
68 | As its return value, |
69 | .BR memfd_secret () | |
70 | returns a new file descriptor that refers to an anonymous file. | |
71 | This file descriptor is opened for both reading and writing | |
72 | .RB ( O_RDWR ) | |
73 | and | |
74 | .B O_LARGEFILE | |
75 | is set for the file descriptor. | |
c6d039a3 | 76 | .P |
ac5edfeb MR |
77 | With respect to |
78 | .BR fork (2) | |
79 | and | |
80 | .BR execve (2), | |
81 | the usual semantics apply for the file descriptor created by | |
82 | .BR memfd_secret (). | |
83 | A copy of the file descriptor is inherited by the child produced by | |
84 | .BR fork (2) | |
85 | and refers to the same file. | |
86 | The file descriptor is preserved across | |
87 | .BR execve (2), | |
88 | unless the close-on-exec flag has been set. | |
c6d039a3 | 89 | .P |
ac5edfeb MR |
90 | The memory region is locked into memory in the same way as with |
91 | .BR mlock (2), | |
881998d5 | 92 | so that it will never be written into swap, |
93 | and hibernation is inhibited for as long as any | |
94 | .BR memfd_secret () | |
95 | descriptions exist. | |
ac5edfeb | 96 | However the implementation of |
84a2ce0f | 97 | .BR memfd_secret () |
ac5edfeb MR |
98 | will not try to populate the whole range during the |
99 | .BR mmap (2) | |
100 | call that attaches the region into the process's address space; | |
101 | instead, the pages are only actually allocated | |
102 | as they are faulted in. | |
103 | The amount of memory allowed for memory mappings | |
104 | of the file descriptor obeys the same rules as | |
105 | .BR mlock (2) | |
106 | and cannot exceed | |
107 | .BR RLIMIT_MEMLOCK . | |
108 | .SH RETURN VALUE | |
109 | On success, | |
84a2ce0f | 110 | .BR memfd_secret () |
ac5edfeb MR |
111 | returns a new file descriptor. |
112 | On error, \-1 is returned and | |
113 | .I errno | |
114 | is set to indicate the error. | |
115 | .SH ERRORS | |
116 | .TP | |
117 | .B EINVAL | |
118 | .I flags | |
119 | included unknown bits. | |
120 | .TP | |
121 | .B EMFILE | |
122 | The per-process limit on the number of open file descriptors has been reached. | |
123 | .TP | |
124 | .B EMFILE | |
125 | The system-wide limit on the total number of open files has been reached. | |
126 | .TP | |
127 | .B ENOMEM | |
128 | There was insufficient memory to create a new anonymous file. | |
129 | .TP | |
130 | .B ENOSYS | |
131 | .BR memfd_secret () | |
2386c2f6 | 132 | is not implemented on this architecture, |
133 | or has not been enabled on the kernel command-line with | |
134 | .BR secretmem_enable =1. | |
3113c7f3 | 135 | .SH STANDARDS |
4131356c AC |
136 | Linux. |
137 | .SH HISTORY | |
138 | Linux 5.14. | |
afcea05d | 139 | .SH NOTES |
afcea05d MR |
140 | The |
141 | .BR memfd_secret () | |
142 | system call is designed to allow a user-space process | |
143 | to create a range of memory that is inaccessible to anybody else - | |
144 | kernel included. | |
145 | There is no 100% guarantee that kernel won't be able to access | |
146 | memory ranges backed by | |
147 | .BR memfd_secret () | |
148 | in any circumstances, but nevertheless, | |
149 | it is much harder to exfiltrate data from these regions. | |
c6d039a3 | 150 | .P |
afcea05d MR |
151 | .BR memfd_secret () |
152 | provides the following protections: | |
cdede5cd | 153 | .IP \[bu] 3 |
afcea05d MR |
154 | Enhanced protection |
155 | (in conjunction with all the other in-kernel attack prevention systems) | |
156 | against ROP attacks. | |
157 | Absence of any in-kernel primitive for accessing memory backed by | |
158 | .BR memfd_secret () | |
159 | means that one-gadget ROP attack | |
160 | can't work to perform data exfiltration. | |
161 | The attacker would need to find enough ROP gadgets | |
162 | to reconstruct the missing page table entries, | |
163 | which significantly increases difficulty of the attack, | |
164 | especially when other protections like the kernel stack size limit | |
165 | and address space layout randomization are in place. | |
cdede5cd | 166 | .IP \[bu] |
624faf01 | 167 | Prevent cross-process user-space memory exposures. |
afcea05d MR |
168 | Once a region for a |
169 | .BR memfd_secret () | |
170 | memory mapping is allocated, | |
171 | the user can't accidentally pass it into the kernel | |
172 | to be transmitted somewhere. | |
173 | The memory pages in this region cannot be accessed via the direct map | |
174 | and they are disallowed in get_user_pages. | |
cdede5cd | 175 | .IP \[bu] |
afcea05d MR |
176 | Harden against exploited kernel flaws. |
177 | In order to access memory areas backed by | |
1ae6b2c7 | 178 | .BR memfd_secret (), |
afcea05d MR |
179 | a kernel-side attack would need to |
180 | either walk the page tables and create new ones, | |
624faf01 | 181 | or spawn a new privileged user-space process to perform |
afcea05d MR |
182 | secrets exfiltration using |
183 | .BR ptrace (2). | |
c6d039a3 | 184 | .P |
afcea05d MR |
185 | The way |
186 | .BR memfd_secret () | |
187 | allocates and locks the memory may impact overall system performance, | |
188 | therefore the system call is disabled by default and only available | |
189 | if the system administrator turned it on using | |
190 | "secretmem.enable=y" kernel parameter. | |
c6d039a3 | 191 | .P |
7843f3ad | 192 | To prevent potential data leaks of memory regions backed by |
1ae6b2c7 | 193 | .BR memfd_secret () |
afcea05d MR |
194 | from a hybernation image, |
195 | hybernation is prevented when there are active | |
196 | .BR memfd_secret () | |
197 | users. | |
ac5edfeb MR |
198 | .SH SEE ALSO |
199 | .BR fcntl (2), | |
200 | .BR ftruncate (2), | |
201 | .BR mlock (2), | |
d5ee9f93 | 202 | .BR memfd_create (2), |
ac5edfeb MR |
203 | .BR mmap (2), |
204 | .BR setrlimit (2) |