]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man/man2/memfd_secret.2
man/, share/mk/: Move man*/ to man/
[thirdparty/man-pages.git] / man / man2 / memfd_secret.2
CommitLineData
ac5edfeb
MR
1.\" Copyright (c) 2021, IBM Corporation.
2.\" Written by Mike Rapoport <rppt@linux.ibm.com>
3.\"
4.\" Based on memfd_create(2) man page
5.\" Copyright (C) 2014 Michael Kerrisk <mtk.manpages@gmail.com>
6.\" and Copyright (C) 2014 David Herrmann <dh.herrmann@gmail.com>
7.\"
e4a74ca8 8.\" SPDX-License-Identifier: GPL-2.0-or-later
ac5edfeb 9.\"
4c1c5274 10.TH memfd_secret 2 (date) "Linux man-pages (unreleased)"
ac5edfeb
MR
11.SH NAME
12memfd_secret \- create an anonymous RAM-based file
13to access secret memory regions
87ba034d
AC
14.SH LIBRARY
15Standard C library
8fc3b2cf 16.RI ( libc ", " \-lc )
ac5edfeb
MR
17.SH SYNOPSIS
18.nf
c6d039a3 19.P
ac5edfeb
MR
20.BR "#include <sys/syscall.h>" " /* Definition of " SYS_* " constants */"
21.B #include <unistd.h>
c6d039a3 22.P
ac5edfeb
MR
23.BI "int syscall(SYS_memfd_secret, unsigned int " flags );
24.fi
c6d039a3 25.P
ac5edfeb
MR
26.IR Note :
27glibc provides no wrapper for
28.BR memfd_secret (),
29necessitating the use of
30.BR syscall (2).
31.SH DESCRIPTION
32.BR memfd_secret ()
eabb03a4
MK
33creates an anonymous RAM-based file and returns a file descriptor
34that refers to it.
ac5edfeb
MR
35The file provides a way to create and access memory regions
36with stronger protection than usual RAM-based files and
37anonymous memory mappings.
38Once all open references to the file are closed,
39it is automatically released.
40The initial size of the file is set to 0.
41Following the call, the file size should be set using
42.BR ftruncate (2).
c6d039a3 43.P
ac5edfeb 44The memory areas backing the file created with
84a2ce0f 45.BR memfd_secret (2)
ac5edfeb
MR
46are visible only to the processes that have access to the file descriptor.
47The memory region is removed from the kernel page tables
48and only the page tables of the processes holding the file descriptor
49map the corresponding physical memory.
50(Thus, the pages in the region can't be accessed by the kernel itself,
51so that, for example, pointers to the region can't be passed to
52system calls.)
c6d039a3 53.P
ac5edfeb
MR
54The following values may be bitwise ORed in
55.I flags
56to control the behavior of
84a2ce0f 57.BR memfd_secret ():
ac5edfeb
MR
58.TP
59.B FD_CLOEXEC
60Set the close-on-exec flag on the new file descriptor,
61which causes the region to be removed from the process on
62.BR execve (2).
63See the description of the
64.B O_CLOEXEC
65flag in
66.BR open (2)
c6d039a3 67.P
ac5edfeb
MR
68As its return value,
69.BR memfd_secret ()
70returns a new file descriptor that refers to an anonymous file.
71This file descriptor is opened for both reading and writing
72.RB ( O_RDWR )
73and
74.B O_LARGEFILE
75is set for the file descriptor.
c6d039a3 76.P
ac5edfeb
MR
77With respect to
78.BR fork (2)
79and
80.BR execve (2),
81the usual semantics apply for the file descriptor created by
82.BR memfd_secret ().
83A copy of the file descriptor is inherited by the child produced by
84.BR fork (2)
85and refers to the same file.
86The file descriptor is preserved across
87.BR execve (2),
88unless the close-on-exec flag has been set.
c6d039a3 89.P
ac5edfeb
MR
90The memory region is locked into memory in the same way as with
91.BR mlock (2),
881998d5 92so that it will never be written into swap,
93and hibernation is inhibited for as long as any
94.BR memfd_secret ()
95descriptions exist.
ac5edfeb 96However the implementation of
84a2ce0f 97.BR memfd_secret ()
ac5edfeb
MR
98will not try to populate the whole range during the
99.BR mmap (2)
100call that attaches the region into the process's address space;
101instead, the pages are only actually allocated
102as they are faulted in.
103The amount of memory allowed for memory mappings
104of the file descriptor obeys the same rules as
105.BR mlock (2)
106and cannot exceed
107.BR RLIMIT_MEMLOCK .
108.SH RETURN VALUE
109On success,
84a2ce0f 110.BR memfd_secret ()
ac5edfeb
MR
111returns a new file descriptor.
112On error, \-1 is returned and
113.I errno
114is set to indicate the error.
115.SH ERRORS
116.TP
117.B EINVAL
118.I flags
119included unknown bits.
120.TP
121.B EMFILE
122The per-process limit on the number of open file descriptors has been reached.
123.TP
124.B EMFILE
125The system-wide limit on the total number of open files has been reached.
126.TP
127.B ENOMEM
128There was insufficient memory to create a new anonymous file.
129.TP
130.B ENOSYS
131.BR memfd_secret ()
2386c2f6 132is not implemented on this architecture,
133or has not been enabled on the kernel command-line with
134.BR secretmem_enable =1.
3113c7f3 135.SH STANDARDS
4131356c
AC
136Linux.
137.SH HISTORY
138Linux 5.14.
afcea05d 139.SH NOTES
afcea05d
MR
140The
141.BR memfd_secret ()
142system call is designed to allow a user-space process
143to create a range of memory that is inaccessible to anybody else -
144kernel included.
145There is no 100% guarantee that kernel won't be able to access
146memory ranges backed by
147.BR memfd_secret ()
148in any circumstances, but nevertheless,
149it is much harder to exfiltrate data from these regions.
c6d039a3 150.P
afcea05d
MR
151.BR memfd_secret ()
152provides the following protections:
cdede5cd 153.IP \[bu] 3
afcea05d
MR
154Enhanced protection
155(in conjunction with all the other in-kernel attack prevention systems)
156against ROP attacks.
157Absence of any in-kernel primitive for accessing memory backed by
158.BR memfd_secret ()
159means that one-gadget ROP attack
160can't work to perform data exfiltration.
161The attacker would need to find enough ROP gadgets
162to reconstruct the missing page table entries,
163which significantly increases difficulty of the attack,
164especially when other protections like the kernel stack size limit
165and address space layout randomization are in place.
cdede5cd 166.IP \[bu]
624faf01 167Prevent cross-process user-space memory exposures.
afcea05d
MR
168Once a region for a
169.BR memfd_secret ()
170memory mapping is allocated,
171the user can't accidentally pass it into the kernel
172to be transmitted somewhere.
173The memory pages in this region cannot be accessed via the direct map
174and they are disallowed in get_user_pages.
cdede5cd 175.IP \[bu]
afcea05d
MR
176Harden against exploited kernel flaws.
177In order to access memory areas backed by
1ae6b2c7 178.BR memfd_secret (),
afcea05d
MR
179a kernel-side attack would need to
180either walk the page tables and create new ones,
624faf01 181or spawn a new privileged user-space process to perform
afcea05d
MR
182secrets exfiltration using
183.BR ptrace (2).
c6d039a3 184.P
afcea05d
MR
185The way
186.BR memfd_secret ()
187allocates and locks the memory may impact overall system performance,
188therefore the system call is disabled by default and only available
189if the system administrator turned it on using
190"secretmem.enable=y" kernel parameter.
c6d039a3 191.P
7843f3ad 192To prevent potential data leaks of memory regions backed by
1ae6b2c7 193.BR memfd_secret ()
afcea05d
MR
194from a hybernation image,
195hybernation is prevented when there are active
196.BR memfd_secret ()
197users.
ac5edfeb
MR
198.SH SEE ALSO
199.BR fcntl (2),
200.BR ftruncate (2),
201.BR mlock (2),
d5ee9f93 202.BR memfd_create (2),
ac5edfeb
MR
203.BR mmap (2),
204.BR setrlimit (2)