1 .\" Copyright (c) 2016, IBM Corporation.
2 .\" Written by Mike Rapoport <rppt@linux.vnet.ibm.com>
3 .\" and Copyright (C) 2016 Michael Kerrisk <mtk.manpages@gmail.com>
5 .\" %%%LICENSE_START(VERBATIM)
6 .\" Permission is granted to make and distribute verbatim copies of this
7 .\" manual provided the copyright notice and this permission notice are
8 .\" preserved on all copies.
10 .\" Permission is granted to copy and distribute modified versions of this
11 .\" manual under the conditions for verbatim copying, provided that the
12 .\" entire resulting derived work is distributed under the terms of a
13 .\" permission notice identical to this one.
15 .\" Since the Linux kernel and libraries are constantly changing, this
16 .\" manual page may be incorrect or out-of-date. The author(s) assume no
17 .\" responsibility for errors or omissions, or for damages resulting from
18 .\" the use of the information contained herein. The author(s) may not
19 .\" have taken the same level of care in the production of this manual,
20 .\" which is licensed free of charge, as they might when working
23 .\" Formatted or processed versions of this manual, if unaccompanied by
24 .\" the source, must acknowledge the copyright and authors of this work.
28 .TH IOCTL_USERFAULTFD 2 2016-12-12 "Linux" "Linux Programmer's Manual"
30 userfaultfd \- create a file descriptor for handling page faults in user
34 .B #include <sys/ioctl.h>
36 .BI "int ioctl(int " fd ", int " cmd ", ...);"
41 operations can be performed on a userfaultfd object (created by a call to
43 using calls of the form:
49 is a file descriptor referring to a userfaultfd object,
51 is one of the commands listed below, and
53 is a pointer to a data structure that is specific to
58 operations are described below.
64 operations are used to
67 These operations allow the caller to choose what features will be enabled and
68 what kinds of events will be delivered to the application.
69 The remaining operations are
72 These operations enable the calling application to resolve page-fault
77 Enable operation of the userfaultfd and perform API handshake.
81 argument is a pointer to a
83 structure, defined as:
88 __u64 api; /* Requested API version (input) */
89 __u64 features; /* Currently must be zero (input) */
90 __u64 ioctls; /* Available ioctl() operations (output) */
97 field denotes the API version requested by the application.
99 The kernel verifies that it can support the requested API version,
104 fields to bit masks representing all the available features and the generic
106 operations available.
108 For Linux kernel versions before 4.11, the
110 field must be initialized to zero before the call to
112 , and zero (i.e., no feature bits) is placed in the
114 field by the kernel upon return from
117 Starting from Linux 4.11, the
119 field can be used to to ask whether particular features are supported
120 and explicitly enable userfaultfd features that are disabled by default.
121 The kernel always reports all the available features in the
124 .\" FIXME add more details about feature negotiation and enablement
126 Since Linux 4.11, the following feature bits may be set:
128 .B UFFD_FEATURE_EVENT_FORK
130 .B UFFD_FEATURE_EVENT_REMAP
132 .B UFFD_FEATURE_EVENT_REMOVE
134 .B UFFD_FEATURE_EVENT_UNMAP
136 .B UFFD_FEATURE_MISSING_HUGETLBFS
138 .B UFFD_FEATURE_MISSING_SHMEM
139 .\" FIXME add feature description
143 field can contain the following bits:
144 .\" FIXME This user-space API seems not fully polished. Why are there
145 .\" not constants defined for each of the bit-mask values listed below?
150 operation is supported.
152 .B 1 << _UFFDIO_REGISTER
155 operation is supported.
157 .B 1 << _UFFDIO_UNREGISTER
160 operation is supported.
164 operation returns 0 on success.
165 On error, \-1 is returned and
167 is set to indicate the cause of the error.
168 Possible errors include:
172 refers to an address that is outside the calling process's
173 accessible address space.
176 The userfaultfd has already been enabled by a previous
181 The API version requested in the
183 field is not supported by this kernel, or the
186 .\" FIXME In the above error case, the returned 'uffdio_api' structure is
187 .\" zeroed out. Why is this done? This should be explained in the manual page.
190 .\" In my understanding the uffdio_api
191 .\" structure is zeroed to allow the caller
192 .\" to distinguish the reasons for -EINVAL.
196 Register a memory address range with the userfaultfd object.
197 The pages in the range must be "compatible".
199 Up to Linux kernel 4.11,
200 only private anonymous ranges are compatible for registering with
201 .BR UFFDIO_REGISTER .
204 hugetlbfs and shared memory ranges are also compatible with
205 .BR UFFDIO_REGISTER .
209 argument is a pointer to a
211 structure, defined as:
215 struct uffdio_range {
216 __u64 start; /* Start of range */
217 __u64 len; /* Length of range (bytes) */
220 struct uffdio_register {
221 struct uffdio_range range;
222 __u64 mode; /* Desired mode of operation (input) */
223 __u64 ioctls; /* Available ioctl() operations (output) */
231 field defines a memory range starting at
235 bytes that should be handled by the userfaultfd.
239 field defines the mode of operation desired for this memory region.
240 The following values may be bitwise ORed to set the userfaultfd mode for
243 .B UFFDIO_REGISTER_MODE_MISSING
244 Track page faults on missing pages.
246 .B UFFDIO_REGISTER_MODE_WP
247 Track page faults on write-protected pages.
249 Currently, the only supported mode is
250 .BR UFFDIO_REGISTER_MODE_MISSING .
252 If the operation is successful, the kernel modifies the
254 bit-mask field to indicate which
256 operations are available for the specified range.
257 This returned bit mask is as for
262 operation returns 0 on success.
263 On error, \-1 is returned and
265 is set to indicate the cause of the error.
266 Possible errors include:
267 .\" FIXME Is the following error list correct?
271 A mapping in the specified range is registered with another
276 refers to an address that is outside the calling process's
277 accessible address space.
280 An invalid or unsupported bit was specified in the
287 There is no mapping in the specified address range.
293 is not a multiple of the system page size; or,
295 is zero; or these fields are otherwise invalid.
298 There as an incompatible mapping in the specified address range.
300 .\" ENOMEM if the process is exiting and the
301 .\" mm_struct has gone by the time userfault grabs it.
302 .SS UFFDIO_UNREGISTER
304 Unregister a memory address range from userfaultfd.
305 The pages in the range must be "compatible" (see the description of
306 .BR UFFDIO_REGISTER .)
308 The address range to unregister is specified in the
310 structure pointed to by
315 operation returns 0 on success.
316 On error, \-1 is returned and
318 is set to indicate the cause of the error.
319 Possible errors include:
328 structure was not a multiple of the system page size; or the
330 field was zero; or these fields were otherwise invalid.
333 There as an incompatible mapping in the specified address range.
336 There was no mapping in the specified address range.
340 Atomically copy a continuous memory chunk into the userfault registered
341 range and optionally wake up the blocked thread.
342 The source and destination addresses and the number of bytes to copy are
344 .IR src ", " dst ", and " len
347 structure pointed to by
353 __u64 dst; /* Source of copy */
354 __u64 src; /* Destination of copy */
355 __u64 len; /* Number of bytes to copy */
356 __u64 mode; /* Flags controlling behavior of copy */
357 __s64 copy; /* Number of bytes copied, or negated error */
362 The following value may be bitwise ORed in
364 to change the behavior of the
368 .B UFFDIO_COPY_MODE_DONTWAKE
369 Do not wake up the thread that waits for page-fault resolution
373 field is used by the kernel to return the number of bytes
374 that was actually copied, or an error (a negated
377 .\" FIXME Above: Why is the 'copy' field used to return error values?
378 .\" This should be explained in the manual page.
379 If the value returned in
381 doesn't match the value that was specified in
383 the operation fails with the error
387 field is output-only;
388 it is not read by the
394 operation returns 0 on success.
395 In this case, the entire area was copied.
396 On error, \-1 is returned and
398 is set to indicate the cause of the error.
399 Possible errors include:
402 The number of bytes copied (i.e., the value returned in the
405 does not equal the value that was specified in the
414 was not a multiple of the system page size, or the range specified by
425 An invalid bit was specified in the
431 Zero out a memory range registered with userfaultfd.
433 The requested range is specified by the
437 structure pointed to by
442 struct uffdio_zeropage {
443 struct uffdio_range range;
444 __u64 mode; /* Flags controlling behavior of copy */
445 __s64 zeropage; /* Number of bytes zeroed, or negated error */
450 The following value may be bitwise ORed in
452 to change the behavior of the
456 .B UFFDIO_ZEROPAGE_MODE_DONTWAKE
457 Do not wake up the thread that waits for page-fault resolution.
461 field is used by the kernel to return the number of bytes
462 that was actually zeroed,
463 or an error in the same manner as
465 .\" FIXME Why is the 'zeropage' field used to return error values?
466 .\" This should be explained in the manual page.
467 If the value returned in the
469 field doesn't match the value that was specified in
471 the operation fails with the error
475 field is output-only;
476 it is not read by the
482 operation returns 0 on success.
483 In this case, the entire area was zeroed.
484 On error, \-1 is returned and
486 is set to indicate the cause of the error.
487 Possible errors include:
490 The number of bytes zeroed (i.e., the value returned in the
493 does not equal the value that was specified in the
502 was not a multiple of the system page size; or
504 was zero; or the range specified was invalid.
507 An invalid bit was specified in the
513 Wake up the thread waiting for page-fault resolution on
514 a specified memory address range.
518 operation is used in conjunction with
522 operations that have the
523 .BR UFFDIO_COPY_MODE_DONTWAKE
525 .BR UFFDIO_ZEROPAGE_MODE_DONTWAKE
529 The userfault monitor can perform several
533 operations in a batch and then explicitly wake up the faulting thread using
538 argument is a pointer to a
540 structure (shown above) that specifies the address range.
544 operation returns 0 on success.
545 On error, \-1 is returned and
547 is set to indicate the cause of the error.
548 Possible errors include:
557 structure was not a multiple of the system page size; or
559 was zero; or the specified range was otherwise invalid.
561 See descriptions of the individual operations, above.
563 See descriptions of the individual operations, above.
564 In addition, the following general errors can occur for all of the
565 operations described above:
569 does not point to a valid memory address.
572 (For all operations except
574 The userfaultfd object has not yet been enabled (via the
580 operations are Linux-specific.
589 .IR Documentation/vm/userfaultfd.txt
590 in the Linux kernel source tree