]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man2/ioctl_userfaultfd.2
ioctl_userfaultfd.2: Update UFFDIO_API description
[thirdparty/man-pages.git] / man2 / ioctl_userfaultfd.2
1 .\" Copyright (c) 2016, IBM Corporation.
2 .\" Written by Mike Rapoport <rppt@linux.vnet.ibm.com>
3 .\" and Copyright (C) 2016 Michael Kerrisk <mtk.manpages@gmail.com>
4 .\"
5 .\" %%%LICENSE_START(VERBATIM)
6 .\" Permission is granted to make and distribute verbatim copies of this
7 .\" manual provided the copyright notice and this permission notice are
8 .\" preserved on all copies.
9 .\"
10 .\" Permission is granted to copy and distribute modified versions of this
11 .\" manual under the conditions for verbatim copying, provided that the
12 .\" entire resulting derived work is distributed under the terms of a
13 .\" permission notice identical to this one.
14 .\"
15 .\" Since the Linux kernel and libraries are constantly changing, this
16 .\" manual page may be incorrect or out-of-date. The author(s) assume no
17 .\" responsibility for errors or omissions, or for damages resulting from
18 .\" the use of the information contained herein. The author(s) may not
19 .\" have taken the same level of care in the production of this manual,
20 .\" which is licensed free of charge, as they might when working
21 .\" professionally.
22 .\"
23 .\" Formatted or processed versions of this manual, if unaccompanied by
24 .\" the source, must acknowledge the copyright and authors of this work.
25 .\" %%%LICENSE_END
26 .\"
27 .\"
28 .TH IOCTL_USERFAULTFD 2 2016-12-12 "Linux" "Linux Programmer's Manual"
29 .SH NAME
30 userfaultfd \- create a file descriptor for handling page faults in user
31 space
32 .SH SYNOPSIS
33 .nf
34 .B #include <sys/ioctl.h>
35
36 .BI "int ioctl(int " fd ", int " cmd ", ...);"
37 .fi
38 .SH DESCRIPTION
39 Various
40 .BR ioctl (2)
41 operations can be performed on a userfaultfd object (created by a call to
42 .BR userfaultfd (2))
43 using calls of the form:
44
45 ioctl(fd, cmd, argp);
46
47 In the above,
48 .I fd
49 is a file descriptor referring to a userfaultfd object,
50 .I cmd
51 is one of the commands listed below, and
52 .I argp
53 is a pointer to a data structure that is specific to
54 .IR cmd .
55
56 The various
57 .BR ioctl (2)
58 operations are described below.
59 The
60 .BR UFFDIO_API,
61 .BR UFFDIO_REGISTER ,
62 and
63 .BR UFFDIO_UNREGISTER
64 operations are used to
65 .I configure
66 userfaultfd behavior.
67 These operations allow the caller to choose what features will be enabled and
68 what kinds of events will be delivered to the application.
69 The remaining operations are
70 .IR range
71 operations.
72 These operations enable the calling application to resolve page-fault
73 events.
74 .\"
75 .SS UFFDIO_API
76 (Since Linux 4.3.)
77 Enable operation of the userfaultfd and perform API handshake.
78
79 The
80 .I argp
81 argument is a pointer to a
82 .IR uffdio_api
83 structure, defined as:
84 .in +4n
85 .nf
86
87 struct uffdio_api {
88 __u64 api; /* Requested API version (input) */
89 __u64 features; /* Currently must be zero (input) */
90 __u64 ioctls; /* Available ioctl() operations (output) */
91 };
92
93 .fi
94 .in
95 The
96 .I api
97 field denotes the API version requested by the application.
98
99 The kernel verifies that it can support the requested API version,
100 and sets the
101 .I features
102 and
103 .I ioctls
104 fields to bit masks representing all the available features and the generic
105 .BR ioctl (2)
106 operations available.
107
108 For Linux kernel versions before 4.11, the
109 .I features
110 field must be initialized to zero before the call to
111 .I UFFDIO_API
112 , and zero (i.e., no feature bits) is placed in the
113 .I features
114 field by the kernel upon return from
115 .BR ioctl (2).
116
117 Starting from Linux 4.11, the
118 .I features
119 field can be used to to ask whether particular features are supported
120 and explicitly enable userfaultfd features that are disabled by default.
121 The kernel always reports all the available features in the
122 .I features
123 field.
124 .\" FIXME add more details about feature negotiation and enablement
125
126 Since Linux 4.11, the following feature bits may be set:
127 .TP
128 .B UFFD_FEATURE_EVENT_FORK
129 .TP
130 .B UFFD_FEATURE_EVENT_REMAP
131 .TP
132 .B UFFD_FEATURE_EVENT_REMOVE
133 .TP
134 .B UFFD_FEATURE_EVENT_UNMAP
135 .TP
136 .B UFFD_FEATURE_MISSING_HUGETLBFS
137 .TP
138 .B UFFD_FEATURE_MISSING_SHMEM
139 .\" FIXME add feature description
140
141 The returned
142 .I ioctls
143 field can contain the following bits:
144 .\" FIXME This user-space API seems not fully polished. Why are there
145 .\" not constants defined for each of the bit-mask values listed below?
146 .TP
147 .B 1 << _UFFDIO_API
148 The
149 .B UFFDIO_API
150 operation is supported.
151 .TP
152 .B 1 << _UFFDIO_REGISTER
153 The
154 .B UFFDIO_REGISTER
155 operation is supported.
156 .TP
157 .B 1 << _UFFDIO_UNREGISTER
158 The
159 .B UFFDIO_UNREGISTER
160 operation is supported.
161 .PP
162 This
163 .BR ioctl (2)
164 operation returns 0 on success.
165 On error, \-1 is returned and
166 .I errno
167 is set to indicate the cause of the error.
168 Possible errors include:
169 .TP
170 .B EFAULT
171 .I argp
172 refers to an address that is outside the calling process's
173 accessible address space.
174 .TP
175 .B EINVAL
176 The userfaultfd has already been enabled by a previous
177 .BR UFFDIO_API
178 operation.
179 .TP
180 .B EINVAL
181 The API version requested in the
182 .I api
183 field is not supported by this kernel, or the
184 .I features
185 field was not zero.
186 .\" FIXME In the above error case, the returned 'uffdio_api' structure is
187 .\" zeroed out. Why is this done? This should be explained in the manual page.
188 .\"
189 .\" Mike Rapoport:
190 .\" In my understanding the uffdio_api
191 .\" structure is zeroed to allow the caller
192 .\" to distinguish the reasons for -EINVAL.
193 .\"
194 .SS UFFDIO_REGISTER
195 (Since Linux 4.3.)
196 Register a memory address range with the userfaultfd object.
197 The pages in the range must be "compatible".
198
199 Up to Linux kernel 4.11,
200 only private anonymous ranges are compatible for registering with
201 .BR UFFDIO_REGISTER .
202
203 Since Linux 4.11,
204 hugetlbfs and shared memory ranges are also compatible with
205 .BR UFFDIO_REGISTER .
206
207 The
208 .I argp
209 argument is a pointer to a
210 .I uffdio_register
211 structure, defined as:
212 .in +4n
213 .nf
214
215 struct uffdio_range {
216 __u64 start; /* Start of range */
217 __u64 len; /* Length of range (bytes) */
218 };
219
220 struct uffdio_register {
221 struct uffdio_range range;
222 __u64 mode; /* Desired mode of operation (input) */
223 __u64 ioctls; /* Available ioctl() operations (output) */
224 };
225
226 .fi
227 .in
228
229 The
230 .I range
231 field defines a memory range starting at
232 .I start
233 and continuing for
234 .I len
235 bytes that should be handled by the userfaultfd.
236
237 The
238 .I mode
239 field defines the mode of operation desired for this memory region.
240 The following values may be bitwise ORed to set the userfaultfd mode for
241 the specified range:
242 .TP
243 .B UFFDIO_REGISTER_MODE_MISSING
244 Track page faults on missing pages.
245 .TP
246 .B UFFDIO_REGISTER_MODE_WP
247 Track page faults on write-protected pages.
248 .PP
249 Currently, the only supported mode is
250 .BR UFFDIO_REGISTER_MODE_MISSING .
251 .PP
252 If the operation is successful, the kernel modifies the
253 .I ioctls
254 bit-mask field to indicate which
255 .BR ioctl (2)
256 operations are available for the specified range.
257 This returned bit mask is as for
258 .BR UFFDIO_API .
259
260 This
261 .BR ioctl (2)
262 operation returns 0 on success.
263 On error, \-1 is returned and
264 .I errno
265 is set to indicate the cause of the error.
266 Possible errors include:
267 .\" FIXME Is the following error list correct?
268 .\"
269 .TP
270 .B EBUSY
271 A mapping in the specified range is registered with another
272 userfaultfd object.
273 .TP
274 .B EFAULT
275 .I argp
276 refers to an address that is outside the calling process's
277 accessible address space.
278 .TP
279 .B EINVAL
280 An invalid or unsupported bit was specified in the
281 .I mode
282 field; or the
283 .I mode
284 field was zero.
285 .TP
286 .B EINVAL
287 There is no mapping in the specified address range.
288 .TP
289 .B EINVAL
290 .I range.start
291 or
292 .I range.len
293 is not a multiple of the system page size; or,
294 .I range.len
295 is zero; or these fields are otherwise invalid.
296 .TP
297 .B EINVAL
298 There as an incompatible mapping in the specified address range.
299 .\" Mike Rapoport:
300 .\" ENOMEM if the process is exiting and the
301 .\" mm_struct has gone by the time userfault grabs it.
302 .SS UFFDIO_UNREGISTER
303 (Since Linux 4.3.)
304 Unregister a memory address range from userfaultfd.
305 The pages in the range must be "compatible" (see the description of
306 .BR UFFDIO_REGISTER .)
307
308 The address range to unregister is specified in the
309 .IR uffdio_range
310 structure pointed to by
311 .IR argp .
312
313 This
314 .BR ioctl (2)
315 operation returns 0 on success.
316 On error, \-1 is returned and
317 .I errno
318 is set to indicate the cause of the error.
319 Possible errors include:
320 .TP
321 .B EINVAL
322 Either the
323 .I start
324 or the
325 .I len
326 field of the
327 .I ufdio_range
328 structure was not a multiple of the system page size; or the
329 .I len
330 field was zero; or these fields were otherwise invalid.
331 .TP
332 .B EINVAL
333 There as an incompatible mapping in the specified address range.
334 .TP
335 .B EINVAL
336 There was no mapping in the specified address range.
337 .\"
338 .SS UFFDIO_COPY
339 (Since Linux 4.3.)
340 Atomically copy a continuous memory chunk into the userfault registered
341 range and optionally wake up the blocked thread.
342 The source and destination addresses and the number of bytes to copy are
343 specified by the
344 .IR src ", " dst ", and " len
345 fields of the
346 .I uffdio_copy
347 structure pointed to by
348 .IR argp :
349
350 .in +4n
351 .nf
352 struct uffdio_copy {
353 __u64 dst; /* Source of copy */
354 __u64 src; /* Destination of copy */
355 __u64 len; /* Number of bytes to copy */
356 __u64 mode; /* Flags controlling behavior of copy */
357 __s64 copy; /* Number of bytes copied, or negated error */
358 };
359 .fi
360 .in
361 .PP
362 The following value may be bitwise ORed in
363 .IR mode
364 to change the behavior of the
365 .B UFFDIO_COPY
366 operation:
367 .TP
368 .B UFFDIO_COPY_MODE_DONTWAKE
369 Do not wake up the thread that waits for page-fault resolution
370 .PP
371 The
372 .I copy
373 field is used by the kernel to return the number of bytes
374 that was actually copied, or an error (a negated
375 .IR errno -style
376 value).
377 .\" FIXME Above: Why is the 'copy' field used to return error values?
378 .\" This should be explained in the manual page.
379 If the value returned in
380 .I copy
381 doesn't match the value that was specified in
382 .IR len ,
383 the operation fails with the error
384 .BR EAGAIN .
385 The
386 .I copy
387 field is output-only;
388 it is not read by the
389 .B UFFDIO_COPY
390 operation.
391
392 This
393 .BR ioctl (2)
394 operation returns 0 on success.
395 In this case, the entire area was copied.
396 On error, \-1 is returned and
397 .I errno
398 is set to indicate the cause of the error.
399 Possible errors include:
400 .TP
401 .B EAGAIN
402 The number of bytes copied (i.e., the value returned in the
403 .I copy
404 field)
405 does not equal the value that was specified in the
406 .I len
407 field.
408 .TP
409 .B EINVAL
410 Either
411 .I dst
412 or
413 .I len
414 was not a multiple of the system page size, or the range specified by
415 .IR src
416 and
417 .IR len
418 or
419 .IR dst
420 and
421 .IR len
422 was invalid.
423 .TP
424 .B EINVAL
425 An invalid bit was specified in the
426 .IR mode
427 field.
428 .\"
429 .SS UFFDIO_ZEROPAGE
430 (Since Linux 4.3.)
431 Zero out a memory range registered with userfaultfd.
432
433 The requested range is specified by the
434 .I range
435 field of the
436 .I uffdio_zeropage
437 structure pointed to by
438 .IR argp :
439
440 .in +4n
441 .nf
442 struct uffdio_zeropage {
443 struct uffdio_range range;
444 __u64 mode; /* Flags controlling behavior of copy */
445 __s64 zeropage; /* Number of bytes zeroed, or negated error */
446 };
447 .fi
448 .in
449 .PP
450 The following value may be bitwise ORed in
451 .IR mode
452 to change the behavior of the
453 .B UFFDIO_ZERO
454 operation:
455 .TP
456 .B UFFDIO_ZEROPAGE_MODE_DONTWAKE
457 Do not wake up the thread that waits for page-fault resolution.
458 .PP
459 The
460 .I zeropage
461 field is used by the kernel to return the number of bytes
462 that was actually zeroed,
463 or an error in the same manner as
464 .BR UFFDIO_COPY .
465 .\" FIXME Why is the 'zeropage' field used to return error values?
466 .\" This should be explained in the manual page.
467 If the value returned in the
468 .I zeropage
469 field doesn't match the value that was specified in
470 .IR range.len ,
471 the operation fails with the error
472 .BR EAGAIN .
473 The
474 .I zeropage
475 field is output-only;
476 it is not read by the
477 .B UFFDIO_ZERO
478 operation.
479
480 This
481 .BR ioctl (2)
482 operation returns 0 on success.
483 In this case, the entire area was zeroed.
484 On error, \-1 is returned and
485 .I errno
486 is set to indicate the cause of the error.
487 Possible errors include:
488 .TP
489 .B EAGAIN
490 The number of bytes zeroed (i.e., the value returned in the
491 .I zeropage
492 field)
493 does not equal the value that was specified in the
494 .I range.len
495 field.
496 .TP
497 .B EINVAL
498 Either
499 .I range.start
500 or
501 .I range.len
502 was not a multiple of the system page size; or
503 .I range.len
504 was zero; or the range specified was invalid.
505 .TP
506 .B EINVAL
507 An invalid bit was specified in the
508 .IR mode
509 field.
510 .\"
511 .SS UFFDIO_WAKE
512 (Since Linux 4.3.)
513 Wake up the thread waiting for page-fault resolution on
514 a specified memory address range.
515
516 The
517 .B UFFDIO_WAKE
518 operation is used in conjunction with
519 .BR UFFDIO_COPY
520 and
521 .BR UFFDIO_ZEROPAGE
522 operations that have the
523 .BR UFFDIO_COPY_MODE_DONTWAKE
524 or
525 .BR UFFDIO_ZEROPAGE_MODE_DONTWAKE
526 bit set in the
527 .I mode
528 field.
529 The userfault monitor can perform several
530 .BR UFFDIO_COPY
531 and
532 .BR UFFDIO_ZEROPAGE
533 operations in a batch and then explicitly wake up the faulting thread using
534 .BR UFFDIO_WAKE .
535
536 The
537 .I argp
538 argument is a pointer to a
539 .I uffdio_range
540 structure (shown above) that specifies the address range.
541
542 This
543 .BR ioctl (2)
544 operation returns 0 on success.
545 On error, \-1 is returned and
546 .I errno
547 is set to indicate the cause of the error.
548 Possible errors include:
549 .TP
550 .B EINVAL
551 The
552 .I start
553 or the
554 .I len
555 field of the
556 .I ufdio_range
557 structure was not a multiple of the system page size; or
558 .I len
559 was zero; or the specified range was otherwise invalid.
560 .SH RETURN VALUE
561 See descriptions of the individual operations, above.
562 .SH ERRORS
563 See descriptions of the individual operations, above.
564 In addition, the following general errors can occur for all of the
565 operations described above:
566 .TP
567 .B EFAULT
568 .I argp
569 does not point to a valid memory address.
570 .TP
571 .B EINVAL
572 (For all operations except
573 .BR UFFDIO_API .)
574 The userfaultfd object has not yet been enabled (via the
575 .BR UFFDIO_API
576 operation).
577 .SH CONFORMING TO
578 These
579 .BR ioctl (2)
580 operations are Linux-specific.
581 .SH EXAMPLE
582 See
583 .BR userfaultfd (2).
584 .SH SEE ALSO
585 .BR ioctl (2),
586 .BR mmap (2),
587 .BR userfaultfd (2)
588
589 .IR Documentation/vm/userfaultfd.txt
590 in the Linux kernel source tree
591