.\" SPDX-License-Identifier: Linux-man-pages-copyleft
.\"
.\"
-.TH IOCTL_USERFAULTFD 2 2021-03-22 "Linux" "Linux Programmer's Manual"
+.TH ioctl_userfaultfd 2 (date) "Linux man-pages (unreleased)"
.SH NAME
ioctl_userfaultfd \- create a file descriptor for handling page faults in user
space
+.SH LIBRARY
+Standard C library
+.RI ( libc ", " \-lc )
.SH SYNOPSIS
.nf
.BR "#include <linux/userfaultfd.h>" " /* Definition of " UFFD* " constants */"
.BR UFFDIO_API ,
.BR UFFDIO_REGISTER ,
and
-.BR UFFDIO_UNREGISTER
+.B UFFDIO_UNREGISTER
operations are used to
.I configure
userfaultfd behavior.
These operations allow the caller to choose what features will be enabled and
what kinds of events will be delivered to the application.
The remaining operations are
-.IR range
+.I range
operations.
These operations enable the calling application to resolve page-fault
events.
The
.I argp
argument is a pointer to a
-.IR uffdio_api
+.I uffdio_api
structure, defined as:
.PP
.in +4n
If this feature bit is set,
.I uffd_msg.pagefault.feat.ptid
will be set to the faulted thread ID for each page-fault message.
+.TP
+.BR UFFD_FEATURE_MINOR_HUGETLBFS " (since Linux 5.13)"
+If this feature bit is set,
+the kernel supports registering userfaultfd ranges
+in minor mode on hugetlbfs-backed memory areas.
+.TP
+.BR UFFD_FEATURE_MINOR_SHMEM " (since Linux 5.14)"
+If this feature bit is set,
+the kernel supports registering userfaultfd ranges
+in minor mode on shmem-backed memory areas.
.PP
The returned
.I ioctls
The
.B UFFDIO_UNREGISTER
operation is supported.
-.TP
-.B 1 << _UFFDIO_WRITEPROTECT
-The
-.B UFFDIO_WRITEPROTECT
-operation is supported.
.PP
This
.BR ioctl (2)
.TP
.B EINVAL
The userfaultfd has already been enabled by a previous
-.BR UFFDIO_API
+.B UFFDIO_API
operation.
.TP
.B EINVAL
(Since Linux 4.3.)
Register a memory address range with the userfaultfd object.
The pages in the range must be "compatible".
-.PP
-Up to Linux kernel 4.11,
-only private anonymous ranges are compatible for registering with
-.BR UFFDIO_REGISTER .
-.PP
-Since Linux 4.11,
-hugetlbfs and shared memory ranges are also compatible with
-.BR UFFDIO_REGISTER .
+Please refer to the list of register modes below
+for the compatible memory backends for each mode.
.PP
The
.I argp
.TP
.B UFFDIO_REGISTER_MODE_MISSING
Track page faults on missing pages.
+Since Linux 4.3,
+only private anonymous ranges are compatible.
+Since Linux 4.11,
+hugetlbfs and shared memory ranges are also compatible.
.TP
.B UFFDIO_REGISTER_MODE_WP
Track page faults on write-protected pages.
+Since Linux 5.7,
+only private anonymous ranges are compatible.
+.TP
+.B UFFDIO_REGISTER_MODE_MINOR
+Track minor page faults.
+Since Linux 5.13,
+only hugetlbfs ranges are compatible.
+Since Linux 5.14,
+compatiblity with shmem ranges was added.
.PP
If the operation is successful, the kernel modifies the
.I ioctls
bit-mask field to indicate which
.BR ioctl (2)
operations are available for the specified range.
-This returned bit mask is as for
-.BR UFFDIO_API .
+This returned bit mask can contain the following bits:
+.TP
+.B 1 << _UFFDIO_COPY
+The
+.B UFFDIO_COPY
+operation is supported.
+.TP
+.B 1 << _UFFDIO_WAKE
+The
+.B UFFDIO_WAKE
+operation is supported.
+.TP
+.B 1 << _UFFDIO_WRITEPROTECT
+The
+.B UFFDIO_WRITEPROTECT
+.TP
+.B 1 << _UFFDIO_ZEROPAGE
+The
+.B UFFDIO_ZEROPAGE
+operation is supported.
+.TP
+.B 1 << _UFFDIO_CONTINUE
+The
+.B UFFDIO_CONTINUE
+operation is supported.
.PP
This
.BR ioctl (2)
.BR UFFDIO_REGISTER .)
.PP
The address range to unregister is specified in the
-.IR uffdio_range
+.I uffdio_range
structure pointed to by
.IR argp .
.PP
.in
.PP
The following value may be bitwise ORed in
-.IR mode
+.I mode
to change the behavior of the
.B UFFDIO_COPY
operation:
or
.I len
was not a multiple of the system page size, or the range specified by
-.IR src
+.I src
and
-.IR len
+.I len
or
-.IR dst
+.I dst
and
-.IR len
+.I len
was invalid.
.TP
.B EINVAL
An invalid bit was specified in the
-.IR mode
+.I mode
field.
.TP
.BR ENOENT " (since Linux 4.11)"
.in
.PP
The following value may be bitwise ORed in
-.IR mode
+.I mode
to change the behavior of the
.B UFFDIO_ZEROPAGE
operation:
.TP
.B EINVAL
An invalid bit was specified in the
-.IR mode
+.I mode
field.
.TP
.BR ESRCH " (since Linux 4.13)"
The
.B UFFDIO_WAKE
operation is used in conjunction with
-.BR UFFDIO_COPY
+.B UFFDIO_COPY
and
-.BR UFFDIO_ZEROPAGE
+.B UFFDIO_ZEROPAGE
operations that have the
-.BR UFFDIO_COPY_MODE_DONTWAKE
+.B UFFDIO_COPY_MODE_DONTWAKE
or
-.BR UFFDIO_ZEROPAGE_MODE_DONTWAKE
+.B UFFDIO_ZEROPAGE_MODE_DONTWAKE
bit set in the
.I mode
field.
The userfault monitor can perform several
-.BR UFFDIO_COPY
+.B UFFDIO_COPY
and
-.BR UFFDIO_ZEROPAGE
+.B UFFDIO_ZEROPAGE
operations in a batch and then explicitly wake up the faulting thread using
.BR UFFDIO_WAKE .
.PP
.TP
.B EFAULT
Encountered a generic fault during processing.
+.\"
+.SS UFFDIO_CONTINUE
+(Since Linux 5.13.)
+Resolve a minor page fault
+by installing page table entries
+for existing pages in the page cache.
+.PP
+The
+.I argp
+argument is a pointer to a
+.I uffdio_continue
+structure as shown below:
+.PP
+.in +4n
+.EX
+struct uffdio_continue {
+ struct uffdio_range range;
+ /* Range to install PTEs for and continue */
+ __u64 mode; /* Flags controlling the behavior of continue */
+ __s64 mapped; /* Number of bytes mapped, or negated error */
+};
+.EE
+.in
+.PP
+The following value may be bitwise ORed in
+.I mode
+to change the behavior of the
+.B UFFDIO_CONTINUE
+operation:
+.TP
+.B UFFDIO_CONTINUE_MODE_DONTWAKE
+Do not wake up the thread that waits for page-fault resolution.
+.PP
+The
+.I mapped
+field is used by the kernel
+to return the number of bytes that were actually mapped,
+or an error in the same manner as
+.BR UFFDIO_COPY .
+If the value returned in the
+.I mapped
+field doesn't match the value that was specified in
+.IR range.len ,
+the operation fails with the error
+.BR EAGAIN .
+The
+.I mapped
+field is output-only;
+it is not read by the
+.B UFFDIO_CONTINUE
+operation.
+.PP
+This
+.BR ioctl (2)
+operation returns 0 on success.
+In this case,
+the entire area was mapped.
+On error, \-1 is returned and
+.I errno
+is set to indicate the error.
+Possible errors include:
+.TP
+.B EAGAIN
+The number of bytes mapped
+(i.e., the value returned in the
+.I mapped
+field)
+does not equal the value that was specified in the
+.I range.len
+field.
+.TP
+.B EINVAL
+Either
+.I range.start
+or
+.I range.len
+was not a multiple of the system page size; or
+.I range.len
+was zero; or the range specified was invalid.
+.TP
+.B EINVAL
+An invalid bit was specified in the
+.I mode
+field.
+.TP
+.B EEXIST
+One or more pages were already mapped in the given range.
+.TP
+.B ENOENT
+The faulting process has changed its virtual memory layout simultaneously with
+an outstanding
+.B UFFDIO_CONTINUE
+operation.
+.TP
+.B ENOMEM
+Allocating memory needed to setup the page table mappings failed.
+.TP
+.B EFAULT
+No existing page could be found in the page cache for the given range.
+.TP
+.B ESRCH
+The faulting process has exited at the time of a
+.B UFFDIO_CONTINUE
+operation.
+.\"
.SH RETURN VALUE
See descriptions of the individual operations, above.
.SH ERRORS
(For all operations except
.BR UFFDIO_API .)
The userfaultfd object has not yet been enabled (via the
-.BR UFFDIO_API
+.B UFFDIO_API
operation).
-.SH CONFORMING TO
+.SH STANDARDS
These
.BR ioctl (2)
operations are Linux-specific.
In order to detect available userfault features and
enable some subset of those features
the userfaultfd file descriptor must be closed after the first
-.BR UFFDIO_API
+.B UFFDIO_API
operation that queries features availability and reopened before
the second
-.BR UFFDIO_API
+.B UFFDIO_API
operation that actually enables the desired features.
.SH EXAMPLES
See
.BR mmap (2),
.BR userfaultfd (2)
.PP
-.IR Documentation/admin\-guide/mm/userfaultfd.rst
+.I Documentation/admin\-guide/mm/userfaultfd.rst
in the Linux kernel source tree