-.\"
-.\" epoll by Davide Libenzi ( efficient event notification retrieval )
.\" Copyright (C) 2003 Davide Libenzi
+.\" Davide Libenzi <davidel@xmailserver.org>
+.\" and Copyright 2009, 2014, 2016, 2018, 2019 Michael Kerrisk <tk.manpages@gmail.com>
.\"
+.\" %%%LICENSE_START(GPLv2+_SW_3_PARA)
.\" This program is free software; you can redistribute it and/or modify
.\" it under the terms of the GNU General Public License as published by
.\" the Free Software Foundation; either version 2 of the License, or
.\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
.\" GNU General Public License for more details.
.\"
-.\" You should have received a copy of the GNU General Public License
-.\" along with this program; if not, write to the Free Software
-.\" Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
-.\"
-.\" Davide Libenzi <davidel@xmailserver.org>
+.\" You should have received a copy of the GNU General Public
+.\" License along with this manual; if not, see
+.\" <http://www.gnu.org/licenses/>.
+.\" %%%LICENSE_END
.\"
-.TH EPOLL_CTL 2 2009-01-17 "Linux" "Linux Programmer's Manual"
+.TH EPOLL_CTL 2 2020-04-11 "Linux" "Linux Programmer's Manual"
.SH NAME
-epoll_ctl \- control interface for an epoll descriptor
+epoll_ctl \- control interface for an epoll file descriptor
.SH SYNOPSIS
.B #include <sys/epoll.h>
-.sp
+.PP
.BI "int epoll_ctl(int " epfd ", int " op ", int " fd \
", struct epoll_event *" event );
.SH DESCRIPTION
-This system call performs control operations on the epoll instance
+This system call is used to add, modify, or remove
+entries in the interest list of the
+.BR epoll (7)
+instance
referred to by the file descriptor
.IR epfd .
It requests that the operation
.I op
be performed for the target file descriptor,
.IR fd .
-
+.PP
Valid values for the
.I op
-argument are :
+argument are:
.TP
.B EPOLL_CTL_ADD
-Register the target file descriptor
-.I fd
-on the
-.B epoll
-instance referred to by the file descriptor
-.I epfd
-and associate the event
-.I event
-with the internal file linked to
-.IR fd .
+Add an entry to the interest list of the epoll file descriptor,
+.IR epfd .
+The entry includes the file descriptor,
+.IR fd ,
+a reference to the corresponding open file description (see
+.BR epoll (7)
+and
+.BR open (2)),
+and the settings specified in
+.IR event .
.TP
.B EPOLL_CTL_MOD
-Change the event
-.I event
-associated with the target file descriptor
-.IR fd .
+Change the settings associated with
+.IR fd
+in the interest list to the new settings specified in
+.IR event .
.TP
.B EPOLL_CTL_DEL
Remove (deregister) the target file descriptor
.I fd
-from the
-.B epoll
-instance referred to by
-.IR epfd .
+from the interest list.
The
.I event
-is ignored and can be NULL (but see BUGS below).
+argument is ignored and can be NULL (but see BUGS below).
.PP
The
.I event
.IR fd .
The
.I struct epoll_event
-is defined as :
-.sp
+is defined as:
+.PP
.in +4n
-.nf
+.EX
typedef union epoll_data {
void *ptr;
int fd;
- __uint32_t u32;
- __uint64_t u64;
+ uint32_t u32;
+ uint64_t u64;
} epoll_data_t;
struct epoll_event {
- __uint32_t events; /* Epoll events */
+ uint32_t events; /* Epoll events */
epoll_data_t data; /* User data variable */
};
-.fi
+.EE
.in
-
+.PP
+The
+.I data
+member of the
+.I epoll_event
+structure specifies data that the kernel should save and then return (via
+.BR epoll_wait (2))
+when this file descriptor becomes ready.
+.PP
The
.I events
-member is a bit set composed using the following available event
-types:
+member of the
+.I epoll_event
+structure is a bit mask composed by ORing together zero or more of
+the following available event types:
.TP
.B EPOLLIN
The associated file is available for
Stream socket peer closed connection,
or shut down writing half of connection.
(This flag is especially useful for writing simple code to detect
-peer shutdown when using Edge Triggered monitoring.)
+peer shutdown when using edge-triggered monitoring.)
.TP
.B EPOLLPRI
-There is urgent data available for
-.BR read (2)
-operations.
+There is an exceptional condition on the file descriptor.
+See the discussion of
+.B POLLPRI
+in
+.BR poll (2).
.TP
.B EPOLLERR
Error condition happened on the associated file descriptor.
+This event is also reported for the write end of a pipe when the read end
+has been closed.
+.IP
.BR epoll_wait (2)
-will always wait for this event; it is not necessary to set it in
-.IR events .
+will always report for this event; it is not necessary to set it in
+.IR events
+when calling
+.BR epoll_ctl ().
.TP
.B EPOLLHUP
Hang up happened on the associated file descriptor.
+.IP
.BR epoll_wait (2)
will always wait for this event; it is not necessary to set it in
-.IR events .
+.IR events
+when calling
+.BR epoll_ctl ().
+.IP
+Note that when reading from a channel such as a pipe or a stream socket,
+this event merely indicates that the peer closed its end of the channel.
+Subsequent reads from the channel will return 0 (end of file)
+only after all outstanding data in the channel has been consumed.
.TP
.B EPOLLET
-Sets the Edge Triggered behavior for the associated file descriptor.
+Requests edge-triggered notification for the associated file descriptor.
The default behavior for
.B epoll
-is Level Triggered.
+is level-triggered.
See
.BR epoll (7)
-for more detailed information about Edge and Level Triggered event
-distribution architectures.
+for more detailed information about edge-triggered and
+level-triggered notification.
+.IP
+This flag is an input flag for the
+.I event.events
+field when calling
+.BR epoll_ctl ();
+it is never returned by
+.BR epoll_wait (2).
.TP
.BR EPOLLONESHOT " (since Linux 2.6.2)"
-Sets the one-shot behavior for the associated file descriptor.
-This means that after an event is pulled out with
-.BR epoll_wait (2)
-the associated file descriptor is internally disabled and no other events
+Requests one-shot notification for the associated file descriptor.
+This means that after an event notified for the file descriptor by
+.BR epoll_wait (2),
+the file descriptor is disabled in the interest list and no other events
will be reported by the
.B epoll
interface.
.BR epoll_ctl ()
with
.B EPOLL_CTL_MOD
-to re-arm the file descriptor with a new event mask.
-.SH "RETURN VALUE"
+to rearm the file descriptor with a new event mask.
+.IP
+This flag is an input flag for the
+.I event.events
+field when calling
+.BR epoll_ctl ();
+it is never returned by
+.BR epoll_wait (2).
+.TP
+.BR EPOLLWAKEUP " (since Linux 3.5)"
+.\" commit 4d7e30d98939a0340022ccd49325a3d70f7e0238
+If
+.B EPOLLONESHOT
+and
+.B EPOLLET
+are clear and the process has the
+.B CAP_BLOCK_SUSPEND
+capability,
+ensure that the system does not enter "suspend" or
+"hibernate" while this event is pending or being processed.
+The event is considered as being "processed" from the time
+when it is returned by a call to
+.BR epoll_wait (2)
+until the next call to
+.BR epoll_wait (2)
+on the same
+.BR epoll (7)
+file descriptor,
+the closure of that file descriptor,
+the removal of the event file descriptor with
+.BR EPOLL_CTL_DEL ,
+or the clearing of
+.B EPOLLWAKEUP
+for the event file descriptor with
+.BR EPOLL_CTL_MOD .
+See also BUGS.
+.IP
+This flag is an input flag for the
+.I event.events
+field when calling
+.BR epoll_ctl ();
+it is never returned by
+.BR epoll_wait (2).
+.TP
+.BR EPOLLEXCLUSIVE " (since Linux 4.5)"
+Sets an exclusive wakeup mode for the epoll file descriptor that is being
+attached to the target file descriptor,
+.IR fd .
+When a wakeup event occurs and multiple epoll file descriptors
+are attached to the same target file using
+.BR EPOLLEXCLUSIVE ,
+one or more of the epoll file descriptors will receive an event with
+.BR epoll_wait (2).
+The default in this scenario (when
+.BR EPOLLEXCLUSIVE
+is not set) is for all epoll file descriptors to receive an event.
+.BR EPOLLEXCLUSIVE
+is thus useful for avoiding thundering herd problems in certain scenarios.
+.IP
+If the same file descriptor is in multiple epoll instances,
+some with the
+.BR EPOLLEXCLUSIVE
+flag, and others without, then events will be provided to all epoll
+instances that did not specify
+.BR EPOLLEXCLUSIVE ,
+and at least one of the epoll instances that did specify
+.BR EPOLLEXCLUSIVE .
+.IP
+The following values may be specified in conjunction with
+.BR EPOLLEXCLUSIVE :
+.BR EPOLLIN ,
+.BR EPOLLOUT ,
+.BR EPOLLWAKEUP ,
+and
+.BR EPOLLET .
+.BR EPOLLHUP
+and
+.BR EPOLLERR
+can also be specified, but this is not required:
+as usual, these events are always reported if they occur,
+regardless of whether they are specified in
+.IR events .
+Attempts to specify other values in
+.I events
+yield the error
+.BR EINVAL .
+.IP
+.B EPOLLEXCLUSIVE
+may be used only in an
+.B EPOLL_CTL_ADD
+operation; attempts to employ it with
+.B EPOLL_CTL_MOD
+yield an error.
+If
+.B EPOLLEXCLUSIVE
+has been set using
+.BR epoll_ctl (),
+then a subsequent
+.B EPOLL_CTL_MOD
+on the same
+.IR epfd ",\ " fd
+pair yields an error.
+A call to
+.BR epoll_ctl ()
+that specifies
+.B EPOLLEXCLUSIVE
+in
+.I events
+and specifies the target file descriptor
+.I fd
+as an epoll instance will likewise fail.
+The error in all of these cases is
+.BR EINVAL .
+.IP
+The
+.BR EPOLLEXCLUSIVE
+flag is an input flag for the
+.I event.events
+field when calling
+.BR epoll_ctl ();
+it is never returned by
+.BR epoll_wait (2).
+.SH RETURN VALUE
When successful,
.BR epoll_ctl ()
returns zero.
.I op
is not supported by this interface.
.TP
+.B EINVAL
+An invalid event type was specified along with
+.B EPOLLEXCLUSIVE
+in
+.IR events .
+.TP
+.B EINVAL
+.I op
+was
+.B EPOLL_CTL_MOD
+and
+.IR events
+included
+.BR EPOLLEXCLUSIVE .
+.TP
+.B EINVAL
+.I op
+was
+.B EPOLL_CTL_MOD
+and the
+.BR EPOLLEXCLUSIVE
+flag has previously been applied to this
+.IR epfd ",\ " fd
+pair.
+.TP
+.B EINVAL
+.BR EPOLLEXCLUSIVE
+was specified in
+.IR event
+and
+.I fd
+refers to an epoll instance.
+.TP
+.B ELOOP
+.I fd
+refers to an epoll instance and this
+.B EPOLL_CTL_ADD
+operation would result in a circular loop of epoll instances
+monitoring one another.
+.TP
.B ENOENT
.I op
was
.BR EPOLL_CTL_DEL ,
and
.I fd
-is not registered on with this epoll instance.
+is not registered with this epoll instance.
.TP
.B ENOMEM
There was insufficient memory to handle the requested
.I op
control operation.
-.TP ENOSPC
+.TP
+.B ENOSPC
The limit imposed by
.I /proc/sys/fs/epoll/max_user_watches
was encountered while trying to register
.I fd
does not support
.BR epoll .
-.SH CONFORMING TO
+This error can occur if
+.I fd
+refers to, for example, a regular file or a directory.
+.SH VERSIONS
.BR epoll_ctl ()
-is Linux-specific, and was introduced in kernel 2.5.44.
+was added to the kernel in version 2.6.
+.\" To be precise: kernel 2.5.44.
.\" The interface should be finalized by Linux kernel 2.5.66.
+.SH CONFORMING TO
+.BR epoll_ctl ()
+is Linux-specific.
+Library support is provided in glibc starting with version 2.3.2.
.SH NOTES
The
.B epoll
.SH BUGS
In kernel versions before 2.6.9, the
.B EPOLL_CTL_DEL
-operation required a non-NULL pointer in
+operation required a non-null pointer in
.IR event ,
even though this argument is ignored.
Since Linux 2.6.9,
when using
.BR EPOLL_CTL_DEL .
Applications that need to be portable to kernels before 2.6.9
-should specify a non-NULL pointer in
+should specify a non-null pointer in
.IR event .
-.SH "SEE ALSO"
+.PP
+If
+.B EPOLLWAKEUP
+is specified in
+.IR flags ,
+but the caller does not have the
+.BR CAP_BLOCK_SUSPEND
+capability, then the
+.B EPOLLWAKEUP
+flag is
+.IR "silently ignored" .
+This unfortunate behavior is necessary because no validity
+checks were performed on the
+.IR flags
+argument in the original implementation, and the addition of the
+.B EPOLLWAKEUP
+with a check that caused the call to fail if the caller did not have the
+.B CAP_BLOCK_SUSPEND
+capability caused a breakage in at least one existing user-space
+application that happened to randomly (and uselessly) specify this bit.
+.\" commit a8159414d7e3af7233e7a5a82d1c5d85379bd75c (behavior change)
+.\" https://lwn.net/Articles/520198/
+A robust application should therefore double check that it has the
+.B CAP_BLOCK_SUSPEND
+capability if attempting to use the
+.B EPOLLWAKEUP
+flag.
+.SH SEE ALSO
.BR epoll_create (2),
.BR epoll_wait (2),
.BR poll (2),