]>
Commit | Line | Data |
---|---|---|
fea681da MK |
1 | .\" Copyright (C) 2003 Davide Libenzi |
2 | .\" | |
e4a74ca8 | 3 | .\" SPDX-License-Identifier: GPL-2.0-or-later |
fea681da MK |
4 | .\" |
5 | .\" Davide Libenzi <davidel@xmailserver.org> | |
6 | .\" | |
4c1c5274 | 7 | .TH epoll 7 (date) "Linux man-pages (unreleased)" |
fea681da MK |
8 | .SH NAME |
9 | epoll \- I/O event notification facility | |
10 | .SH SYNOPSIS | |
c7db92b9 | 11 | .nf |
fea681da | 12 | .B #include <sys/epoll.h> |
c7db92b9 | 13 | .fi |
fea681da | 14 | .SH DESCRIPTION |
2b348e56 MK |
15 | The |
16 | .B epoll | |
17 | API performs a similar task to | |
18 | .BR poll (2): | |
19 | monitoring multiple file descriptors to see if I/O is possible on any of them. | |
20 | The | |
fea681da | 21 | .B epoll |
2b348e56 | 22 | API can be used either as an edge-triggered or a level-triggered |
fc15f317 | 23 | interface and scales well to large numbers of watched file descriptors. |
c6d039a3 | 24 | .P |
04091160 MK |
25 | The central concept of the |
26 | .B epoll | |
27 | API is the | |
28 | .B epoll | |
29 | .IR instance , | |
30 | an in-kernel data structure which, from a user-space perspective, | |
31 | can be considered as a container for two lists: | |
cdede5cd | 32 | .IP \[bu] 3 |
04091160 MK |
33 | The |
34 | .I interest | |
35 | list (sometimes also called the | |
36 | .B epoll | |
37 | set): the set of file descriptors that the process has registered | |
38 | an interest in monitoring. | |
cdede5cd | 39 | .IP \[bu] |
04091160 MK |
40 | The |
41 | .I ready | |
42 | list: the set of file descriptors that are "ready" for I/O. | |
43 | The ready list is a subset of | |
44 | (or, more precisely, a set of references to) | |
0a26e2d3 MK |
45 | the file descriptors in the interest list. |
46 | The ready list is dynamically populated | |
04091160 | 47 | by the kernel as a result of I/O activity on those file descriptors. |
c6d039a3 | 48 | .P |
9d0f3fcb | 49 | The following system calls are provided to |
7547121f | 50 | create and manage an |
fea681da | 51 | .B epoll |
7547121f | 52 | instance: |
cdede5cd | 53 | .IP \[bu] 3 |
2b348e56 | 54 | .BR epoll_create (2) |
302b4b87 | 55 | creates a new |
fea681da | 56 | .B epoll |
2b348e56 | 57 | instance and returns a file descriptor referring to that instance. |
9d0f3fcb MK |
58 | (The more recent |
59 | .BR epoll_create1 (2) | |
60 | extends the functionality of | |
61 | .BR epoll_create (2).) | |
cdede5cd | 62 | .IP \[bu] |
7547121f | 63 | Interest in particular file descriptors is then registered via |
04091160 MK |
64 | .BR epoll_ctl (2), |
65 | which adds items to the interest list of the | |
4524285a | 66 | .B epoll |
04091160 | 67 | instance. |
cdede5cd | 68 | .IP \[bu] |
2b348e56 MK |
69 | .BR epoll_wait (2) |
70 | waits for I/O events, | |
71 | blocking the calling thread if no events are currently available. | |
04091160 MK |
72 | (This system call can be thought of as fetching items from |
73 | the ready list of the | |
74 | .B epoll | |
75 | instance.) | |
76 | .\" | |
c634028a | 77 | .SS Level-triggered and edge-triggered |
fea681da MK |
78 | The |
79 | .B epoll | |
fc15f317 | 80 | event distribution interface is able to behave both as edge-triggered |
7547121f | 81 | (ET) and as level-triggered (LT). |
69eb01fd MK |
82 | The difference between the two mechanisms |
83 | can be described as follows. | |
c13182ef | 84 | Suppose that |
7025a2fe | 85 | this scenario happens: |
22356d97 | 86 | .IP (1) 5 |
fc15f317 MK |
87 | The file descriptor that represents the read side of a pipe |
88 | .RI ( rfd ) | |
7547121f | 89 | is registered on the |
fea681da | 90 | .B epoll |
7547121f | 91 | instance. |
22356d97 | 92 | .IP (2) |
c4b7e5ac | 93 | A pipe writer writes 2\ kB of data on the write side of the pipe. |
22356d97 | 94 | .IP (3) |
fea681da MK |
95 | A call to |
96 | .BR epoll_wait (2) | |
97 | is done that will return | |
fc15f317 MK |
98 | .I rfd |
99 | as a ready file descriptor. | |
22356d97 | 100 | .IP (4) |
c4b7e5ac | 101 | The pipe reader reads 1\ kB of data from |
fc15f317 | 102 | .IR rfd . |
22356d97 | 103 | .IP (5) |
fea681da MK |
104 | A call to |
105 | .BR epoll_wait (2) | |
106 | is done. | |
c6d039a3 | 107 | .P |
fea681da | 108 | If the |
fc15f317 | 109 | .I rfd |
fea681da MK |
110 | file descriptor has been added to the |
111 | .B epoll | |
112 | interface using the | |
113 | .B EPOLLET | |
f2e101d0 | 114 | (edge-triggered) |
fea681da MK |
115 | flag, the call to |
116 | .BR epoll_wait (2) | |
988db661 | 117 | done in step |
fea681da | 118 | .B 5 |
fc15f317 MK |
119 | will probably hang despite the available data still present in the file |
120 | input buffer; | |
121 | meanwhile the remote peer might be expecting a response based on the | |
c13182ef | 122 | data it already sent. |
33a0ccb2 MK |
123 | The reason for this is that edge-triggered mode |
124 | delivers events only when changes occur on the monitored file descriptor. | |
fea681da MK |
125 | So, in step |
126 | .B 5 | |
127 | the caller might end up waiting for some data that is already present inside | |
c13182ef MK |
128 | the input buffer. |
129 | In the above example, an event on | |
fc15f317 | 130 | .I rfd |
fea681da | 131 | will be generated because of the write done in |
0daa9e92 | 132 | .B 2 |
66eca51e | 133 | and the event is consumed in |
fea681da MK |
134 | .BR 3 . |
135 | Since the read operation done in | |
136 | .B 4 | |
137 | does not consume the whole buffer data, the call to | |
138 | .BR epoll_wait (2) | |
139 | done in step | |
140 | .B 5 | |
fc15f317 | 141 | might block indefinitely. |
c6d039a3 | 142 | .P |
fc15f317 | 143 | An application that employs the |
fea681da | 144 | .B EPOLLET |
ff40dbb3 | 145 | flag should use nonblocking file descriptors to avoid having a blocking |
fc15f317 | 146 | read or write starve a task that is handling multiple file descriptors. |
fea681da MK |
147 | The suggested way to use |
148 | .B epoll | |
fc15f317 | 149 | as an edge-triggered |
66eca51e | 150 | .RB ( EPOLLET ) |
fc15f317 | 151 | interface is as follows: |
22356d97 | 152 | .IP (1) 5 |
ff40dbb3 | 153 | with nonblocking file descriptors; and |
22356d97 | 154 | .IP (2) |
69eb01fd | 155 | by waiting for an event only after |
fea681da | 156 | .BR read (2) |
c13182ef | 157 | or |
fea681da | 158 | .BR write (2) |
097585ed MK |
159 | return |
160 | .BR EAGAIN . | |
c6d039a3 | 161 | .P |
f2e101d0 MK |
162 | By contrast, when used as a level-triggered interface |
163 | (the default, when | |
164 | .B EPOLLET | |
165 | is not specified), | |
fea681da | 166 | .B epoll |
512a1783 | 167 | is simply a faster |
fea681da MK |
168 | .BR poll (2), |
169 | and can be used wherever the latter is used since it shares the | |
c13182ef | 170 | same semantics. |
c6d039a3 | 171 | .P |
7547121f MK |
172 | Since even with edge-triggered |
173 | .BR epoll , | |
fc15f317 | 174 | multiple events can be generated upon receipt of multiple chunks of data, |
fea681da MK |
175 | the caller has the option to specify the |
176 | .B EPOLLONESHOT | |
177 | flag, to tell | |
178 | .B epoll | |
3f1c1b0a | 179 | to disable the associated file descriptor after the receipt of an event with |
fea681da MK |
180 | .BR epoll_wait (2). |
181 | When the | |
182 | .B EPOLLONESHOT | |
c13182ef | 183 | flag is specified, |
fc15f317 | 184 | it is the caller's responsibility to rearm the file descriptor using |
fea681da MK |
185 | .BR epoll_ctl (2) |
186 | with | |
187 | .BR EPOLL_CTL_MOD . | |
c6d039a3 | 188 | .P |
a3961b2f MK |
189 | If multiple threads |
190 | (or processes, if child processes have inherited the | |
191 | .B epoll | |
192 | file descriptor across | |
193 | .BR fork (2)) | |
194 | are blocked in | |
195 | .BR epoll_wait (2) | |
9d7fb784 | 196 | waiting on the same epoll file descriptor and a file descriptor |
a3961b2f MK |
197 | in the interest list that is marked for edge-triggered |
198 | .RB ( EPOLLET ) | |
199 | notification becomes ready, | |
200 | just one of the threads (or processes) is awoken from | |
201 | .BR epoll_wait (2). | |
202 | This provides a useful optimization for avoiding "thundering herd" wake-ups | |
203 | in some scenarios. | |
204 | .\" | |
6db5acce N |
205 | .SS Interaction with autosleep |
206 | If the system is in | |
207 | .B autosleep | |
208 | mode via | |
209 | .I /sys/power/autosleep | |
210 | and an event happens which wakes the device from sleep, the device | |
8e798cce | 211 | driver will keep the device awake only until that event is queued. |
d3695ae2 MK |
212 | To keep the device awake until the event has been processed, |
213 | it is necessary to use the | |
bf7bc8b8 | 214 | .BR epoll_ctl (2) |
6db5acce N |
215 | .B EPOLLWAKEUP |
216 | flag. | |
c6d039a3 | 217 | .P |
d3695ae2 MK |
218 | When the |
219 | .B EPOLLWAKEUP | |
220 | flag is set in the | |
6db5acce N |
221 | .B events |
222 | field for a | |
d3695ae2 MK |
223 | .IR "struct epoll_event" , |
224 | the system will be kept awake from the moment the event is queued, | |
6db5acce | 225 | through the |
d3695ae2 | 226 | .BR epoll_wait (2) |
6db5acce | 227 | call which returns the event until the subsequent |
d3695ae2 MK |
228 | .BR epoll_wait (2) |
229 | call. | |
230 | If the event should keep the system awake beyond that time, | |
231 | then a separate | |
6db5acce N |
232 | .I wake_lock |
233 | should be taken before the second | |
d3695ae2 | 234 | .BR epoll_wait (2) |
6db5acce | 235 | call. |
5ee0575d MK |
236 | .SS /proc interfaces |
237 | The following interfaces can be used to limit the amount of | |
238 | kernel memory consumed by epoll: | |
b324e17d | 239 | .\" Following was added in Linux 2.6.28, but them removed in Linux 2.6.29 |
f09cbcf3 | 240 | .\" .TP |
597fa43c MK |
241 | .\" .IR /proc/sys/fs/epoll/max_user_instances " (since Linux 2.6.28)" |
242 | .\" This specifies an upper limit on the number of epoll instances | |
243 | .\" that can be created per real user ID. | |
5ee0575d MK |
244 | .TP |
245 | .IR /proc/sys/fs/epoll/max_user_watches " (since Linux 2.6.28)" | |
246 | This specifies a limit on the total number of | |
247 | file descriptors that a user can register across | |
248 | all epoll instances on the system. | |
249 | The limit is per real user ID. | |
250 | Each registered file descriptor costs roughly 90 bytes on a 32-bit kernel, | |
251 | and roughly 160 bytes on a 64-bit kernel. | |
252 | Currently, | |
b324e17d | 253 | .\" Linux 2.6.29 (in Linux 2.6.28, the default was 1/32 of lowmem) |
5ee0575d MK |
254 | the default value for |
255 | .I max_user_watches | |
597fa43c | 256 | is 1/25 (4%) of the available low memory, |
5ee0575d | 257 | divided by the registration cost in bytes. |
c634028a | 258 | .SS Example for suggested usage |
fea681da MK |
259 | While the usage of |
260 | .B epoll | |
fc15f317 MK |
261 | when employed as a level-triggered interface does have the same |
262 | semantics as | |
fea681da | 263 | .BR poll (2), |
fc15f317 | 264 | the edge-triggered usage requires more clarification to avoid stalls |
c13182ef MK |
265 | in the application event loop. |
266 | In this example, listener is a | |
ff40dbb3 | 267 | nonblocking socket on which |
fea681da | 268 | .BR listen (2) |
c13182ef | 269 | has been called. |
54d02f32 MK |
270 | The function |
271 | .I do_use_fd() | |
272 | uses the new ready file descriptor until | |
097585ed MK |
273 | .B EAGAIN |
274 | is returned by either | |
fea681da MK |
275 | .BR read (2) |
276 | or | |
277 | .BR write (2). | |
fc15f317 | 278 | An event-driven state machine application should, after having received |
097585ed | 279 | .BR EAGAIN , |
54d02f32 MK |
280 | record its current state so that at the next call to |
281 | .I do_use_fd() | |
fea681da MK |
282 | it will continue to |
283 | .BR read (2) | |
284 | or | |
285 | .BR write (2) | |
c13182ef | 286 | from where it stopped before. |
c6d039a3 | 287 | .P |
3bc917f6 | 288 | .in +4n |
bdd915e2 | 289 | .EX |
66132b5e MK |
290 | #define MAX_EVENTS 10 |
291 | struct epoll_event ev, events[MAX_EVENTS]; | |
292 | int listen_sock, conn_sock, nfds, epollfd; | |
fe5dba13 | 293 | \& |
b957f81f | 294 | /* Code to set up listening socket, \[aq]listen_sock\[aq], |
46b20ca1 | 295 | (socket(), bind(), listen()) omitted. */ |
fe5dba13 | 296 | \& |
a3e65c93 | 297 | epollfd = epoll_create1(0); |
66132b5e | 298 | if (epollfd == \-1) { |
a3e65c93 | 299 | perror("epoll_create1"); |
66132b5e MK |
300 | exit(EXIT_FAILURE); |
301 | } | |
fe5dba13 | 302 | \& |
a8d9df27 | 303 | ev.events = EPOLLIN; |
66132b5e MK |
304 | ev.data.fd = listen_sock; |
305 | if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == \-1) { | |
306 | perror("epoll_ctl: listen_sock"); | |
307 | exit(EXIT_FAILURE); | |
308 | } | |
fe5dba13 | 309 | \& |
d4949190 | 310 | for (;;) { |
66132b5e | 311 | nfds = epoll_wait(epollfd, events, MAX_EVENTS, \-1); |
40c75945 | 312 | if (nfds == \-1) { |
be6b243a | 313 | perror("epoll_wait"); |
40c75945 MK |
314 | exit(EXIT_FAILURE); |
315 | } | |
fe5dba13 | 316 | \& |
cf0a9ace | 317 | for (n = 0; n < nfds; ++n) { |
66132b5e MK |
318 | if (events[n].data.fd == listen_sock) { |
319 | conn_sock = accept(listen_sock, | |
24a31d63 | 320 | (struct sockaddr *) &addr, &addrlen); |
66132b5e | 321 | if (conn_sock == \-1) { |
fea681da | 322 | perror("accept"); |
15277745 | 323 | exit(EXIT_FAILURE); |
fea681da | 324 | } |
66132b5e | 325 | setnonblocking(conn_sock); |
fea681da | 326 | ev.events = EPOLLIN | EPOLLET; |
66132b5e MK |
327 | ev.data.fd = conn_sock; |
328 | if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock, | |
329 | &ev) == \-1) { | |
df5c8d49 | 330 | perror("epoll_ctl: conn_sock"); |
66132b5e | 331 | exit(EXIT_FAILURE); |
fea681da | 332 | } |
cf0a9ace | 333 | } else { |
fea681da | 334 | do_use_fd(events[n].data.fd); |
cf0a9ace | 335 | } |
fea681da MK |
336 | } |
337 | } | |
bdd915e2 | 338 | .EE |
3bc917f6 | 339 | .in |
c6d039a3 | 340 | .P |
fc15f317 | 341 | When used as an edge-triggered interface, for performance reasons, it is |
3bc917f6 MK |
342 | possible to add the file descriptor inside the |
343 | .B epoll | |
344 | interface | |
fc15f317 | 345 | .RB ( EPOLL_CTL_ADD ) |
69eb01fd | 346 | once by specifying |
fc15f317 | 347 | .RB ( EPOLLIN | EPOLLOUT ). |
c13182ef | 348 | This allows you to avoid |
fea681da MK |
349 | continuously switching between |
350 | .B EPOLLIN | |
351 | and | |
352 | .B EPOLLOUT | |
353 | calling | |
354 | .BR epoll_ctl (2) | |
355 | with | |
356 | .BR EPOLL_CTL_MOD . | |
c634028a | 357 | .SS Questions and answers |
cdede5cd | 358 | .IP \[bu] 3 |
7547121f | 359 | What is the key used to distinguish the file descriptors registered in an |
a607673b | 360 | interest list? |
6832efaf | 361 | .IP |
7fb5cf0f MK |
362 | The key is the combination of the file descriptor number and |
363 | the open file description | |
d377b54d | 364 | (also known as an "open file handle", |
7fb5cf0f | 365 | the kernel's internal representation of an open file). |
cdede5cd | 366 | .IP \[bu] |
7547121f | 367 | What happens if you register the same file descriptor on an |
3bc917f6 | 368 | .B epoll |
7547121f | 369 | instance twice? |
6832efaf | 370 | .IP |
097585ed MK |
371 | You will probably get |
372 | .BR EEXIST . | |
2b229334 MK |
373 | However, it is possible to add a duplicate |
374 | .RB ( dup (2), | |
375 | .BR dup2 (2), | |
376 | .BR fcntl (2) | |
7fb5cf0f | 377 | .BR F_DUPFD ) |
d9cb0d7d | 378 | file descriptor to the same |
2b229334 | 379 | .B epoll |
7547121f | 380 | instance. |
d9cb0d7d | 381 | .\" But a file descriptor duplicated by fork(2) can't be added to the |
d377b54d MK |
382 | .\" set, because the [file *, fd] pair is already in the epoll set. |
383 | .\" That is a somewhat ugly inconsistency. On the one hand, a child process | |
7fb5cf0f | 384 | .\" cannot add the duplicate file descriptor to the epoll set. (In every |
d9cb0d7d MK |
385 | .\" other case that I can think of, file descriptors duplicated by fork have |
386 | .\" similar semantics to file descriptors duplicated by dup() and friends.) On | |
7fb5cf0f | 387 | .\" the other hand, the very fact that the child has a duplicate of the |
d9cb0d7d MK |
388 | .\" file descriptor means that even if the parent closes its file descriptor, |
389 | .\" then epoll_wait() in the parent will continue to receive notifications for | |
390 | .\" that file descriptor because of the duplicated file descriptor in the child. | |
7fb5cf0f | 391 | .\" |
d377b54d MK |
392 | .\" See http://thread.gmane.org/gmane.linux.kernel/596462/ |
393 | .\" "epoll design problems with common fork/exec patterns" | |
31981fa1 | 394 | .\" |
7fb5cf0f | 395 | .\" mtk, Feb 2008 |
2b229334 MK |
396 | This can be a useful technique for filtering events, |
397 | if the duplicate file descriptors are registered with different | |
398 | .I events | |
399 | masks. | |
cdede5cd | 400 | .IP \[bu] |
fea681da MK |
401 | Can two |
402 | .B epoll | |
7547121f | 403 | instances wait for the same file descriptor? |
1c44bd5b | 404 | If so, are events reported to both |
fea681da | 405 | .B epoll |
fc15f317 | 406 | file descriptors? |
6832efaf | 407 | .IP |
fc15f317 | 408 | Yes, and events would be reported to both. |
882bbb69 | 409 | However, careful programming may be needed to do this correctly. |
cdede5cd | 410 | .IP \[bu] |
fea681da MK |
411 | Is the |
412 | .B epoll | |
fc15f317 | 413 | file descriptor itself poll/epoll/selectable? |
6832efaf | 414 | .IP |
fea681da | 415 | Yes. |
cc65f7d8 MK |
416 | If an |
417 | .B epoll | |
1c4070c7 | 418 | file descriptor has events waiting, then it will |
cc65f7d8 | 419 | indicate as being readable. |
cdede5cd | 420 | .IP \[bu] |
7547121f | 421 | What happens if one attempts to put an |
fea681da | 422 | .B epoll |
7547121f | 423 | file descriptor into its own file descriptor set? |
6832efaf | 424 | .IP |
4fecd703 MK |
425 | The |
426 | .BR epoll_ctl (2) | |
a23d8efa | 427 | call fails |
4fecd703 | 428 | .RB ( EINVAL ). |
c13182ef | 429 | However, you can add an |
fea681da | 430 | .B epoll |
3bc917f6 MK |
431 | file descriptor inside another |
432 | .B epoll | |
433 | file descriptor set. | |
cdede5cd | 434 | .IP \[bu] |
54d02f32 | 435 | Can I send an |
fea681da | 436 | .B epoll |
008f1ecc | 437 | file descriptor over a UNIX domain socket to another process? |
6832efaf | 438 | .IP |
54d02f32 | 439 | Yes, but it does not make sense to do this, since the receiving process |
a607673b | 440 | would not have copies of the file descriptors in the interest list. |
cdede5cd | 441 | .IP \[bu] |
fc15f317 | 442 | Will closing a file descriptor cause it to be removed from all |
fea681da | 443 | .B epoll |
a607673b | 444 | interest lists? |
6832efaf | 445 | .IP |
a4a120c7 MK |
446 | Yes, but be aware of the following point. |
447 | A file descriptor is a reference to an open file description (see | |
448 | .BR open (2)). | |
d9cb0d7d | 449 | Whenever a file descriptor is duplicated via |
a4a120c7 MK |
450 | .BR dup (2), |
451 | .BR dup2 (2), | |
452 | .BR fcntl (2) | |
453 | .BR F_DUPFD , | |
454 | or | |
455 | .BR fork (2), | |
456 | a new file descriptor referring to the same open file description is | |
457 | created. | |
458 | An open file description continues to exist until all | |
459 | file descriptors referring to it have been closed. | |
d1d90ea5 | 460 | .IP |
d377b54d | 461 | A file descriptor is removed from an |
d1d90ea5 MK |
462 | interest list only after all the file descriptors referring to the underlying |
463 | open file description have been closed. | |
a4a120c7 | 464 | This means that even after a file descriptor that is part of an |
d1d90ea5 | 465 | interest list has been closed, |
a4a120c7 MK |
466 | events may be reported for that file descriptor if other file |
467 | descriptors referring to the same underlying file description remain open. | |
d1d90ea5 MK |
468 | To prevent this happening, |
469 | the file descriptor must be explicitly removed from the interest list (using | |
470 | .BR epoll_ctl (2) | |
471 | .BR EPOLL_CTL_DEL ) | |
472 | before it is duplicated. | |
473 | Alternatively, | |
474 | the application must ensure that all file descriptors are closed | |
475 | (which may be difficult if file descriptors were duplicated | |
476 | behind the scenes by library functions that used | |
477 | .BR dup (2) | |
478 | or | |
479 | .BR fork (2)). | |
cdede5cd | 480 | .IP \[bu] |
fc15f317 | 481 | If more than one event occurs between |
fea681da MK |
482 | .BR epoll_wait (2) |
483 | calls, are they combined or reported separately? | |
6832efaf | 484 | .IP |
fea681da | 485 | They will be combined. |
cdede5cd | 486 | .IP \[bu] |
988db661 | 487 | Does an operation on a file descriptor affect the |
fc15f317 | 488 | already collected but not yet reported events? |
6832efaf | 489 | .IP |
fc15f317 | 490 | You can do two operations on an existing file descriptor. |
c13182ef MK |
491 | Remove would be meaningless for |
492 | this case. | |
3b777aff | 493 | Modify will reread available I/O. |
cdede5cd | 494 | .IP \[bu] |
fc15f317 | 495 | Do I need to continuously read/write a file descriptor |
097585ed MK |
496 | until |
497 | .B EAGAIN | |
498 | when using the | |
fea681da | 499 | .B EPOLLET |
b4ebb4ee | 500 | flag (edge-triggered behavior)? |
6832efaf | 501 | .IP |
c13182ef | 502 | Receiving an event from |
fea681da | 503 | .BR epoll_wait (2) |
f11af7da | 504 | should suggest to you that such |
160c5be1 | 505 | file descriptor is ready for the requested I/O operation. |
ff40dbb3 | 506 | You must consider it ready until the next (nonblocking) |
cb1de8d7 | 507 | read/write yields |
097585ed | 508 | .BR EAGAIN . |
f11af7da | 509 | When and how you will use the file descriptor is entirely up to you. |
bdd915e2 | 510 | .IP |
cb1de8d7 MK |
511 | For packet/token-oriented files (e.g., datagram socket, |
512 | terminal in canonical mode), | |
146c1764 | 513 | the only way to detect the end of the read/write I/O space |
cb1de8d7 MK |
514 | is to continue to read/write until |
515 | .BR EAGAIN . | |
bdd915e2 | 516 | .IP |
cb1de8d7 | 517 | For stream-oriented files (e.g., pipe, FIFO, stream socket), the |
f11af7da MK |
518 | condition that the read/write I/O space is exhausted can also be detected by |
519 | checking the amount of data read from / written to the target file | |
520 | descriptor. | |
c13182ef | 521 | For example, if you call |
fea681da | 522 | .BR read (2) |
160c5be1 | 523 | by asking to read a certain amount of data and |
fea681da | 524 | .BR read (2) |
f11af7da MK |
525 | returns a lower number of bytes, you |
526 | can be sure of having exhausted the read I/O space for the file | |
527 | descriptor. | |
160c5be1 | 528 | The same is true when writing using |
fc15f317 | 529 | .BR write (2). |
cb1de8d7 MK |
530 | (Avoid this latter technique if you cannot guarantee that |
531 | the monitored file descriptor always refers to a stream-oriented file.) | |
c634028a | 532 | .SS Possible pitfalls and ways to avoid them |
cdede5cd | 533 | .IP \[bu] 3 |
22356d97 AC |
534 | .B Starvation (edge-triggered) |
535 | .IP | |
c13182ef MK |
536 | If there is a large amount of I/O space, |
537 | it is possible that by trying to drain | |
538 | it the other files will not get processed causing starvation. | |
fc15f317 MK |
539 | (This problem is not specific to |
540 | .BR epoll .) | |
22356d97 | 541 | .IP |
c13182ef MK |
542 | The solution is to maintain a ready list |
543 | and mark the file descriptor as ready | |
fea681da MK |
544 | in its associated data structure, thereby allowing the application to |
545 | remember which files need to be processed but still round robin amongst | |
c13182ef MK |
546 | all the ready files. |
547 | This also supports ignoring subsequent events you | |
fc15f317 | 548 | receive for file descriptors that are already ready. |
cdede5cd | 549 | .IP \[bu] |
22356d97 AC |
550 | .B If using an event cache... |
551 | .IP | |
fc15f317 | 552 | If you use an event cache or store all the file descriptors returned from |
fea681da | 553 | .BR epoll_wait (2), |
c13182ef | 554 | then make sure to provide a way to mark |
fc15f317 | 555 | its closure dynamically (i.e., caused by |
c13182ef MK |
556 | a previous event's processing). |
557 | Suppose you receive 100 events from | |
fea681da | 558 | .BR epoll_wait (2), |
c13182ef MK |
559 | and in event #47 a condition causes event #13 to be closed. |
560 | If you remove the structure and | |
63f6a20a | 561 | .BR close (2) |
fc15f317 MK |
562 | the file descriptor for event #13, then your |
563 | event cache might still say there are events waiting for that | |
564 | file descriptor causing confusion. | |
22356d97 | 565 | .IP |
fea681da MK |
566 | One solution for this is to call, during the processing of event 47, |
567 | .BR epoll_ctl ( EPOLL_CTL_DEL ) | |
fc15f317 | 568 | to delete file descriptor 13 and |
63f6a20a | 569 | .BR close (2), |
f87925c6 | 570 | then mark its associated |
c13182ef MK |
571 | data structure as removed and link it to a cleanup list. |
572 | If you find another | |
fc15f317 MK |
573 | event for file descriptor 13 in your batch processing, |
574 | you will discover the file descriptor had been | |
fea681da | 575 | previously removed and there will be no confusion. |
2b2581ee | 576 | .SH VERSIONS |
4131356c AC |
577 | Some other systems provide similar mechanisms; |
578 | for example, | |
579 | FreeBSD has | |
c13182ef MK |
580 | .IR kqueue , |
581 | and Solaris has | |
c803c3e3 | 582 | .IR /dev/poll . |
4131356c AC |
583 | .SH STANDARDS |
584 | Linux. | |
585 | .SH HISTORY | |
586 | Linux 2.5.44. | |
587 | .\" Its interface should be finalized in Linux 2.5.66. | |
588 | glibc 2.3.2. | |
58a80cd4 MK |
589 | .SH NOTES |
590 | The set of file descriptors that is being monitored via | |
591 | an epoll file descriptor can be viewed via the entry for | |
592 | the epoll file descriptor in the process's | |
1ae6b2c7 | 593 | .IR /proc/ pid /fdinfo |
58a80cd4 MK |
594 | directory. |
595 | See | |
596 | .BR proc (5) | |
597 | for further details. | |
c6d039a3 | 598 | .P |
b8dd62ac MK |
599 | The |
600 | .BR kcmp (2) | |
601 | .B KCMP_EPOLL_TFD | |
602 | operation can be used to test whether a file descriptor | |
603 | is present in an epoll instance. | |
47297adb | 604 | .SH SEE ALSO |
fea681da | 605 | .BR epoll_create (2), |
9d0f3fcb | 606 | .BR epoll_create1 (2), |
fea681da | 607 | .BR epoll_ctl (2), |
634c92fb MK |
608 | .BR epoll_wait (2), |
609 | .BR poll (2), | |
610 | .BR select (2) |