]>
Commit | Line | Data |
---|---|---|
fea681da MK |
1 | .\" Copyright (C) 2003 Davide Libenzi |
2 | .\" | |
f0008367 | 3 | .\" %%%LICENSE_START(GPLv2+_SW_3_PARA) |
fea681da MK |
4 | .\" This program is free software; you can redistribute it and/or modify |
5 | .\" it under the terms of the GNU General Public License as published by | |
6 | .\" the Free Software Foundation; either version 2 of the License, or | |
7 | .\" (at your option) any later version. | |
8 | .\" | |
9 | .\" This program is distributed in the hope that it will be useful, | |
10 | .\" but WITHOUT ANY WARRANTY; without even the implied warranty of | |
11 | .\" MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the | |
12 | .\" GNU General Public License for more details. | |
13 | .\" | |
68fa4398 MK |
14 | .\" You should have received a copy of the GNU General Public |
15 | .\" License along with this manual; if not, see | |
16 | .\" <http://www.gnu.org/licenses/>. | |
8ff7380d | 17 | .\" %%%LICENSE_END |
fea681da MK |
18 | .\" |
19 | .\" Davide Libenzi <davidel@xmailserver.org> | |
20 | .\" | |
b8efb414 | 21 | .TH EPOLL 7 2016-10-08 "Linux" "Linux Programmer's Manual" |
fea681da MK |
22 | .SH NAME |
23 | epoll \- I/O event notification facility | |
24 | .SH SYNOPSIS | |
25 | .B #include <sys/epoll.h> | |
26 | .SH DESCRIPTION | |
2b348e56 MK |
27 | The |
28 | .B epoll | |
29 | API performs a similar task to | |
30 | .BR poll (2): | |
31 | monitoring multiple file descriptors to see if I/O is possible on any of them. | |
32 | The | |
fea681da | 33 | .B epoll |
2b348e56 | 34 | API can be used either as an edge-triggered or a level-triggered |
fc15f317 | 35 | interface and scales well to large numbers of watched file descriptors. |
9d0f3fcb | 36 | The following system calls are provided to |
7547121f | 37 | create and manage an |
fea681da | 38 | .B epoll |
7547121f MK |
39 | instance: |
40 | .IP * 3 | |
2b348e56 MK |
41 | .BR epoll_create (2) |
42 | creates an | |
fea681da | 43 | .B epoll |
2b348e56 | 44 | instance and returns a file descriptor referring to that instance. |
9d0f3fcb MK |
45 | (The more recent |
46 | .BR epoll_create1 (2) | |
47 | extends the functionality of | |
48 | .BR epoll_create (2).) | |
7547121f MK |
49 | .IP * |
50 | Interest in particular file descriptors is then registered via | |
fea681da | 51 | .BR epoll_ctl (2). |
540cc87f | 52 | The set of file descriptors currently registered on an |
7547121f MK |
53 | .B epoll |
54 | instance is sometimes called an | |
55 | .I epoll | |
56 | set. | |
57 | .IP * | |
2b348e56 MK |
58 | .BR epoll_wait (2) |
59 | waits for I/O events, | |
60 | blocking the calling thread if no events are currently available. | |
c634028a | 61 | .SS Level-triggered and edge-triggered |
fea681da MK |
62 | The |
63 | .B epoll | |
fc15f317 | 64 | event distribution interface is able to behave both as edge-triggered |
7547121f | 65 | (ET) and as level-triggered (LT). |
69eb01fd MK |
66 | The difference between the two mechanisms |
67 | can be described as follows. | |
c13182ef | 68 | Suppose that |
7025a2fe | 69 | this scenario happens: |
69eb01fd | 70 | .IP 1. 3 |
fc15f317 MK |
71 | The file descriptor that represents the read side of a pipe |
72 | .RI ( rfd ) | |
7547121f | 73 | is registered on the |
fea681da | 74 | .B epoll |
7547121f | 75 | instance. |
69eb01fd MK |
76 | .IP 2. |
77 | A pipe writer writes 2 kB of data on the write side of the pipe. | |
78 | .IP 3. | |
fea681da MK |
79 | A call to |
80 | .BR epoll_wait (2) | |
81 | is done that will return | |
fc15f317 MK |
82 | .I rfd |
83 | as a ready file descriptor. | |
69eb01fd MK |
84 | .IP 4. |
85 | The pipe reader reads 1 kB of data from | |
fc15f317 | 86 | .IR rfd . |
69eb01fd | 87 | .IP 5. |
fea681da MK |
88 | A call to |
89 | .BR epoll_wait (2) | |
90 | is done. | |
91 | .PP | |
fea681da | 92 | If the |
fc15f317 | 93 | .I rfd |
fea681da MK |
94 | file descriptor has been added to the |
95 | .B epoll | |
96 | interface using the | |
97 | .B EPOLLET | |
f2e101d0 | 98 | (edge-triggered) |
fea681da MK |
99 | flag, the call to |
100 | .BR epoll_wait (2) | |
988db661 | 101 | done in step |
fea681da | 102 | .B 5 |
fc15f317 MK |
103 | will probably hang despite the available data still present in the file |
104 | input buffer; | |
105 | meanwhile the remote peer might be expecting a response based on the | |
c13182ef | 106 | data it already sent. |
33a0ccb2 MK |
107 | The reason for this is that edge-triggered mode |
108 | delivers events only when changes occur on the monitored file descriptor. | |
fea681da MK |
109 | So, in step |
110 | .B 5 | |
111 | the caller might end up waiting for some data that is already present inside | |
c13182ef MK |
112 | the input buffer. |
113 | In the above example, an event on | |
fc15f317 | 114 | .I rfd |
fea681da | 115 | will be generated because of the write done in |
0daa9e92 | 116 | .B 2 |
66eca51e | 117 | and the event is consumed in |
fea681da MK |
118 | .BR 3 . |
119 | Since the read operation done in | |
120 | .B 4 | |
121 | does not consume the whole buffer data, the call to | |
122 | .BR epoll_wait (2) | |
123 | done in step | |
124 | .B 5 | |
fc15f317 MK |
125 | might block indefinitely. |
126 | ||
127 | An application that employs the | |
fea681da | 128 | .B EPOLLET |
ff40dbb3 | 129 | flag should use nonblocking file descriptors to avoid having a blocking |
fc15f317 | 130 | read or write starve a task that is handling multiple file descriptors. |
fea681da MK |
131 | The suggested way to use |
132 | .B epoll | |
fc15f317 | 133 | as an edge-triggered |
66eca51e | 134 | .RB ( EPOLLET ) |
fc15f317 | 135 | interface is as follows: |
fea681da | 136 | .RS |
3bc917f6 | 137 | .TP 4 |
fea681da | 138 | .B i |
ff40dbb3 | 139 | with nonblocking file descriptors; and |
c13182ef | 140 | .TP |
fea681da | 141 | .B ii |
69eb01fd | 142 | by waiting for an event only after |
fea681da | 143 | .BR read (2) |
c13182ef | 144 | or |
fea681da | 145 | .BR write (2) |
097585ed MK |
146 | return |
147 | .BR EAGAIN . | |
fea681da MK |
148 | .RE |
149 | .PP | |
f2e101d0 MK |
150 | By contrast, when used as a level-triggered interface |
151 | (the default, when | |
152 | .B EPOLLET | |
153 | is not specified), | |
fea681da | 154 | .B epoll |
512a1783 | 155 | is simply a faster |
fea681da MK |
156 | .BR poll (2), |
157 | and can be used wherever the latter is used since it shares the | |
c13182ef | 158 | same semantics. |
fc15f317 | 159 | |
7547121f MK |
160 | Since even with edge-triggered |
161 | .BR epoll , | |
fc15f317 | 162 | multiple events can be generated upon receipt of multiple chunks of data, |
fea681da MK |
163 | the caller has the option to specify the |
164 | .B EPOLLONESHOT | |
165 | flag, to tell | |
166 | .B epoll | |
3f1c1b0a | 167 | to disable the associated file descriptor after the receipt of an event with |
fea681da MK |
168 | .BR epoll_wait (2). |
169 | When the | |
170 | .B EPOLLONESHOT | |
c13182ef | 171 | flag is specified, |
fc15f317 | 172 | it is the caller's responsibility to rearm the file descriptor using |
fea681da MK |
173 | .BR epoll_ctl (2) |
174 | with | |
175 | .BR EPOLL_CTL_MOD . | |
6db5acce N |
176 | .SS Interaction with autosleep |
177 | If the system is in | |
178 | .B autosleep | |
179 | mode via | |
180 | .I /sys/power/autosleep | |
181 | and an event happens which wakes the device from sleep, the device | |
8e798cce | 182 | driver will keep the device awake only until that event is queued. |
d3695ae2 MK |
183 | To keep the device awake until the event has been processed, |
184 | it is necessary to use the | |
bf7bc8b8 | 185 | .BR epoll_ctl (2) |
6db5acce N |
186 | .B EPOLLWAKEUP |
187 | flag. | |
188 | ||
d3695ae2 MK |
189 | When the |
190 | .B EPOLLWAKEUP | |
191 | flag is set in the | |
6db5acce N |
192 | .B events |
193 | field for a | |
d3695ae2 MK |
194 | .IR "struct epoll_event" , |
195 | the system will be kept awake from the moment the event is queued, | |
6db5acce | 196 | through the |
d3695ae2 | 197 | .BR epoll_wait (2) |
6db5acce | 198 | call which returns the event until the subsequent |
d3695ae2 MK |
199 | .BR epoll_wait (2) |
200 | call. | |
201 | If the event should keep the system awake beyond that time, | |
202 | then a separate | |
6db5acce N |
203 | .I wake_lock |
204 | should be taken before the second | |
d3695ae2 | 205 | .BR epoll_wait (2) |
6db5acce | 206 | call. |
5ee0575d MK |
207 | .SS /proc interfaces |
208 | The following interfaces can be used to limit the amount of | |
209 | kernel memory consumed by epoll: | |
597fa43c | 210 | .\" Following was added in 2.6.28, but them removed in 2.6.29 |
f09cbcf3 | 211 | .\" .TP |
597fa43c MK |
212 | .\" .IR /proc/sys/fs/epoll/max_user_instances " (since Linux 2.6.28)" |
213 | .\" This specifies an upper limit on the number of epoll instances | |
214 | .\" that can be created per real user ID. | |
5ee0575d MK |
215 | .TP |
216 | .IR /proc/sys/fs/epoll/max_user_watches " (since Linux 2.6.28)" | |
217 | This specifies a limit on the total number of | |
218 | file descriptors that a user can register across | |
219 | all epoll instances on the system. | |
220 | The limit is per real user ID. | |
221 | Each registered file descriptor costs roughly 90 bytes on a 32-bit kernel, | |
222 | and roughly 160 bytes on a 64-bit kernel. | |
223 | Currently, | |
597fa43c | 224 | .\" 2.6.29 (in 2.6.28, the default was 1/32 of lowmem) |
5ee0575d MK |
225 | the default value for |
226 | .I max_user_watches | |
597fa43c | 227 | is 1/25 (4%) of the available low memory, |
5ee0575d | 228 | divided by the registration cost in bytes. |
c634028a | 229 | .SS Example for suggested usage |
fea681da MK |
230 | While the usage of |
231 | .B epoll | |
fc15f317 MK |
232 | when employed as a level-triggered interface does have the same |
233 | semantics as | |
fea681da | 234 | .BR poll (2), |
fc15f317 | 235 | the edge-triggered usage requires more clarification to avoid stalls |
c13182ef MK |
236 | in the application event loop. |
237 | In this example, listener is a | |
ff40dbb3 | 238 | nonblocking socket on which |
fea681da | 239 | .BR listen (2) |
c13182ef | 240 | has been called. |
54d02f32 MK |
241 | The function |
242 | .I do_use_fd() | |
243 | uses the new ready file descriptor until | |
097585ed MK |
244 | .B EAGAIN |
245 | is returned by either | |
fea681da MK |
246 | .BR read (2) |
247 | or | |
248 | .BR write (2). | |
fc15f317 | 249 | An event-driven state machine application should, after having received |
097585ed | 250 | .BR EAGAIN , |
54d02f32 MK |
251 | record its current state so that at the next call to |
252 | .I do_use_fd() | |
fea681da MK |
253 | it will continue to |
254 | .BR read (2) | |
255 | or | |
256 | .BR write (2) | |
c13182ef | 257 | from where it stopped before. |
fea681da | 258 | |
3bc917f6 | 259 | .in +4n |
fea681da | 260 | .nf |
66132b5e MK |
261 | #define MAX_EVENTS 10 |
262 | struct epoll_event ev, events[MAX_EVENTS]; | |
263 | int listen_sock, conn_sock, nfds, epollfd; | |
264 | ||
7d26f7d4 MK |
265 | /* Code to set up listening socket, \(aqlisten_sock\(aq, |
266 | (socket(), bind(), listen()) omitted */ | |
66132b5e | 267 | |
a3e65c93 | 268 | epollfd = epoll_create1(0); |
66132b5e | 269 | if (epollfd == \-1) { |
a3e65c93 | 270 | perror("epoll_create1"); |
66132b5e MK |
271 | exit(EXIT_FAILURE); |
272 | } | |
273 | ||
a8d9df27 | 274 | ev.events = EPOLLIN; |
66132b5e MK |
275 | ev.data.fd = listen_sock; |
276 | if (epoll_ctl(epollfd, EPOLL_CTL_ADD, listen_sock, &ev) == \-1) { | |
277 | perror("epoll_ctl: listen_sock"); | |
278 | exit(EXIT_FAILURE); | |
279 | } | |
fea681da | 280 | |
d4949190 | 281 | for (;;) { |
66132b5e | 282 | nfds = epoll_wait(epollfd, events, MAX_EVENTS, \-1); |
40c75945 | 283 | if (nfds == \-1) { |
be6b243a | 284 | perror("epoll_wait"); |
40c75945 MK |
285 | exit(EXIT_FAILURE); |
286 | } | |
fea681da | 287 | |
cf0a9ace | 288 | for (n = 0; n < nfds; ++n) { |
66132b5e MK |
289 | if (events[n].data.fd == listen_sock) { |
290 | conn_sock = accept(listen_sock, | |
24a31d63 | 291 | (struct sockaddr *) &addr, &addrlen); |
66132b5e | 292 | if (conn_sock == \-1) { |
fea681da | 293 | perror("accept"); |
15277745 | 294 | exit(EXIT_FAILURE); |
fea681da | 295 | } |
66132b5e | 296 | setnonblocking(conn_sock); |
fea681da | 297 | ev.events = EPOLLIN | EPOLLET; |
66132b5e MK |
298 | ev.data.fd = conn_sock; |
299 | if (epoll_ctl(epollfd, EPOLL_CTL_ADD, conn_sock, | |
300 | &ev) == \-1) { | |
df5c8d49 | 301 | perror("epoll_ctl: conn_sock"); |
66132b5e | 302 | exit(EXIT_FAILURE); |
fea681da | 303 | } |
cf0a9ace | 304 | } else { |
fea681da | 305 | do_use_fd(events[n].data.fd); |
cf0a9ace | 306 | } |
fea681da MK |
307 | } |
308 | } | |
309 | .fi | |
3bc917f6 | 310 | .in |
fea681da | 311 | |
fc15f317 | 312 | When used as an edge-triggered interface, for performance reasons, it is |
3bc917f6 MK |
313 | possible to add the file descriptor inside the |
314 | .B epoll | |
315 | interface | |
fc15f317 | 316 | .RB ( EPOLL_CTL_ADD ) |
69eb01fd | 317 | once by specifying |
fc15f317 | 318 | .RB ( EPOLLIN | EPOLLOUT ). |
c13182ef | 319 | This allows you to avoid |
fea681da MK |
320 | continuously switching between |
321 | .B EPOLLIN | |
322 | and | |
323 | .B EPOLLOUT | |
324 | calling | |
325 | .BR epoll_ctl (2) | |
326 | with | |
327 | .BR EPOLL_CTL_MOD . | |
c634028a | 328 | .SS Questions and answers |
28afd4f4 | 329 | .TP 4 |
7fb5cf0f | 330 | .B Q0 |
7547121f | 331 | What is the key used to distinguish the file descriptors registered in an |
3bc917f6 MK |
332 | .B epoll |
333 | set? | |
7fb5cf0f MK |
334 | .TP |
335 | .B A0 | |
336 | The key is the combination of the file descriptor number and | |
337 | the open file description | |
d377b54d | 338 | (also known as an "open file handle", |
7fb5cf0f MK |
339 | the kernel's internal representation of an open file). |
340 | .TP | |
c13182ef | 341 | .B Q1 |
7547121f | 342 | What happens if you register the same file descriptor on an |
3bc917f6 | 343 | .B epoll |
7547121f | 344 | instance twice? |
fea681da | 345 | .TP |
c13182ef | 346 | .B A1 |
097585ed MK |
347 | You will probably get |
348 | .BR EEXIST . | |
2b229334 MK |
349 | However, it is possible to add a duplicate |
350 | .RB ( dup (2), | |
351 | .BR dup2 (2), | |
352 | .BR fcntl (2) | |
7fb5cf0f | 353 | .BR F_DUPFD ) |
d9cb0d7d | 354 | file descriptor to the same |
2b229334 | 355 | .B epoll |
7547121f | 356 | instance. |
d9cb0d7d | 357 | .\" But a file descriptor duplicated by fork(2) can't be added to the |
d377b54d MK |
358 | .\" set, because the [file *, fd] pair is already in the epoll set. |
359 | .\" That is a somewhat ugly inconsistency. On the one hand, a child process | |
7fb5cf0f | 360 | .\" cannot add the duplicate file descriptor to the epoll set. (In every |
d9cb0d7d MK |
361 | .\" other case that I can think of, file descriptors duplicated by fork have |
362 | .\" similar semantics to file descriptors duplicated by dup() and friends.) On | |
7fb5cf0f | 363 | .\" the other hand, the very fact that the child has a duplicate of the |
d9cb0d7d MK |
364 | .\" file descriptor means that even if the parent closes its file descriptor, |
365 | .\" then epoll_wait() in the parent will continue to receive notifications for | |
366 | .\" that file descriptor because of the duplicated file descriptor in the child. | |
7fb5cf0f | 367 | .\" |
d377b54d MK |
368 | .\" See http://thread.gmane.org/gmane.linux.kernel/596462/ |
369 | .\" "epoll design problems with common fork/exec patterns" | |
31981fa1 | 370 | .\" |
7fb5cf0f | 371 | .\" mtk, Feb 2008 |
2b229334 MK |
372 | This can be a useful technique for filtering events, |
373 | if the duplicate file descriptors are registered with different | |
374 | .I events | |
375 | masks. | |
fea681da | 376 | .TP |
c13182ef | 377 | .B Q2 |
fea681da MK |
378 | Can two |
379 | .B epoll | |
7547121f | 380 | instances wait for the same file descriptor? |
1c44bd5b | 381 | If so, are events reported to both |
fea681da | 382 | .B epoll |
fc15f317 | 383 | file descriptors? |
fea681da MK |
384 | .TP |
385 | .B A2 | |
fc15f317 | 386 | Yes, and events would be reported to both. |
882bbb69 | 387 | However, careful programming may be needed to do this correctly. |
fea681da MK |
388 | .TP |
389 | .B Q3 | |
390 | Is the | |
391 | .B epoll | |
fc15f317 | 392 | file descriptor itself poll/epoll/selectable? |
fea681da MK |
393 | .TP |
394 | .B A3 | |
395 | Yes. | |
cc65f7d8 MK |
396 | If an |
397 | .B epoll | |
1c4070c7 | 398 | file descriptor has events waiting, then it will |
cc65f7d8 | 399 | indicate as being readable. |
fea681da | 400 | .TP |
c13182ef | 401 | .B Q4 |
7547121f | 402 | What happens if one attempts to put an |
fea681da | 403 | .B epoll |
7547121f | 404 | file descriptor into its own file descriptor set? |
fea681da MK |
405 | .TP |
406 | .B A4 | |
4fecd703 MK |
407 | The |
408 | .BR epoll_ctl (2) | |
409 | call will fail | |
410 | .RB ( EINVAL ). | |
c13182ef | 411 | However, you can add an |
fea681da | 412 | .B epoll |
3bc917f6 MK |
413 | file descriptor inside another |
414 | .B epoll | |
415 | file descriptor set. | |
fea681da MK |
416 | .TP |
417 | .B Q5 | |
54d02f32 | 418 | Can I send an |
fea681da | 419 | .B epoll |
008f1ecc | 420 | file descriptor over a UNIX domain socket to another process? |
fea681da MK |
421 | .TP |
422 | .B A5 | |
54d02f32 MK |
423 | Yes, but it does not make sense to do this, since the receiving process |
424 | would not have copies of the file descriptors in the | |
425 | .B epoll | |
426 | set. | |
fea681da MK |
427 | .TP |
428 | .B Q6 | |
fc15f317 | 429 | Will closing a file descriptor cause it to be removed from all |
fea681da MK |
430 | .B epoll |
431 | sets automatically? | |
432 | .TP | |
433 | .B A6 | |
a4a120c7 MK |
434 | Yes, but be aware of the following point. |
435 | A file descriptor is a reference to an open file description (see | |
436 | .BR open (2)). | |
d9cb0d7d | 437 | Whenever a file descriptor is duplicated via |
a4a120c7 MK |
438 | .BR dup (2), |
439 | .BR dup2 (2), | |
440 | .BR fcntl (2) | |
441 | .BR F_DUPFD , | |
442 | or | |
443 | .BR fork (2), | |
444 | a new file descriptor referring to the same open file description is | |
445 | created. | |
446 | An open file description continues to exist until all | |
447 | file descriptors referring to it have been closed. | |
d377b54d | 448 | A file descriptor is removed from an |
a4a120c7 MK |
449 | .B epoll |
450 | set only after all the file descriptors referring to the underlying | |
31981fa1 | 451 | open file description have been closed |
d9cb0d7d | 452 | (or before if the file descriptor is explicitly removed using |
0b80cf56 | 453 | .BR epoll_ctl (2) |
d377b54d | 454 | .BR EPOLL_CTL_DEL ). |
a4a120c7 MK |
455 | This means that even after a file descriptor that is part of an |
456 | .B epoll | |
457 | set has been closed, | |
458 | events may be reported for that file descriptor if other file | |
459 | descriptors referring to the same underlying file description remain open. | |
fea681da | 460 | .TP |
c13182ef | 461 | .B Q7 |
fc15f317 | 462 | If more than one event occurs between |
fea681da MK |
463 | .BR epoll_wait (2) |
464 | calls, are they combined or reported separately? | |
465 | .TP | |
466 | .B A7 | |
467 | They will be combined. | |
468 | .TP | |
469 | .B Q8 | |
988db661 | 470 | Does an operation on a file descriptor affect the |
fc15f317 | 471 | already collected but not yet reported events? |
fea681da MK |
472 | .TP |
473 | .B A8 | |
fc15f317 | 474 | You can do two operations on an existing file descriptor. |
c13182ef MK |
475 | Remove would be meaningless for |
476 | this case. | |
3b777aff | 477 | Modify will reread available I/O. |
fea681da MK |
478 | .TP |
479 | .B Q9 | |
fc15f317 | 480 | Do I need to continuously read/write a file descriptor |
097585ed MK |
481 | until |
482 | .B EAGAIN | |
483 | when using the | |
fea681da | 484 | .B EPOLLET |
fc15f317 | 485 | flag (edge-triggered behavior) ? |
fea681da MK |
486 | .TP |
487 | .B A9 | |
c13182ef | 488 | Receiving an event from |
fea681da | 489 | .BR epoll_wait (2) |
f11af7da | 490 | should suggest to you that such |
160c5be1 | 491 | file descriptor is ready for the requested I/O operation. |
ff40dbb3 | 492 | You must consider it ready until the next (nonblocking) |
cb1de8d7 | 493 | read/write yields |
097585ed | 494 | .BR EAGAIN . |
f11af7da MK |
495 | When and how you will use the file descriptor is entirely up to you. |
496 | .sp | |
cb1de8d7 MK |
497 | For packet/token-oriented files (e.g., datagram socket, |
498 | terminal in canonical mode), | |
146c1764 | 499 | the only way to detect the end of the read/write I/O space |
cb1de8d7 MK |
500 | is to continue to read/write until |
501 | .BR EAGAIN . | |
502 | .sp | |
503 | For stream-oriented files (e.g., pipe, FIFO, stream socket), the | |
f11af7da MK |
504 | condition that the read/write I/O space is exhausted can also be detected by |
505 | checking the amount of data read from / written to the target file | |
506 | descriptor. | |
c13182ef | 507 | For example, if you call |
fea681da | 508 | .BR read (2) |
160c5be1 | 509 | by asking to read a certain amount of data and |
fea681da | 510 | .BR read (2) |
f11af7da MK |
511 | returns a lower number of bytes, you |
512 | can be sure of having exhausted the read I/O space for the file | |
513 | descriptor. | |
160c5be1 | 514 | The same is true when writing using |
fc15f317 | 515 | .BR write (2). |
cb1de8d7 MK |
516 | (Avoid this latter technique if you cannot guarantee that |
517 | the monitored file descriptor always refers to a stream-oriented file.) | |
c634028a | 518 | .SS Possible pitfalls and ways to avoid them |
fea681da | 519 | .TP |
fc15f317 | 520 | .B o Starvation (edge-triggered) |
fea681da | 521 | .PP |
c13182ef MK |
522 | If there is a large amount of I/O space, |
523 | it is possible that by trying to drain | |
524 | it the other files will not get processed causing starvation. | |
fc15f317 MK |
525 | (This problem is not specific to |
526 | .BR epoll .) | |
fea681da | 527 | .PP |
c13182ef MK |
528 | The solution is to maintain a ready list |
529 | and mark the file descriptor as ready | |
fea681da MK |
530 | in its associated data structure, thereby allowing the application to |
531 | remember which files need to be processed but still round robin amongst | |
c13182ef MK |
532 | all the ready files. |
533 | This also supports ignoring subsequent events you | |
fc15f317 | 534 | receive for file descriptors that are already ready. |
fea681da | 535 | .TP |
c13182ef | 536 | .B o If using an event cache... |
fea681da | 537 | .PP |
fc15f317 | 538 | If you use an event cache or store all the file descriptors returned from |
fea681da | 539 | .BR epoll_wait (2), |
c13182ef | 540 | then make sure to provide a way to mark |
fc15f317 | 541 | its closure dynamically (i.e., caused by |
c13182ef MK |
542 | a previous event's processing). |
543 | Suppose you receive 100 events from | |
fea681da | 544 | .BR epoll_wait (2), |
c13182ef MK |
545 | and in event #47 a condition causes event #13 to be closed. |
546 | If you remove the structure and | |
63f6a20a | 547 | .BR close (2) |
fc15f317 MK |
548 | the file descriptor for event #13, then your |
549 | event cache might still say there are events waiting for that | |
550 | file descriptor causing confusion. | |
c13182ef | 551 | .PP |
fea681da MK |
552 | One solution for this is to call, during the processing of event 47, |
553 | .BR epoll_ctl ( EPOLL_CTL_DEL ) | |
fc15f317 | 554 | to delete file descriptor 13 and |
63f6a20a | 555 | .BR close (2), |
f87925c6 | 556 | then mark its associated |
c13182ef MK |
557 | data structure as removed and link it to a cleanup list. |
558 | If you find another | |
fc15f317 MK |
559 | event for file descriptor 13 in your batch processing, |
560 | you will discover the file descriptor had been | |
fea681da | 561 | previously removed and there will be no confusion. |
2b2581ee | 562 | .SH VERSIONS |
04be4241 MK |
563 | The |
564 | .B epoll | |
565 | API was introduced in Linux kernel 2.5.44. | |
7547121f | 566 | .\" Its interface should be finalized in Linux kernel 2.5.66. |
d1d87801 | 567 | Support was added to glibc in version 2.3.2. |
fea681da | 568 | .SH CONFORMING TO |
3bc917f6 MK |
569 | The |
570 | .B epoll | |
571 | API is Linux-specific. | |
c803c3e3 | 572 | Some other systems provide similar |
75b94dc3 | 573 | mechanisms, for example, FreeBSD has |
c13182ef MK |
574 | .IR kqueue , |
575 | and Solaris has | |
c803c3e3 | 576 | .IR /dev/poll . |
58a80cd4 MK |
577 | .SH NOTES |
578 | The set of file descriptors that is being monitored via | |
579 | an epoll file descriptor can be viewed via the entry for | |
580 | the epoll file descriptor in the process's | |
581 | .IR /proc/[pid]/fdinfo | |
582 | directory. | |
583 | See | |
584 | .BR proc (5) | |
585 | for further details. | |
47297adb | 586 | .SH SEE ALSO |
fea681da | 587 | .BR epoll_create (2), |
9d0f3fcb | 588 | .BR epoll_create1 (2), |
fea681da | 589 | .BR epoll_ctl (2), |
634c92fb MK |
590 | .BR epoll_wait (2), |
591 | .BR poll (2), | |
592 | .BR select (2) |