]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man2/select_tut.2
mknod.2: tfix
[thirdparty/man-pages.git] / man2 / select_tut.2
1 .\" This manpage is copyright (C) 2001 Paul Sheer.
2 .\"
3 .\" %%%LICENSE_START(VERBATIM)
4 .\" Permission is granted to make and distribute verbatim copies of this
5 .\" manual provided the copyright notice and this permission notice are
6 .\" preserved on all copies.
7 .\"
8 .\" Permission is granted to copy and distribute modified versions of this
9 .\" manual under the conditions for verbatim copying, provided that the
10 .\" entire resulting derived work is distributed under the terms of a
11 .\" permission notice identical to this one.
12 .\"
13 .\" Since the Linux kernel and libraries are constantly changing, this
14 .\" manual page may be incorrect or out-of-date. The author(s) assume no
15 .\" responsibility for errors or omissions, or for damages resulting from
16 .\" the use of the information contained herein. The author(s) may not
17 .\" have taken the same level of care in the production of this manual,
18 .\" which is licensed free of charge, as they might when working
19 .\" professionally.
20 .\"
21 .\" Formatted or processed versions of this manual, if unaccompanied by
22 .\" the source, must acknowledge the copyright and authors of this work.
23 .\" %%%LICENSE_END
24 .\"
25 .\" very minor changes, aeb
26 .\"
27 .\" Modified 5 June 2002, Michael Kerrisk <mtk.manpages@gmail.com>
28 .\" 2006-05-13, mtk, removed much material that is redundant with select.2
29 .\" various other changes
30 .\" 2008-01-26, mtk, substantial changes and rewrites
31 .\"
32 .TH SELECT_TUT 2 2020-04-11 "Linux" "Linux Programmer's Manual"
33 .SH NAME
34 select, pselect \- synchronous I/O multiplexing
35 .SH SYNOPSIS
36 .PP
37 See
38 .BR select (2)
39 .SH DESCRIPTION
40 The
41 .BR select ()
42 and
43 .BR pselect ()
44 system calls are used to efficiently monitor multiple file descriptors,
45 to see if any of them is, or becomes, "ready";
46 that is, to see whether I/O becomes possible,
47 or an "exceptional condition" has occurred on any of the file descriptors.
48 .PP
49 This page provides background and tutorial information
50 on the use of these system calls.
51 For details of the arguments and semantics of
52 .BR select ()
53 and
54 .BR pselect (),
55 see
56 .BR select (2).
57 .PP
58 .\"
59 .SS Combining signal and data events
60 .BR pselect ()
61 is useful if you are waiting for a signal as well as
62 for file descriptor(s) to become ready for I/O.
63 Programs that receive signals
64 normally use the signal handler only to raise a global flag.
65 The global flag will indicate that the event must be processed
66 in the main loop of the program.
67 A signal will cause the
68 .BR select ()
69 (or
70 .BR pselect ())
71 call to return with \fIerrno\fP set to \fBEINTR\fP.
72 This behavior is essential so that signals can be processed
73 in the main loop of the program, otherwise
74 .BR select ()
75 would block indefinitely.
76 .PP
77 Now, somewhere
78 in the main loop will be a conditional to check the global flag.
79 So we must ask:
80 what if a signal arrives after the conditional, but before the
81 .BR select ()
82 call?
83 The answer is that
84 .BR select ()
85 would block indefinitely, even though an event is actually pending.
86 This race condition is solved by the
87 .BR pselect ()
88 call.
89 This call can be used to set the signal mask to a set of signals
90 that are to be received only within the
91 .BR pselect ()
92 call.
93 For instance, let us say that the event in question
94 was the exit of a child process.
95 Before the start of the main loop, we
96 would block \fBSIGCHLD\fP using
97 .BR sigprocmask (2).
98 Our
99 .BR pselect ()
100 call would enable
101 .B SIGCHLD
102 by using an empty signal mask.
103 Our program would look like:
104 .PP
105 .EX
106 static volatile sig_atomic_t got_SIGCHLD = 0;
107
108 static void
109 child_sig_handler(int sig)
110 {
111 got_SIGCHLD = 1;
112 }
113
114 int
115 main(int argc, char *argv[])
116 {
117 sigset_t sigmask, empty_mask;
118 struct sigaction sa;
119 fd_set readfds, writefds, exceptfds;
120 int r;
121
122 sigemptyset(&sigmask);
123 sigaddset(&sigmask, SIGCHLD);
124 if (sigprocmask(SIG_BLOCK, &sigmask, NULL) == \-1) {
125 perror("sigprocmask");
126 exit(EXIT_FAILURE);
127 }
128
129 sa.sa_flags = 0;
130 sa.sa_handler = child_sig_handler;
131 sigemptyset(&sa.sa_mask);
132 if (sigaction(SIGCHLD, &sa, NULL) == \-1) {
133 perror("sigaction");
134 exit(EXIT_FAILURE);
135 }
136
137 sigemptyset(&empty_mask);
138
139 for (;;) { /* main loop */
140 /* Initialize readfds, writefds, and exceptfds
141 before the pselect() call. (Code omitted.) */
142
143 r = pselect(nfds, &readfds, &writefds, &exceptfds,
144 NULL, &empty_mask);
145 if (r == \-1 && errno != EINTR) {
146 /* Handle error */
147 }
148
149 if (got_SIGCHLD) {
150 got_SIGCHLD = 0;
151
152 /* Handle signalled event here; e.g., wait() for all
153 terminated children. (Code omitted.) */
154 }
155
156 /* main body of program */
157 }
158 }
159 .EE
160 .SS Practical
161 So what is the point of
162 .BR select ()?
163 Can't I just read and write to my file descriptors whenever I want?
164 The point of
165 .BR select ()
166 is that it watches
167 multiple descriptors at the same time and properly puts the process to
168 sleep if there is no activity.
169 UNIX programmers often find
170 themselves in a position where they have to handle I/O from more than one
171 file descriptor where the data flow may be intermittent.
172 If you were to merely create a sequence of
173 .BR read (2)
174 and
175 .BR write (2)
176 calls, you would
177 find that one of your calls may block waiting for data from/to a file
178 descriptor, while another file descriptor is unused though ready for I/O.
179 .BR select ()
180 efficiently copes with this situation.
181 .SS Select law
182 Many people who try to use
183 .BR select ()
184 come across behavior that is
185 difficult to understand and produces nonportable or borderline results.
186 For instance, the above program is carefully written not to
187 block at any point, even though it does not set its file descriptors to
188 nonblocking mode.
189 It is easy to introduce
190 subtle errors that will remove the advantage of using
191 .BR select (),
192 so here is a list of essentials to watch for when using
193 .BR select ().
194 .TP 4
195 1.
196 You should always try to use
197 .BR select ()
198 without a timeout.
199 Your program
200 should have nothing to do if there is no data available.
201 Code that
202 depends on timeouts is not usually portable and is difficult to debug.
203 .TP
204 2.
205 The value \fInfds\fP must be properly calculated for efficiency as
206 explained above.
207 .TP
208 3.
209 No file descriptor must be added to any set if you do not intend
210 to check its result after the
211 .BR select ()
212 call, and respond appropriately.
213 See next rule.
214 .TP
215 4.
216 After
217 .BR select ()
218 returns, all file descriptors in all sets
219 should be checked to see if they are ready.
220 .TP
221 5.
222 The functions
223 .BR read (2),
224 .BR recv (2),
225 .BR write (2),
226 and
227 .BR send (2)
228 do \fInot\fP necessarily read/write the full amount of data
229 that you have requested.
230 If they do read/write the full amount, it's
231 because you have a low traffic load and a fast stream.
232 This is not always going to be the case.
233 You should cope with the case of your
234 functions managing to send or receive only a single byte.
235 .TP
236 6.
237 Never read/write only in single bytes at a time unless you are really
238 sure that you have a small amount of data to process.
239 It is extremely
240 inefficient not to read/write as much data as you can buffer each time.
241 The buffers in the example below are 1024 bytes although they could
242 easily be made larger.
243 .TP
244 7.
245 Calls to
246 .BR read (2),
247 .BR recv (2),
248 .BR write (2),
249 .BR send (2),
250 and
251 .BR select ()
252 can fail with the error
253 \fBEINTR\fP,
254 and calls to
255 .BR read (2),
256 .BR recv (2)
257 .BR write (2),
258 and
259 .BR send (2)
260 can fail with
261 .I errno
262 set to \fBEAGAIN\fP (\fBEWOULDBLOCK\fP).
263 These results must be properly managed (not done properly above).
264 If your program is not going to receive any signals, then
265 it is unlikely you will get \fBEINTR\fP.
266 If your program does not set nonblocking I/O,
267 you will not get \fBEAGAIN\fP.
268 .\" Nonetheless, you should still cope with these errors for completeness.
269 .TP
270 8.
271 Never call
272 .BR read (2),
273 .BR recv (2),
274 .BR write (2),
275 or
276 .BR send (2)
277 with a buffer length of zero.
278 .TP
279 9.
280 If the functions
281 .BR read (2),
282 .BR recv (2),
283 .BR write (2),
284 and
285 .BR send (2)
286 fail with errors other than those listed in \fB7.\fP,
287 or one of the input functions returns 0, indicating end of file,
288 then you should \fInot\fP pass that file descriptor to
289 .BR select ()
290 again.
291 In the example below,
292 I close the file descriptor immediately, and then set it to \-1
293 to prevent it being included in a set.
294 .TP
295 10.
296 The timeout value must be initialized with each new call to
297 .BR select (),
298 since some operating systems modify the structure.
299 .BR pselect ()
300 however does not modify its timeout structure.
301 .TP
302 11.
303 Since
304 .BR select ()
305 modifies its file descriptor sets,
306 if the call is being used in a loop,
307 then the sets must be reinitialized before each call.
308 .\" "I have heard" does not fill me with confidence, and doesn't
309 .\" belong in a man page, so I've commented this point out.
310 .\" .TP
311 .\" 11.
312 .\" I have heard that the Windows socket layer does not cope with OOB data
313 .\" properly.
314 .\" It also does not cope with
315 .\" .BR select ()
316 .\" calls when no file descriptors are set at all.
317 .\" Having no file descriptors set is a useful
318 .\" way to sleep the process with subsecond precision by using the timeout.
319 .\" (See further on.)
320 .SH RETURN VALUE
321 See
322 .BR select (2).
323 .SH NOTES
324 Generally speaking,
325 all operating systems that support sockets also support
326 .BR select ().
327 .BR select ()
328 can be used to solve
329 many problems in a portable and efficient way that naive programmers try
330 to solve in a more complicated manner using
331 threads, forking, IPCs, signals, memory sharing, and so on.
332 .PP
333 The
334 .BR poll (2)
335 system call has the same functionality as
336 .BR select (),
337 and is somewhat more efficient when monitoring sparse
338 file descriptor sets.
339 It is nowadays widely available, but historically was less portable than
340 .BR select ().
341 .PP
342 The Linux-specific
343 .BR epoll (7)
344 API provides an interface that is more efficient than
345 .BR select (2)
346 and
347 .BR poll (2)
348 when monitoring large numbers of file descriptors.
349 .SH EXAMPLE
350 Here is an example that better demonstrates the true utility of
351 .BR select ().
352 The listing below is a TCP forwarding program that forwards
353 from one TCP port to another.
354 .PP
355 .EX
356 #include <stdlib.h>
357 #include <stdio.h>
358 #include <unistd.h>
359 #include <sys/select.h>
360 #include <string.h>
361 #include <signal.h>
362 #include <sys/socket.h>
363 #include <netinet/in.h>
364 #include <arpa/inet.h>
365 #include <errno.h>
366
367 static int forward_port;
368
369 #undef max
370 #define max(x,y) ((x) > (y) ? (x) : (y))
371
372 static int
373 listen_socket(int listen_port)
374 {
375 struct sockaddr_in addr;
376 int lfd;
377 int yes;
378
379 lfd = socket(AF_INET, SOCK_STREAM, 0);
380 if (lfd == \-1) {
381 perror("socket");
382 return \-1;
383 }
384
385 yes = 1;
386 if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR,
387 &yes, sizeof(yes)) == \-1) {
388 perror("setsockopt");
389 close(lfd);
390 return \-1;
391 }
392
393 memset(&addr, 0, sizeof(addr));
394 addr.sin_port = htons(listen_port);
395 addr.sin_family = AF_INET;
396 if (bind(lfd, (struct sockaddr *) &addr, sizeof(addr)) == \-1) {
397 perror("bind");
398 close(lfd);
399 return \-1;
400 }
401
402 printf("accepting connections on port %d\en", listen_port);
403 listen(lfd, 10);
404 return lfd;
405 }
406
407 static int
408 connect_socket(int connect_port, char *address)
409 {
410 struct sockaddr_in addr;
411 int cfd;
412
413 cfd = socket(AF_INET, SOCK_STREAM, 0);
414 if (cfd == \-1) {
415 perror("socket");
416 return \-1;
417 }
418
419 memset(&addr, 0, sizeof(addr));
420 addr.sin_port = htons(connect_port);
421 addr.sin_family = AF_INET;
422
423 if (!inet_aton(address, (struct in_addr *) &addr.sin_addr.s_addr)) {
424 fprintf(stderr, "inet_aton(): bad IP address format\en");
425 close(cfd);
426 return \-1;
427 }
428
429 if (connect(cfd, (struct sockaddr *) &addr, sizeof(addr)) == \-1) {
430 perror("connect()");
431 shutdown(cfd, SHUT_RDWR);
432 close(cfd);
433 return \-1;
434 }
435 return cfd;
436 }
437
438 #define SHUT_FD1 do { \e
439 if (fd1 >= 0) { \e
440 shutdown(fd1, SHUT_RDWR); \e
441 close(fd1); \e
442 fd1 = \-1; \e
443 } \e
444 } while (0)
445
446 #define SHUT_FD2 do { \e
447 if (fd2 >= 0) { \e
448 shutdown(fd2, SHUT_RDWR); \e
449 close(fd2); \e
450 fd2 = \-1; \e
451 } \e
452 } while (0)
453
454 #define BUF_SIZE 1024
455
456 int
457 main(int argc, char *argv[])
458 {
459 int h;
460 int fd1 = \-1, fd2 = \-1;
461 char buf1[BUF_SIZE], buf2[BUF_SIZE];
462 int buf1_avail = 0, buf1_written = 0;
463 int buf2_avail = 0, buf2_written = 0;
464
465 if (argc != 4) {
466 fprintf(stderr, "Usage\en\etfwd <listen\-port> "
467 "<forward\-to\-port> <forward\-to\-ip\-address>\en");
468 exit(EXIT_FAILURE);
469 }
470
471 signal(SIGPIPE, SIG_IGN);
472
473 forward_port = atoi(argv[2]);
474
475 h = listen_socket(atoi(argv[1]));
476 if (h == \-1)
477 exit(EXIT_FAILURE);
478
479 for (;;) {
480 int ready, nfds = 0;
481 ssize_t nbytes;
482 fd_set readfds, writefds, exceptfds;
483
484 FD_ZERO(&readfds);
485 FD_ZERO(&writefds);
486 FD_ZERO(&exceptfds);
487 FD_SET(h, &readfds);
488 nfds = max(nfds, h);
489
490 if (fd1 > 0 && buf1_avail < BUF_SIZE)
491 FD_SET(fd1, &readfds);
492 /* Note: nfds is updated below, when fd1 is added to
493 exceptfds. */
494 if (fd2 > 0 && buf2_avail < BUF_SIZE)
495 FD_SET(fd2, &readfds);
496
497 if (fd1 > 0 && buf2_avail \- buf2_written > 0)
498 FD_SET(fd1, &writefds);
499 if (fd2 > 0 && buf1_avail \- buf1_written > 0)
500 FD_SET(fd2, &writefds);
501
502 if (fd1 > 0) {
503 FD_SET(fd1, &exceptfds);
504 nfds = max(nfds, fd1);
505 }
506 if (fd2 > 0) {
507 FD_SET(fd2, &exceptfds);
508 nfds = max(nfds, fd2);
509 }
510
511 ready = select(nfds + 1, &readfds, &writefds, &exceptfds, NULL);
512
513 if (ready == \-1 && errno == EINTR)
514 continue;
515
516 if (ready == \-1) {
517 perror("select()");
518 exit(EXIT_FAILURE);
519 }
520
521 if (FD_ISSET(h, &readfds)) {
522 socklen_t addrlen;
523 struct sockaddr_in client_addr;
524 int fd;
525
526 addrlen = sizeof(client_addr);
527 memset(&client_addr, 0, addrlen);
528 fd = accept(h, (struct sockaddr *) &client_addr, &addrlen);
529 if (fd == \-1) {
530 perror("accept()");
531 } else {
532 SHUT_FD1;
533 SHUT_FD2;
534 buf1_avail = buf1_written = 0;
535 buf2_avail = buf2_written = 0;
536 fd1 = fd;
537 fd2 = connect_socket(forward_port, argv[3]);
538 if (fd2 == \-1)
539 SHUT_FD1;
540 else
541 printf("connect from %s\en",
542 inet_ntoa(client_addr.sin_addr));
543
544 /* Skip any events on the old, closed file
545 descriptors. */
546
547 continue;
548 }
549 }
550
551 /* NB: read OOB data before normal reads */
552
553 if (fd1 > 0 && FD_ISSET(fd1, &exceptfds)) {
554 char c;
555
556 nbytes = recv(fd1, &c, 1, MSG_OOB);
557 if (nbytes < 1)
558 SHUT_FD1;
559 else
560 send(fd2, &c, 1, MSG_OOB);
561 }
562 if (fd2 > 0 && FD_ISSET(fd2, &exceptfds)) {
563 char c;
564
565 nbytes = recv(fd2, &c, 1, MSG_OOB);
566 if (nbytes < 1)
567 SHUT_FD2;
568 else
569 send(fd1, &c, 1, MSG_OOB);
570 }
571 if (fd1 > 0 && FD_ISSET(fd1, &readfds)) {
572 nbytes = read(fd1, buf1 + buf1_avail,
573 BUF_SIZE \- buf1_avail);
574 if (nbytes < 1)
575 SHUT_FD1;
576 else
577 buf1_avail += nbytes;
578 }
579 if (fd2 > 0 && FD_ISSET(fd2, &readfds)) {
580 nbytes = read(fd2, buf2 + buf2_avail,
581 BUF_SIZE \- buf2_avail);
582 if (nbytes < 1)
583 SHUT_FD2;
584 else
585 buf2_avail += nbytes;
586 }
587 if (fd1 > 0 && FD_ISSET(fd1, &writefds) && buf2_avail > 0) {
588 nbytes = write(fd1, buf2 + buf2_written,
589 buf2_avail \- buf2_written);
590 if (nbytes < 1)
591 SHUT_FD1;
592 else
593 buf2_written += nbytes;
594 }
595 if (fd2 > 0 && FD_ISSET(fd2, &writefds) && buf1_avail > 0) {
596 nbytes = write(fd2, buf1 + buf1_written,
597 buf1_avail \- buf1_written);
598 if (nbytes < 1)
599 SHUT_FD2;
600 else
601 buf1_written += nbytes;
602 }
603
604 /* Check if write data has caught read data */
605
606 if (buf1_written == buf1_avail)
607 buf1_written = buf1_avail = 0;
608 if (buf2_written == buf2_avail)
609 buf2_written = buf2_avail = 0;
610
611 /* One side has closed the connection, keep
612 writing to the other side until empty */
613
614 if (fd1 < 0 && buf1_avail \- buf1_written == 0)
615 SHUT_FD2;
616 if (fd2 < 0 && buf2_avail \- buf2_written == 0)
617 SHUT_FD1;
618 }
619 exit(EXIT_SUCCESS);
620 }
621 .EE
622 .PP
623 The above program properly forwards most kinds of TCP connections
624 including OOB signal data transmitted by \fBtelnet\fP servers.
625 It handles the tricky problem of having data flow in both directions
626 simultaneously.
627 You might think it more efficient to use a
628 .BR fork (2)
629 call and devote a thread to each stream.
630 This becomes more tricky than you might suspect.
631 Another idea is to set nonblocking I/O using
632 .BR fcntl (2).
633 This also has its problems because you end up using
634 inefficient timeouts.
635 .PP
636 The program does not handle more than one simultaneous connection at a
637 time, although it could easily be extended to do this with a linked list
638 of buffers\(emone for each connection.
639 At the moment, new
640 connections cause the current connection to be dropped.
641 .SH SEE ALSO
642 .BR accept (2),
643 .BR connect (2),
644 .BR poll (2),
645 .BR read (2),
646 .BR recv (2),
647 .BR select (2),
648 .BR send (2),
649 .BR sigprocmask (2),
650 .BR write (2),
651 .BR epoll (7)
652 .\" .SH AUTHORS
653 .\" This man page was written by Paul Sheer.