man2/select_tut.2

   1 .\" This manpage is copyright (C) 2001 Paul Sheer.
   2 .\"
   3 .\" %%%LICENSE_START(VERBATIM)
   4 .\" Permission is granted to make and distribute verbatim copies of this
   5 .\" manual provided the copyright notice and this permission notice are
   6 .\" preserved on all copies.
   7 .\"
   8 .\" Permission is granted to copy and distribute modified versions of this
   9 .\" manual under the conditions for verbatim copying, provided that the
  10 .\" entire resulting derived work is distributed under the terms of a
  11 .\" permission notice identical to this one.
  12 .\"
  13 .\" Since the Linux kernel and libraries are constantly changing, this
  14 .\" manual page may be incorrect or out-of-date.  The author(s) assume no
  15 .\" responsibility for errors or omissions, or for damages resulting from
  16 .\" the use of the information contained herein.  The author(s) may not
  17 .\" have taken the same level of care in the production of this manual,
  18 .\" which is licensed free of charge, as they might when working
  19 .\" professionally.
  20 .\"
  21 .\" Formatted or processed versions of this manual, if unaccompanied by
  22 .\" the source, must acknowledge the copyright and authors of this work.
  23 .\" %%%LICENSE_END
  24 .\"
  25 .\" very minor changes, aeb
  26 .\"
  27 .\" Modified 5 June 2002, Michael Kerrisk <mtk.manpages@gmail.com>
  28 .\" 2006-05-13, mtk, removed much material that is redundant with select.2
  29 .\"             various other changes
  30 .\" 2008-01-26, mtk, substantial changes and rewrites
  31 .\"
  32 .TH SELECT_TUT 2 2020-04-11 "Linux" "Linux Programmer's Manual"
  33 .SH NAME
  34 select, pselect \- synchronous I/O multiplexing
  35 .SH SYNOPSIS
  36 .PP
  37 See
  38 .BR select (2)
  39 .SH DESCRIPTION
  40 The
  41 .BR select ()
  42 and
  43 .BR pselect ()
  44 system calls are used to efficiently monitor multiple file descriptors,
  45 to see if any of them is, or becomes, "ready";
  46 that is, to see whether I/O becomes possible,
  47 or an "exceptional condition" has occurred on any of the file descriptors.
  48 .PP
  49 This page provides background and tutorial information
  50 on the use of these system calls.
  51 For details of the arguments and semantics of
  52 .BR select ()
  53 and
  54 .BR pselect (),
  55 see
  56 .BR select (2).
  57 .PP
  58 .\"
  59 .SS Combining signal and data events
  60 .BR pselect ()
  61 is useful if you are waiting for a signal as well as
  62 for file descriptor(s) to become ready for I/O.
  63 Programs that receive signals
  64 normally use the signal handler only to raise a global flag.
  65 The global flag will indicate that the event must be processed
  66 in the main loop of the program.
  67 A signal will cause the
  68 .BR select ()
  69 (or
  70 .BR pselect ())
  71 call to return with \fIerrno\fP set to \fBEINTR\fP.
  72 This behavior is essential so that signals can be processed
  73 in the main loop of the program, otherwise
  74 .BR select ()
  75 would block indefinitely.
  76 .PP
  77 Now, somewhere
  78 in the main loop will be a conditional to check the global flag.
  79 So we must ask:
  80 what if a signal arrives after the conditional, but before the
  81 .BR select ()
  82 call?
  83 The answer is that
  84 .BR select ()
  85 would block indefinitely, even though an event is actually pending.
  86 This race condition is solved by the
  87 .BR pselect ()
  88 call.
  89 This call can be used to set the signal mask to a set of signals
  90 that are to be received only within the
  91 .BR pselect ()
  92 call.
  93 For instance, let us say that the event in question
  94 was the exit of a child process.
  95 Before the start of the main loop, we
  96 would block \fBSIGCHLD\fP using
  97 .BR sigprocmask (2).
  98 Our
  99 .BR pselect ()
 100 call would enable
 101 .B SIGCHLD
 102 by using an empty signal mask.
 103 Our program would look like:
 104 .PP
 105 .EX
 106 static volatile sig_atomic_t got_SIGCHLD = 0;
 107
 108 static void
 109 child_sig_handler(int sig)
 110 {
 111     got_SIGCHLD = 1;
 112 }
 113
 114 int
 115 main(int argc, char *argv[])
 116 {
 117     sigset_t sigmask, empty_mask;
 118     struct sigaction sa;
 119     fd_set readfds, writefds, exceptfds;
 120     int r;
 121
 122     sigemptyset(&sigmask);
 123     sigaddset(&sigmask, SIGCHLD);
 124     if (sigprocmask(SIG_BLOCK, &sigmask, NULL) == \-1) {
 125         perror("sigprocmask");
 126         exit(EXIT_FAILURE);
 127     }
 128
 129     sa.sa_flags = 0;
 130     sa.sa_handler = child_sig_handler;
 131     sigemptyset(&sa.sa_mask);
 132     if (sigaction(SIGCHLD, &sa, NULL) == \-1) {
 133         perror("sigaction");
 134         exit(EXIT_FAILURE);
 135     }
 136
 137     sigemptyset(&empty_mask);
 138
 139     for (;;) {          /* main loop */
 140         /* Initialize readfds, writefds, and exceptfds
 141            before the pselect() call. (Code omitted.) */
 142
 143         r = pselect(nfds, &readfds, &writefds, &exceptfds,
 144                     NULL, &empty_mask);
 145         if (r == \-1 && errno != EINTR) {
 146             /* Handle error */
 147         }
 148
 149         if (got_SIGCHLD) {
 150             got_SIGCHLD = 0;
 151
 152             /* Handle signalled event here; e.g., wait() for all
 153                terminated children. (Code omitted.) */
 154         }
 155
 156         /* main body of program */
 157     }
 158 }
 159 .EE
 160 .SS Practical
 161 So what is the point of
 162 .BR select ()?
 163 Can't I just read and write to my file descriptors whenever I want?
 164 The point of
 165 .BR select ()
 166 is that it watches
 167 multiple descriptors at the same time and properly puts the process to
 168 sleep if there is no activity.
 169 UNIX programmers often find
 170 themselves in a position where they have to handle I/O from more than one
 171 file descriptor where the data flow may be intermittent.
 172 If you were to merely create a sequence of
 173 .BR read (2)
 174 and
 175 .BR write (2)
 176 calls, you would
 177 find that one of your calls may block waiting for data from/to a file
 178 descriptor, while another file descriptor is unused though ready for I/O.
 179 .BR select ()
 180 efficiently copes with this situation.
 181 .SS Select law
 182 Many people who try to use
 183 .BR select ()
 184 come across behavior that is
 185 difficult to understand and produces nonportable or borderline results.
 186 For instance, the above program is carefully written not to
 187 block at any point, even though it does not set its file descriptors to
 188 nonblocking mode.
 189 It is easy to introduce
 190 subtle errors that will remove the advantage of using
 191 .BR select (),
 192 so here is a list of essentials to watch for when using
 193 .BR select ().
 194 .TP 4
 195 1.
 196 You should always try to use
 197 .BR select ()
 198 without a timeout.
 199 Your program
 200 should have nothing to do if there is no data available.
 201 Code that
 202 depends on timeouts is not usually portable and is difficult to debug.
 203 .TP
 204 2.
 205 The value \fInfds\fP must be properly calculated for efficiency as
 206 explained above.
 207 .TP
 208 3.
 209 No file descriptor must be added to any set if you do not intend
 210 to check its result after the
 211 .BR select ()
 212 call, and respond appropriately.
 213 See next rule.
 214 .TP
 215 4.
 216 After
 217 .BR select ()
 218 returns, all file descriptors in all sets
 219 should be checked to see if they are ready.
 220 .TP
 221 5.
 222 The functions
 223 .BR read (2),
 224 .BR recv (2),
 225 .BR write (2),
 226 and
 227 .BR send (2)
 228 do \fInot\fP necessarily read/write the full amount of data
 229 that you have requested.
 230 If they do read/write the full amount, it's
 231 because you have a low traffic load and a fast stream.
 232 This is not always going to be the case.
 233 You should cope with the case of your
 234 functions managing to send or receive only a single byte.
 235 .TP
 236 6.
 237 Never read/write only in single bytes at a time unless you are really
 238 sure that you have a small amount of data to process.
 239 It is extremely
 240 inefficient not to read/write as much data as you can buffer each time.
 241 The buffers in the example below are 1024 bytes although they could
 242 easily be made larger.
 243 .TP
 244 7.
 245 Calls to
 246 .BR read (2),
 247 .BR recv (2),
 248 .BR write (2),
 249 .BR send (2),
 250 and
 251 .BR select ()
 252 can fail with the error
 253 \fBEINTR\fP,
 254 and calls to
 255 .BR read (2),
 256 .BR recv (2)
 257 .BR write (2),
 258 and
 259 .BR send (2)
 260 can fail with
 261 .I errno
 262 set to \fBEAGAIN\fP (\fBEWOULDBLOCK\fP).
 263 These results must be properly managed (not done properly above).
 264 If your program is not going to receive any signals, then
 265 it is unlikely you will get \fBEINTR\fP.
 266 If your program does not set nonblocking I/O,
 267 you will not get \fBEAGAIN\fP.
 268 .\" Nonetheless, you should still cope with these errors for completeness.
 269 .TP
 270 8.
 271 Never call
 272 .BR read (2),
 273 .BR recv (2),
 274 .BR write (2),
 275 or
 276 .BR send (2)
 277 with a buffer length of zero.
 278 .TP
 279 9.
 280 If the functions
 281 .BR read (2),
 282 .BR recv (2),
 283 .BR write (2),
 284 and
 285 .BR send (2)
 286 fail with errors other than those listed in \fB7.\fP,
 287 or one of the input functions returns 0, indicating end of file,
 288 then you should \fInot\fP pass that file descriptor to
 289 .BR select ()
 290 again.
 291 In the example below,
 292 I close the file descriptor immediately, and then set it to \-1
 293 to prevent it being included in a set.
 294 .TP
 295 10.
 296 The timeout value must be initialized with each new call to
 297 .BR select (),
 298 since some operating systems modify the structure.
 299 .BR pselect ()
 300 however does not modify its timeout structure.
 301 .TP
 302 11.
 303 Since
 304 .BR select ()
 305 modifies its file descriptor sets,
 306 if the call is being used in a loop,
 307 then the sets must be reinitialized before each call.
 308 .\" "I have heard" does not fill me with confidence, and doesn't
 309 .\" belong in a man page, so I've commented this point out.
 310 .\" .TP
 311 .\" 11.
 312 .\" I have heard that the Windows socket layer does not cope with OOB data
 313 .\" properly.
 314 .\" It also does not cope with
 315 .\" .BR select ()
 316 .\" calls when no file descriptors are set at all.
 317 .\" Having no file descriptors set is a useful
 318 .\" way to sleep the process with subsecond precision by using the timeout.
 319 .\" (See further on.)
 320 .SH RETURN VALUE
 321 See
 322 .BR select (2).
 323 .SH NOTES
 324 Generally speaking,
 325 all operating systems that support sockets also support
 326 .BR select ().
 327 .BR select ()
 328 can be used to solve
 329 many problems in a portable and efficient way that naive programmers try
 330 to solve in a more complicated manner using
 331 threads, forking, IPCs, signals, memory sharing, and so on.
 332 .PP
 333 The
 334 .BR poll (2)
 335 system call has the same functionality as
 336 .BR select (),
 337 and is somewhat more efficient when monitoring sparse
 338 file descriptor sets.
 339 It is nowadays widely available, but historically was less portable than
 340 .BR select ().
 341 .PP
 342 The Linux-specific
 343 .BR epoll (7)
 344 API provides an interface that is more efficient than
 345 .BR select (2)
 346 and
 347 .BR poll (2)
 348 when monitoring large numbers of file descriptors.
 349 .SH EXAMPLE
 350 Here is an example that better demonstrates the true utility of
 351 .BR select ().
 352 The listing below is a TCP forwarding program that forwards
 353 from one TCP port to another.
 354 .PP
 355 .EX
 356 #include <stdlib.h>
 357 #include <stdio.h>
 358 #include <unistd.h>
 359 #include <sys/select.h>
 360 #include <string.h>
 361 #include <signal.h>
 362 #include <sys/socket.h>
 363 #include <netinet/in.h>
 364 #include <arpa/inet.h>
 365 #include <errno.h>
 366
 367 static int forward_port;
 368
 369 #undef max
 370 #define max(x,y) ((x) > (y) ? (x) : (y))
 371
 372 static int
 373 listen_socket(int listen_port)
 374 {
 375     struct sockaddr_in addr;
 376     int lfd;
 377     int yes;
 378
 379     lfd = socket(AF_INET, SOCK_STREAM, 0);
 380     if (lfd == \-1) {
 381         perror("socket");
 382         return \-1;
 383     }
 384
 385     yes = 1;
 386     if (setsockopt(lfd, SOL_SOCKET, SO_REUSEADDR,
 387             &yes, sizeof(yes)) == \-1) {
 388         perror("setsockopt");
 389         close(lfd);
 390         return \-1;
 391     }
 392
 393     memset(&addr, 0, sizeof(addr));
 394     addr.sin_port = htons(listen_port);
 395     addr.sin_family = AF_INET;
 396     if (bind(lfd, (struct sockaddr *) &addr, sizeof(addr)) == \-1) {
 397         perror("bind");
 398         close(lfd);
 399         return \-1;
 400     }
 401
 402     printf("accepting connections on port %d\en", listen_port);
 403     listen(lfd, 10);
 404     return lfd;
 405 }
 406
 407 static int
 408 connect_socket(int connect_port, char *address)
 409 {
 410     struct sockaddr_in addr;
 411     int cfd;
 412
 413     cfd = socket(AF_INET, SOCK_STREAM, 0);
 414     if (cfd == \-1) {
 415         perror("socket");
 416         return \-1;
 417     }
 418
 419     memset(&addr, 0, sizeof(addr));
 420     addr.sin_port = htons(connect_port);
 421     addr.sin_family = AF_INET;
 422
 423     if (!inet_aton(address, (struct in_addr *) &addr.sin_addr.s_addr)) {
 424         fprintf(stderr, "inet_aton(): bad IP address format\en");
 425         close(cfd);
 426         return \-1;
 427     }
 428
 429     if (connect(cfd, (struct sockaddr *) &addr, sizeof(addr)) == \-1) {
 430         perror("connect()");
 431         shutdown(cfd, SHUT_RDWR);
 432         close(cfd);
 433         return \-1;
 434     }
 435     return cfd;
 436 }
 437
 438 #define SHUT_FD1 do {                                \e
 439                      if (fd1 >= 0) {                 \e
 440                          shutdown(fd1, SHUT_RDWR);   \e
 441                          close(fd1);                 \e
 442                          fd1 = \-1;                   \e
 443                      }                               \e
 444                  } while (0)
 445
 446 #define SHUT_FD2 do {                                \e
 447                      if (fd2 >= 0) {                 \e
 448                          shutdown(fd2, SHUT_RDWR);   \e
 449                          close(fd2);                 \e
 450                          fd2 = \-1;                   \e
 451                      }                               \e
 452                  } while (0)
 453
 454 #define BUF_SIZE 1024
 455
 456 int
 457 main(int argc, char *argv[])
 458 {
 459     int h;
 460     int fd1 = \-1, fd2 = \-1;
 461     char buf1[BUF_SIZE], buf2[BUF_SIZE];
 462     int buf1_avail = 0, buf1_written = 0;
 463     int buf2_avail = 0, buf2_written = 0;
 464
 465     if (argc != 4) {
 466         fprintf(stderr, "Usage\en\etfwd <listen\-port> "
 467                  "<forward\-to\-port> <forward\-to\-ip\-address>\en");
 468         exit(EXIT_FAILURE);
 469     }
 470
 471     signal(SIGPIPE, SIG_IGN);
 472
 473     forward_port = atoi(argv[2]);
 474
 475     h = listen_socket(atoi(argv[1]));
 476     if (h == \-1)
 477         exit(EXIT_FAILURE);
 478
 479     for (;;) {
 480         int ready, nfds = 0;
 481         ssize_t nbytes;
 482         fd_set readfds, writefds, exceptfds;
 483
 484         FD_ZERO(&readfds);
 485         FD_ZERO(&writefds);
 486         FD_ZERO(&exceptfds);
 487         FD_SET(h, &readfds);
 488         nfds = max(nfds, h);
 489
 490         if (fd1 > 0 && buf1_avail < BUF_SIZE)
 491             FD_SET(fd1, &readfds);
 492             /* Note: nfds is updated below, when fd1 is added to
 493                exceptfds. */
 494         if (fd2 > 0 && buf2_avail < BUF_SIZE)
 495             FD_SET(fd2, &readfds);
 496
 497         if (fd1 > 0 && buf2_avail \- buf2_written > 0)
 498             FD_SET(fd1, &writefds);
 499         if (fd2 > 0 && buf1_avail \- buf1_written > 0)
 500             FD_SET(fd2, &writefds);
 501
 502         if (fd1 > 0) {
 503             FD_SET(fd1, &exceptfds);
 504             nfds = max(nfds, fd1);
 505         }
 506         if (fd2 > 0) {
 507             FD_SET(fd2, &exceptfds);
 508             nfds = max(nfds, fd2);
 509         }
 510
 511         ready = select(nfds + 1, &readfds, &writefds, &exceptfds, NULL);
 512
 513         if (ready == \-1 && errno == EINTR)
 514             continue;
 515
 516         if (ready == \-1) {
 517             perror("select()");
 518             exit(EXIT_FAILURE);
 519         }
 520
 521         if (FD_ISSET(h, &readfds)) {
 522             socklen_t addrlen;
 523             struct sockaddr_in client_addr;
 524             int fd;
 525
 526             addrlen = sizeof(client_addr);
 527             memset(&client_addr, 0, addrlen);
 528             fd = accept(h, (struct sockaddr *) &client_addr, &addrlen);
 529             if (fd == \-1) {
 530                 perror("accept()");
 531             } else {
 532                 SHUT_FD1;
 533                 SHUT_FD2;
 534                 buf1_avail = buf1_written = 0;
 535                 buf2_avail = buf2_written = 0;
 536                 fd1 = fd;
 537                 fd2 = connect_socket(forward_port, argv[3]);
 538                 if (fd2 == \-1)
 539                     SHUT_FD1;
 540                 else
 541                     printf("connect from %s\en",
 542                             inet_ntoa(client_addr.sin_addr));
 543
 544                 /* Skip any events on the old, closed file
 545                    descriptors. */
 546
 547                 continue;
 548             }
 549         }
 550
 551         /* NB: read OOB data before normal reads */
 552
 553         if (fd1 > 0 && FD_ISSET(fd1, &exceptfds)) {
 554             char c;
 555
 556             nbytes = recv(fd1, &c, 1, MSG_OOB);
 557             if (nbytes < 1)
 558                 SHUT_FD1;
 559             else
 560                 send(fd2, &c, 1, MSG_OOB);
 561         }
 562         if (fd2 > 0 && FD_ISSET(fd2, &exceptfds)) {
 563             char c;
 564
 565             nbytes = recv(fd2, &c, 1, MSG_OOB);
 566             if (nbytes < 1)
 567                 SHUT_FD2;
 568             else
 569                 send(fd1, &c, 1, MSG_OOB);
 570         }
 571         if (fd1 > 0 && FD_ISSET(fd1, &readfds)) {
 572             nbytes = read(fd1, buf1 + buf1_avail,
 573                       BUF_SIZE \- buf1_avail);
 574             if (nbytes < 1)
 575                 SHUT_FD1;
 576             else
 577                 buf1_avail += nbytes;
 578         }
 579         if (fd2 > 0 && FD_ISSET(fd2, &readfds)) {
 580             nbytes = read(fd2, buf2 + buf2_avail,
 581                       BUF_SIZE \- buf2_avail);
 582             if (nbytes < 1)
 583                 SHUT_FD2;
 584             else
 585                 buf2_avail += nbytes;
 586         }
 587         if (fd1 > 0 && FD_ISSET(fd1, &writefds) && buf2_avail > 0) {
 588             nbytes = write(fd1, buf2 + buf2_written,
 589                        buf2_avail \- buf2_written);
 590             if (nbytes < 1)
 591                 SHUT_FD1;
 592             else
 593                 buf2_written += nbytes;
 594         }
 595         if (fd2 > 0 && FD_ISSET(fd2, &writefds) && buf1_avail > 0) {
 596             nbytes = write(fd2, buf1 + buf1_written,
 597                        buf1_avail \- buf1_written);
 598             if (nbytes < 1)
 599                 SHUT_FD2;
 600             else
 601                 buf1_written += nbytes;
 602         }
 603
 604         /* Check if write data has caught read data */
 605
 606         if (buf1_written == buf1_avail)
 607             buf1_written = buf1_avail = 0;
 608         if (buf2_written == buf2_avail)
 609             buf2_written = buf2_avail = 0;
 610
 611         /* One side has closed the connection, keep
 612            writing to the other side until empty */
 613
 614         if (fd1 < 0 && buf1_avail \- buf1_written == 0)
 615             SHUT_FD2;
 616         if (fd2 < 0 && buf2_avail \- buf2_written == 0)
 617             SHUT_FD1;
 618     }
 619     exit(EXIT_SUCCESS);
 620 }
 621 .EE
 622 .PP
 623 The above program properly forwards most kinds of TCP connections
 624 including OOB signal data transmitted by \fBtelnet\fP servers.
 625 It handles the tricky problem of having data flow in both directions
 626 simultaneously.
 627 You might think it more efficient to use a
 628 .BR fork (2)
 629 call and devote a thread to each stream.
 630 This becomes more tricky than you might suspect.
 631 Another idea is to set nonblocking I/O using
 632 .BR fcntl (2).
 633 This also has its problems because you end up using
 634 inefficient timeouts.
 635 .PP
 636 The program does not handle more than one simultaneous connection at a
 637 time, although it could easily be extended to do this with a linked list
 638 of buffers\(emone for each connection.
 639 At the moment, new
 640 connections cause the current connection to be dropped.
 641 .SH SEE ALSO
 642 .BR accept (2),
 643 .BR connect (2),
 644 .BR poll (2),
 645 .BR read (2),
 646 .BR recv (2),
 647 .BR select (2),
 648 .BR send (2),
 649 .BR sigprocmask (2),
 650 .BR write (2),
 651 .BR epoll (7)
 652 .\" .SH AUTHORS
 653 .\" This man page was written by Paul Sheer.