]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/socket.7
credentials.7: SEE ALSO: add shadow(5)
[thirdparty/man-pages.git] / man7 / socket.7
1 '\" t
2 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3 .\" and copyright (c) 1999 Matthew Wilcox.
4 .\"
5 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
6 .\" Permission is granted to distribute possibly modified copies
7 .\" of this page provided the header is included verbatim,
8 .\" and in case of nontrivial modification author and date
9 .\" of the modification is added to the header.
10 .\" %%%LICENSE_END
11 .\"
12 .\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
13 .\" Added description of SO_ACCEPTCONN
14 .\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
15 .\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
16 .\" Added notes on capability requirements
17 .\" A few small grammar fixes
18 .\" 2010-06-13 Jan Engelhardt <jengelh@medozas.de>
19 .\" Documented SO_DOMAIN and SO_PROTOCOL.
20 .\"
21 .\" FIXME
22 .\" The following are not yet documented:
23 .\"
24 .\" SO_PEERNAME (2.4?)
25 .\" get only
26 .\" Seems to do something similar to getpeername(), but then
27 .\" why is it necessary / how does it differ?
28 .\"
29 .\" SO_TIMESTAMPNS (2.6.22)
30 .\" Documentation/networking/timestamping.txt
31 .\" commit 92f37fd2ee805aa77925c1e64fd56088b46094fc
32 .\" Author: Eric Dumazet <dada1@cosmosbay.com>
33 .\"
34 .\" SO_TIMESTAMPING (2.6.30)
35 .\" Documentation/networking/timestamping.txt
36 .\" commit cb9eff097831007afb30d64373f29d99825d0068
37 .\" Author: Patrick Ohly <patrick.ohly@intel.com>
38 .\"
39 .\" SO_WIFI_STATUS (3.3)
40 .\" commit 6e3e939f3b1bf8534b32ad09ff199d88800835a0
41 .\" Author: Johannes Berg <johannes.berg@intel.com>
42 .\" Also: SCM_WIFI_STATUS
43 .\"
44 .\" SO_NOFCS (3.4)
45 .\" commit 3bdc0eba0b8b47797f4a76e377dd8360f317450f
46 .\" Author: Ben Greear <greearb@candelatech.com>
47 .\"
48 .\" SO_GET_FILTER (3.8)
49 .\" commit a8fc92778080c845eaadc369a0ecf5699a03bef0
50 .\" Author: Pavel Emelyanov <xemul@parallels.com>
51 .\"
52 .\" SO_SELECT_ERR_QUEUE (3.10)
53 .\" commit 7d4c04fc170087119727119074e72445f2bb192b
54 .\" Author: Keller, Jacob E <jacob.e.keller@intel.com>
55 .\"
56 .\" SO_MAX_PACING_RATE (3.13)
57 .\" commit 62748f32d501f5d3712a7c372bbb92abc7c62bc7
58 .\" Author: Eric Dumazet <edumazet@google.com>
59 .\"
60 .\" SO_BPF_EXTENSIONS (3.14)
61 .\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
62 .\" Author: Michal Sekletar <msekleta@redhat.com>
63 .\"
64 .TH SOCKET 7 2016-10-08 Linux "Linux Programmer's Manual"
65 .SH NAME
66 socket \- Linux socket interface
67 .SH SYNOPSIS
68 .B #include <sys/socket.h>
69 .sp
70 .IB sockfd " = socket(int " socket_family ", int " socket_type ", int " protocol );
71 .SH DESCRIPTION
72 This manual page describes the Linux networking socket layer user
73 interface.
74 The BSD compatible sockets
75 are the uniform interface
76 between the user process and the network protocol stacks in the kernel.
77 The protocol modules are grouped into
78 .I protocol families
79 such as
80 .BR AF_INET ", " AF_IPX ", and " AF_PACKET ,
81 and
82 .I socket types
83 such as
84 .B SOCK_STREAM
85 or
86 .BR SOCK_DGRAM .
87 See
88 .BR socket (2)
89 for more information on families and types.
90 .SS Socket-layer functions
91 These functions are used by the user process to send or receive packets
92 and to do other socket operations.
93 For more information see their respective manual pages.
94
95 .BR socket (2)
96 creates a socket,
97 .BR connect (2)
98 connects a socket to a remote socket address,
99 the
100 .BR bind (2)
101 function binds a socket to a local socket address,
102 .BR listen (2)
103 tells the socket that new connections shall be accepted, and
104 .BR accept (2)
105 is used to get a new socket with a new incoming connection.
106 .BR socketpair (2)
107 returns two connected anonymous sockets (implemented only for a few
108 local families like
109 .BR AF_UNIX )
110 .PP
111 .BR send (2),
112 .BR sendto (2),
113 and
114 .BR sendmsg (2)
115 send data over a socket, and
116 .BR recv (2),
117 .BR recvfrom (2),
118 .BR recvmsg (2)
119 receive data from a socket.
120 .BR poll (2)
121 and
122 .BR select (2)
123 wait for arriving data or a readiness to send data.
124 In addition, the standard I/O operations like
125 .BR write (2),
126 .BR writev (2),
127 .BR sendfile (2),
128 .BR read (2),
129 and
130 .BR readv (2)
131 can be used to read and write data.
132 .PP
133 .BR getsockname (2)
134 returns the local socket address and
135 .BR getpeername (2)
136 returns the remote socket address.
137 .BR getsockopt (2)
138 and
139 .BR setsockopt (2)
140 are used to set or get socket layer or protocol options.
141 .BR ioctl (2)
142 can be used to set or read some other options.
143 .PP
144 .BR close (2)
145 is used to close a socket.
146 .BR shutdown (2)
147 closes parts of a full-duplex socket connection.
148 .PP
149 Seeking, or calling
150 .BR pread (2)
151 or
152 .BR pwrite (2)
153 with a nonzero position is not supported on sockets.
154 .PP
155 It is possible to do nonblocking I/O on sockets by setting the
156 .B O_NONBLOCK
157 flag on a socket file descriptor using
158 .BR fcntl (2).
159 Then all operations that would block will (usually)
160 return with
161 .B EAGAIN
162 (operation should be retried later);
163 .BR connect (2)
164 will return
165 .B EINPROGRESS
166 error.
167 The user can then wait for various events via
168 .BR poll (2)
169 or
170 .BR select (2).
171 .TS
172 tab(:) allbox;
173 c s s
174 l l l.
175 I/O events
176 Event:Poll flag:Occurrence
177 Read:POLLIN:T{
178 New data arrived.
179 T}
180 Read:POLLIN:T{
181 A connection setup has been completed
182 (for connection-oriented sockets)
183 T}
184 Read:POLLHUP:T{
185 A disconnection request has been initiated by the other end.
186 T}
187 Read:POLLHUP:T{
188 A connection is broken (only for connection-oriented protocols).
189 When the socket is written
190 .B SIGPIPE
191 is also sent.
192 T}
193 Write:POLLOUT:T{
194 Socket has enough send buffer space for writing new data.
195 T}
196 Read/Write:T{
197 POLLIN |
198 .br
199 POLLOUT
200 T}:T{
201 An outgoing
202 .BR connect (2)
203 finished.
204 T}
205 Read/Write:POLLERR:An asynchronous error occurred.
206 Read/Write:POLLHUP:The other end has shut down one direction.
207 Exception:POLLPRI:T{
208 Urgent data arrived.
209 .B SIGURG
210 is sent then.
211 T}
212 .\" FIXME . The following is not true currently:
213 .\" It is no I/O event when the connection
214 .\" is broken from the local end using
215 .\" .BR shutdown (2)
216 .\" or
217 .\" .BR close (2).
218 .TE
219 .PP
220 An alternative to
221 .BR poll (2)
222 and
223 .BR select (2)
224 is to let the kernel inform the application about events
225 via a
226 .B SIGIO
227 signal.
228 For that the
229 .B O_ASYNC
230 flag must be set on a socket file descriptor via
231 .BR fcntl (2)
232 and a valid signal handler for
233 .B SIGIO
234 must be installed via
235 .BR sigaction (2).
236 See the
237 .I Signals
238 discussion below.
239 .SS Socket address structures
240 Each socket domain has its own format for socket addresses,
241 with a domain-specific address structure.
242 Each of these structures begins with an
243 integer "family" field (typed as
244 .IR sa_family_t )
245 that indicates the type of the address structure.
246 This allows
247 the various system calls (e.g.,
248 .BR connect (2),
249 .BR bind (2),
250 .BR accept (2),
251 .BR getsockname (2),
252 .BR getpeername (2)),
253 which are generic to all socket domains,
254 to determine the domain of a particular socket address.
255
256 To allow any type of socket address to be passed to
257 interfaces in the sockets API,
258 the type
259 .IR "struct sockaddr"
260 is defined.
261 The purpose of this type is purely to allow casting of
262 domain-specific socket address types to a "generic" type,
263 so as to avoid compiler warnings about type mismatches in
264 calls to the sockets API.
265
266 In addition, the sockets API provides the data type
267 .IR "struct sockaddr_storage".
268 This type
269 is suitable to accommodate all supported domain-specific socket
270 address structures; it is large enough and is aligned properly.
271 (In particular, it is large enough to hold
272 IPv6 socket addresses.)
273 The structure includes the following field, which can be used to identify
274 the type of socket address actually stored in the structure:
275
276 .in +4n
277 .nf
278 sa_family_t ss_family;
279 .fi
280 .in
281
282 The
283 .I sockaddr_storage
284 structure is useful in programs that must handle socket addresses
285 in a generic way
286 (e.g., programs that must deal with both IPv4 and IPv6 socket addresses).
287 .SS Socket options
288 The socket options listed below can be set by using
289 .BR setsockopt (2)
290 and read with
291 .BR getsockopt (2)
292 with the socket level set to
293 .B SOL_SOCKET
294 for all sockets.
295 Unless otherwise noted,
296 .I optval
297 is a pointer to an
298 .IR int .
299 .\" FIXME .
300 .\" In the list below, the text used to describe argument types
301 .\" for each socket option should be more consistent
302 .\"
303 .\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
304 .\" W R Stevens, UNPv1
305 .TP
306 .B SO_ACCEPTCONN
307 Returns a value indicating whether or not this socket has been marked
308 to accept connections with
309 .BR listen (2).
310 The value 0 indicates that this is not a listening socket,
311 the value 1 indicates that this is a listening socket.
312 This socket option is read-only.
313 .TP
314 .BR SO_ATTACH_FILTER " (since Linux 2.2), " SO_ATTACH_BPF " (since Linux 3.19)"
315 Attach a classic BPF
316 .RB ( SO_ATTACH_FILTER )
317 or an extended BPF
318 .RB ( SO_ATTACH_BPF )
319 program to the socket for use as a filter of incoming packets.
320 A packet will be dropped if the filter program returns zero.
321 If the filter program returns a
322 non-zero value which is less than the packet's data length,
323 the packet will be truncated to the length returned.
324 If the value returned by the filter is greater than or equal to the
325 packet's data length, the packet is allowed to proceed unmodified.
326
327 The argument for
328 .BR SO_ATTACH_FILTER
329 is a
330 .I sock_fprog
331 structure, defined in
332 .IR <linux/filter.h> :
333 .sp
334 .in +4n
335 .nf
336 struct sock_fprog {
337 unsigned short len;
338 struct sock_filter *filter;
339 };
340 .fi
341 .in
342 .IP
343 The argument for
344 .BR SO_ATTACH_BPF
345 is a file descriptor returned by the
346 .BR bpf (2)
347 system call and must refer to a program of type
348 .BR BPF_PROG_TYPE_SOCKET_FILTER.
349
350 These options may be set multiple times for a given socket,
351 each time replacing the previous filter program.
352 The classic and extended versions may be called on the same socket,
353 but the previous filter will always be replaced such that a socket
354 never has more than one filter defined.
355
356 Both classic and extended BPF are explained in the kernel source file
357 .I Documentation/networking/filter.txt
358 .TP
359 .BR SO_ATTACH_REUSEPORT_CBPF ", " SO_ATTACH_REUSEPORT_EBPF
360 For use with the
361 .BR SO_REUSEPORT
362 option, these options allow the user to set a classic BPF
363 .RB ( SO_ATTACH_REUSEPORT_CBPF )
364 or an extended BPF
365 .RB ( SO_ATTACH_REUSEPORT_EBPF )
366 program which defines how packets are assigned to
367 the sockets in the reuseport group (that is, all sockets which have
368 .BR SO_REUSEPORT
369 set and are using the same local address to receive packets).
370
371 The BPF program must return an index between 0 and N\-1 representing
372 the socket which should receive the packet
373 (where N is the number of sockets in the group).
374 If the BPF program returns an invalid index,
375 socket selection will fall back to the plain
376 .BR SO_REUSEPORT
377 mechanism.
378
379 Sockets are numbered in the order in which they are added to the group
380 (that is, the order of
381 .BR bind (2)
382 calls for UDP sockets or the order of
383 .BR listen (2)
384 calls for TCP sockets).
385 New sockets added to a reuseport group will inherit the BPF program.
386 When a socket is removed from a reuseport group (via
387 .BR close (2)),
388 the last socket in the group will be moved into the closed socket's
389 position.
390
391 These options may be set repeatedly at any time on any socket in the group
392 to replace the current BPF program used by all sockets in the group.
393
394 .BR SO_ATTACH_REUSEPORT_CBPF
395 takes the same argument type as
396 .BR SO_ATTACH_FILTER
397 and
398 .BR SO_ATTACH_REUSEPORT_EBPF
399 takes the same argument type as
400 .BR SO_ATTACH_BPF.
401
402 UDP support for this feature is available since Linux 4.5;
403 TCP support is available since Linux 4.6.
404 .TP
405 .B SO_BINDTODEVICE
406 Bind this socket to a particular device like \(lqeth0\(rq,
407 as specified in the passed interface name.
408 If the
409 name is an empty string or the option length is zero, the socket device
410 binding is removed.
411 The passed option is a variable-length null-terminated
412 interface name string with the maximum size of
413 .BR IFNAMSIZ .
414 If a socket is bound to an interface,
415 only packets received from that particular interface are processed by the
416 socket.
417 Note that this works only for some socket types, particularly
418 .B AF_INET
419 sockets.
420 It is not supported for packet sockets (use normal
421 .BR bind (2)
422 there).
423
424 Before Linux 3.8,
425 this socket option could be set, but could not retrieved with
426 .BR getsockopt (2).
427 Since Linux 3.8, it is readable.
428 The
429 .I optlen
430 argument should contain the buffer size available
431 to receive the device name and is recommended to be
432 .BR IFNAMSZ
433 bytes.
434 The real device name length is reported back in the
435 .I optlen
436 argument.
437 .TP
438 .B SO_BROADCAST
439 Set or get the broadcast flag.
440 When enabled, datagram sockets are allowed to send
441 packets to a broadcast address.
442 This option has no effect on stream-oriented sockets.
443 .TP
444 .B SO_BSDCOMPAT
445 Enable BSD bug-to-bug compatibility.
446 This is used by the UDP protocol module in Linux 2.0 and 2.2.
447 If enabled, ICMP errors received for a UDP socket will not be passed
448 to the user program.
449 In later kernel versions, support for this option has been phased out:
450 Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
451 (printk()) if a program uses this option.
452 Linux 2.0 also enabled BSD bug-to-bug compatibility
453 options (random header changing, skipping of the broadcast flag) for raw
454 sockets with this option, but that was removed in Linux 2.2.
455 .TP
456 .B SO_DEBUG
457 Enable socket debugging.
458 Allowed only for processes with the
459 .B CAP_NET_ADMIN
460 capability or an effective user ID of 0.
461 .TP
462 .BR SO_DETACH_FILTER " (since Linux 2.2), " SO_DETACH_BPF " (since Linux 3.19)"
463 These two options, which are synonyms,
464 may be used to remove the classic or extended BPF
465 program attached to a socket with either
466 .BR SO_ATTACH_FILTER
467 or
468 .BR SO_ATTACH_BPF .
469 The option value is ignored.
470 .TP
471 .BR SO_DOMAIN " (since Linux 2.6.32)"
472 Retrieves the socket domain as an integer, returning a value such as
473 .BR AF_INET6 .
474 See
475 .BR socket (2)
476 for details.
477 This socket option is read-only.
478 .TP
479 .B SO_ERROR
480 Get and clear the pending socket error.
481 This socket option is read-only.
482 Expects an integer.
483 .TP
484 .B SO_DONTROUTE
485 Don't send via a gateway, send only to directly connected hosts.
486 The same effect can be achieved by setting the
487 .B MSG_DONTROUTE
488 flag on a socket
489 .BR send (2)
490 operation.
491 Expects an integer boolean flag.
492 .TP
493 .B SO_KEEPALIVE
494 Enable sending of keep-alive messages on connection-oriented sockets.
495 Expects an integer boolean flag.
496 .TP
497 .B SO_LINGER
498 Sets or gets the
499 .B SO_LINGER
500 option.
501 The argument is a
502 .I linger
503 structure.
504 .sp
505 .in +4n
506 .nf
507 struct linger {
508 int l_onoff; /* linger active */
509 int l_linger; /* how many seconds to linger for */
510 };
511 .fi
512 .in
513 .IP
514 When enabled, a
515 .BR close (2)
516 or
517 .BR shutdown (2)
518 will not return until all queued messages for the socket have been
519 successfully sent or the linger timeout has been reached.
520 Otherwise,
521 the call returns immediately and the closing is done in the background.
522 When the socket is closed as part of
523 .BR exit (2),
524 it always lingers in the background.
525 .TP
526 .B SO_LOCK_FILTER
527 .\" commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
528 When set, this option will prevent
529 changing the filters associated with the socket.
530 These filters include any set using the socket options
531 .BR SO_ATTACH_FILTER,
532 .BR SO_ATTACH_BPF,
533 .BR SO_ATTACH_REUSEPORT_CBPF
534 and
535 .BR SO_ATTACH_REUSEPORT_EPBF .
536
537 The typical use case is for a privileged process to set up a raw socket
538 (an operation that requires the
539 .BR CAP_NET_RAW
540 capability), apply a restrictive filter, set the
541 .BR SO_LOCK_FILTER
542 option,
543 and then either drop its privileges or pass the socket file descriptor
544 to an unprivileged process via a UNIX domain socket.
545
546 Once the
547 .BR SO_LOCK_FILTER
548 option has been enabled, attempts to change or remove the filter
549 attached to a socket, or to disable the
550 .BR SO_LOCK_FILTER
551 option will fail with the error
552 .BR EPERM .
553 .TP
554 .BR SO_MARK " (since Linux 2.6.25)"
555 .\" commit 4a19ec5800fc3bb64e2d87c4d9fdd9e636086fe0
556 .\" and 914a9ab386a288d0f22252fc268ecbc048cdcbd5
557 Set the mark for each packet sent through this socket
558 (similar to the netfilter MARK target but socket-based).
559 Changing the mark can be used for mark-based
560 routing without netfilter or for packet filtering.
561 Setting this option requires the
562 .B CAP_NET_ADMIN
563 capability.
564 .TP
565 .B SO_OOBINLINE
566 If this option is enabled,
567 out-of-band data is directly placed into the receive data stream.
568 Otherwise, out-of-band data is passed only when the
569 .B MSG_OOB
570 flag is set during receiving.
571 .\" don't document it because it can do too much harm.
572 .\".B SO_NO_CHECK
573 .\" The kernel has support for the SO_NO_CHECK socket
574 .\" option (boolean: 0 == default, calculate checksum on xmit,
575 .\" 1 == do not calculate checksum on xmit).
576 .\" Additional note from Andi Kleen on SO_NO_CHECK (2010-08-30)
577 .\" On Linux UDP checksums are essentially free and there's no reason
578 .\" to turn them off and it would disable another safety line.
579 .\" That is why I didn't document the option.
580 .TP
581 .B SO_PASSCRED
582 Enable or disable the receiving of the
583 .B SCM_CREDENTIALS
584 control message.
585 For more information see
586 .BR unix (7).
587 .\" FIXME Document SO_PASSSEC, added in 2.6.18; there is some info
588 .\" in the 2.6.18 ChangeLog
589 .TP
590 .BR SO_PEEK_OFF " (since Linux 3.4)"
591 .\" commit ef64a54f6e558155b4f149bb10666b9e914b6c54
592 This option, which is currently supported only for
593 .BR unix (7)
594 sockets, sets the value of the "peek offset" for the
595 .BR recv (2)
596 system call when used with
597 .BR MSG_PEEK
598 flag.
599
600 When this option is set to a negative value
601 (it is set to \-1 for all new sockets),
602 traditional behavior is provided:
603 .BR recv (2)
604 with the
605 .BR MSG_PEEK
606 flag will peek data from the front of the queue.
607
608 When the option is set to a value greater than or equal to zero,
609 then the next peek at data queued in the socket will occur at
610 the byte offset specified by the option value.
611 At the same time, the "peek offset" will be
612 incremented by the number of bytes that were peeked from the queue,
613 so that a subsequent peek will return the next data in the queue.
614
615 If data is removed from the front of the queue via a call to
616 .BR recv (2)
617 (or similar) without the
618 .BR MSG_PEEK
619 flag, the "peek offset" will be decreased by the number of bytes removed.
620 In other words, receiving data without the
621 .B MSG_PEEK
622 flag will cause the "peek offset" to be adjusted to maintain
623 the correct relative position in the queued data,
624 so that a subsequent peek will retrieve the data that would have been
625 retrieved had the data not been removed.
626
627 For datagram sockets, if the "peek offset" points to the middle of a packet,
628 the data returned will be marked with the
629 .BR MSG_TRUNC
630 flag.
631
632 The following example serves to illustrate the use of
633 .BR SO_PEEK_OFF .
634 Suppose a stream socket has the following queued input data:
635
636 aabbccddeeff
637 .IP
638 The following sequence of
639 .BR recv (2)
640 calls would have the effect noted in the comments:
641
642 .in +4n
643 .nf
644 int ov = 4; // Set peek offset to 4
645 setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
646
647 recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
648 recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
649 recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
650 recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
651 .fi
652 .in
653 .TP
654 .B SO_PEERCRED
655 Return the credentials of the foreign process connected to this socket.
656 This is possible only for connected
657 .B AF_UNIX
658 stream sockets and
659 .B AF_UNIX
660 stream and datagram socket pairs created using
661 .BR socketpair (2);
662 see
663 .BR unix (7).
664 The returned credentials are those that were in effect at the time
665 of the call to
666 .BR connect (2)
667 or
668 .BR socketpair (2).
669 The argument is a
670 .I ucred
671 structure; define the
672 .B _GNU_SOURCE
673 feature test macro to obtain the definition of that structure from
674 .IR <sys/socket.h> .
675 This socket option is read-only.
676 .TP
677 .B SO_PRIORITY
678 Set the protocol-defined priority for all packets to be sent on
679 this socket.
680 Linux uses this value to order the networking queues:
681 packets with a higher priority may be processed first depending
682 on the selected device queueing discipline.
683 .\" For
684 .\" .BR ip (7),
685 .\" this also sets the IP type-of-service (TOS) field for outgoing packets.
686 Setting a priority outside the range 0 to 6 requires the
687 .B CAP_NET_ADMIN
688 capability.
689 .TP
690 .BR SO_PROTOCOL " (since Linux 2.6.32)"
691 Retrieves the socket protocol as an integer, returning a value such as
692 .BR IPPROTO_SCTP .
693 See
694 .BR socket (2)
695 for details.
696 This socket option is read-only.
697 .TP
698 .B SO_RCVBUF
699 Sets or gets the maximum socket receive buffer in bytes.
700 The kernel doubles this value (to allow space for bookkeeping overhead)
701 when it is set using
702 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
703 .BR setsockopt (2),
704 and this doubled value is returned by
705 .BR getsockopt (2).
706 .\" The following thread on LMKL is quite informative:
707 .\" getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behavior
708 .\" 17 July 2012
709 .\" http://thread.gmane.org/gmane.linux.kernel/1328935
710 The default value is set by the
711 .I /proc/sys/net/core/rmem_default
712 file, and the maximum allowed value is set by the
713 .I /proc/sys/net/core/rmem_max
714 file.
715 The minimum (doubled) value for this option is 256.
716 .TP
717 .BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
718 Using this socket option, a privileged
719 .RB ( CAP_NET_ADMIN )
720 process can perform the same task as
721 .BR SO_RCVBUF ,
722 but the
723 .I rmem_max
724 limit can be overridden.
725 .TP
726 .BR SO_RCVLOWAT " and " SO_SNDLOWAT
727 Specify the minimum number of bytes in the buffer until the socket layer
728 will pass the data to the protocol
729 .RB ( SO_SNDLOWAT )
730 or the user on receiving
731 .RB ( SO_RCVLOWAT ).
732 These two values are initialized to 1.
733 .B SO_SNDLOWAT
734 is not changeable on Linux
735 .RB ( setsockopt (2)
736 fails with the error
737 .BR ENOPROTOOPT ).
738 .B SO_RCVLOWAT
739 is changeable
740 only since Linux 2.4.
741 The
742 .BR select (2)
743 and
744 .BR poll (2)
745 system calls currently do not respect the
746 .B SO_RCVLOWAT
747 setting on Linux,
748 and mark a socket readable when even a single byte of data is available.
749 A subsequent read from the socket will block until
750 .B SO_RCVLOWAT
751 bytes are available.
752 .\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
753 .\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
754 .TP
755 .BR SO_RCVTIMEO " and " SO_SNDTIMEO
756 .\" Not implemented in 2.0.
757 .\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
758 .\" Implemented in 2.3.41 for setsockopt, and actually used.
759 Specify the receiving or sending timeouts until reporting an error.
760 The argument is a
761 .IR "struct timeval" .
762 If an input or output function blocks for this period of time, and
763 data has been sent or received, the return value of that function
764 will be the amount of data transferred; if no data has been transferred
765 and the timeout has been reached, then \-1 is returned with
766 .I errno
767 set to
768 .BR EAGAIN
769 or
770 .BR EWOULDBLOCK ,
771 .\" in fact to EAGAIN
772 or
773 .B EINPROGRESS
774 (for
775 .BR connect (2))
776 just as if the socket was specified to be nonblocking.
777 If the timeout is set to zero (the default),
778 then the operation will never timeout.
779 Timeouts only have effect for system calls that perform socket I/O (e.g.,
780 .BR read (2),
781 .BR recvmsg (2),
782 .BR send (2),
783 .BR sendmsg (2));
784 timeouts have no effect for
785 .BR select (2),
786 .BR poll (2),
787 .BR epoll_wait (2),
788 and so on.
789 .TP
790 .B SO_REUSEADDR
791 .\" commit c617f398edd4db2b8567a28e899a88f8f574798d
792 .\" https://lwn.net/Articles/542629/
793 Indicates that the rules used in validating addresses supplied in a
794 .BR bind (2)
795 call should allow reuse of local addresses.
796 For
797 .B AF_INET
798 sockets this
799 means that a socket may bind, except when there
800 is an active listening socket bound to the address.
801 When the listening socket is bound to
802 .B INADDR_ANY
803 with a specific port then it is not possible
804 to bind to this port for any local address.
805 Argument is an integer boolean flag.
806 .TP
807 .BR SO_REUSEPORT " (since Linux 3.9)"
808 Permits multiple
809 .B AF_INET
810 or
811 .B AF_INET6
812 sockets to be bound to an identical socket address.
813 This option must be set on each socket (including the first socket)
814 prior to calling
815 .BR bind (2)
816 on the socket.
817 To prevent port hijacking,
818 all of the processes binding to the same address must have the same
819 effective UID.
820 This option can be employed with both TCP and UDP sockets.
821
822 For TCP sockets, this option allows
823 .BR accept (2)
824 load distribution in a multi-threaded server to be improved by
825 using a distinct listener socket for each thread.
826 This provides improved load distribution as compared
827 to traditional techniques such using a single
828 .BR accept (2)ing
829 thread that distributes connections,
830 or having multiple threads that compete to
831 .BR accept (2)
832 from the same socket.
833
834 For UDP sockets,
835 the use of this option can provide better distribution
836 of incoming datagrams to multiple processes (or threads) as compared
837 to the traditional technique of having multiple processes
838 compete to receive datagrams on the same socket.
839 .TP
840 .BR SO_RXQ_OVFL " (since Linux 2.6.33)"
841 .\" commit 3b885787ea4112eaa80945999ea0901bf742707f
842 Indicates that an unsigned 32-bit value ancillary message (cmsg)
843 should be attached to received skbs indicating
844 the number of packets dropped by the socket between
845 the last received packet and this received packet.
846 .TP
847 .B SO_SNDBUF
848 Sets or gets the maximum socket send buffer in bytes.
849 The kernel doubles this value (to allow space for bookkeeping overhead)
850 when it is set using
851 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
852 .\" See also the comment to SO_RCVBUF (17 Jul 2012 LKML mail)
853 .BR setsockopt (2),
854 and this doubled value is returned by
855 .BR getsockopt (2).
856 The default value is set by the
857 .I /proc/sys/net/core/wmem_default
858 file and the maximum allowed value is set by the
859 .I /proc/sys/net/core/wmem_max
860 file.
861 The minimum (doubled) value for this option is 2048.
862 .TP
863 .BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
864 Using this socket option, a privileged
865 .RB ( CAP_NET_ADMIN )
866 process can perform the same task as
867 .BR SO_SNDBUF ,
868 but the
869 .I wmem_max
870 limit can be overridden.
871 .TP
872 .B SO_TIMESTAMP
873 Enable or disable the receiving of the
874 .B SO_TIMESTAMP
875 control message.
876 The timestamp control message is sent with level
877 .B SOL_SOCKET
878 and the
879 .I cmsg_data
880 field is a
881 .I "struct timeval"
882 indicating the
883 reception time of the last packet passed to the user in this call.
884 See
885 .BR cmsg (3)
886 for details on control messages.
887 .TP
888 .B SO_TYPE
889 Gets the socket type as an integer (e.g.,
890 .BR SOCK_STREAM ).
891 This socket option is read-only.
892 .TP
893 .BR SO_BUSY_POLL " (since Linux 3.11)"
894 Sets the approximate time in microseconds to busy poll on a blocking receive
895 when there is no data.
896 Increasing this value requires
897 .BR CAP_NET_ADMIN .
898 The default for this option is controlled by the
899 .I /proc/sys/net/core/busy_read
900 file.
901
902 The value in the
903 .I /proc/sys/net/core/busy_poll
904 file determines how long
905 .BR select (2)
906 and
907 .BR poll (2)
908 will busy poll when they operate on sockets with
909 .BR SO_BUSY_POLL
910 set and no events to report are found.
911
912 In both cases,
913 busy polling will only be done when the socket last received data
914 from a network device that supports this option.
915
916 While busy polling may improve latency of some applications,
917 care must be taken when using it since this will increase
918 both CPU utilization and power usage.
919 .SS Signals
920 When writing onto a connection-oriented socket that has been shut down
921 (by the local or the remote end)
922 .B SIGPIPE
923 is sent to the writing process and
924 .B EPIPE
925 is returned.
926 The signal is not sent when the write call
927 specified the
928 .B MSG_NOSIGNAL
929 flag.
930 .PP
931 When requested with the
932 .B FIOSETOWN
933 .BR fcntl (2)
934 or
935 .B SIOCSPGRP
936 .BR ioctl (2),
937 .B SIGIO
938 is sent when an I/O event occurs.
939 It is possible to use
940 .BR poll (2)
941 or
942 .BR select (2)
943 in the signal handler to find out which socket the event occurred on.
944 An alternative (in Linux 2.2) is to set a real-time signal using the
945 .B F_SETSIG
946 .BR fcntl (2);
947 the handler of the real time signal will be called with
948 the file descriptor in the
949 .I si_fd
950 field of its
951 .IR siginfo_t .
952 See
953 .BR fcntl (2)
954 for more information.
955 .PP
956 Under some circumstances (e.g., multiple processes accessing a
957 single socket), the condition that caused the
958 .B SIGIO
959 may have already disappeared when the process reacts to the signal.
960 If this happens, the process should wait again because Linux
961 will resend the signal later.
962 .\" .SS Ancillary messages
963 .SS /proc interfaces
964 The core socket networking parameters can be accessed
965 via files in the directory
966 .IR /proc/sys/net/core/ .
967 .TP
968 .I rmem_default
969 contains the default setting in bytes of the socket receive buffer.
970 .TP
971 .I rmem_max
972 contains the maximum socket receive buffer size in bytes which a user may
973 set by using the
974 .B SO_RCVBUF
975 socket option.
976 .TP
977 .I wmem_default
978 contains the default setting in bytes of the socket send buffer.
979 .TP
980 .I wmem_max
981 contains the maximum socket send buffer size in bytes which a user may
982 set by using the
983 .B SO_SNDBUF
984 socket option.
985 .TP
986 .IR message_cost " and " message_burst
987 configure the token bucket filter used to load limit warning messages
988 caused by external network events.
989 .TP
990 .I netdev_max_backlog
991 Maximum number of packets in the global input queue.
992 .TP
993 .I optmem_max
994 Maximum length of ancillary data and user control data like the iovecs
995 per socket.
996 .\" netdev_fastroute is not documented because it is experimental
997 .SS Ioctls
998 These operations can be accessed using
999 .BR ioctl (2):
1000
1001 .in +4n
1002 .nf
1003 .IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
1004 .fi
1005 .in
1006 .TP
1007 .B SIOCGSTAMP
1008 Return a
1009 .I struct timeval
1010 with the receive timestamp of the last packet passed to the user.
1011 This is useful for accurate round trip time measurements.
1012 See
1013 .BR setitimer (2)
1014 for a description of
1015 .IR "struct timeval" .
1016 .\"
1017 This ioctl should be used only if the socket option
1018 .B SO_TIMESTAMP
1019 is not set on the socket.
1020 Otherwise, it returns the timestamp of the
1021 last packet that was received while
1022 .B SO_TIMESTAMP
1023 was not set, or it fails if no such packet has been received,
1024 (i.e.,
1025 .BR ioctl (2)
1026 returns \-1 with
1027 .I errno
1028 set to
1029 .BR ENOENT ).
1030 .TP
1031 .B SIOCSPGRP
1032 Set the process or process group that is to receive
1033 .B SIGIO
1034 or
1035 .B SIGURG
1036 signals when I/O becomes possible or urgent data is available.
1037 The argument is a pointer to a
1038 .IR pid_t .
1039 For further details, see the description of
1040 .BR F_SETOWN
1041 in
1042 .BR fcntl (2).
1043 .TP
1044 .B FIOASYNC
1045 Change the
1046 .B O_ASYNC
1047 flag to enable or disable asynchronous I/O mode of the socket.
1048 Asynchronous I/O mode means that the
1049 .B SIGIO
1050 signal or the signal set with
1051 .B F_SETSIG
1052 is raised when a new I/O event occurs.
1053 .IP
1054 Argument is an integer boolean flag.
1055 (This operation is synonymous with the use of
1056 .BR fcntl (2)
1057 to set the
1058 .B O_ASYNC
1059 flag.)
1060 .\"
1061 .TP
1062 .B SIOCGPGRP
1063 Get the current process or process group that receives
1064 .B SIGIO
1065 or
1066 .B SIGURG
1067 signals,
1068 or 0
1069 when none is set.
1070 .PP
1071 Valid
1072 .BR fcntl (2)
1073 operations:
1074 .TP
1075 .B FIOGETOWN
1076 The same as the
1077 .B SIOCGPGRP
1078 .BR ioctl (2).
1079 .TP
1080 .B FIOSETOWN
1081 The same as the
1082 .B SIOCSPGRP
1083 .BR ioctl (2).
1084 .SH VERSIONS
1085 .B SO_BINDTODEVICE
1086 was introduced in Linux 2.0.30.
1087 .B SO_PASSCRED
1088 is new in Linux 2.2.
1089 The
1090 .I /proc
1091 interfaces were introduced in Linux 2.2.
1092 .B SO_RCVTIMEO
1093 and
1094 .B SO_SNDTIMEO
1095 are supported since Linux 2.3.41.
1096 Earlier, timeouts were fixed to
1097 a protocol-specific setting, and could not be read or written.
1098 .SH NOTES
1099 Linux assumes that half of the send/receive buffer is used for internal
1100 kernel structures; thus the values in the corresponding
1101 .I /proc
1102 files are twice what can be observed on the wire.
1103
1104 Linux will allow port reuse only with the
1105 .B SO_REUSEADDR
1106 option
1107 when this option was set both in the previous program that performed a
1108 .BR bind (2)
1109 to the port and in the program that wants to reuse the port.
1110 This differs from some implementations (e.g., FreeBSD)
1111 where only the later program needs to set the
1112 .B SO_REUSEADDR
1113 option.
1114 Typically this difference is invisible, since, for example, a server
1115 program is designed to always set this option.
1116 .\" .SH AUTHORS
1117 .\" This man page was written by Andi Kleen.
1118 .SH SEE ALSO
1119 .BR wireshark (1),
1120 .BR bpf (2),
1121 .BR connect (2),
1122 .BR getsockopt (2),
1123 .BR setsockopt (2),
1124 .BR socket (2),
1125 .BR pcap (3),
1126 .BR capabilities (7),
1127 .BR ddp (7),
1128 .BR ip (7),
1129 .BR packet (7),
1130 .BR tcp (7),
1131 .BR udp (7),
1132 .BR unix (7),
1133 .BR tcpdump (8)