]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/socket.7
socket.7: SIOCSPGRP: refer to fcntl(2) F_SETOWN for correct permission rules
[thirdparty/man-pages.git] / man7 / socket.7
1 '\" t
2 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3 .\" and copyright (c) 1999 Matthew Wilcox.
4 .\"
5 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
6 .\" Permission is granted to distribute possibly modified copies
7 .\" of this page provided the header is included verbatim,
8 .\" and in case of nontrivial modification author and date
9 .\" of the modification is added to the header.
10 .\" %%%LICENSE_END
11 .\"
12 .\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
13 .\" Added description of SO_ACCEPTCONN
14 .\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
15 .\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
16 .\" Added notes on capability requirements
17 .\" A few small grammar fixes
18 .\" 2010-06-13 Jan Engelhardt <jengelh@medozas.de>
19 .\" Documented SO_DOMAIN and SO_PROTOCOL.
20 .\" FIXME
21 .\" The following are not yet documented:
22 .\" SO_PEERNAME (2.4?)
23 .\" get only
24 .\" Seems to do something similar to getpeername(), but then
25 .\" why is it necessary / how does it differ?
26 .\" SO_TIMESTAMPNS (2.6.22)
27 .\" Documentation/networking/timestamping.txt
28 .\" commit 92f37fd2ee805aa77925c1e64fd56088b46094fc
29 .\" Author: Eric Dumazet <dada1@cosmosbay.com>
30 .\" SO_TIMESTAMPING (2.6.30)
31 .\" Documentation/networking/timestamping.txt
32 .\" commit cb9eff097831007afb30d64373f29d99825d0068
33 .\" Author: Patrick Ohly <patrick.ohly@intel.com>
34 .\" SO_WIFI_STATUS (3.3)
35 .\" commit 6e3e939f3b1bf8534b32ad09ff199d88800835a0
36 .\" Author: Johannes Berg <johannes.berg@intel.com>
37 .\" Also: SCM_WIFI_STATUS
38 .\" SO_NOFCS (3.4)
39 .\" commit 3bdc0eba0b8b47797f4a76e377dd8360f317450f
40 .\" Author: Ben Greear <greearb@candelatech.com>
41 .\" SO_GET_FILTER (3.8)
42 .\" commit a8fc92778080c845eaadc369a0ecf5699a03bef0
43 .\" Author: Pavel Emelyanov <xemul@parallels.com>
44 .\" SO_SELECT_ERR_QUEUE (3.10)
45 .\" commit 7d4c04fc170087119727119074e72445f2bb192b
46 .\" Author: Keller, Jacob E <jacob.e.keller@intel.com>
47 .\" SO_MAX_PACING_RATE (3.13)
48 .\" commit 62748f32d501f5d3712a7c372bbb92abc7c62bc7
49 .\" Author: Eric Dumazet <edumazet@google.com>
50 .\" SO_BPF_EXTENSIONS (3.14)
51 .\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
52 .\" Author: Michal Sekletar <msekleta@redhat.com>
53 .\"
54 .TH SOCKET 7 2016-03-15 Linux "Linux Programmer's Manual"
55 .SH NAME
56 socket \- Linux socket interface
57 .SH SYNOPSIS
58 .B #include <sys/socket.h>
59 .sp
60 .IB sockfd " = socket(int " socket_family ", int " socket_type ", int " protocol );
61 .SH DESCRIPTION
62 This manual page describes the Linux networking socket layer user
63 interface.
64 The BSD compatible sockets
65 are the uniform interface
66 between the user process and the network protocol stacks in the kernel.
67 The protocol modules are grouped into
68 .I protocol families
69 such as
70 .BR AF_INET ", " AF_IPX ", and " AF_PACKET ,
71 and
72 .I socket types
73 such as
74 .B SOCK_STREAM
75 or
76 .BR SOCK_DGRAM .
77 See
78 .BR socket (2)
79 for more information on families and types.
80 .SS Socket-layer functions
81 These functions are used by the user process to send or receive packets
82 and to do other socket operations.
83 For more information see their respective manual pages.
84
85 .BR socket (2)
86 creates a socket,
87 .BR connect (2)
88 connects a socket to a remote socket address,
89 the
90 .BR bind (2)
91 function binds a socket to a local socket address,
92 .BR listen (2)
93 tells the socket that new connections shall be accepted, and
94 .BR accept (2)
95 is used to get a new socket with a new incoming connection.
96 .BR socketpair (2)
97 returns two connected anonymous sockets (implemented only for a few
98 local families like
99 .BR AF_UNIX )
100 .PP
101 .BR send (2),
102 .BR sendto (2),
103 and
104 .BR sendmsg (2)
105 send data over a socket, and
106 .BR recv (2),
107 .BR recvfrom (2),
108 .BR recvmsg (2)
109 receive data from a socket.
110 .BR poll (2)
111 and
112 .BR select (2)
113 wait for arriving data or a readiness to send data.
114 In addition, the standard I/O operations like
115 .BR write (2),
116 .BR writev (2),
117 .BR sendfile (2),
118 .BR read (2),
119 and
120 .BR readv (2)
121 can be used to read and write data.
122 .PP
123 .BR getsockname (2)
124 returns the local socket address and
125 .BR getpeername (2)
126 returns the remote socket address.
127 .BR getsockopt (2)
128 and
129 .BR setsockopt (2)
130 are used to set or get socket layer or protocol options.
131 .BR ioctl (2)
132 can be used to set or read some other options.
133 .PP
134 .BR close (2)
135 is used to close a socket.
136 .BR shutdown (2)
137 closes parts of a full-duplex socket connection.
138 .PP
139 Seeking, or calling
140 .BR pread (2)
141 or
142 .BR pwrite (2)
143 with a nonzero position is not supported on sockets.
144 .PP
145 It is possible to do nonblocking I/O on sockets by setting the
146 .B O_NONBLOCK
147 flag on a socket file descriptor using
148 .BR fcntl (2).
149 Then all operations that would block will (usually)
150 return with
151 .B EAGAIN
152 (operation should be retried later);
153 .BR connect (2)
154 will return
155 .B EINPROGRESS
156 error.
157 The user can then wait for various events via
158 .BR poll (2)
159 or
160 .BR select (2).
161 .TS
162 tab(:) allbox;
163 c s s
164 l l l.
165 I/O events
166 Event:Poll flag:Occurrence
167 Read:POLLIN:T{
168 New data arrived.
169 T}
170 Read:POLLIN:T{
171 A connection setup has been completed
172 (for connection-oriented sockets)
173 T}
174 Read:POLLHUP:T{
175 A disconnection request has been initiated by the other end.
176 T}
177 Read:POLLHUP:T{
178 A connection is broken (only for connection-oriented protocols).
179 When the socket is written
180 .B SIGPIPE
181 is also sent.
182 T}
183 Write:POLLOUT:T{
184 Socket has enough send buffer space for writing new data.
185 T}
186 Read/Write:T{
187 POLLIN |
188 .br
189 POLLOUT
190 T}:T{
191 An outgoing
192 .BR connect (2)
193 finished.
194 T}
195 Read/Write:POLLERR:An asynchronous error occurred.
196 Read/Write:POLLHUP:The other end has shut down one direction.
197 Exception:POLLPRI:T{
198 Urgent data arrived.
199 .B SIGURG
200 is sent then.
201 T}
202 .\" FIXME . The following is not true currently:
203 .\" It is no I/O event when the connection
204 .\" is broken from the local end using
205 .\" .BR shutdown (2)
206 .\" or
207 .\" .BR close (2).
208 .TE
209 .PP
210 An alternative to
211 .BR poll (2)
212 and
213 .BR select (2)
214 is to let the kernel inform the application about events
215 via a
216 .B SIGIO
217 signal.
218 For that the
219 .B O_ASYNC
220 flag must be set on a socket file descriptor via
221 .BR fcntl (2)
222 and a valid signal handler for
223 .B SIGIO
224 must be installed via
225 .BR sigaction (2).
226 See the
227 .I Signals
228 discussion below.
229 .SS Socket address structures
230 Each socket domain has its own format for socket addresses,
231 with a domain-specific address structure.
232 Each of these structures begins with an
233 integer "family" field (typed as
234 .IR sa_family_t )
235 that indicates the type of the address structure.
236 This allows
237 the various system calls (e.g.,
238 .BR connect (2),
239 .BR bind (2),
240 .BR accept (2),
241 .BR getsockname (2),
242 .BR getpeername (2)),
243 which are generic to all socket domains,
244 to determine the domain of a particular socket address.
245
246 To allow any type of socket address to be passed to
247 interfaces in the sockets API,
248 the type
249 .IR "struct sockaddr"
250 is defined.
251 The purpose of this type is purely to allow casting of
252 domain-specific socket address types to a "generic" type,
253 so as to avoid compiler warnings about type mismatches in
254 calls to the sockets API.
255
256 In addition, the sockets API provides the data type
257 .IR "struct sockaddr_storage".
258 This type
259 is suitable to accommodate all supported domain-specific socket
260 address structures; it is large enough and is aligned properly.
261 (In particular, it is large enough to hold
262 IPv6 socket addresses.)
263 The structure includes the following field, which can be used to identify
264 the type of socket address actually stored in the structure:
265
266 .in +4n
267 .nf
268 sa_family_t ss_family;
269 .fi
270 .in
271
272 The
273 .I sockaddr_storage
274 structure is useful in programs that must handle socket addresses
275 in a generic way
276 (e.g., programs that must deal with both IPv4 and IPv6 socket addresses).
277 .SS Socket options
278 The socket options listed below can be set by using
279 .BR setsockopt (2)
280 and read with
281 .BR getsockopt (2)
282 with the socket level set to
283 .B SOL_SOCKET
284 for all sockets.
285 Unless otherwise noted,
286 .I optval
287 is a pointer to an
288 .IR int .
289 .\" FIXME .
290 .\" In the list below, the text used to describe argument types
291 .\" for each socket option should be more consistent
292 .\"
293 .\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
294 .\" W R Stevens, UNPv1
295 .TP
296 .B SO_ACCEPTCONN
297 Returns a value indicating whether or not this socket has been marked
298 to accept connections with
299 .BR listen (2).
300 The value 0 indicates that this is not a listening socket,
301 the value 1 indicates that this is a listening socket.
302 This socket option is read-only.
303 .TP
304 .BR SO_ATTACH_FILTER " (since Linux 2.2), " SO_ATTACH_BPF " (since Linux 3.19)"
305 Attach a classic BPF
306 .RB ( SO_ATTACH_FILTER )
307 or an extended BPF
308 .RB ( SO_ATTACH_BPF )
309 program to the socket for use as a filter of incoming packets.
310 A packet will be dropped if the filter program returns zero.
311 If the filter program returns a
312 non-zero value which is less than the packet's data length,
313 the packet will be truncated to the length returned.
314 If the value returned by the filter is greater than or equal to the
315 packet's data length, the packet is allowed to proceed unmodified.
316
317 The argument for
318 .BR SO_ATTACH_FILTER
319 is a
320 .I sock_fprog
321 structure, defined in
322 .IR <linux/filter.h> :
323 .sp
324 .in +4n
325 .nf
326 struct sock_fprog {
327 unsigned short len;
328 struct sock_filter *filter;
329 };
330 .fi
331 .in
332 .IP
333 The argument for
334 .BR SO_ATTACH_BPF
335 is a file descriptor returned by the
336 .BR bpf (2)
337 system call and must refer to a program of type
338 .BR BPF_PROG_TYPE_SOCKET_FILTER.
339
340 These options may be set multiple times for a given socket,
341 each time replacing the previous filter program.
342 The classic and extended versions may be called on the same socket,
343 but the previous filter will always be replaced such that a socket
344 never has more than one filter defined.
345
346 Both classic and extended BPF are explained in the kernel source file
347 .I Documentation/networking/filter.txt
348 .TP
349 .BR SO_ATTACH_REUSEPORT_CBPF ", " SO_ATTACH_REUSEPORT_EBPF
350 For use with the
351 .BR SO_REUSEPORT
352 option, these options allow the user to set a classic BPF
353 .RB ( SO_ATTACH_REUSEPORT_CBPF )
354 or an extended BPF
355 .RB ( SO_ATTACH_REUSEPORT_EBPF )
356 program which defines how packets are assigned to
357 the sockets in the reuseport group (that is, all sockets which have
358 .BR SO_REUSEPORT
359 set and are using the same local address to receive packets).
360
361 The BPF program must return an index between 0 and N\-1 representing
362 the socket which should receive the packet
363 (where N is the number of sockets in the group).
364 If the BPF program returns an invalid index,
365 socket selection will fall back to the plain
366 .BR SO_REUSEPORT
367 mechanism.
368
369 Sockets are numbered in the order in which they are added to the group
370 (that is, the order of
371 .BR bind (2)
372 calls for UDP sockets or the order of
373 .BR listen (2)
374 calls for TCP sockets).
375 New sockets added to a reuseport group will inherit the BPF program.
376 When a socket is removed from a reuseport group (via
377 .BR close (2)),
378 the last socket in the group will be moved into the closed socket's
379 position.
380
381 These options may be set repeatedly at any time on any socket in the group
382 to replace the current BPF program used by all sockets in the group.
383
384 .BR SO_ATTACH_REUSEPORT_CBPF
385 takes the same argument type as
386 .BR SO_ATTACH_FILTER
387 and
388 .BR SO_ATTACH_REUSEPORT_EBPF
389 takes the same argument type as
390 .BR SO_ATTACH_BPF.
391
392 UDP support for this feature is available since Linux 4.5;
393 TCP support is available since Linux 4.6.
394 .TP
395 .B SO_BINDTODEVICE
396 Bind this socket to a particular device like \(lqeth0\(rq,
397 as specified in the passed interface name.
398 If the
399 name is an empty string or the option length is zero, the socket device
400 binding is removed.
401 The passed option is a variable-length null-terminated
402 interface name string with the maximum size of
403 .BR IFNAMSIZ .
404 If a socket is bound to an interface,
405 only packets received from that particular interface are processed by the
406 socket.
407 Note that this works only for some socket types, particularly
408 .B AF_INET
409 sockets.
410 It is not supported for packet sockets (use normal
411 .BR bind (2)
412 there).
413
414 Before Linux 3.8,
415 this socket option could be set, but could not retrieved with
416 .BR getsockopt (2).
417 Since Linux 3.8, it is readable.
418 The
419 .I optlen
420 argument should contain the buffer size available
421 to receive the device name and is recommended to be
422 .BR IFNAMSZ
423 bytes.
424 The real device name length is reported back in the
425 .I optlen
426 argument.
427 .TP
428 .B SO_BROADCAST
429 Set or get the broadcast flag.
430 When enabled, datagram sockets are allowed to send
431 packets to a broadcast address.
432 This option has no effect on stream-oriented sockets.
433 .TP
434 .B SO_BSDCOMPAT
435 Enable BSD bug-to-bug compatibility.
436 This is used by the UDP protocol module in Linux 2.0 and 2.2.
437 If enabled, ICMP errors received for a UDP socket will not be passed
438 to the user program.
439 In later kernel versions, support for this option has been phased out:
440 Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
441 (printk()) if a program uses this option.
442 Linux 2.0 also enabled BSD bug-to-bug compatibility
443 options (random header changing, skipping of the broadcast flag) for raw
444 sockets with this option, but that was removed in Linux 2.2.
445 .TP
446 .B SO_DEBUG
447 Enable socket debugging.
448 Only allowed for processes with the
449 .B CAP_NET_ADMIN
450 capability or an effective user ID of 0.
451 .TP
452 .BR SO_DETACH_FILTER " (since Linux 2.2), " SO_DETACH_BPF " (since Linux 3.19)"
453 These two options, which are synonyms,
454 may be used to remove the classic or extended BPF
455 program attached to a socket with either
456 .BR SO_ATTACH_FILTER
457 or
458 .BR SO_ATTACH_BPF .
459 The option value is ignored.
460 .TP
461 .BR SO_DOMAIN " (since Linux 2.6.32)"
462 Retrieves the socket domain as an integer, returning a value such as
463 .BR AF_INET6 .
464 See
465 .BR socket (2)
466 for details.
467 This socket option is read-only.
468 .TP
469 .B SO_ERROR
470 Get and clear the pending socket error.
471 This socket option is read-only.
472 Expects an integer.
473 .TP
474 .B SO_DONTROUTE
475 Don't send via a gateway, send only to directly connected hosts.
476 The same effect can be achieved by setting the
477 .B MSG_DONTROUTE
478 flag on a socket
479 .BR send (2)
480 operation.
481 Expects an integer boolean flag.
482 .TP
483 .B SO_KEEPALIVE
484 Enable sending of keep-alive messages on connection-oriented sockets.
485 Expects an integer boolean flag.
486 .TP
487 .B SO_LINGER
488 Sets or gets the
489 .B SO_LINGER
490 option.
491 The argument is a
492 .I linger
493 structure.
494 .sp
495 .in +4n
496 .nf
497 struct linger {
498 int l_onoff; /* linger active */
499 int l_linger; /* how many seconds to linger for */
500 };
501 .fi
502 .in
503 .IP
504 When enabled, a
505 .BR close (2)
506 or
507 .BR shutdown (2)
508 will not return until all queued messages for the socket have been
509 successfully sent or the linger timeout has been reached.
510 Otherwise,
511 the call returns immediately and the closing is done in the background.
512 When the socket is closed as part of
513 .BR exit (2),
514 it always lingers in the background.
515 .TP
516 .B SO_LOCK_FILTER
517 .\" commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
518 When set, this option will prevent
519 changing the filters associated with the socket.
520 These filters include any set using the socket options
521 .BR SO_ATTACH_FILTER,
522 .BR SO_ATTACH_BPF,
523 .BR SO_ATTACH_REUSEPORT_CBPF
524 and
525 .BR SO_ATTACH_REUSEPORT_EPBF .
526
527 The typical use case is for a privileged process to set up a raw socket
528 (an operation that requires the
529 .BR CAP_NET_RAW
530 capability), apply a restrictive filter, set the
531 .BR SO_LOCK_FILTER
532 option,
533 and then either drop its privileges or pass the socket file descriptor
534 to an unprivileged process via a UNIX domain socket.
535
536 Once the
537 .BR SO_LOCK_FILTER
538 option has been enabled, attempts to change or remove the filter
539 attached to a socket, or to disable the
540 .BR SO_LOCK_FILTER
541 option will fail with the error
542 .BR EPERM .
543 .TP
544 .BR SO_MARK " (since Linux 2.6.25)"
545 .\" commit 4a19ec5800fc3bb64e2d87c4d9fdd9e636086fe0
546 .\" and 914a9ab386a288d0f22252fc268ecbc048cdcbd5
547 Set the mark for each packet sent through this socket
548 (similar to the netfilter MARK target but socket-based).
549 Changing the mark can be used for mark-based
550 routing without netfilter or for packet filtering.
551 Setting this option requires the
552 .B CAP_NET_ADMIN
553 capability.
554 .TP
555 .B SO_OOBINLINE
556 If this option is enabled,
557 out-of-band data is directly placed into the receive data stream.
558 Otherwise, out-of-band data is passed only when the
559 .B MSG_OOB
560 flag is set during receiving.
561 .\" don't document it because it can do too much harm.
562 .\".B SO_NO_CHECK
563 .\" The kernel has support for the SO_NO_CHECK socket
564 .\" option (boolean: 0 == default, calculate checksum on xmit,
565 .\" 1 == do not calculate checksum on xmit).
566 .\" Additional note from Andi Kleen on SO_NO_CHECK (2010-08-30)
567 .\" On Linux UDP checksums are essentially free and there's no reason
568 .\" to turn them off and it would disable another safety line.
569 .\" That is why I didn't document the option.
570 .TP
571 .B SO_PASSCRED
572 Enable or disable the receiving of the
573 .B SCM_CREDENTIALS
574 control message.
575 For more information see
576 .BR unix (7).
577 .\" FIXME Document SO_PASSSEC, added in 2.6.18; there is some info
578 .\" in the 2.6.18 ChangeLog
579 .TP
580 .BR SO_PEEK_OFF " (since Linux 3.4)"
581 .\" commit ef64a54f6e558155b4f149bb10666b9e914b6c54
582 This option, which is currently supported only for
583 .BR unix (7)
584 sockets, sets the value of the "peek offset" for the
585 .BR recv (2)
586 system call when used with
587 .BR MSG_PEEK
588 flag.
589
590 When this option is set to a negative value
591 (it is set to \-1 for all new sockets),
592 traditional behavior is provided:
593 .BR recv (2)
594 with the
595 .BR MSG_PEEK
596 flag will peek data from the front of the queue.
597
598 When the option is set to a value greater than or equal to zero,
599 then the next peek at data queued in the socket will occur at
600 the byte offset specified by the option value.
601 At the same time, the "peek offset" will be
602 incremented by the number of bytes that were peeked from the queue,
603 so that a subsequent peek will return the next data in the queue.
604
605 If data is removed from the front of the queue via a call to
606 .BR recv (2)
607 (or similar) without the
608 .BR MSG_PEEK
609 flag, the "peek offset" will be decreased by the number of bytes removed.
610 In other words, receiving data without the
611 .B MSG_PEEK
612 flag will cause the "peek offset" to be adjusted to maintain
613 the correct relative position in the queued data,
614 so that a subsequent peek will retrieve the data that would have been
615 retrieved had the data not been removed.
616
617 For datagram sockets, if the "peek offset" points to the middle of a packet,
618 the data returned will be marked with the
619 .BR MSG_TRUNC
620 flag.
621
622 The following example serves to illustrate the use of
623 .BR SO_PEEK_OFF .
624 Suppose a stream socket has the following queued input data:
625
626 aabbccddeeff
627 .IP
628 The following sequence of
629 .BR recv (2)
630 calls would have the effect noted in the comments:
631
632 .in +4n
633 .nf
634 int ov = 4; // Set peek offset to 4
635 setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
636
637 recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
638 recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
639 recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
640 recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
641 .fi
642 .in
643 .TP
644 .B SO_PEERCRED
645 Return the credentials of the foreign process connected to this socket.
646 This is possible only for connected
647 .B AF_UNIX
648 stream sockets and
649 .B AF_UNIX
650 stream and datagram socket pairs created using
651 .BR socketpair (2);
652 see
653 .BR unix (7).
654 The returned credentials are those that were in effect at the time
655 of the call to
656 .BR connect (2)
657 or
658 .BR socketpair (2).
659 The argument is a
660 .I ucred
661 structure; define the
662 .B _GNU_SOURCE
663 feature test macro to obtain the definition of that structure from
664 .IR <sys/socket.h> .
665 This socket option is read-only.
666 .TP
667 .B SO_PRIORITY
668 Set the protocol-defined priority for all packets to be sent on
669 this socket.
670 Linux uses this value to order the networking queues:
671 packets with a higher priority may be processed first depending
672 on the selected device queueing discipline.
673 .\" For
674 .\" .BR ip (7),
675 .\" this also sets the IP type-of-service (TOS) field for outgoing packets.
676 Setting a priority outside the range 0 to 6 requires the
677 .B CAP_NET_ADMIN
678 capability.
679 .TP
680 .BR SO_PROTOCOL " (since Linux 2.6.32)"
681 Retrieves the socket protocol as an integer, returning a value such as
682 .BR IPPROTO_SCTP .
683 See
684 .BR socket (2)
685 for details.
686 This socket option is read-only.
687 .TP
688 .B SO_RCVBUF
689 Sets or gets the maximum socket receive buffer in bytes.
690 The kernel doubles this value (to allow space for bookkeeping overhead)
691 when it is set using
692 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
693 .BR setsockopt (2),
694 and this doubled value is returned by
695 .BR getsockopt (2).
696 .\" The following thread on LMKL is quite informative:
697 .\" getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behavior
698 .\" 17 July 2012
699 .\" http://thread.gmane.org/gmane.linux.kernel/1328935
700 The default value is set by the
701 .I /proc/sys/net/core/rmem_default
702 file, and the maximum allowed value is set by the
703 .I /proc/sys/net/core/rmem_max
704 file.
705 The minimum (doubled) value for this option is 256.
706 .TP
707 .BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
708 Using this socket option, a privileged
709 .RB ( CAP_NET_ADMIN )
710 process can perform the same task as
711 .BR SO_RCVBUF ,
712 but the
713 .I rmem_max
714 limit can be overridden.
715 .TP
716 .BR SO_RCVLOWAT " and " SO_SNDLOWAT
717 Specify the minimum number of bytes in the buffer until the socket layer
718 will pass the data to the protocol
719 .RB ( SO_SNDLOWAT )
720 or the user on receiving
721 .RB ( SO_RCVLOWAT ).
722 These two values are initialized to 1.
723 .B SO_SNDLOWAT
724 is not changeable on Linux
725 .RB ( setsockopt (2)
726 fails with the error
727 .BR ENOPROTOOPT ).
728 .B SO_RCVLOWAT
729 is changeable
730 only since Linux 2.4.
731 The
732 .BR select (2)
733 and
734 .BR poll (2)
735 system calls currently do not respect the
736 .B SO_RCVLOWAT
737 setting on Linux,
738 and mark a socket readable when even a single byte of data is available.
739 A subsequent read from the socket will block until
740 .B SO_RCVLOWAT
741 bytes are available.
742 .\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
743 .\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
744 .TP
745 .BR SO_RCVTIMEO " and " SO_SNDTIMEO
746 .\" Not implemented in 2.0.
747 .\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
748 .\" Implemented in 2.3.41 for setsockopt, and actually used.
749 Specify the receiving or sending timeouts until reporting an error.
750 The argument is a
751 .IR "struct timeval" .
752 If an input or output function blocks for this period of time, and
753 data has been sent or received, the return value of that function
754 will be the amount of data transferred; if no data has been transferred
755 and the timeout has been reached, then \-1 is returned with
756 .I errno
757 set to
758 .BR EAGAIN
759 or
760 .BR EWOULDBLOCK ,
761 .\" in fact to EAGAIN
762 or
763 .B EINPROGRESS
764 (for
765 .BR connect (2))
766 just as if the socket was specified to be nonblocking.
767 If the timeout is set to zero (the default),
768 then the operation will never timeout.
769 Timeouts only have effect for system calls that perform socket I/O (e.g.,
770 .BR read (2),
771 .BR recvmsg (2),
772 .BR send (2),
773 .BR sendmsg (2));
774 timeouts have no effect for
775 .BR select (2),
776 .BR poll (2),
777 .BR epoll_wait (2),
778 and so on.
779 .TP
780 .B SO_REUSEADDR
781 .\" commit c617f398edd4db2b8567a28e899a88f8f574798d
782 .\" https://lwn.net/Articles/542629/
783 Indicates that the rules used in validating addresses supplied in a
784 .BR bind (2)
785 call should allow reuse of local addresses.
786 For
787 .B AF_INET
788 sockets this
789 means that a socket may bind, except when there
790 is an active listening socket bound to the address.
791 When the listening socket is bound to
792 .B INADDR_ANY
793 with a specific port then it is not possible
794 to bind to this port for any local address.
795 Argument is an integer boolean flag.
796 .TP
797 .BR SO_REUSEPORT " (since Linux 3.9)"
798 Permits multiple
799 .B AF_INET
800 or
801 .B AF_INET6
802 sockets to be bound to an identical socket address.
803 This option must be set on each socket (including the first socket)
804 prior to calling
805 .BR bind (2)
806 on the socket.
807 To prevent port hijacking,
808 all of the processes binding to the same address must have the same
809 effective UID.
810 This option can be employed with both TCP and UDP sockets.
811
812 For TCP sockets, this option allows
813 .BR accept (2)
814 load distribution in a multi-threaded server to be improved by
815 using a distinct listener socket for each thread.
816 This provides improved load distribution as compared
817 to traditional techniques such using a single
818 .BR accept (2)ing
819 thread that distributes connections,
820 or having multiple threads that compete to
821 .BR accept (2)
822 from the same socket.
823
824 For UDP sockets,
825 the use of this option can provide better distribution
826 of incoming datagrams to multiple processes (or threads) as compared
827 to the traditional technique of having multiple processes
828 compete to receive datagrams on the same socket.
829 .TP
830 .BR SO_RXQ_OVFL " (since Linux 2.6.33)"
831 .\" commit 3b885787ea4112eaa80945999ea0901bf742707f
832 Indicates that an unsigned 32-bit value ancillary message (cmsg)
833 should be attached to received skbs indicating
834 the number of packets dropped by the socket between
835 the last received packet and this received packet.
836 .TP
837 .B SO_SNDBUF
838 Sets or gets the maximum socket send buffer in bytes.
839 The kernel doubles this value (to allow space for bookkeeping overhead)
840 when it is set using
841 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
842 .\" See also the comment to SO_RCVBUF (17 Jul 2012 LKML mail)
843 .BR setsockopt (2),
844 and this doubled value is returned by
845 .BR getsockopt (2).
846 The default value is set by the
847 .I /proc/sys/net/core/wmem_default
848 file and the maximum allowed value is set by the
849 .I /proc/sys/net/core/wmem_max
850 file.
851 The minimum (doubled) value for this option is 2048.
852 .TP
853 .BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
854 Using this socket option, a privileged
855 .RB ( CAP_NET_ADMIN )
856 process can perform the same task as
857 .BR SO_SNDBUF ,
858 but the
859 .I wmem_max
860 limit can be overridden.
861 .TP
862 .B SO_TIMESTAMP
863 Enable or disable the receiving of the
864 .B SO_TIMESTAMP
865 control message.
866 The timestamp control message is sent with level
867 .B SOL_SOCKET
868 and the
869 .I cmsg_data
870 field is a
871 .I "struct timeval"
872 indicating the
873 reception time of the last packet passed to the user in this call.
874 See
875 .BR cmsg (3)
876 for details on control messages.
877 .TP
878 .B SO_TYPE
879 Gets the socket type as an integer (e.g.,
880 .BR SOCK_STREAM ).
881 This socket option is read-only.
882 .TP
883 .BR SO_BUSY_POLL " (since Linux 3.11)"
884 Sets the approximate time in microseconds to busy poll on a blocking receive
885 when there is no data.
886 Increasing this value requires
887 .BR CAP_NET_ADMIN .
888 The default for this option is controlled by the
889 .I /proc/sys/net/core/busy_read
890 file.
891
892 The value in the
893 .I /proc/sys/net/core/busy_poll
894 file determines how long
895 .BR select (2)
896 and
897 .BR poll (2)
898 will busy poll when they operate on sockets with
899 .BR SO_BUSY_POLL
900 set and no events to report are found.
901
902 In both cases,
903 busy polling will only be done when the socket last received data
904 from a network device that supports this option.
905
906 While busy polling may improve latency of some applications,
907 care must be taken when using it since this will increase
908 both CPU utilization and power usage.
909 .SS Signals
910 When writing onto a connection-oriented socket that has been shut down
911 (by the local or the remote end)
912 .B SIGPIPE
913 is sent to the writing process and
914 .B EPIPE
915 is returned.
916 The signal is not sent when the write call
917 specified the
918 .B MSG_NOSIGNAL
919 flag.
920 .PP
921 When requested with the
922 .B FIOSETOWN
923 .BR fcntl (2)
924 or
925 .B SIOCSPGRP
926 .BR ioctl (2),
927 .B SIGIO
928 is sent when an I/O event occurs.
929 It is possible to use
930 .BR poll (2)
931 or
932 .BR select (2)
933 in the signal handler to find out which socket the event occurred on.
934 An alternative (in Linux 2.2) is to set a real-time signal using the
935 .B F_SETSIG
936 .BR fcntl (2);
937 the handler of the real time signal will be called with
938 the file descriptor in the
939 .I si_fd
940 field of its
941 .IR siginfo_t .
942 See
943 .BR fcntl (2)
944 for more information.
945 .PP
946 Under some circumstances (e.g., multiple processes accessing a
947 single socket), the condition that caused the
948 .B SIGIO
949 may have already disappeared when the process reacts to the signal.
950 If this happens, the process should wait again because Linux
951 will resend the signal later.
952 .\" .SS Ancillary messages
953 .SS /proc interfaces
954 The core socket networking parameters can be accessed
955 via files in the directory
956 .IR /proc/sys/net/core/ .
957 .TP
958 .I rmem_default
959 contains the default setting in bytes of the socket receive buffer.
960 .TP
961 .I rmem_max
962 contains the maximum socket receive buffer size in bytes which a user may
963 set by using the
964 .B SO_RCVBUF
965 socket option.
966 .TP
967 .I wmem_default
968 contains the default setting in bytes of the socket send buffer.
969 .TP
970 .I wmem_max
971 contains the maximum socket send buffer size in bytes which a user may
972 set by using the
973 .B SO_SNDBUF
974 socket option.
975 .TP
976 .IR message_cost " and " message_burst
977 configure the token bucket filter used to load limit warning messages
978 caused by external network events.
979 .TP
980 .I netdev_max_backlog
981 Maximum number of packets in the global input queue.
982 .TP
983 .I optmem_max
984 Maximum length of ancillary data and user control data like the iovecs
985 per socket.
986 .\" netdev_fastroute is not documented because it is experimental
987 .SS Ioctls
988 These operations can be accessed using
989 .BR ioctl (2):
990
991 .in +4n
992 .nf
993 .IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
994 .fi
995 .in
996 .TP
997 .B SIOCGSTAMP
998 Return a
999 .I struct timeval
1000 with the receive timestamp of the last packet passed to the user.
1001 This is useful for accurate round trip time measurements.
1002 See
1003 .BR setitimer (2)
1004 for a description of
1005 .IR "struct timeval" .
1006 .\"
1007 This ioctl should be used only if the socket option
1008 .B SO_TIMESTAMP
1009 is not set on the socket.
1010 Otherwise, it returns the timestamp of the
1011 last packet that was received while
1012 .B SO_TIMESTAMP
1013 was not set, or it fails if no such packet has been received,
1014 (i.e.,
1015 .BR ioctl (2)
1016 returns \-1 with
1017 .I errno
1018 set to
1019 .BR ENOENT ).
1020 .TP
1021 .B SIOCSPGRP
1022 Set the process or process group that is to receive
1023 .B SIGIO
1024 or
1025 .B SIGURG
1026 signals when I/O becomes possible or urgent data is available.
1027 The argument is a pointer to a
1028 .IR pid_t .
1029 For further details, see the description of
1030 .BR F_SETOWN
1031 in
1032 .BR fcntl (2).
1033 .TP
1034 .B FIOASYNC
1035 Change the
1036 .B O_ASYNC
1037 flag to enable or disable asynchronous I/O mode of the socket.
1038 Asynchronous I/O mode means that the
1039 .B SIGIO
1040 signal or the signal set with
1041 .B F_SETSIG
1042 is raised when a new I/O event occurs.
1043 .IP
1044 Argument is an integer boolean flag.
1045 (This operation is synonymous with the use of
1046 .BR fcntl (2)
1047 to set the
1048 .B O_ASYNC
1049 flag.)
1050 .\"
1051 .TP
1052 .B SIOCGPGRP
1053 Get the current process or process group that receives
1054 .B SIGIO
1055 or
1056 .B SIGURG
1057 signals,
1058 or 0
1059 when none is set.
1060 .PP
1061 Valid
1062 .BR fcntl (2)
1063 operations:
1064 .TP
1065 .B FIOGETOWN
1066 The same as the
1067 .B SIOCGPGRP
1068 .BR ioctl (2).
1069 .TP
1070 .B FIOSETOWN
1071 The same as the
1072 .B SIOCSPGRP
1073 .BR ioctl (2).
1074 .SH VERSIONS
1075 .B SO_BINDTODEVICE
1076 was introduced in Linux 2.0.30.
1077 .B SO_PASSCRED
1078 is new in Linux 2.2.
1079 The
1080 .I /proc
1081 interfaces were introduced in Linux 2.2.
1082 .B SO_RCVTIMEO
1083 and
1084 .B SO_SNDTIMEO
1085 are supported since Linux 2.3.41.
1086 Earlier, timeouts were fixed to
1087 a protocol-specific setting, and could not be read or written.
1088 .SH NOTES
1089 Linux assumes that half of the send/receive buffer is used for internal
1090 kernel structures; thus the values in the corresponding
1091 .I /proc
1092 files are twice what can be observed on the wire.
1093
1094 Linux will allow port reuse only with the
1095 .B SO_REUSEADDR
1096 option
1097 when this option was set both in the previous program that performed a
1098 .BR bind (2)
1099 to the port and in the program that wants to reuse the port.
1100 This differs from some implementations (e.g., FreeBSD)
1101 where only the later program needs to set the
1102 .B SO_REUSEADDR
1103 option.
1104 Typically this difference is invisible, since, for example, a server
1105 program is designed to always set this option.
1106 .\" .SH AUTHORS
1107 .\" This man page was written by Andi Kleen.
1108 .SH SEE ALSO
1109 .BR wireshark (1),
1110 .BR bpf (2),
1111 .BR connect (2),
1112 .BR getsockopt (2),
1113 .BR setsockopt (2),
1114 .BR socket (2),
1115 .BR pcap (3),
1116 .BR capabilities (7),
1117 .BR ddp (7),
1118 .BR ip (7),
1119 .BR packet (7),
1120 .BR tcp (7),
1121 .BR udp (7),
1122 .BR unix (7),
1123 .BR tcpdump (8)