]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/socket.7
socket.7: Fix description of SO_LOCK_FILTER
[thirdparty/man-pages.git] / man7 / socket.7
1 '\" t
2 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3 .\" and copyright (c) 1999 Matthew Wilcox.
4 .\"
5 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
6 .\" Permission is granted to distribute possibly modified copies
7 .\" of this page provided the header is included verbatim,
8 .\" and in case of nontrivial modification author and date
9 .\" of the modification is added to the header.
10 .\" %%%LICENSE_END
11 .\"
12 .\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
13 .\" Added description of SO_ACCEPTCONN
14 .\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
15 .\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
16 .\" Added notes on capability requirements
17 .\" A few small grammar fixes
18 .\" 2010-06-13 Jan Engelhardt <jengelh@medozas.de>
19 .\" Documented SO_DOMAIN and SO_PROTOCOL.
20 .\" FIXME
21 .\" The following are not yet documented:
22 .\" SO_PEERNAME (2.4?)
23 .\" get only
24 .\" Seems to do something similar to getpeername(), but then
25 .\" why is it necessary / how does it differ?
26 .\" SO_TIMESTAMPNS (2.6.22)
27 .\" Documentation/networking/timestamping.txt
28 .\" commit 92f37fd2ee805aa77925c1e64fd56088b46094fc
29 .\" Author: Eric Dumazet <dada1@cosmosbay.com>
30 .\" SO_TIMESTAMPING (2.6.30)
31 .\" Documentation/networking/timestamping.txt
32 .\" commit cb9eff097831007afb30d64373f29d99825d0068
33 .\" Author: Patrick Ohly <patrick.ohly@intel.com>
34 .\" SO_WIFI_STATUS (3.3)
35 .\" commit 6e3e939f3b1bf8534b32ad09ff199d88800835a0
36 .\" Author: Johannes Berg <johannes.berg@intel.com>
37 .\" Also: SCM_WIFI_STATUS
38 .\" SO_NOFCS (3.4)
39 .\" commit 3bdc0eba0b8b47797f4a76e377dd8360f317450f
40 .\" Author: Ben Greear <greearb@candelatech.com>
41 .\" SO_GET_FILTER (3.8)
42 .\" commit a8fc92778080c845eaadc369a0ecf5699a03bef0
43 .\" Author: Pavel Emelyanov <xemul@parallels.com>
44 .\" SO_SELECT_ERR_QUEUE (3.10)
45 .\" commit 7d4c04fc170087119727119074e72445f2bb192b
46 .\" Author: Keller, Jacob E <jacob.e.keller@intel.com>
47 .\" SO_MAX_PACING_RATE (3.13)
48 .\" commit 62748f32d501f5d3712a7c372bbb92abc7c62bc7
49 .\" Author: Eric Dumazet <edumazet@google.com>
50 .\" SO_BPF_EXTENSIONS (3.14)
51 .\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
52 .\" Author: Michal Sekletar <msekleta@redhat.com>
53 .\"
54 .TH SOCKET 7 2015-05-07 Linux "Linux Programmer's Manual"
55 .SH NAME
56 socket \- Linux socket interface
57 .SH SYNOPSIS
58 .B #include <sys/socket.h>
59 .sp
60 .IB sockfd " = socket(int " socket_family ", int " socket_type ", int " protocol );
61 .SH DESCRIPTION
62 This manual page describes the Linux networking socket layer user
63 interface.
64 The BSD compatible sockets
65 are the uniform interface
66 between the user process and the network protocol stacks in the kernel.
67 The protocol modules are grouped into
68 .I protocol families
69 such as
70 .BR AF_INET ", " AF_IPX ", and " AF_PACKET ,
71 and
72 .I socket types
73 such as
74 .B SOCK_STREAM
75 or
76 .BR SOCK_DGRAM .
77 See
78 .BR socket (2)
79 for more information on families and types.
80 .SS Socket-layer functions
81 These functions are used by the user process to send or receive packets
82 and to do other socket operations.
83 For more information see their respective manual pages.
84
85 .BR socket (2)
86 creates a socket,
87 .BR connect (2)
88 connects a socket to a remote socket address,
89 the
90 .BR bind (2)
91 function binds a socket to a local socket address,
92 .BR listen (2)
93 tells the socket that new connections shall be accepted, and
94 .BR accept (2)
95 is used to get a new socket with a new incoming connection.
96 .BR socketpair (2)
97 returns two connected anonymous sockets (implemented only for a few
98 local families like
99 .BR AF_UNIX )
100 .PP
101 .BR send (2),
102 .BR sendto (2),
103 and
104 .BR sendmsg (2)
105 send data over a socket, and
106 .BR recv (2),
107 .BR recvfrom (2),
108 .BR recvmsg (2)
109 receive data from a socket.
110 .BR poll (2)
111 and
112 .BR select (2)
113 wait for arriving data or a readiness to send data.
114 In addition, the standard I/O operations like
115 .BR write (2),
116 .BR writev (2),
117 .BR sendfile (2),
118 .BR read (2),
119 and
120 .BR readv (2)
121 can be used to read and write data.
122 .PP
123 .BR getsockname (2)
124 returns the local socket address and
125 .BR getpeername (2)
126 returns the remote socket address.
127 .BR getsockopt (2)
128 and
129 .BR setsockopt (2)
130 are used to set or get socket layer or protocol options.
131 .BR ioctl (2)
132 can be used to set or read some other options.
133 .PP
134 .BR close (2)
135 is used to close a socket.
136 .BR shutdown (2)
137 closes parts of a full-duplex socket connection.
138 .PP
139 Seeking, or calling
140 .BR pread (2)
141 or
142 .BR pwrite (2)
143 with a nonzero position is not supported on sockets.
144 .PP
145 It is possible to do nonblocking I/O on sockets by setting the
146 .B O_NONBLOCK
147 flag on a socket file descriptor using
148 .BR fcntl (2).
149 Then all operations that would block will (usually)
150 return with
151 .B EAGAIN
152 (operation should be retried later);
153 .BR connect (2)
154 will return
155 .B EINPROGRESS
156 error.
157 The user can then wait for various events via
158 .BR poll (2)
159 or
160 .BR select (2).
161 .TS
162 tab(:) allbox;
163 c s s
164 l l l.
165 I/O events
166 Event:Poll flag:Occurrence
167 Read:POLLIN:T{
168 New data arrived.
169 T}
170 Read:POLLIN:T{
171 A connection setup has been completed
172 (for connection-oriented sockets)
173 T}
174 Read:POLLHUP:T{
175 A disconnection request has been initiated by the other end.
176 T}
177 Read:POLLHUP:T{
178 A connection is broken (only for connection-oriented protocols).
179 When the socket is written
180 .B SIGPIPE
181 is also sent.
182 T}
183 Write:POLLOUT:T{
184 Socket has enough send buffer space for writing new data.
185 T}
186 Read/Write:T{
187 POLLIN|
188 .br
189 POLLOUT
190 T}:T{
191 An outgoing
192 .BR connect (2)
193 finished.
194 T}
195 Read/Write:POLLERR:An asynchronous error occurred.
196 Read/Write:POLLHUP:The other end has shut down one direction.
197 Exception:POLLPRI:T{
198 Urgent data arrived.
199 .B SIGURG
200 is sent then.
201 T}
202 .\" FIXME . The following is not true currently:
203 .\" It is no I/O event when the connection
204 .\" is broken from the local end using
205 .\" .BR shutdown (2)
206 .\" or
207 .\" .BR close (2).
208 .TE
209 .PP
210 An alternative to
211 .BR poll (2)
212 and
213 .BR select (2)
214 is to let the kernel inform the application about events
215 via a
216 .B SIGIO
217 signal.
218 For that the
219 .B O_ASYNC
220 flag must be set on a socket file descriptor via
221 .BR fcntl (2)
222 and a valid signal handler for
223 .B SIGIO
224 must be installed via
225 .BR sigaction (2).
226 See the
227 .I Signals
228 discussion below.
229 .SS Socket address structures
230 Each socket domain has its own format for socket addresses,
231 with a domain-specific address structure.
232 Each of these structures begins with an
233 integer "family" field (typed as
234 .IR sa_family_t )
235 that indicates the type of the address structure.
236 This allows
237 the various system calls (e.g.,
238 .BR connect (2),
239 .BR bind (2),
240 .BR accept (2),
241 .BR getsockname (2),
242 .BR getpeername (2)),
243 which are generic to all socket domains,
244 to determine the domain of a particular socket address.
245
246 To allow any type of socket address to be passed to
247 interfaces in the sockets API,
248 the type
249 .IR "struct sockaddr"
250 is defined.
251 The purpose of this type is purely to allow casting of
252 domain-specific socket address types to a "generic" type,
253 so as to avoid compiler warnings about type mismatches in
254 calls to the sockets API.
255
256 In addition, the sockets API provides the data type
257 .IR "struct sockaddr_storage".
258 This type
259 is suitable to accommodate all supported domain-specific socket
260 address structures; it is large enough and is aligned properly.
261 (In particular, it is large enough to hold
262 IPv6 socket addresses.)
263 The structure includes the following field, which can be used to identify
264 the type of socket address actually stored in the structure:
265
266 .in +4n
267 .nf
268 sa_family_t ss_family;
269 .fi
270 .in
271
272 The
273 .I sockaddr_storage
274 structure is useful in programs that must handle socket addresses
275 in a generic way
276 (e.g., programs that must deal with both IPv4 and IPv6 socket addresses).
277 .SS Socket options
278 The socket options listed below can be set by using
279 .BR setsockopt (2)
280 and read with
281 .BR getsockopt (2)
282 with the socket level set to
283 .B SOL_SOCKET
284 for all sockets.
285 Unless otherwise noted,
286 .I optval
287 is a pointer to an
288 .IR int .
289 .\" FIXME .
290 .\" In the list below, the text used to describe argument types
291 .\" for each socket option should be more consistent
292 .\"
293 .\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
294 .\" W R Stevens, UNPv1
295 .TP
296 .B SO_ACCEPTCONN
297 Returns a value indicating whether or not this socket has been marked
298 to accept connections with
299 .BR listen (2).
300 The value 0 indicates that this is not a listening socket,
301 the value 1 indicates that this is a listening socket.
302 This socket option is read-only.
303 .TP
304 .BR SO_ATTACH_FILTER " (since Linux 2.2), " SO_ATTACH_BPF " (since Linux 3.19)"
305 Attach a classic BPF
306 .RB ( SO_ATTACH_FILTER )
307 or an extended BPF
308 .RB ( SO_ATTACH_BPF )
309 program to the socket for use as a filter of incoming packets.
310 A packet will be dropped if the filter program returns zero.
311 If the filter program returns a
312 non-zero value which is less than the packet's data length,
313 the packet will be truncated to the length returned.
314 If the value returned by the filter is greater than or equal to the
315 packet's data length, the packet is allowed to proceed unmodified.
316
317 The argument for
318 .BR SO_ATTACH_FILTER
319 is a
320 .I sock_fprog
321 structure, defined in
322 .IR <linux/filter.h> :
323 .sp
324 .in +4n
325 .nf
326 struct sock_fprog {
327 unsigned short len;
328 struct sock_filter *filter;
329 };
330 .fi
331 .in
332 .IP
333 The argument for
334 .BR SO_ATTACH_BPF
335 is a file descriptor returned by the
336 .BR bpf (2)
337 system call and must refer to a program of type
338 .BR BPF_PROG_TYPE_SOCKET_FILTER.
339
340 These options may be set multiple times for a given socket,
341 each time replacing the previous filter program.
342 The classic and extended versions may be called on the same socket,
343 but the previous filter will always be replaced such that a socket
344 never has more than one filter defined.
345
346 Both classic and extended BPF are explained in the kernel source file
347 .I Documentation/networking/filter.txt
348 .TP
349 .BR SO_ATTACH_REUSEPORT_CBPF ", " SO_ATTACH_REUSEPORT_EBPF
350 For use with the
351 .BR SO_REUSEPORT
352 option, these options allow the user to set a classic BPF
353 .RB ( SO_ATTACH_REUSEPORT_CBPF )
354 or an extended BPF
355 .RB ( SO_ATTACH_REUSEPORT_EBPF )
356 program which defines how packets are assigned to
357 the sockets in the reuseport group (that is, all sockets which have
358 .BR SO_REUSEPORT
359 set and are using the same local address to receive packets).
360
361 The BPF program must return an index between 0 and N\-1 representing
362 the socket which should receive the packet
363 (where N is the number of sockets in the group).
364 If the BPF program returns an invalid index,
365 socket selection will fall back to the plain
366 .BR SO_REUSEPORT
367 mechanism.
368
369 Sockets are numbered in the order in which they are added to the group
370 (that is, the order of
371 .BR bind (2)
372 calls for UDP sockets or the order of
373 .BR listen (2)
374 calls for TCP sockets).
375 New sockets added to a reuseport group will inherit the BPF program.
376 When a socket is removed from a reuseport group (via
377 .BR close (2)),
378 the last socket in the group will be moved into the closed socket's
379 position.
380
381 These options may be set repeatedly at any time on any socket in the group
382 to replace the current BPF program used by all sockets in the group.
383
384 .BR SO_ATTACH_REUSEPORT_CBPF
385 takes the same argument type as
386 .BR SO_ATTACH_FILTER
387 and
388 .BR SO_ATTACH_REUSEPORT_EBPF
389 takes the same argument type as
390 .BR SO_ATTACH_BPF.
391
392 UDP support for this feature is available since Linux 4.5;
393 TCP support is available since Linux 4.6.
394 .TP
395 .B SO_BINDTODEVICE
396 Bind this socket to a particular device like \(lqeth0\(rq,
397 as specified in the passed interface name.
398 If the
399 name is an empty string or the option length is zero, the socket device
400 binding is removed.
401 The passed option is a variable-length null-terminated
402 interface name string with the maximum size of
403 .BR IFNAMSIZ .
404 If a socket is bound to an interface,
405 only packets received from that particular interface are processed by the
406 socket.
407 Note that this works only for some socket types, particularly
408 .B AF_INET
409 sockets.
410 It is not supported for packet sockets (use normal
411 .BR bind (2)
412 there).
413
414 Before Linux 3.8,
415 this socket option could be set, but could not retrieved with
416 .BR getsockopt (2).
417 Since Linux 3.8, it is readable.
418 The
419 .I optlen
420 argument should contain the buffer size available
421 to receive the device name and is recommended to be
422 .BR IFNAMSZ
423 bytes.
424 The real device name length is reported back in the
425 .I optlen
426 argument.
427 .TP
428 .B SO_BROADCAST
429 Set or get the broadcast flag.
430 When enabled, datagram sockets are allowed to send
431 packets to a broadcast address.
432 This option has no effect on stream-oriented sockets.
433 .TP
434 .B SO_BSDCOMPAT
435 Enable BSD bug-to-bug compatibility.
436 This is used by the UDP protocol module in Linux 2.0 and 2.2.
437 If enabled, ICMP errors received for a UDP socket will not be passed
438 to the user program.
439 In later kernel versions, support for this option has been phased out:
440 Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
441 (printk()) if a program uses this option.
442 Linux 2.0 also enabled BSD bug-to-bug compatibility
443 options (random header changing, skipping of the broadcast flag) for raw
444 sockets with this option, but that was removed in Linux 2.2.
445 .TP
446 .B SO_DEBUG
447 Enable socket debugging.
448 Only allowed for processes with the
449 .B CAP_NET_ADMIN
450 capability or an effective user ID of 0.
451 .TP
452 .BR SO_DETACH_FILTER " (since Linux 2.2), " SO_DETACH_BPF " (since Linux 3.19)"
453 These two options, which are synonyms,
454 may be used to remove the classic or extended BPF
455 program attached to a socket with either
456 .BR SO_ATTACH_FILTER
457 or
458 .BR SO_ATTACH_BPF .
459 The option value is ignored.
460 .TP
461 .BR SO_DOMAIN " (since Linux 2.6.32)"
462 Retrieves the socket domain as an integer, returning a value such as
463 .BR AF_INET6 .
464 See
465 .BR socket (2)
466 for details.
467 This socket option is read-only.
468 .TP
469 .B SO_ERROR
470 Get and clear the pending socket error.
471 This socket option is read-only.
472 Expects an integer.
473 .TP
474 .B SO_DONTROUTE
475 Don't send via a gateway, send only to directly connected hosts.
476 The same effect can be achieved by setting the
477 .B MSG_DONTROUTE
478 flag on a socket
479 .BR send (2)
480 operation.
481 Expects an integer boolean flag.
482 .TP
483 .B SO_KEEPALIVE
484 Enable sending of keep-alive messages on connection-oriented sockets.
485 Expects an integer boolean flag.
486 .TP
487 .B SO_LINGER
488 Sets or gets the
489 .B SO_LINGER
490 option.
491 The argument is a
492 .I linger
493 structure.
494 .sp
495 .in +4n
496 .nf
497 struct linger {
498 int l_onoff; /* linger active */
499 int l_linger; /* how many seconds to linger for */
500 };
501 .fi
502 .in
503 .IP
504 When enabled, a
505 .BR close (2)
506 or
507 .BR shutdown (2)
508 will not return until all queued messages for the socket have been
509 successfully sent or the linger timeout has been reached.
510 Otherwise,
511 the call returns immediately and the closing is done in the background.
512 When the socket is closed as part of
513 .BR exit (2),
514 it always lingers in the background.
515 .TP
516 .B SO_LOCK_FILTER
517 .\" commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
518 When set, this option will prevent
519 changing the filters associated with the socket.
520 These filters include any set using the socket options
521 .BR SO_ATTACH_FILTER,
522 .BR SO_ATTACH_BPF,
523 .BR SO_ATTACH_REUSEPORT_CBPF
524 and
525 .BR SO_ATTACH_REUSEPORT_EPBF .
526
527 The typical use case is for a privileged process to set up a socket with
528 restrictive filters, set
529 .BR SO_LOCK_FILTER ,
530 and then either drop its privileges or pass the socket file descriptor
531 to an unprivileged process.
532
533 Once the
534 .BR SO_LOCK_FILTER
535 option has been enabled, attempts to change or remove the filter
536 attached to a socket, or to disable the
537 .BR SO_LOCK_FILTER
538 option will fail with the error
539 .BR EPERM .
540 .TP
541 .BR SO_MARK " (since Linux 2.6.25)"
542 .\" commit 4a19ec5800fc3bb64e2d87c4d9fdd9e636086fe0
543 .\" and 914a9ab386a288d0f22252fc268ecbc048cdcbd5
544 Set the mark for each packet sent through this socket
545 (similar to the netfilter MARK target but socket-based).
546 Changing the mark can be used for mark-based
547 routing without netfilter or for packet filtering.
548 Setting this option requires the
549 .B CAP_NET_ADMIN
550 capability.
551 .TP
552 .B SO_OOBINLINE
553 If this option is enabled,
554 out-of-band data is directly placed into the receive data stream.
555 Otherwise, out-of-band data is passed only when the
556 .B MSG_OOB
557 flag is set during receiving.
558 .\" don't document it because it can do too much harm.
559 .\".B SO_NO_CHECK
560 .\" The kernel has support for the SO_NO_CHECK socket
561 .\" option (boolean: 0 == default, calculate checksum on xmit,
562 .\" 1 == do not calculate checksum on xmit).
563 .\" Additional note from Andi Kleen on SO_NO_CHECK (2010-08-30)
564 .\" On Linux UDP checksums are essentially free and there's no reason
565 .\" to turn them off and it would disable another safety line.
566 .\" That is why I didn't document the option.
567 .TP
568 .B SO_PASSCRED
569 Enable or disable the receiving of the
570 .B SCM_CREDENTIALS
571 control message.
572 For more information see
573 .BR unix (7).
574 .\" FIXME Document SO_PASSSEC, added in 2.6.18; there is some info
575 .\" in the 2.6.18 ChangeLog
576 .TP
577 .BR SO_PEEK_OFF " (since Linux 3.4)"
578 .\" commit ef64a54f6e558155b4f149bb10666b9e914b6c54
579 This option, which is currently supported only for
580 .BR unix (7)
581 sockets, sets the value of the "peek offset" for the
582 .BR recv (2)
583 system call when used with
584 .BR MSG_PEEK
585 flag.
586
587 When this option is set to a negative value
588 (it is set to \-1 for all new sockets),
589 traditional behavior is provided:
590 .BR recv (2)
591 with the
592 .BR MSG_PEEK
593 flag will peek data from the front of the queue.
594
595 When the option is set to a value greater than or equal to zero,
596 then the next peek at data queued in the socket will occur at
597 the byte offset specified by the option value.
598 At the same time, the "peek offset" will be
599 incremented by the number of bytes that were peeked from the queue,
600 so that a subsequent peek will return the next data in the queue.
601
602 If data is removed from the front of the queue via a call to
603 .BR recv (2)
604 (or similar) without the
605 .BR MSG_PEEK
606 flag, the "peek offset" will be decreased by the number of bytes removed.
607 In other words, receiving data without the
608 .B MSG_PEEK
609 flag will cause the "peek offset" to be adjusted to maintain
610 the correct relative position in the queued data,
611 so that a subsequent peek will retrieve the data that would have been
612 retrieved had the data not been removed.
613
614 For datagram sockets, if the "peek offset" points to the middle of a packet,
615 the data returned will be marked with the
616 .BR MSG_TRUNC
617 flag.
618
619 The following example serves to illustrate the use of
620 .BR SO_PEEK_OFF .
621 Suppose a stream socket has the following queued input data:
622
623 aabbccddeeff
624 .IP
625 The following sequence of
626 .BR recv (2)
627 calls would have the effect noted in the comments:
628
629 .in +4n
630 .nf
631 int ov = 4; // Set peek offset to 4
632 setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
633
634 recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
635 recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
636 recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
637 recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
638 .fi
639 .in
640 .TP
641 .B SO_PEERCRED
642 Return the credentials of the foreign process connected to this socket.
643 This is possible only for connected
644 .B AF_UNIX
645 stream sockets and
646 .B AF_UNIX
647 stream and datagram socket pairs created using
648 .BR socketpair (2);
649 see
650 .BR unix (7).
651 The returned credentials are those that were in effect at the time
652 of the call to
653 .BR connect (2)
654 or
655 .BR socketpair (2).
656 The argument is a
657 .I ucred
658 structure; define the
659 .B _GNU_SOURCE
660 feature test macro to obtain the definition of that structure from
661 .IR <sys/socket.h> .
662 This socket option is read-only.
663 .TP
664 .B SO_PRIORITY
665 Set the protocol-defined priority for all packets to be sent on
666 this socket.
667 Linux uses this value to order the networking queues:
668 packets with a higher priority may be processed first depending
669 on the selected device queueing discipline.
670 .\" For
671 .\" .BR ip (7),
672 .\" this also sets the IP type-of-service (TOS) field for outgoing packets.
673 Setting a priority outside the range 0 to 6 requires the
674 .B CAP_NET_ADMIN
675 capability.
676 .TP
677 .BR SO_PROTOCOL " (since Linux 2.6.32)"
678 Retrieves the socket protocol as an integer, returning a value such as
679 .BR IPPROTO_SCTP .
680 See
681 .BR socket (2)
682 for details.
683 This socket option is read-only.
684 .TP
685 .B SO_RCVBUF
686 Sets or gets the maximum socket receive buffer in bytes.
687 The kernel doubles this value (to allow space for bookkeeping overhead)
688 when it is set using
689 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
690 .BR setsockopt (2),
691 and this doubled value is returned by
692 .BR getsockopt (2).
693 .\" The following thread on LMKL is quite informative:
694 .\" getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behavior
695 .\" 17 July 2012
696 .\" http://thread.gmane.org/gmane.linux.kernel/1328935
697 The default value is set by the
698 .I /proc/sys/net/core/rmem_default
699 file, and the maximum allowed value is set by the
700 .I /proc/sys/net/core/rmem_max
701 file.
702 The minimum (doubled) value for this option is 256.
703 .TP
704 .BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
705 Using this socket option, a privileged
706 .RB ( CAP_NET_ADMIN )
707 process can perform the same task as
708 .BR SO_RCVBUF ,
709 but the
710 .I rmem_max
711 limit can be overridden.
712 .TP
713 .BR SO_RCVLOWAT " and " SO_SNDLOWAT
714 Specify the minimum number of bytes in the buffer until the socket layer
715 will pass the data to the protocol
716 .RB ( SO_SNDLOWAT )
717 or the user on receiving
718 .RB ( SO_RCVLOWAT ).
719 These two values are initialized to 1.
720 .B SO_SNDLOWAT
721 is not changeable on Linux
722 .RB ( setsockopt (2)
723 fails with the error
724 .BR ENOPROTOOPT ).
725 .B SO_RCVLOWAT
726 is changeable
727 only since Linux 2.4.
728 The
729 .BR select (2)
730 and
731 .BR poll (2)
732 system calls currently do not respect the
733 .B SO_RCVLOWAT
734 setting on Linux,
735 and mark a socket readable when even a single byte of data is available.
736 A subsequent read from the socket will block until
737 .B SO_RCVLOWAT
738 bytes are available.
739 .\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
740 .\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
741 .TP
742 .BR SO_RCVTIMEO " and " SO_SNDTIMEO
743 .\" Not implemented in 2.0.
744 .\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
745 .\" Implemented in 2.3.41 for setsockopt, and actually used.
746 Specify the receiving or sending timeouts until reporting an error.
747 The argument is a
748 .IR "struct timeval" .
749 If an input or output function blocks for this period of time, and
750 data has been sent or received, the return value of that function
751 will be the amount of data transferred; if no data has been transferred
752 and the timeout has been reached, then \-1 is returned with
753 .I errno
754 set to
755 .BR EAGAIN
756 or
757 .BR EWOULDBLOCK ,
758 .\" in fact to EAGAIN
759 or
760 .B EINPROGRESS
761 (for
762 .BR connect (2))
763 just as if the socket was specified to be nonblocking.
764 If the timeout is set to zero (the default),
765 then the operation will never timeout.
766 Timeouts only have effect for system calls that perform socket I/O (e.g.,
767 .BR read (2),
768 .BR recvmsg (2),
769 .BR send (2),
770 .BR sendmsg (2));
771 timeouts have no effect for
772 .BR select (2),
773 .BR poll (2),
774 .BR epoll_wait (2),
775 and so on.
776 .TP
777 .B SO_REUSEADDR
778 .\" commit c617f398edd4db2b8567a28e899a88f8f574798d
779 .\" https://lwn.net/Articles/542629/
780 Indicates that the rules used in validating addresses supplied in a
781 .BR bind (2)
782 call should allow reuse of local addresses.
783 For
784 .B AF_INET
785 sockets this
786 means that a socket may bind, except when there
787 is an active listening socket bound to the address.
788 When the listening socket is bound to
789 .B INADDR_ANY
790 with a specific port then it is not possible
791 to bind to this port for any local address.
792 Argument is an integer boolean flag.
793 .TP
794 .BR SO_REUSEPORT " (since Linux 3.9)"
795 Permits multiple
796 .B AF_INET
797 or
798 .B AF_INET6
799 sockets to be bound to an identical socket address.
800 This option must be set on each socket (including the first socket)
801 prior to calling
802 .BR bind (2)
803 on the socket.
804 To prevent port hijacking,
805 all of the processes binding to the same address must have the same
806 effective UID.
807 This option can be employed with both TCP and UDP sockets.
808
809 For TCP sockets, this option allows
810 .BR accept (2)
811 load distribution in a multi-threaded server to be improved by
812 using a distinct listener socket for each thread.
813 This provides improved load distribution as compared
814 to traditional techniques such using a single
815 .BR accept (2)ing
816 thread that distributes connections,
817 or having multiple threads that compete to
818 .BR accept (2)
819 from the same socket.
820
821 For UDP sockets,
822 the use of this option can provide better distribution
823 of incoming datagrams to multiple processes (or threads) as compared
824 to the traditional technique of having multiple processes
825 compete to receive datagrams on the same socket.
826 .TP
827 .BR SO_RXQ_OVFL " (since Linux 2.6.33)"
828 .\" commit 3b885787ea4112eaa80945999ea0901bf742707f
829 Indicates that an unsigned 32-bit value ancillary message (cmsg)
830 should be attached to received skbs indicating
831 the number of packets dropped by the socket between
832 the last received packet and this received packet.
833 .TP
834 .B SO_SNDBUF
835 Sets or gets the maximum socket send buffer in bytes.
836 The kernel doubles this value (to allow space for bookkeeping overhead)
837 when it is set using
838 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
839 .\" See also the comment to SO_RCVBUF (17 Jul 2012 LKML mail)
840 .BR setsockopt (2),
841 and this doubled value is returned by
842 .BR getsockopt (2).
843 The default value is set by the
844 .I /proc/sys/net/core/wmem_default
845 file and the maximum allowed value is set by the
846 .I /proc/sys/net/core/wmem_max
847 file.
848 The minimum (doubled) value for this option is 2048.
849 .TP
850 .BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
851 Using this socket option, a privileged
852 .RB ( CAP_NET_ADMIN )
853 process can perform the same task as
854 .BR SO_SNDBUF ,
855 but the
856 .I wmem_max
857 limit can be overridden.
858 .TP
859 .B SO_TIMESTAMP
860 Enable or disable the receiving of the
861 .B SO_TIMESTAMP
862 control message.
863 The timestamp control message is sent with level
864 .B SOL_SOCKET
865 and the
866 .I cmsg_data
867 field is a
868 .I "struct timeval"
869 indicating the
870 reception time of the last packet passed to the user in this call.
871 See
872 .BR cmsg (3)
873 for details on control messages.
874 .TP
875 .B SO_TYPE
876 Gets the socket type as an integer (e.g.,
877 .BR SOCK_STREAM ).
878 This socket option is read-only.
879 .TP
880 .BR SO_BUSY_POLL " (since Linux 3.11)"
881 Sets the approximate time in microseconds to busy poll on a blocking receive
882 when there is no data.
883 Increasing this value requires
884 .BR CAP_NET_ADMIN .
885 The default for this option is controlled by the
886 .I /proc/sys/net/core/busy_read
887 file.
888
889 The value in the
890 .I /proc/sys/net/core/busy_poll
891 file determines how long
892 .BR select (2)
893 and
894 .BR poll (2)
895 will busy poll when they operate on sockets with
896 .BR SO_BUSY_POLL
897 set and no events to report are found.
898
899 In both cases,
900 busy polling will only be done when the socket last received data
901 from a network device that supports this option.
902
903 While busy polling may improve latency of some applications,
904 care must be taken when using it since this will increase
905 both CPU utilization and power usage.
906 .SS Signals
907 When writing onto a connection-oriented socket that has been shut down
908 (by the local or the remote end)
909 .B SIGPIPE
910 is sent to the writing process and
911 .B EPIPE
912 is returned.
913 The signal is not sent when the write call
914 specified the
915 .B MSG_NOSIGNAL
916 flag.
917 .PP
918 When requested with the
919 .B FIOSETOWN
920 .BR fcntl (2)
921 or
922 .B SIOCSPGRP
923 .BR ioctl (2),
924 .B SIGIO
925 is sent when an I/O event occurs.
926 It is possible to use
927 .BR poll (2)
928 or
929 .BR select (2)
930 in the signal handler to find out which socket the event occurred on.
931 An alternative (in Linux 2.2) is to set a real-time signal using the
932 .B F_SETSIG
933 .BR fcntl (2);
934 the handler of the real time signal will be called with
935 the file descriptor in the
936 .I si_fd
937 field of its
938 .IR siginfo_t .
939 See
940 .BR fcntl (2)
941 for more information.
942 .PP
943 Under some circumstances (e.g., multiple processes accessing a
944 single socket), the condition that caused the
945 .B SIGIO
946 may have already disappeared when the process reacts to the signal.
947 If this happens, the process should wait again because Linux
948 will resend the signal later.
949 .\" .SS Ancillary messages
950 .SS /proc interfaces
951 The core socket networking parameters can be accessed
952 via files in the directory
953 .IR /proc/sys/net/core/ .
954 .TP
955 .I rmem_default
956 contains the default setting in bytes of the socket receive buffer.
957 .TP
958 .I rmem_max
959 contains the maximum socket receive buffer size in bytes which a user may
960 set by using the
961 .B SO_RCVBUF
962 socket option.
963 .TP
964 .I wmem_default
965 contains the default setting in bytes of the socket send buffer.
966 .TP
967 .I wmem_max
968 contains the maximum socket send buffer size in bytes which a user may
969 set by using the
970 .B SO_SNDBUF
971 socket option.
972 .TP
973 .IR message_cost " and " message_burst
974 configure the token bucket filter used to load limit warning messages
975 caused by external network events.
976 .TP
977 .I netdev_max_backlog
978 Maximum number of packets in the global input queue.
979 .TP
980 .I optmem_max
981 Maximum length of ancillary data and user control data like the iovecs
982 per socket.
983 .\" netdev_fastroute is not documented because it is experimental
984 .SS Ioctls
985 These operations can be accessed using
986 .BR ioctl (2):
987
988 .in +4n
989 .nf
990 .IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
991 .fi
992 .in
993 .TP
994 .B SIOCGSTAMP
995 Return a
996 .I struct timeval
997 with the receive timestamp of the last packet passed to the user.
998 This is useful for accurate round trip time measurements.
999 See
1000 .BR setitimer (2)
1001 for a description of
1002 .IR "struct timeval" .
1003 .\"
1004 This ioctl should be used only if the socket option
1005 .B SO_TIMESTAMP
1006 is not set on the socket.
1007 Otherwise, it returns the timestamp of the
1008 last packet that was received while
1009 .B SO_TIMESTAMP
1010 was not set, or it fails if no such packet has been received,
1011 (i.e.,
1012 .BR ioctl (2)
1013 returns \-1 with
1014 .I errno
1015 set to
1016 .BR ENOENT ).
1017 .TP
1018 .B SIOCSPGRP
1019 Set the process or process group to send
1020 .B SIGIO
1021 or
1022 .B SIGURG
1023 signals
1024 to when an
1025 asynchronous I/O operation has finished or urgent data is available.
1026 The argument is a pointer to a
1027 .IR pid_t .
1028 If the argument is positive, send the signals to that process.
1029 If the
1030 argument is negative, send the signals to the process group with the ID
1031 of the absolute value of the argument.
1032 The process may only choose itself or its own process group to receive
1033 signals unless it has the
1034 .B CAP_KILL
1035 capability or an effective UID of 0.
1036 .TP
1037 .B FIOASYNC
1038 Change the
1039 .B O_ASYNC
1040 flag to enable or disable asynchronous I/O mode of the socket.
1041 Asynchronous I/O mode means that the
1042 .B SIGIO
1043 signal or the signal set with
1044 .B F_SETSIG
1045 is raised when a new I/O event occurs.
1046 .IP
1047 Argument is an integer boolean flag.
1048 (This operation is synonymous with the use of
1049 .BR fcntl (2)
1050 to set the
1051 .B O_ASYNC
1052 flag.)
1053 .\"
1054 .TP
1055 .B SIOCGPGRP
1056 Get the current process or process group that receives
1057 .B SIGIO
1058 or
1059 .B SIGURG
1060 signals,
1061 or 0
1062 when none is set.
1063 .PP
1064 Valid
1065 .BR fcntl (2)
1066 operations:
1067 .TP
1068 .B FIOGETOWN
1069 The same as the
1070 .B SIOCGPGRP
1071 .BR ioctl (2).
1072 .TP
1073 .B FIOSETOWN
1074 The same as the
1075 .B SIOCSPGRP
1076 .BR ioctl (2).
1077 .SH VERSIONS
1078 .B SO_BINDTODEVICE
1079 was introduced in Linux 2.0.30.
1080 .B SO_PASSCRED
1081 is new in Linux 2.2.
1082 The
1083 .I /proc
1084 interfaces were introduced in Linux 2.2.
1085 .B SO_RCVTIMEO
1086 and
1087 .B SO_SNDTIMEO
1088 are supported since Linux 2.3.41.
1089 Earlier, timeouts were fixed to
1090 a protocol-specific setting, and could not be read or written.
1091 .SH NOTES
1092 Linux assumes that half of the send/receive buffer is used for internal
1093 kernel structures; thus the values in the corresponding
1094 .I /proc
1095 files are twice what can be observed on the wire.
1096
1097 Linux will allow port reuse only with the
1098 .B SO_REUSEADDR
1099 option
1100 when this option was set both in the previous program that performed a
1101 .BR bind (2)
1102 to the port and in the program that wants to reuse the port.
1103 This differs from some implementations (e.g., FreeBSD)
1104 where only the later program needs to set the
1105 .B SO_REUSEADDR
1106 option.
1107 Typically this difference is invisible, since, for example, a server
1108 program is designed to always set this option.
1109 .\" .SH AUTHORS
1110 .\" This man page was written by Andi Kleen.
1111 .SH SEE ALSO
1112 .BR wireshark (1),
1113 .BR bpf (2),
1114 .BR connect (2),
1115 .BR getsockopt (2),
1116 .BR setsockopt (2),
1117 .BR socket (2),
1118 .BR pcap (3),
1119 .BR capabilities (7),
1120 .BR ddp (7),
1121 .BR ip (7),
1122 .BR packet (7),
1123 .BR tcp (7),
1124 .BR udp (7),
1125 .BR unix (7),
1126 .BR tcpdump (8)