]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/socket.7
socket.7: Be explicit that accept(2) respects SO_*TIMEO
[thirdparty/man-pages.git] / man7 / socket.7
CommitLineData
77117f4f
MK
1.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
2.\" and copyright (c) 1999 Matthew Wilcox.
2297bf0e 3.\"
00acdba1 4.\" %%%LICENSE_START(VERBATIM_ONE_PARA)
77117f4f
MK
5.\" Permission is granted to distribute possibly modified copies
6.\" of this page provided the header is included verbatim,
7.\" and in case of nontrivial modification author and date
8.\" of the modification is added to the header.
8ff7380d 9.\" %%%LICENSE_END
77117f4f
MK
10.\"
11.\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
12.\" Added description of SO_ACCEPTCONN
13.\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
14.\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
15.\" Added notes on capability requirements
16.\" A few small grammar fixes
dd2127e0
JE
17.\" 2010-06-13 Jan Engelhardt <jengelh@medozas.de>
18.\" Documented SO_DOMAIN and SO_PROTOCOL.
e57fe8ad 19.\"
8cf1b72a
MK
20.\" FIXME
21.\" The following are not yet documented:
e57fe8ad
MK
22.\"
23.\" SO_PEERNAME (2.4?)
24.\" get only
25.\" Seems to do something similar to getpeername(), but then
26.\" why is it necessary / how does it differ?
27.\"
e57fe8ad
MK
28.\" SO_TIMESTAMPING (2.6.30)
29.\" Documentation/networking/timestamping.txt
30.\" commit cb9eff097831007afb30d64373f29d99825d0068
31.\" Author: Patrick Ohly <patrick.ohly@intel.com>
32.\"
33.\" SO_WIFI_STATUS (3.3)
34.\" commit 6e3e939f3b1bf8534b32ad09ff199d88800835a0
35.\" Author: Johannes Berg <johannes.berg@intel.com>
36.\" Also: SCM_WIFI_STATUS
37.\"
38.\" SO_NOFCS (3.4)
39.\" commit 3bdc0eba0b8b47797f4a76e377dd8360f317450f
40.\" Author: Ben Greear <greearb@candelatech.com>
41.\"
42.\" SO_GET_FILTER (3.8)
43.\" commit a8fc92778080c845eaadc369a0ecf5699a03bef0
44.\" Author: Pavel Emelyanov <xemul@parallels.com>
45.\"
e57fe8ad
MK
46.\" SO_MAX_PACING_RATE (3.13)
47.\" commit 62748f32d501f5d3712a7c372bbb92abc7c62bc7
48.\" Author: Eric Dumazet <edumazet@google.com>
49.\"
50.\" SO_BPF_EXTENSIONS (3.14)
51.\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
52.\" Author: Michal Sekletar <msekleta@redhat.com>
77117f4f 53.\"
4c1c5274 54.TH socket 7 (date) "Linux man-pages (unreleased)"
77117f4f
MK
55.SH NAME
56socket \- Linux socket interface
57.SH SYNOPSIS
c7db92b9 58.nf
77117f4f 59.B #include <sys/socket.h>
68e4db0a 60.PP
c4e7b714 61.IB sockfd " = socket(int " socket_family ", int " socket_type ", int " protocol );
c7db92b9 62.fi
77117f4f
MK
63.SH DESCRIPTION
64This manual page describes the Linux networking socket layer user
65interface.
66The BSD compatible sockets
67are the uniform interface
68between the user process and the network protocol stacks in the kernel.
69The protocol modules are grouped into
70.I protocol families
2c212ccd 71such as
5019071b 72.BR AF_INET ", " AF_IPX ", and " AF_PACKET ,
77117f4f
MK
73and
74.I socket types
2c212ccd 75such as
77117f4f
MK
76.B SOCK_STREAM
77or
78.BR SOCK_DGRAM .
79See
80.BR socket (2)
81for more information on families and types.
c634028a 82.SS Socket-layer functions
77117f4f
MK
83These functions are used by the user process to send or receive packets
84and to do other socket operations.
85For more information see their respective manual pages.
5711c04f 86.PP
77117f4f
MK
87.BR socket (2)
88creates a socket,
89.BR connect (2)
90connects a socket to a remote socket address,
91the
92.BR bind (2)
93function binds a socket to a local socket address,
94.BR listen (2)
95tells the socket that new connections shall be accepted, and
96.BR accept (2)
97is used to get a new socket with a new incoming connection.
98.BR socketpair (2)
33a0ccb2 99returns two connected anonymous sockets (implemented only for a few
77117f4f 100local families like
d4c8c97c 101.BR AF_UNIX )
77117f4f
MK
102.PP
103.BR send (2),
104.BR sendto (2),
105and
106.BR sendmsg (2)
107send data over a socket, and
108.BR recv (2),
109.BR recvfrom (2),
110.BR recvmsg (2)
111receive data from a socket.
112.BR poll (2)
113and
114.BR select (2)
115wait for arriving data or a readiness to send data.
116In addition, the standard I/O operations like
117.BR write (2),
118.BR writev (2),
119.BR sendfile (2),
120.BR read (2),
121and
122.BR readv (2)
123can be used to read and write data.
124.PP
125.BR getsockname (2)
126returns the local socket address and
127.BR getpeername (2)
128returns the remote socket address.
129.BR getsockopt (2)
130and
131.BR setsockopt (2)
132are used to set or get socket layer or protocol options.
133.BR ioctl (2)
134can be used to set or read some other options.
135.PP
136.BR close (2)
137is used to close a socket.
138.BR shutdown (2)
139closes parts of a full-duplex socket connection.
140.PP
141Seeking, or calling
142.BR pread (2)
143or
144.BR pwrite (2)
c7094399 145with a nonzero position is not supported on sockets.
77117f4f 146.PP
ff40dbb3 147It is possible to do nonblocking I/O on sockets by setting the
77117f4f
MK
148.B O_NONBLOCK
149flag on a socket file descriptor using
150.BR fcntl (2).
151Then all operations that would block will (usually)
152return with
153.B EAGAIN
154(operation should be retried later);
155.BR connect (2)
156will return
157.B EINPROGRESS
158error.
159The user can then wait for various events via
160.BR poll (2)
161or
162.BR select (2).
163.TS
164tab(:) allbox;
165c s s
0b174fe0 166l l lx.
77117f4f
MK
167I/O events
168Event:Poll flag:Occurrence
169Read:POLLIN:T{
170New data arrived.
171T}
172Read:POLLIN:T{
173A connection setup has been completed
174(for connection-oriented sockets)
175T}
176Read:POLLHUP:T{
177A disconnection request has been initiated by the other end.
178T}
179Read:POLLHUP:T{
180A connection is broken (only for connection-oriented protocols).
181When the socket is written
182.B SIGPIPE
183is also sent.
184T}
185Write:POLLOUT:T{
186Socket has enough send buffer space for writing new data.
187T}
188Read/Write:T{
bd8a7ca2 189POLLIN |
77117f4f
MK
190.br
191POLLOUT
192T}:T{
193An outgoing
194.BR connect (2)
195finished.
196T}
0b174fe0
MK
197Read/Write:POLLERR:T{
198An asynchronous error occurred.
199T}
200Read/Write:POLLHUP:T{
201The other end has shut down one direction.
202T}
77117f4f
MK
203Exception:POLLPRI:T{
204Urgent data arrived.
205.B SIGURG
206is sent then.
207T}
208.\" FIXME . The following is not true currently:
209.\" It is no I/O event when the connection
210.\" is broken from the local end using
211.\" .BR shutdown (2)
212.\" or
213.\" .BR close (2).
214.TE
77117f4f
MK
215.PP
216An alternative to
217.BR poll (2)
218and
219.BR select (2)
220is to let the kernel inform the application about events
221via a
222.B SIGIO
223signal.
224For that the
225.B O_ASYNC
226flag must be set on a socket file descriptor via
227.BR fcntl (2)
228and a valid signal handler for
229.B SIGIO
230must be installed via
231.BR sigaction (2).
232See the
233.I Signals
234discussion below.
b1b84b7a
MK
235.SS Socket address structures
236Each socket domain has its own format for socket addresses,
237with a domain-specific address structure.
238Each of these structures begins with an
239integer "family" field (typed as
240.IR sa_family_t )
241that indicates the type of the address structure.
242This allows
243the various system calls (e.g.,
244.BR connect (2),
245.BR bind (2),
246.BR accept (2),
247.BR getsockname (2),
248.BR getpeername (2)),
249which are generic to all socket domains,
250to determine the domain of a particular socket address.
5711c04f 251.PP
b1b84b7a
MK
252To allow any type of socket address to be passed to
253interfaces in the sockets API,
254the type
1ae6b2c7 255.I struct sockaddr
b1b84b7a 256is defined.
e6d86b41 257The purpose of this type is purely to allow casting of
b1b84b7a 258domain-specific socket address types to a "generic" type,
e6d86b41 259so as to avoid compiler warnings about type mismatches in
b1b84b7a 260calls to the sockets API.
5711c04f 261.PP
b1b84b7a
MK
262In addition, the sockets API provides the data type
263.IR "struct sockaddr_storage".
264This type
265is suitable to accommodate all supported domain-specific socket
266address structures; it is large enough and is aligned properly.
e6d86b41 267(In particular, it is large enough to hold
b1b84b7a
MK
268IPv6 socket addresses.)
269The structure includes the following field, which can be used to identify
270the type of socket address actually stored in the structure:
5711c04f 271.PP
b1b84b7a 272.in +4n
b8302363 273.EX
b1b84b7a 274 sa_family_t ss_family;
b8302363 275.EE
b1b84b7a 276.in
5711c04f 277.PP
e6d86b41 278The
b1b84b7a
MK
279.I sockaddr_storage
280structure is useful in programs that must handle socket addresses
281in a generic way
282(e.g., programs that must deal with both IPv4 and IPv6 socket addresses).
c634028a 283.SS Socket options
7d247ee8 284The socket options listed below can be set by using
77117f4f
MK
285.BR setsockopt (2)
286and read with
287.BR getsockopt (2)
288with the socket level set to
289.B SOL_SOCKET
7d247ee8
MK
290for all sockets.
291Unless otherwise noted,
292.I optval
293is a pointer to an
294.IR int .
bea08fec 295.\" FIXME .
e2ec4f17
MK
296.\" In the list below, the text used to describe argument types
297.\" for each socket option should be more consistent
298.\"
77117f4f
MK
299.\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
300.\" W R Stevens, UNPv1
301.TP
302.B SO_ACCEPTCONN
303Returns a value indicating whether or not this socket has been marked
304to accept connections with
305.BR listen (2).
306The value 0 indicates that this is not a listening socket,
307the value 1 indicates that this is a listening socket.
fa574567 308This socket option is read-only.
77117f4f 309.TP
096da110
MK
310.BR SO_ATTACH_FILTER " (since Linux 2.2), " SO_ATTACH_BPF " (since Linux 3.19)"
311Attach a classic BPF
312.RB ( SO_ATTACH_FILTER )
313or an extended BPF
314.RB ( SO_ATTACH_BPF )
315program to the socket for use as a filter of incoming packets.
316A packet will be dropped if the filter program returns zero.
317If the filter program returns a
777411ae 318nonzero value which is less than the packet's data length,
096da110
MK
319the packet will be truncated to the length returned.
320If the value returned by the filter is greater than or equal to the
321packet's data length, the packet is allowed to proceed unmodified.
5711c04f 322.IP
1fa871f5 323The argument for
1ae6b2c7 324.B SO_ATTACH_FILTER
1fa871f5
CG
325is a
326.I sock_fprog
096da110
MK
327structure, defined in
328.IR <linux/filter.h> :
6545cc56 329.IP
1fa871f5 330.in +4n
6545cc56 331.EX
1fa871f5
CG
332struct sock_fprog {
333 unsigned short len;
334 struct sock_filter *filter;
335};
6545cc56 336.EE
1fa871f5
CG
337.in
338.IP
339The argument for
1ae6b2c7 340.B SO_ATTACH_BPF
1fa871f5
CG
341is a file descriptor returned by the
342.BR bpf (2)
343system call and must refer to a program of type
d8012462 344.BR BPF_PROG_TYPE_SOCKET_FILTER .
5711c04f 345.IP
096da110
MK
346These options may be set multiple times for a given socket,
347each time replacing the previous filter program.
348The classic and extended versions may be called on the same socket,
349but the previous filter will always be replaced such that a socket
350never has more than one filter defined.
5711c04f 351.IP
096da110 352Both classic and extended BPF are explained in the kernel source file
1fa871f5
CG
353.I Documentation/networking/filter.txt
354.TP
096da110 355.BR SO_ATTACH_REUSEPORT_CBPF ", " SO_ATTACH_REUSEPORT_EBPF
1fa871f5 356For use with the
1ae6b2c7 357.B SO_REUSEPORT
096da110
MK
358option, these options allow the user to set a classic BPF
359.RB ( SO_ATTACH_REUSEPORT_CBPF )
360or an extended BPF
361.RB ( SO_ATTACH_REUSEPORT_EBPF )
362program which defines how packets are assigned to
1fa871f5 363the sockets in the reuseport group (that is, all sockets which have
1ae6b2c7 364.B SO_REUSEPORT
096da110 365set and are using the same local address to receive packets).
5711c04f 366.IP
096da110
MK
367The BPF program must return an index between 0 and N\-1 representing
368the socket which should receive the packet
369(where N is the number of sockets in the group).
370If the BPF program returns an invalid index,
371socket selection will fall back to the plain
1ae6b2c7 372.B SO_REUSEPORT
1fa871f5 373mechanism.
5711c04f 374.IP
1fa871f5
CG
375Sockets are numbered in the order in which they are added to the group
376(that is, the order of
377.BR bind (2)
378calls for UDP sockets or the order of
379.BR listen (2)
096da110
MK
380calls for TCP sockets).
381New sockets added to a reuseport group will inherit the BPF program.
382When a socket is removed from a reuseport group (via
383.BR close (2)),
1fa871f5
CG
384the last socket in the group will be moved into the closed socket's
385position.
5711c04f 386.IP
096da110
MK
387These options may be set repeatedly at any time on any socket in the group
388to replace the current BPF program used by all sockets in the group.
5711c04f 389.IP
1ae6b2c7 390.B SO_ATTACH_REUSEPORT_CBPF
096da110 391takes the same argument type as
1ae6b2c7 392.B SO_ATTACH_FILTER
1fa871f5 393and
1ae6b2c7 394.B SO_ATTACH_REUSEPORT_EBPF
096da110 395takes the same argument type as
d8012462 396.BR SO_ATTACH_BPF .
5711c04f 397.IP
096da110
MK
398UDP support for this feature is available since Linux 4.5;
399TCP support is available since Linux 4.6.
1fa871f5 400.TP
77117f4f
MK
401.B SO_BINDTODEVICE
402Bind this socket to a particular device like \(lqeth0\(rq,
403as specified in the passed interface name.
404If the
405name is an empty string or the option length is zero, the socket device
406binding is removed.
d0cb7cc6 407The passed option is a variable-length null-terminated
77117f4f
MK
408interface name string with the maximum size of
409.BR IFNAMSIZ .
410If a socket is bound to an interface,
411only packets received from that particular interface are processed by the
412socket.
33a0ccb2 413Note that this works only for some socket types, particularly
77117f4f
MK
414.B AF_INET
415sockets.
416It is not supported for packet sockets (use normal
56bf2613 417.BR bind (2)
77117f4f 418there).
5711c04f 419.IP
757716c7
MK
420Before Linux 3.8,
421this socket option could be set, but could not retrieved with
422.BR getsockopt (2).
423Since Linux 3.8, it is readable.
424The
425.I optlen
b072a788 426argument should contain the buffer size available
757716c7 427to receive the device name and is recommended to be
1ae6b2c7 428.B IFNAMSIZ
757716c7
MK
429bytes.
430The real device name length is reported back in the
431.I optlen
432argument.
77117f4f
MK
433.TP
434.B SO_BROADCAST
435Set or get the broadcast flag.
42bd5b3d 436When enabled, datagram sockets are allowed to send
77117f4f
MK
437packets to a broadcast address.
438This option has no effect on stream-oriented sockets.
439.TP
440.B SO_BSDCOMPAT
441Enable BSD bug-to-bug compatibility.
442This is used by the UDP protocol module in Linux 2.0 and 2.2.
eebf8c09 443If enabled, ICMP errors received for a UDP socket will not be passed
77117f4f
MK
444to the user program.
445In later kernel versions, support for this option has been phased out:
446Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
447(printk()) if a program uses this option.
448Linux 2.0 also enabled BSD bug-to-bug compatibility
449options (random header changing, skipping of the broadcast flag) for raw
450sockets with this option, but that was removed in Linux 2.2.
451.TP
452.B SO_DEBUG
453Enable socket debugging.
d7087783 454Allowed only for processes with the
77117f4f
MK
455.B CAP_NET_ADMIN
456capability or an effective user ID of 0.
457.TP
096da110
MK
458.BR SO_DETACH_FILTER " (since Linux 2.2), " SO_DETACH_BPF " (since Linux 3.19)"
459These two options, which are synonyms,
460may be used to remove the classic or extended BPF
461program attached to a socket with either
1ae6b2c7 462.B SO_ATTACH_FILTER
1fa871f5 463or
096da110 464.BR SO_ATTACH_BPF .
1fa871f5 465The option value is ignored.
1fa871f5 466.TP
dd2127e0
JE
467.BR SO_DOMAIN " (since Linux 2.6.32)"
468Retrieves the socket domain as an integer, returning a value such as
469.BR AF_INET6 .
470See
471.BR socket (2)
472for details.
fa574567 473This socket option is read-only.
dd2127e0 474.TP
77117f4f
MK
475.B SO_ERROR
476Get and clear the pending socket error.
fa574567 477This socket option is read-only.
77117f4f
MK
478Expects an integer.
479.TP
480.B SO_DONTROUTE
33a0ccb2 481Don't send via a gateway, send only to directly connected hosts.
77117f4f
MK
482The same effect can be achieved by setting the
483.B MSG_DONTROUTE
484flag on a socket
485.BR send (2)
486operation.
487Expects an integer boolean flag.
488.TP
b7f97e8e
MK
489.BR SO_INCOMING_CPU " (gettable since Linux 3.19, settable since Linux 4.4)"
490.\" getsockopt 2c8c56e15df3d4c2af3d656e44feb18789f75837
491.\" setsockopt 70da268b569d32a9fddeea85dc18043de9d89f89
492Sets or gets the CPU affinity of a socket.
493Expects an integer flag.
5711c04f 494.IP
ca1969e9 495.in +4n
b8302363 496.EX
ca1969e9 497int cpu = 1;
0b3f52d0
MK
498setsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu,
499 sizeof(cpu));
b8302363 500.EE
ca1969e9 501.in
5711c04f 502.IP
a99fa5fb
MK
503Because all of the packets for a single stream
504(i.e., all packets for the same 4-tuple)
505arrive on the single RX queue that is associated with a particular CPU,
506the typical use case is to employ one listening process per RX queue,
507with the incoming flow being handled by a listener
508on the same CPU that is handling the RX queue.
b7f97e8e 509This provides optimal NUMA behavior and keeps CPU caches hot.
bb676145
MK
510.\"
511.\" From an email conversation with Eric Dumazet:
512.\" >> Note that setting the option is not supported if SO_REUSEPORT is used.
513.\" >
514.\" > Please define "not supported". Does this yield an API diagnostic?
515.\" > If so, what is it?
516.\" >
517.\" >> Socket will be selected from an array, either by a hash or BPF program
518.\" >> that has no access to this information.
519.\" >
520.\" > Sorry -- I'm lost here. How does this comment relate to the proposed
521.\" > man page text above?
99cf1681 522.\"
bb676145 523.\" Simply that :
99cf1681 524.\"
bb676145
MK
525.\" If an application uses both SO_INCOMING_CPU and SO_REUSEPORT, then
526.\" SO_REUSEPORT logic, selecting the socket to receive the packet, ignores
527.\" SO_INCOMING_CPU setting.
ca1969e9 528.TP
e8500ecc
SS
529.BR SO_INCOMING_NAPI_ID " (gettable since Linux 4.12)"
530.\" getsockopt 6d4339028b350efbf87c61e6d9e113e5373545c9
b5638e2e 531Returns a system-level unique ID called NAPI ID that is associated
820e13fb
MK
532with a RX queue on which the last packet associated with that
533socket is received.
e8500ecc
SS
534.IP
535This can be used by an application to split the incoming flows among worker
820e13fb
MK
536threads based on the RX queue on which the packets associated with the
537flows are received.
538It allows each worker thread to be associated with
539a NIC HW receive queue and service all the connection
540requests received on that RX queue.
541This mapping between a app thread and
542a HW NIC queue streamlines the
e8500ecc
SS
543flow of data from the NIC to the application.
544.TP
77117f4f
MK
545.B SO_KEEPALIVE
546Enable sending of keep-alive messages on connection-oriented sockets.
547Expects an integer boolean flag.
548.TP
549.B SO_LINGER
550Sets or gets the
551.B SO_LINGER
552option.
553The argument is a
554.I linger
555structure.
6545cc56 556.IP
77117f4f 557.in +4n
6545cc56 558.EX
77117f4f
MK
559struct linger {
560 int l_onoff; /* linger active */
561 int l_linger; /* how many seconds to linger for */
562};
6545cc56 563.EE
77117f4f
MK
564.in
565.IP
566When enabled, a
567.BR close (2)
568or
569.BR shutdown (2)
570will not return until all queued messages for the socket have been
571successfully sent or the linger timeout has been reached.
572Otherwise,
573the call returns immediately and the closing is done in the background.
574When the socket is closed as part of
575.BR exit (2),
576it always lingers in the background.
577.TP
1fa871f5 578.B SO_LOCK_FILTER
096da110 579.\" commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
96d9edea 580When set, this option will prevent
096da110
MK
581changing the filters associated with the socket.
582These filters include any set using the socket options
f7111396
MK
583.BR SO_ATTACH_FILTER ,
584.BR SO_ATTACH_BPF ,
585.BR SO_ATTACH_REUSEPORT_CBPF ,
096da110 586and
335c2365 587.BR SO_ATTACH_REUSEPORT_EBPF .
5711c04f 588.IP
59ac6f2f
MK
589The typical use case is for a privileged process to set up a raw socket
590(an operation that requires the
1ae6b2c7 591.B CAP_NET_RAW
59ac6f2f 592capability), apply a restrictive filter, set the
1ae6b2c7 593.B SO_LOCK_FILTER
59ac6f2f 594option,
1fa871f5 595and then either drop its privileges or pass the socket file descriptor
59ac6f2f 596to an unprivileged process via a UNIX domain socket.
5711c04f 597.IP
096da110 598Once the
1ae6b2c7 599.B SO_LOCK_FILTER
96d9edea 600option has been enabled, attempts to change or remove the filter
096da110 601attached to a socket, or to disable the
1ae6b2c7 602.B SO_LOCK_FILTER
096da110
MK
603option will fail with the error
604.BR EPERM .
1fa871f5 605.TP
cf0a1f7c
M
606.BR SO_MARK " (since Linux 2.6.25)"
607.\" commit 4a19ec5800fc3bb64e2d87c4d9fdd9e636086fe0
608.\" and 914a9ab386a288d0f22252fc268ecbc048cdcbd5
609Set the mark for each packet sent through this socket
610(similar to the netfilter MARK target but socket-based).
611Changing the mark can be used for mark-based
612routing without netfilter or for packet filtering.
613Setting this option requires the
614.B CAP_NET_ADMIN
615capability.
616.TP
77117f4f
MK
617.B SO_OOBINLINE
618If this option is enabled,
619out-of-band data is directly placed into the receive data stream.
2b9b829d 620Otherwise, out-of-band data is passed only when the
77117f4f
MK
621.B MSG_OOB
622flag is set during receiving.
623.\" don't document it because it can do too much harm.
624.\".B SO_NO_CHECK
5d75650a
MK
625.\" The kernel has support for the SO_NO_CHECK socket
626.\" option (boolean: 0 == default, calculate checksum on xmit,
627.\" 1 == do not calculate checksum on xmit).
628.\" Additional note from Andi Kleen on SO_NO_CHECK (2010-08-30)
629.\" On Linux UDP checksums are essentially free and there's no reason
630.\" to turn them off and it would disable another safety line.
631.\" That is why I didn't document the option.
77117f4f
MK
632.TP
633.B SO_PASSCRED
634Enable or disable the receiving of the
635.B SCM_CREDENTIALS
636control message.
637For more information see
638.BR unix (7).
2fc7c74c
MK
639.TP
640.B SO_PASSSEC
641Enable or disable the receiving of the
642.B SCM_SECURITY
643control message.
644For more information see
645.BR unix (7).
77117f4f 646.TP
3f1e877d
MK
647.BR SO_PEEK_OFF " (since Linux 3.4)"
648.\" commit ef64a54f6e558155b4f149bb10666b9e914b6c54
649This option, which is currently supported only for
650.BR unix (7)
651sockets, sets the value of the "peek offset" for the
7f4cd55d 652.BR recv (2)
3f1e877d 653system call when used with
1ae6b2c7 654.B MSG_PEEK
3f1e877d 655flag.
5711c04f 656.IP
3f1e877d
MK
657When this option is set to a negative value
658(it is set to \-1 for all new sockets),
659traditional behavior is provided:
7f4cd55d 660.BR recv (2)
3f1e877d 661with the
1ae6b2c7 662.B MSG_PEEK
3f1e877d 663flag will peek data from the front of the queue.
5711c04f 664.IP
3f1e877d
MK
665When the option is set to a value greater than or equal to zero,
666then the next peek at data queued in the socket will occur at
667the byte offset specified by the option value.
668At the same time, the "peek offset" will be
669incremented by the number of bytes that were peeked from the queue,
cac3a0c5 670so that a subsequent peek will return the next data in the queue.
5711c04f 671.IP
3f1e877d
MK
672If data is removed from the front of the queue via a call to
673.BR recv (2)
674(or similar) without the
1ae6b2c7 675.B MSG_PEEK
3f1e877d
MK
676flag, the "peek offset" will be decreased by the number of bytes removed.
677In other words, receiving data without the
678.B MSG_PEEK
679flag will cause the "peek offset" to be adjusted to maintain
680the correct relative position in the queued data,
681so that a subsequent peek will retrieve the data that would have been
682retrieved had the data not been removed.
5711c04f 683.IP
3f1e877d
MK
684For datagram sockets, if the "peek offset" points to the middle of a packet,
685the data returned will be marked with the
1ae6b2c7 686.B MSG_TRUNC
3f1e877d 687flag.
5711c04f 688.IP
3f1e877d
MK
689The following example serves to illustrate the use of
690.BR SO_PEEK_OFF .
691Suppose a stream socket has the following queued input data:
5711c04f 692.IP
1ae6b2c7
AC
693.in +4n
694.EX
695aabbccddeeff
696.EE
697.in
3f1e877d
MK
698.IP
699The following sequence of
700.BR recv (2)
701calls would have the effect noted in the comments:
5711c04f 702.IP
3f1e877d 703.in +4n
b8302363 704.EX
3f1e877d
MK
705int ov = 4; // Set peek offset to 4
706setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
707
708recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
709recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
710recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
711recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
b8302363 712.EE
3f1e877d
MK
713.in
714.TP
77117f4f 715.B SO_PEERCRED
94950b9a
MK
716Return the credentials of the peer process connected to this socket.
717For further details, see
77117f4f 718.BR unix (7).
77117f4f 719.TP
e6f90c3f
SS
720.BR SO_PEERSEC " (since Linux 2.6.2)"
721Return the security context of the peer socket connected to this socket.
722For further details, see
71a38281
MK
723.BR unix (7)
724and
725.BR ip (7).
e6f90c3f 726.TP
77117f4f
MK
727.B SO_PRIORITY
728Set the protocol-defined priority for all packets to be sent on
729this socket.
730Linux uses this value to order the networking queues:
731packets with a higher priority may be processed first depending
732on the selected device queueing discipline.
3a7ee744
MK
733.\" For
734.\" .BR ip (7),
735.\" this also sets the IP type-of-service (TOS) field for outgoing packets.
77117f4f
MK
736Setting a priority outside the range 0 to 6 requires the
737.B CAP_NET_ADMIN
738capability.
739.TP
dd2127e0
JE
740.BR SO_PROTOCOL " (since Linux 2.6.32)"
741Retrieves the socket protocol as an integer, returning a value such as
742.BR IPPROTO_SCTP .
743See
744.BR socket (2)
745for details.
fa574567 746This socket option is read-only.
dd2127e0 747.TP
77117f4f
MK
748.B SO_RCVBUF
749Sets or gets the maximum socket receive buffer in bytes.
750The kernel doubles this value (to allow space for bookkeeping overhead)
751when it is set using
752.\" Most (all?) other implementations do not do this -- MTK, Dec 05
753.BR setsockopt (2),
754and this doubled value is returned by
755.BR getsockopt (2).
3de2d3be 756.\" The following thread on LMKL is quite informative:
a1fa36af 757.\" getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behavior
3de2d3be
MK
758.\" 17 July 2012
759.\" http://thread.gmane.org/gmane.linux.kernel/1328935
77117f4f 760The default value is set by the
5a2ff571
MK
761.I /proc/sys/net/core/rmem_default
762file, and the maximum allowed value is set by the
763.I /proc/sys/net/core/rmem_max
764file.
77117f4f
MK
765The minimum (doubled) value for this option is 256.
766.TP
767.BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
768Using this socket option, a privileged
769.RB ( CAP_NET_ADMIN )
770process can perform the same task as
771.BR SO_RCVBUF ,
772but the
773.I rmem_max
774limit can be overridden.
775.TP
776.BR SO_RCVLOWAT " and " SO_SNDLOWAT
777Specify the minimum number of bytes in the buffer until the socket layer
778will pass the data to the protocol
779.RB ( SO_SNDLOWAT )
780or the user on receiving
781.RB ( SO_RCVLOWAT ).
782These two values are initialized to 1.
783.B SO_SNDLOWAT
784is not changeable on Linux
785.RB ( setsockopt (2)
786fails with the error
787.BR ENOPROTOOPT ).
788.B SO_RCVLOWAT
789is changeable
790only since Linux 2.4.
44a00819
MK
791.IP
792Before Linux 2.6.28
858c8575 793.\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
44a00819
MK
794.BR select (2),
795.BR poll (2),
77117f4f 796and
44a00819
MK
797.BR epoll (7)
798did not respect the
77117f4f
MK
799.B SO_RCVLOWAT
800setting on Linux,
44a00819
MK
801and indicated a socket as readable when even a single byte of data
802was available.
803A subsequent read from the socket would then block until
77117f4f
MK
804.B SO_RCVLOWAT
805bytes are available.
858c8575
MK
806Since Linux 2.6.28,
807.\" commit c7004482e8dcb7c3c72666395cfa98a216a4fb70
808.BR select (2),
809.BR poll (2),
810and
811.BR epoll (7)
812indicate a socket as readable only if at least
813.B SO_RCVLOWAT
814bytes are available.
77117f4f
MK
815.TP
816.BR SO_RCVTIMEO " and " SO_SNDTIMEO
b324e17d
AC
817.\" Not implemented in Linux 2.0.
818.\" Implemented in Linux 2.1.11 for getsockopt: always return a zero struct.
819.\" Implemented in Linux 2.3.41 for setsockopt, and actually used.
77117f4f
MK
820Specify the receiving or sending timeouts until reporting an error.
821The argument is a
822.IR "struct timeval" .
823If an input or output function blocks for this period of time, and
824data has been sent or received, the return value of that function
825will be the amount of data transferred; if no data has been transferred
56db9d31 826and the timeout has been reached, then \-1 is returned with
77117f4f
MK
827.I errno
828set to
1ae6b2c7 829.B EAGAIN
77117f4f 830or
f3277220 831.BR EWOULDBLOCK ,
77117f4f 832.\" in fact to EAGAIN
f3277220
AK
833or
834.B EINPROGRESS
835(for
836.BR connect (2))
ff40dbb3 837just as if the socket was specified to be nonblocking.
eebf8c09 838If the timeout is set to zero (the default),
77117f4f
MK
839then the operation will never timeout.
840Timeouts only have effect for system calls that perform socket I/O (e.g.,
b53bfa6d 841.BR accept (2),
3d4a2839 842.BR connect (2),
77117f4f
MK
843.BR read (2),
844.BR recvmsg (2),
845.BR send (2),
846.BR sendmsg (2));
847timeouts have no effect for
848.BR select (2),
849.BR poll (2),
850.BR epoll_wait (2),
02f95a31 851and so on.
77117f4f
MK
852.TP
853.B SO_REUSEADDR
c28f1dd3
MK
854.\" commit c617f398edd4db2b8567a28e899a88f8f574798d
855.\" https://lwn.net/Articles/542629/
77117f4f
MK
856Indicates that the rules used in validating addresses supplied in a
857.BR bind (2)
858call should allow reuse of local addresses.
859For
d4c8c97c 860.B AF_INET
77117f4f
MK
861sockets this
862means that a socket may bind, except when there
863is an active listening socket bound to the address.
864When the listening socket is bound to
865.B INADDR_ANY
866with a specific port then it is not possible
867to bind to this port for any local address.
868Argument is an integer boolean flag.
869.TP
75979920 870.BR SO_REUSEPORT " (since Linux 3.9)"
11af2d4b 871Permits multiple
75979920
DW
872.B AF_INET
873or
874.B AF_INET6
11af2d4b
MK
875sockets to be bound to an identical socket address.
876This option must be set on each socket (including the first socket)
877prior to calling
878.BR bind (2)
879on the socket.
880To prevent port hijacking,
881all of the processes binding to the same address must have the same
882effective UID.
883This option can be employed with both TCP and UDP sockets.
5711c04f 884.IP
11af2d4b 885For TCP sockets, this option allows
75979920
DW
886.BR accept (2)
887load distribution in a multi-threaded server to be improved by
c28f1dd3 888using a distinct listener socket for each thread.
11af2d4b
MK
889This provides improved load distribution as compared
890to traditional techniques such using a single
891.BR accept (2)ing
892thread that distributes connections,
893or having multiple threads that compete to
894.BR accept (2)
895from the same socket.
5711c04f 896.IP
11af2d4b
MK
897For UDP sockets,
898the use of this option can provide better distribution
899of incoming datagrams to multiple processes (or threads) as compared
900to the traditional technique of having multiple processes
901compete to receive datagrams on the same socket.
75979920 902.TP
9cad276e 903.BR SO_RXQ_OVFL " (since Linux 2.6.33)"
f4c644e5 904.\" commit 3b885787ea4112eaa80945999ea0901bf742707f
91edd9ad 905Indicates that an unsigned 32-bit value ancillary message (cmsg)
f4c644e5 906should be attached to received skbs indicating
4cee5821 907the number of packets dropped by the socket since its creation.
8cfbaec5 908.TP
7ded63ef
RBP
909.BR SO_SELECT_ERR_QUEUE " (since Linux 3.10)"
910.\" commit 7d4c04fc170087119727119074e72445f2bb192b
911.\" Author: Keller, Jacob E <jacob.e.keller@intel.com>
112e0e60
MK
912When this option is set on a socket,
913an error condition on a socket causes notification not only via the
914.I exceptfds
915set of
916.BR select (2).
917Similarly,
918.BR poll (2)
919also returns a
7ded63ef 920.B POLLPRI
112e0e60 921whenever an
7ded63ef 922.B POLLERR
112e0e60
MK
923event is returned.
924.\" It does not affect wake up.
7ded63ef 925.IP
112e0e60 926Background: this option was added when waking up on an error condition
165bef47 927occurred only via the
1ae6b2c7 928.I readfds
112e0e60 929and
1ae6b2c7 930.I writefds
112e0e60
MK
931sets of
932.BR select (2).
933The option was added to allow monitoring for error conditions via the
934.I exceptfds
935argument without simultaneously having to receive notifications (via
936.IR readfds )
937for regular data that can be read from the socket.
938After changes in Linux 4.16,
939.\" commit 6e5d58fdc9bedd0255a8
940.\" ("skbuff: Fix not waking applications when errors are enqueued")
941the use of this flag to achieve the desired notifications
942is no longer necessary.
943This option is nevertheless retained for backwards compatibility.
7ded63ef 944.TP
77117f4f
MK
945.B SO_SNDBUF
946Sets or gets the maximum socket send buffer in bytes.
947The kernel doubles this value (to allow space for bookkeeping overhead)
948when it is set using
949.\" Most (all?) other implementations do not do this -- MTK, Dec 05
3de2d3be 950.\" See also the comment to SO_RCVBUF (17 Jul 2012 LKML mail)
77117f4f
MK
951.BR setsockopt (2),
952and this doubled value is returned by
953.BR getsockopt (2).
954The default value is set by the
5a2ff571
MK
955.I /proc/sys/net/core/wmem_default
956file and the maximum allowed value is set by the
957.I /proc/sys/net/core/wmem_max
958file.
77117f4f
MK
959The minimum (doubled) value for this option is 2048.
960.TP
961.BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
962Using this socket option, a privileged
963.RB ( CAP_NET_ADMIN )
964process can perform the same task as
965.BR SO_SNDBUF ,
966but the
967.I wmem_max
968limit can be overridden.
969.TP
970.B SO_TIMESTAMP
971Enable or disable the receiving of the
972.B SO_TIMESTAMP
973control message.
974The timestamp control message is sent with level
975.B SOL_SOCKET
dd6b076a
MK
976and a
977.I cmsg_type
978of
979.BR SCM_TIMESTAMP .
980The
77117f4f
MK
981.I cmsg_data
982field is a
983.I "struct timeval"
984indicating the
985reception time of the last packet passed to the user in this call.
986See
987.BR cmsg (3)
988for details on control messages.
989.TP
3e472692
MK
990.BR SO_TIMESTAMPNS " (since Linux 2.6.22)"
991.\" commit 92f37fd2ee805aa77925c1e64fd56088b46094fc
a47d370b
AC
992Enable or disable the receiving of the
993.B SO_TIMESTAMPNS
994control message.
995The timestamp control message is sent with level
996.B SOL_SOCKET
dd6b076a
MK
997and a
998.I cmsg_type
999of
1000.BR SCM_TIMESTAMPNS .
1001The
a47d370b
AC
1002.I cmsg_data
1003field is a
1004.I "struct timespec"
1005indicating the
1006reception time of the last packet passed to the user in this call.
1007The clock used for the timestamp is
1008.BR CLOCK_REALTIME .
1009See
1010.BR cmsg (3)
1011for details on control messages.
3e472692
MK
1012.IP
1013A socket cannot mix
1014.B SO_TIMESTAMP
1015and
575bac0f 1016.BR SO_TIMESTAMPNS :
3e472692 1017the two modes are mutually exclusive.
a47d370b 1018.TP
77117f4f 1019.B SO_TYPE
fa574567 1020Gets the socket type as an integer (e.g.,
77117f4f 1021.BR SOCK_STREAM ).
fa574567 1022This socket option is read-only.
8e57271a 1023.TP
1260477b 1024.BR SO_BUSY_POLL " (since Linux 3.11)"
8e57271a 1025Sets the approximate time in microseconds to busy poll on a blocking receive
049be102
MK
1026when there is no data.
1027Increasing this value requires
84fc2a6e 1028.BR CAP_NET_ADMIN .
8e57271a
ET
1029The default for this option is controlled by the
1030.I /proc/sys/net/core/busy_read
84fc2a6e 1031file.
5711c04f 1032.IP
84fc2a6e 1033The value in the
8e57271a 1034.I /proc/sys/net/core/busy_poll
84fc2a6e 1035file determines how long
8e57271a 1036.BR select (2)
84fc2a6e 1037and
8e57271a 1038.BR poll (2)
84fc2a6e 1039will busy poll when they operate on sockets with
1ae6b2c7 1040.B SO_BUSY_POLL
8e57271a 1041set and no events to report are found.
5711c04f 1042.IP
049be102
MK
1043In both cases,
1044busy polling will only be done when the socket last received data
8e57271a 1045from a network device that supports this option.
5711c04f 1046.IP
049be102
MK
1047While busy polling may improve latency of some applications,
1048care must be taken when using it since this will increase
1049both CPU utilization and power usage.
77117f4f
MK
1050.SS Signals
1051When writing onto a connection-oriented socket that has been shut down
1052(by the local or the remote end)
1053.B SIGPIPE
1054is sent to the writing process and
1055.B EPIPE
1056is returned.
1057The signal is not sent when the write call
1058specified the
1059.B MSG_NOSIGNAL
1060flag.
1061.PP
1062When requested with the
1063.B FIOSETOWN
1064.BR fcntl (2)
1065or
1066.B SIOCSPGRP
1067.BR ioctl (2),
1068.B SIGIO
1069is sent when an I/O event occurs.
1070It is possible to use
1071.BR poll (2)
1072or
1073.BR select (2)
1074in the signal handler to find out which socket the event occurred on.
1075An alternative (in Linux 2.2) is to set a real-time signal using the
1076.B F_SETSIG
1077.BR fcntl (2);
1078the handler of the real time signal will be called with
1079the file descriptor in the
1080.I si_fd
1081field of its
1082.IR siginfo_t .
1083See
1084.BR fcntl (2)
1085for more information.
1086.PP
1087Under some circumstances (e.g., multiple processes accessing a
1088single socket), the condition that caused the
1089.B SIGIO
1090may have already disappeared when the process reacts to the signal.
1091If this happens, the process should wait again because Linux
1092will resend the signal later.
c634028a 1093.\" .SS Ancillary messages
5a2ff571
MK
1094.SS /proc interfaces
1095The core socket networking parameters can be accessed
1096via files in the directory
1097.IR /proc/sys/net/core/ .
77117f4f
MK
1098.TP
1099.I rmem_default
1100contains the default setting in bytes of the socket receive buffer.
1101.TP
1102.I rmem_max
1103contains the maximum socket receive buffer size in bytes which a user may
1104set by using the
1105.B SO_RCVBUF
1106socket option.
1107.TP
1108.I wmem_default
1109contains the default setting in bytes of the socket send buffer.
1110.TP
1111.I wmem_max
1112contains the maximum socket send buffer size in bytes which a user may
1113set by using the
1114.B SO_SNDBUF
1115socket option.
1116.TP
cabf996a 1117.IR message_cost " and " message_burst
77117f4f
MK
1118configure the token bucket filter used to load limit warning messages
1119caused by external network events.
1120.TP
1121.I netdev_max_backlog
1122Maximum number of packets in the global input queue.
1123.TP
1124.I optmem_max
1125Maximum length of ancillary data and user control data like the iovecs
1126per socket.
1127.\" netdev_fastroute is not documented because it is experimental
1128.SS Ioctls
1129These operations can be accessed using
1130.BR ioctl (2):
5711c04f 1131.PP
77117f4f 1132.in +4n
b8302363 1133.EX
77117f4f 1134.IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
b8302363 1135.EE
77117f4f
MK
1136.in
1137.TP
1138.B SIOCGSTAMP
1139Return a
1140.I struct timeval
1141with the receive timestamp of the last packet passed to the user.
1142This is useful for accurate round trip time measurements.
1143See
1144.BR setitimer (2)
1145for a description of
1146.IR "struct timeval" .
1147.\"
a47d370b 1148This ioctl should be used only if the socket options
77117f4f 1149.B SO_TIMESTAMP
a47d370b
AC
1150and
1151.B SO_TIMESTAMPNS
1152are not set on the socket.
77117f4f
MK
1153Otherwise, it returns the timestamp of the
1154last packet that was received while
1155.B SO_TIMESTAMP
a47d370b
AC
1156and
1157.B SO_TIMESTAMPNS
1158were not set, or it fails if no such packet has been received,
77117f4f
MK
1159(i.e.,
1160.BR ioctl (2)
1161returns \-1 with
1162.I errno
1163set to
1164.BR ENOENT ).
1165.TP
1166.B SIOCSPGRP
0d86f490 1167Set the process or process group that is to receive
77117f4f
MK
1168.B SIGIO
1169or
1170.B SIGURG
0d86f490 1171signals when I/O becomes possible or urgent data is available.
77117f4f
MK
1172The argument is a pointer to a
1173.IR pid_t .
0d86f490 1174For further details, see the description of
1ae6b2c7 1175.B F_SETOWN
0d86f490
MK
1176in
1177.BR fcntl (2).
77117f4f
MK
1178.TP
1179.B FIOASYNC
1180Change the
1181.B O_ASYNC
1182flag to enable or disable asynchronous I/O mode of the socket.
1183Asynchronous I/O mode means that the
1184.B SIGIO
1185signal or the signal set with
1186.B F_SETSIG
1187is raised when a new I/O event occurs.
1188.IP
1189Argument is an integer boolean flag.
1190(This operation is synonymous with the use of
1191.BR fcntl (2)
1192to set the
1193.B O_ASYNC
1194flag.)
1195.\"
1196.TP
1197.B SIOCGPGRP
1198Get the current process or process group that receives
1199.B SIGIO
1200or
1201.B SIGURG
1202signals,
1203or 0
1204when none is set.
1205.PP
1206Valid
1207.BR fcntl (2)
1208operations:
1209.TP
1210.B FIOGETOWN
1211The same as the
1212.B SIOCGPGRP
1213.BR ioctl (2).
1214.TP
1215.B FIOSETOWN
1216The same as the
1217.B SIOCSPGRP
1218.BR ioctl (2).
1219.SH VERSIONS
1220.B SO_BINDTODEVICE
1221was introduced in Linux 2.0.30.
1222.B SO_PASSCRED
1223is new in Linux 2.2.
5a2ff571
MK
1224The
1225.I /proc
159097d4 1226interfaces were introduced in Linux 2.2.
77117f4f
MK
1227.B SO_RCVTIMEO
1228and
1229.B SO_SNDTIMEO
1230are supported since Linux 2.3.41.
1231Earlier, timeouts were fixed to
1232a protocol-specific setting, and could not be read or written.
1233.SH NOTES
1234Linux assumes that half of the send/receive buffer is used for internal
5a2ff571
MK
1235kernel structures; thus the values in the corresponding
1236.I /proc
1237files are twice what can be observed on the wire.
5711c04f 1238.PP
2a479ee4 1239Linux will allow port reuse only with the
77117f4f
MK
1240.B SO_REUSEADDR
1241option
1242when this option was set both in the previous program that performed a
1243.BR bind (2)
3b777aff 1244to the port and in the program that wants to reuse the port.
77117f4f
MK
1245This differs from some implementations (e.g., FreeBSD)
1246where only the later program needs to set the
1247.B SO_REUSEADDR
1248option.
1249Typically this difference is invisible, since, for example, a server
1250program is designed to always set this option.
77117f4f
MK
1251.\" .SH AUTHORS
1252.\" This man page was written by Andi Kleen.
47297adb 1253.SH SEE ALSO
6e933659 1254.BR wireshark (1),
b1e6b7c7 1255.BR bpf (2),
f3277220 1256.BR connect (2),
0ec954ee 1257.BR getsockopt (2),
77117f4f
MK
1258.BR setsockopt (2),
1259.BR socket (2),
587f954b 1260.BR pcap (3),
a5409af7 1261.BR address_families (7),
77117f4f
MK
1262.BR capabilities (7),
1263.BR ddp (7),
1264.BR ip (7),
999f8568 1265.BR ipv6 (7),
0b8a4459 1266.BR packet (7),
c24995b9
MK
1267.BR tcp (7),
1268.BR udp (7),
6e933659
MK
1269.BR unix (7),
1270.BR tcpdump (8)