]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/socket.7
sched.7: Note error that occurs when writing invalid value to /proc/PID/autogroup
[thirdparty/man-pages.git] / man7 / socket.7
CommitLineData
77117f4f 1'\" t
77117f4f
MK
2.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3.\" and copyright (c) 1999 Matthew Wilcox.
2297bf0e 4.\"
00acdba1 5.\" %%%LICENSE_START(VERBATIM_ONE_PARA)
77117f4f
MK
6.\" Permission is granted to distribute possibly modified copies
7.\" of this page provided the header is included verbatim,
8.\" and in case of nontrivial modification author and date
9.\" of the modification is added to the header.
8ff7380d 10.\" %%%LICENSE_END
77117f4f
MK
11.\"
12.\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
13.\" Added description of SO_ACCEPTCONN
14.\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
15.\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
16.\" Added notes on capability requirements
17.\" A few small grammar fixes
dd2127e0
JE
18.\" 2010-06-13 Jan Engelhardt <jengelh@medozas.de>
19.\" Documented SO_DOMAIN and SO_PROTOCOL.
e57fe8ad 20.\"
8cf1b72a
MK
21.\" FIXME
22.\" The following are not yet documented:
e57fe8ad
MK
23.\"
24.\" SO_PEERNAME (2.4?)
25.\" get only
26.\" Seems to do something similar to getpeername(), but then
27.\" why is it necessary / how does it differ?
28.\"
29.\" SO_TIMESTAMPNS (2.6.22)
30.\" Documentation/networking/timestamping.txt
31.\" commit 92f37fd2ee805aa77925c1e64fd56088b46094fc
32.\" Author: Eric Dumazet <dada1@cosmosbay.com>
33.\"
34.\" SO_TIMESTAMPING (2.6.30)
35.\" Documentation/networking/timestamping.txt
36.\" commit cb9eff097831007afb30d64373f29d99825d0068
37.\" Author: Patrick Ohly <patrick.ohly@intel.com>
38.\"
39.\" SO_WIFI_STATUS (3.3)
40.\" commit 6e3e939f3b1bf8534b32ad09ff199d88800835a0
41.\" Author: Johannes Berg <johannes.berg@intel.com>
42.\" Also: SCM_WIFI_STATUS
43.\"
44.\" SO_NOFCS (3.4)
45.\" commit 3bdc0eba0b8b47797f4a76e377dd8360f317450f
46.\" Author: Ben Greear <greearb@candelatech.com>
47.\"
48.\" SO_GET_FILTER (3.8)
49.\" commit a8fc92778080c845eaadc369a0ecf5699a03bef0
50.\" Author: Pavel Emelyanov <xemul@parallels.com>
51.\"
52.\" SO_SELECT_ERR_QUEUE (3.10)
53.\" commit 7d4c04fc170087119727119074e72445f2bb192b
54.\" Author: Keller, Jacob E <jacob.e.keller@intel.com>
55.\"
56.\" SO_MAX_PACING_RATE (3.13)
57.\" commit 62748f32d501f5d3712a7c372bbb92abc7c62bc7
58.\" Author: Eric Dumazet <edumazet@google.com>
59.\"
60.\" SO_BPF_EXTENSIONS (3.14)
61.\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
62.\" Author: Michal Sekletar <msekleta@redhat.com>
77117f4f 63.\"
b8efb414 64.TH SOCKET 7 2016-10-08 Linux "Linux Programmer's Manual"
77117f4f
MK
65.SH NAME
66socket \- Linux socket interface
67.SH SYNOPSIS
68.B #include <sys/socket.h>
69.sp
c4e7b714 70.IB sockfd " = socket(int " socket_family ", int " socket_type ", int " protocol );
77117f4f
MK
71.SH DESCRIPTION
72This manual page describes the Linux networking socket layer user
73interface.
74The BSD compatible sockets
75are the uniform interface
76between the user process and the network protocol stacks in the kernel.
77The protocol modules are grouped into
78.I protocol families
2c212ccd 79such as
5019071b 80.BR AF_INET ", " AF_IPX ", and " AF_PACKET ,
77117f4f
MK
81and
82.I socket types
2c212ccd 83such as
77117f4f
MK
84.B SOCK_STREAM
85or
86.BR SOCK_DGRAM .
87See
88.BR socket (2)
89for more information on families and types.
c634028a 90.SS Socket-layer functions
77117f4f
MK
91These functions are used by the user process to send or receive packets
92and to do other socket operations.
93For more information see their respective manual pages.
94
95.BR socket (2)
96creates a socket,
97.BR connect (2)
98connects a socket to a remote socket address,
99the
100.BR bind (2)
101function binds a socket to a local socket address,
102.BR listen (2)
103tells the socket that new connections shall be accepted, and
104.BR accept (2)
105is used to get a new socket with a new incoming connection.
106.BR socketpair (2)
33a0ccb2 107returns two connected anonymous sockets (implemented only for a few
77117f4f 108local families like
d4c8c97c 109.BR AF_UNIX )
77117f4f
MK
110.PP
111.BR send (2),
112.BR sendto (2),
113and
114.BR sendmsg (2)
115send data over a socket, and
116.BR recv (2),
117.BR recvfrom (2),
118.BR recvmsg (2)
119receive data from a socket.
120.BR poll (2)
121and
122.BR select (2)
123wait for arriving data or a readiness to send data.
124In addition, the standard I/O operations like
125.BR write (2),
126.BR writev (2),
127.BR sendfile (2),
128.BR read (2),
129and
130.BR readv (2)
131can be used to read and write data.
132.PP
133.BR getsockname (2)
134returns the local socket address and
135.BR getpeername (2)
136returns the remote socket address.
137.BR getsockopt (2)
138and
139.BR setsockopt (2)
140are used to set or get socket layer or protocol options.
141.BR ioctl (2)
142can be used to set or read some other options.
143.PP
144.BR close (2)
145is used to close a socket.
146.BR shutdown (2)
147closes parts of a full-duplex socket connection.
148.PP
149Seeking, or calling
150.BR pread (2)
151or
152.BR pwrite (2)
c7094399 153with a nonzero position is not supported on sockets.
77117f4f 154.PP
ff40dbb3 155It is possible to do nonblocking I/O on sockets by setting the
77117f4f
MK
156.B O_NONBLOCK
157flag on a socket file descriptor using
158.BR fcntl (2).
159Then all operations that would block will (usually)
160return with
161.B EAGAIN
162(operation should be retried later);
163.BR connect (2)
164will return
165.B EINPROGRESS
166error.
167The user can then wait for various events via
168.BR poll (2)
169or
170.BR select (2).
171.TS
172tab(:) allbox;
173c s s
174l l l.
175I/O events
176Event:Poll flag:Occurrence
177Read:POLLIN:T{
178New data arrived.
179T}
180Read:POLLIN:T{
181A connection setup has been completed
182(for connection-oriented sockets)
183T}
184Read:POLLHUP:T{
185A disconnection request has been initiated by the other end.
186T}
187Read:POLLHUP:T{
188A connection is broken (only for connection-oriented protocols).
189When the socket is written
190.B SIGPIPE
191is also sent.
192T}
193Write:POLLOUT:T{
194Socket has enough send buffer space for writing new data.
195T}
196Read/Write:T{
bd8a7ca2 197POLLIN |
77117f4f
MK
198.br
199POLLOUT
200T}:T{
201An outgoing
202.BR connect (2)
203finished.
204T}
205Read/Write:POLLERR:An asynchronous error occurred.
206Read/Write:POLLHUP:The other end has shut down one direction.
207Exception:POLLPRI:T{
208Urgent data arrived.
209.B SIGURG
210is sent then.
211T}
212.\" FIXME . The following is not true currently:
213.\" It is no I/O event when the connection
214.\" is broken from the local end using
215.\" .BR shutdown (2)
216.\" or
217.\" .BR close (2).
218.TE
77117f4f
MK
219.PP
220An alternative to
221.BR poll (2)
222and
223.BR select (2)
224is to let the kernel inform the application about events
225via a
226.B SIGIO
227signal.
228For that the
229.B O_ASYNC
230flag must be set on a socket file descriptor via
231.BR fcntl (2)
232and a valid signal handler for
233.B SIGIO
234must be installed via
235.BR sigaction (2).
236See the
237.I Signals
238discussion below.
b1b84b7a
MK
239.SS Socket address structures
240Each socket domain has its own format for socket addresses,
241with a domain-specific address structure.
242Each of these structures begins with an
243integer "family" field (typed as
244.IR sa_family_t )
245that indicates the type of the address structure.
246This allows
247the various system calls (e.g.,
248.BR connect (2),
249.BR bind (2),
250.BR accept (2),
251.BR getsockname (2),
252.BR getpeername (2)),
253which are generic to all socket domains,
254to determine the domain of a particular socket address.
255
256To allow any type of socket address to be passed to
257interfaces in the sockets API,
258the type
259.IR "struct sockaddr"
260is defined.
e6d86b41 261The purpose of this type is purely to allow casting of
b1b84b7a 262domain-specific socket address types to a "generic" type,
e6d86b41 263so as to avoid compiler warnings about type mismatches in
b1b84b7a
MK
264calls to the sockets API.
265
266In addition, the sockets API provides the data type
267.IR "struct sockaddr_storage".
268This type
269is suitable to accommodate all supported domain-specific socket
270address structures; it is large enough and is aligned properly.
e6d86b41 271(In particular, it is large enough to hold
b1b84b7a
MK
272IPv6 socket addresses.)
273The structure includes the following field, which can be used to identify
274the type of socket address actually stored in the structure:
275
276.in +4n
277.nf
278 sa_family_t ss_family;
279.fi
280.in
281
e6d86b41 282The
b1b84b7a
MK
283.I sockaddr_storage
284structure is useful in programs that must handle socket addresses
285in a generic way
286(e.g., programs that must deal with both IPv4 and IPv6 socket addresses).
c634028a 287.SS Socket options
7d247ee8 288The socket options listed below can be set by using
77117f4f
MK
289.BR setsockopt (2)
290and read with
291.BR getsockopt (2)
292with the socket level set to
293.B SOL_SOCKET
7d247ee8
MK
294for all sockets.
295Unless otherwise noted,
296.I optval
297is a pointer to an
298.IR int .
bea08fec 299.\" FIXME .
e2ec4f17
MK
300.\" In the list below, the text used to describe argument types
301.\" for each socket option should be more consistent
302.\"
77117f4f
MK
303.\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
304.\" W R Stevens, UNPv1
305.TP
306.B SO_ACCEPTCONN
307Returns a value indicating whether or not this socket has been marked
308to accept connections with
309.BR listen (2).
310The value 0 indicates that this is not a listening socket,
311the value 1 indicates that this is a listening socket.
fa574567 312This socket option is read-only.
77117f4f 313.TP
096da110
MK
314.BR SO_ATTACH_FILTER " (since Linux 2.2), " SO_ATTACH_BPF " (since Linux 3.19)"
315Attach a classic BPF
316.RB ( SO_ATTACH_FILTER )
317or an extended BPF
318.RB ( SO_ATTACH_BPF )
319program to the socket for use as a filter of incoming packets.
320A packet will be dropped if the filter program returns zero.
321If the filter program returns a
322non-zero value which is less than the packet's data length,
323the packet will be truncated to the length returned.
324If the value returned by the filter is greater than or equal to the
325packet's data length, the packet is allowed to proceed unmodified.
1fa871f5
CG
326
327The argument for
328.BR SO_ATTACH_FILTER
329is a
330.I sock_fprog
096da110
MK
331structure, defined in
332.IR <linux/filter.h> :
1fa871f5
CG
333.sp
334.in +4n
335.nf
336struct sock_fprog {
337 unsigned short len;
338 struct sock_filter *filter;
339};
340.fi
341.in
342.IP
343The argument for
344.BR SO_ATTACH_BPF
345is a file descriptor returned by the
346.BR bpf (2)
347system call and must refer to a program of type
348.BR BPF_PROG_TYPE_SOCKET_FILTER.
1fa871f5 349
096da110
MK
350These options may be set multiple times for a given socket,
351each time replacing the previous filter program.
352The classic and extended versions may be called on the same socket,
353but the previous filter will always be replaced such that a socket
354never has more than one filter defined.
355
356Both classic and extended BPF are explained in the kernel source file
1fa871f5
CG
357.I Documentation/networking/filter.txt
358.TP
096da110 359.BR SO_ATTACH_REUSEPORT_CBPF ", " SO_ATTACH_REUSEPORT_EBPF
1fa871f5
CG
360For use with the
361.BR SO_REUSEPORT
096da110
MK
362option, these options allow the user to set a classic BPF
363.RB ( SO_ATTACH_REUSEPORT_CBPF )
364or an extended BPF
365.RB ( SO_ATTACH_REUSEPORT_EBPF )
366program which defines how packets are assigned to
1fa871f5
CG
367the sockets in the reuseport group (that is, all sockets which have
368.BR SO_REUSEPORT
096da110
MK
369set and are using the same local address to receive packets).
370
371The BPF program must return an index between 0 and N\-1 representing
372the socket which should receive the packet
373(where N is the number of sockets in the group).
374If the BPF program returns an invalid index,
375socket selection will fall back to the plain
1fa871f5
CG
376.BR SO_REUSEPORT
377mechanism.
378
379Sockets are numbered in the order in which they are added to the group
380(that is, the order of
381.BR bind (2)
382calls for UDP sockets or the order of
383.BR listen (2)
096da110
MK
384calls for TCP sockets).
385New sockets added to a reuseport group will inherit the BPF program.
386When a socket is removed from a reuseport group (via
387.BR close (2)),
1fa871f5
CG
388the last socket in the group will be moved into the closed socket's
389position.
390
096da110
MK
391These options may be set repeatedly at any time on any socket in the group
392to replace the current BPF program used by all sockets in the group.
393
1fa871f5 394.BR SO_ATTACH_REUSEPORT_CBPF
096da110 395takes the same argument type as
1fa871f5
CG
396.BR SO_ATTACH_FILTER
397and
398.BR SO_ATTACH_REUSEPORT_EBPF
096da110 399takes the same argument type as
1fa871f5 400.BR SO_ATTACH_BPF.
096da110
MK
401
402UDP support for this feature is available since Linux 4.5;
403TCP support is available since Linux 4.6.
1fa871f5 404.TP
77117f4f
MK
405.B SO_BINDTODEVICE
406Bind this socket to a particular device like \(lqeth0\(rq,
407as specified in the passed interface name.
408If the
409name is an empty string or the option length is zero, the socket device
410binding is removed.
d0cb7cc6 411The passed option is a variable-length null-terminated
77117f4f
MK
412interface name string with the maximum size of
413.BR IFNAMSIZ .
414If a socket is bound to an interface,
415only packets received from that particular interface are processed by the
416socket.
33a0ccb2 417Note that this works only for some socket types, particularly
77117f4f
MK
418.B AF_INET
419sockets.
420It is not supported for packet sockets (use normal
56bf2613 421.BR bind (2)
77117f4f 422there).
83e03f72 423
757716c7
MK
424Before Linux 3.8,
425this socket option could be set, but could not retrieved with
426.BR getsockopt (2).
427Since Linux 3.8, it is readable.
428The
429.I optlen
b072a788 430argument should contain the buffer size available
757716c7 431to receive the device name and is recommended to be
83e03f72 432.BR IFNAMSZ
757716c7
MK
433bytes.
434The real device name length is reported back in the
435.I optlen
436argument.
77117f4f
MK
437.TP
438.B SO_BROADCAST
439Set or get the broadcast flag.
42bd5b3d 440When enabled, datagram sockets are allowed to send
77117f4f
MK
441packets to a broadcast address.
442This option has no effect on stream-oriented sockets.
443.TP
444.B SO_BSDCOMPAT
445Enable BSD bug-to-bug compatibility.
446This is used by the UDP protocol module in Linux 2.0 and 2.2.
eebf8c09 447If enabled, ICMP errors received for a UDP socket will not be passed
77117f4f
MK
448to the user program.
449In later kernel versions, support for this option has been phased out:
450Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
451(printk()) if a program uses this option.
452Linux 2.0 also enabled BSD bug-to-bug compatibility
453options (random header changing, skipping of the broadcast flag) for raw
454sockets with this option, but that was removed in Linux 2.2.
455.TP
456.B SO_DEBUG
457Enable socket debugging.
d7087783 458Allowed only for processes with the
77117f4f
MK
459.B CAP_NET_ADMIN
460capability or an effective user ID of 0.
461.TP
096da110
MK
462.BR SO_DETACH_FILTER " (since Linux 2.2), " SO_DETACH_BPF " (since Linux 3.19)"
463These two options, which are synonyms,
464may be used to remove the classic or extended BPF
465program attached to a socket with either
1fa871f5
CG
466.BR SO_ATTACH_FILTER
467or
096da110 468.BR SO_ATTACH_BPF .
1fa871f5 469The option value is ignored.
1fa871f5 470.TP
dd2127e0
JE
471.BR SO_DOMAIN " (since Linux 2.6.32)"
472Retrieves the socket domain as an integer, returning a value such as
473.BR AF_INET6 .
474See
475.BR socket (2)
476for details.
fa574567 477This socket option is read-only.
dd2127e0 478.TP
77117f4f
MK
479.B SO_ERROR
480Get and clear the pending socket error.
fa574567 481This socket option is read-only.
77117f4f
MK
482Expects an integer.
483.TP
484.B SO_DONTROUTE
33a0ccb2 485Don't send via a gateway, send only to directly connected hosts.
77117f4f
MK
486The same effect can be achieved by setting the
487.B MSG_DONTROUTE
488flag on a socket
489.BR send (2)
490operation.
491Expects an integer boolean flag.
492.TP
493.B SO_KEEPALIVE
494Enable sending of keep-alive messages on connection-oriented sockets.
495Expects an integer boolean flag.
496.TP
497.B SO_LINGER
498Sets or gets the
499.B SO_LINGER
500option.
501The argument is a
502.I linger
503structure.
504.sp
505.in +4n
506.nf
507struct linger {
508 int l_onoff; /* linger active */
509 int l_linger; /* how many seconds to linger for */
510};
511.fi
512.in
513.IP
514When enabled, a
515.BR close (2)
516or
517.BR shutdown (2)
518will not return until all queued messages for the socket have been
519successfully sent or the linger timeout has been reached.
520Otherwise,
521the call returns immediately and the closing is done in the background.
522When the socket is closed as part of
523.BR exit (2),
524it always lingers in the background.
525.TP
1fa871f5 526.B SO_LOCK_FILTER
096da110 527.\" commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
96d9edea 528When set, this option will prevent
096da110
MK
529changing the filters associated with the socket.
530These filters include any set using the socket options
1fa871f5
CG
531.BR SO_ATTACH_FILTER,
532.BR SO_ATTACH_BPF,
533.BR SO_ATTACH_REUSEPORT_CBPF
096da110
MK
534and
535.BR SO_ATTACH_REUSEPORT_EPBF .
536
59ac6f2f
MK
537The typical use case is for a privileged process to set up a raw socket
538(an operation that requires the
539.BR CAP_NET_RAW
540capability), apply a restrictive filter, set the
541.BR SO_LOCK_FILTER
542option,
1fa871f5 543and then either drop its privileges or pass the socket file descriptor
59ac6f2f 544to an unprivileged process via a UNIX domain socket.
096da110
MK
545
546Once the
547.BR SO_LOCK_FILTER
96d9edea 548option has been enabled, attempts to change or remove the filter
096da110 549attached to a socket, or to disable the
1fa871f5 550.BR SO_LOCK_FILTER
096da110
MK
551option will fail with the error
552.BR EPERM .
1fa871f5 553.TP
cf0a1f7c
M
554.BR SO_MARK " (since Linux 2.6.25)"
555.\" commit 4a19ec5800fc3bb64e2d87c4d9fdd9e636086fe0
556.\" and 914a9ab386a288d0f22252fc268ecbc048cdcbd5
557Set the mark for each packet sent through this socket
558(similar to the netfilter MARK target but socket-based).
559Changing the mark can be used for mark-based
560routing without netfilter or for packet filtering.
561Setting this option requires the
562.B CAP_NET_ADMIN
563capability.
564.TP
77117f4f
MK
565.B SO_OOBINLINE
566If this option is enabled,
567out-of-band data is directly placed into the receive data stream.
2b9b829d 568Otherwise, out-of-band data is passed only when the
77117f4f
MK
569.B MSG_OOB
570flag is set during receiving.
571.\" don't document it because it can do too much harm.
572.\".B SO_NO_CHECK
5d75650a
MK
573.\" The kernel has support for the SO_NO_CHECK socket
574.\" option (boolean: 0 == default, calculate checksum on xmit,
575.\" 1 == do not calculate checksum on xmit).
576.\" Additional note from Andi Kleen on SO_NO_CHECK (2010-08-30)
577.\" On Linux UDP checksums are essentially free and there's no reason
578.\" to turn them off and it would disable another safety line.
579.\" That is why I didn't document the option.
77117f4f
MK
580.TP
581.B SO_PASSCRED
582Enable or disable the receiving of the
583.B SCM_CREDENTIALS
584control message.
585For more information see
586.BR unix (7).
587.\" FIXME Document SO_PASSSEC, added in 2.6.18; there is some info
588.\" in the 2.6.18 ChangeLog
589.TP
3f1e877d
MK
590.BR SO_PEEK_OFF " (since Linux 3.4)"
591.\" commit ef64a54f6e558155b4f149bb10666b9e914b6c54
592This option, which is currently supported only for
593.BR unix (7)
594sockets, sets the value of the "peek offset" for the
7f4cd55d 595.BR recv (2)
3f1e877d
MK
596system call when used with
597.BR MSG_PEEK
598flag.
599
600When this option is set to a negative value
601(it is set to \-1 for all new sockets),
602traditional behavior is provided:
7f4cd55d 603.BR recv (2)
3f1e877d
MK
604with the
605.BR MSG_PEEK
606flag will peek data from the front of the queue.
607
608When the option is set to a value greater than or equal to zero,
609then the next peek at data queued in the socket will occur at
610the byte offset specified by the option value.
611At the same time, the "peek offset" will be
612incremented by the number of bytes that were peeked from the queue,
cac3a0c5 613so that a subsequent peek will return the next data in the queue.
3f1e877d
MK
614
615If data is removed from the front of the queue via a call to
616.BR recv (2)
617(or similar) without the
618.BR MSG_PEEK
619flag, the "peek offset" will be decreased by the number of bytes removed.
620In other words, receiving data without the
621.B MSG_PEEK
622flag will cause the "peek offset" to be adjusted to maintain
623the correct relative position in the queued data,
624so that a subsequent peek will retrieve the data that would have been
625retrieved had the data not been removed.
626
627For datagram sockets, if the "peek offset" points to the middle of a packet,
628the data returned will be marked with the
629.BR MSG_TRUNC
630flag.
631
632The following example serves to illustrate the use of
633.BR SO_PEEK_OFF .
634Suppose a stream socket has the following queued input data:
635
636 aabbccddeeff
3f1e877d
MK
637.IP
638The following sequence of
639.BR recv (2)
640calls would have the effect noted in the comments:
641
642.in +4n
643.nf
644int ov = 4; // Set peek offset to 4
645setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
646
647recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
648recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
649recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
650recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
651.fi
652.in
653.TP
77117f4f
MK
654.B SO_PEERCRED
655Return the credentials of the foreign process connected to this socket.
33a0ccb2 656This is possible only for connected
d4c8c97c 657.B AF_UNIX
77117f4f 658stream sockets and
d4c8c97c 659.B AF_UNIX
77117f4f
MK
660stream and datagram socket pairs created using
661.BR socketpair (2);
662see
663.BR unix (7).
664The returned credentials are those that were in effect at the time
665of the call to
666.BR connect (2)
667or
668.BR socketpair (2).
2713189f 669The argument is a
77117f4f 670.I ucred
2713189f 671structure; define the
14e9f7cf 672.B _GNU_SOURCE
2713189f
MK
673feature test macro to obtain the definition of that structure from
674.IR <sys/socket.h> .
fa574567 675This socket option is read-only.
77117f4f
MK
676.TP
677.B SO_PRIORITY
678Set the protocol-defined priority for all packets to be sent on
679this socket.
680Linux uses this value to order the networking queues:
681packets with a higher priority may be processed first depending
682on the selected device queueing discipline.
3a7ee744
MK
683.\" For
684.\" .BR ip (7),
685.\" this also sets the IP type-of-service (TOS) field for outgoing packets.
77117f4f
MK
686Setting a priority outside the range 0 to 6 requires the
687.B CAP_NET_ADMIN
688capability.
689.TP
dd2127e0
JE
690.BR SO_PROTOCOL " (since Linux 2.6.32)"
691Retrieves the socket protocol as an integer, returning a value such as
692.BR IPPROTO_SCTP .
693See
694.BR socket (2)
695for details.
fa574567 696This socket option is read-only.
dd2127e0 697.TP
77117f4f
MK
698.B SO_RCVBUF
699Sets or gets the maximum socket receive buffer in bytes.
700The kernel doubles this value (to allow space for bookkeeping overhead)
701when it is set using
702.\" Most (all?) other implementations do not do this -- MTK, Dec 05
703.BR setsockopt (2),
704and this doubled value is returned by
705.BR getsockopt (2).
3de2d3be 706.\" The following thread on LMKL is quite informative:
a1fa36af 707.\" getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behavior
3de2d3be
MK
708.\" 17 July 2012
709.\" http://thread.gmane.org/gmane.linux.kernel/1328935
77117f4f 710The default value is set by the
5a2ff571
MK
711.I /proc/sys/net/core/rmem_default
712file, and the maximum allowed value is set by the
713.I /proc/sys/net/core/rmem_max
714file.
77117f4f
MK
715The minimum (doubled) value for this option is 256.
716.TP
717.BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
718Using this socket option, a privileged
719.RB ( CAP_NET_ADMIN )
720process can perform the same task as
721.BR SO_RCVBUF ,
722but the
723.I rmem_max
724limit can be overridden.
725.TP
726.BR SO_RCVLOWAT " and " SO_SNDLOWAT
727Specify the minimum number of bytes in the buffer until the socket layer
728will pass the data to the protocol
729.RB ( SO_SNDLOWAT )
730or the user on receiving
731.RB ( SO_RCVLOWAT ).
732These two values are initialized to 1.
733.B SO_SNDLOWAT
734is not changeable on Linux
735.RB ( setsockopt (2)
736fails with the error
737.BR ENOPROTOOPT ).
738.B SO_RCVLOWAT
739is changeable
740only since Linux 2.4.
741The
742.BR select (2)
743and
744.BR poll (2)
745system calls currently do not respect the
746.B SO_RCVLOWAT
747setting on Linux,
748and mark a socket readable when even a single byte of data is available.
749A subsequent read from the socket will block until
750.B SO_RCVLOWAT
751bytes are available.
752.\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
753.\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
754.TP
755.BR SO_RCVTIMEO " and " SO_SNDTIMEO
756.\" Not implemented in 2.0.
757.\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
758.\" Implemented in 2.3.41 for setsockopt, and actually used.
759Specify the receiving or sending timeouts until reporting an error.
760The argument is a
761.IR "struct timeval" .
762If an input or output function blocks for this period of time, and
763data has been sent or received, the return value of that function
764will be the amount of data transferred; if no data has been transferred
56db9d31 765and the timeout has been reached, then \-1 is returned with
77117f4f
MK
766.I errno
767set to
f3277220 768.BR EAGAIN
77117f4f 769or
f3277220 770.BR EWOULDBLOCK ,
77117f4f 771.\" in fact to EAGAIN
f3277220
AK
772or
773.B EINPROGRESS
774(for
775.BR connect (2))
ff40dbb3 776just as if the socket was specified to be nonblocking.
eebf8c09 777If the timeout is set to zero (the default),
77117f4f
MK
778then the operation will never timeout.
779Timeouts only have effect for system calls that perform socket I/O (e.g.,
780.BR read (2),
781.BR recvmsg (2),
782.BR send (2),
783.BR sendmsg (2));
784timeouts have no effect for
785.BR select (2),
786.BR poll (2),
787.BR epoll_wait (2),
02f95a31 788and so on.
77117f4f
MK
789.TP
790.B SO_REUSEADDR
c28f1dd3
MK
791.\" commit c617f398edd4db2b8567a28e899a88f8f574798d
792.\" https://lwn.net/Articles/542629/
77117f4f
MK
793Indicates that the rules used in validating addresses supplied in a
794.BR bind (2)
795call should allow reuse of local addresses.
796For
d4c8c97c 797.B AF_INET
77117f4f
MK
798sockets this
799means that a socket may bind, except when there
800is an active listening socket bound to the address.
801When the listening socket is bound to
802.B INADDR_ANY
803with a specific port then it is not possible
804to bind to this port for any local address.
805Argument is an integer boolean flag.
806.TP
75979920 807.BR SO_REUSEPORT " (since Linux 3.9)"
11af2d4b 808Permits multiple
75979920
DW
809.B AF_INET
810or
811.B AF_INET6
11af2d4b
MK
812sockets to be bound to an identical socket address.
813This option must be set on each socket (including the first socket)
814prior to calling
815.BR bind (2)
816on the socket.
817To prevent port hijacking,
818all of the processes binding to the same address must have the same
819effective UID.
820This option can be employed with both TCP and UDP sockets.
821
822For TCP sockets, this option allows
75979920
DW
823.BR accept (2)
824load distribution in a multi-threaded server to be improved by
c28f1dd3 825using a distinct listener socket for each thread.
11af2d4b
MK
826This provides improved load distribution as compared
827to traditional techniques such using a single
828.BR accept (2)ing
829thread that distributes connections,
830or having multiple threads that compete to
831.BR accept (2)
832from the same socket.
833
834For UDP sockets,
835the use of this option can provide better distribution
836of incoming datagrams to multiple processes (or threads) as compared
837to the traditional technique of having multiple processes
838compete to receive datagrams on the same socket.
75979920 839.TP
9cad276e 840.BR SO_RXQ_OVFL " (since Linux 2.6.33)"
f4c644e5 841.\" commit 3b885787ea4112eaa80945999ea0901bf742707f
91edd9ad 842Indicates that an unsigned 32-bit value ancillary message (cmsg)
f4c644e5
MK
843should be attached to received skbs indicating
844the number of packets dropped by the socket between
3eba8ff1 845the last received packet and this received packet.
8cfbaec5 846.TP
77117f4f
MK
847.B SO_SNDBUF
848Sets or gets the maximum socket send buffer in bytes.
849The kernel doubles this value (to allow space for bookkeeping overhead)
850when it is set using
851.\" Most (all?) other implementations do not do this -- MTK, Dec 05
3de2d3be 852.\" See also the comment to SO_RCVBUF (17 Jul 2012 LKML mail)
77117f4f
MK
853.BR setsockopt (2),
854and this doubled value is returned by
855.BR getsockopt (2).
856The default value is set by the
5a2ff571
MK
857.I /proc/sys/net/core/wmem_default
858file and the maximum allowed value is set by the
859.I /proc/sys/net/core/wmem_max
860file.
77117f4f
MK
861The minimum (doubled) value for this option is 2048.
862.TP
863.BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
864Using this socket option, a privileged
865.RB ( CAP_NET_ADMIN )
866process can perform the same task as
867.BR SO_SNDBUF ,
868but the
869.I wmem_max
870limit can be overridden.
871.TP
872.B SO_TIMESTAMP
873Enable or disable the receiving of the
874.B SO_TIMESTAMP
875control message.
876The timestamp control message is sent with level
877.B SOL_SOCKET
878and the
879.I cmsg_data
880field is a
881.I "struct timeval"
882indicating the
883reception time of the last packet passed to the user in this call.
884See
885.BR cmsg (3)
886for details on control messages.
887.TP
888.B SO_TYPE
fa574567 889Gets the socket type as an integer (e.g.,
77117f4f 890.BR SOCK_STREAM ).
fa574567 891This socket option is read-only.
8e57271a 892.TP
1260477b 893.BR SO_BUSY_POLL " (since Linux 3.11)"
8e57271a 894Sets the approximate time in microseconds to busy poll on a blocking receive
049be102
MK
895when there is no data.
896Increasing this value requires
84fc2a6e 897.BR CAP_NET_ADMIN .
8e57271a
ET
898The default for this option is controlled by the
899.I /proc/sys/net/core/busy_read
84fc2a6e 900file.
8e57271a 901
84fc2a6e 902The value in the
8e57271a 903.I /proc/sys/net/core/busy_poll
84fc2a6e 904file determines how long
8e57271a 905.BR select (2)
84fc2a6e 906and
8e57271a 907.BR poll (2)
84fc2a6e 908will busy poll when they operate on sockets with
8e57271a
ET
909.BR SO_BUSY_POLL
910set and no events to report are found.
911
049be102
MK
912In both cases,
913busy polling will only be done when the socket last received data
8e57271a
ET
914from a network device that supports this option.
915
049be102
MK
916While busy polling may improve latency of some applications,
917care must be taken when using it since this will increase
918both CPU utilization and power usage.
77117f4f
MK
919.SS Signals
920When writing onto a connection-oriented socket that has been shut down
921(by the local or the remote end)
922.B SIGPIPE
923is sent to the writing process and
924.B EPIPE
925is returned.
926The signal is not sent when the write call
927specified the
928.B MSG_NOSIGNAL
929flag.
930.PP
931When requested with the
932.B FIOSETOWN
933.BR fcntl (2)
934or
935.B SIOCSPGRP
936.BR ioctl (2),
937.B SIGIO
938is sent when an I/O event occurs.
939It is possible to use
940.BR poll (2)
941or
942.BR select (2)
943in the signal handler to find out which socket the event occurred on.
944An alternative (in Linux 2.2) is to set a real-time signal using the
945.B F_SETSIG
946.BR fcntl (2);
947the handler of the real time signal will be called with
948the file descriptor in the
949.I si_fd
950field of its
951.IR siginfo_t .
952See
953.BR fcntl (2)
954for more information.
955.PP
956Under some circumstances (e.g., multiple processes accessing a
957single socket), the condition that caused the
958.B SIGIO
959may have already disappeared when the process reacts to the signal.
960If this happens, the process should wait again because Linux
961will resend the signal later.
c634028a 962.\" .SS Ancillary messages
5a2ff571
MK
963.SS /proc interfaces
964The core socket networking parameters can be accessed
965via files in the directory
966.IR /proc/sys/net/core/ .
77117f4f
MK
967.TP
968.I rmem_default
969contains the default setting in bytes of the socket receive buffer.
970.TP
971.I rmem_max
972contains the maximum socket receive buffer size in bytes which a user may
973set by using the
974.B SO_RCVBUF
975socket option.
976.TP
977.I wmem_default
978contains the default setting in bytes of the socket send buffer.
979.TP
980.I wmem_max
981contains the maximum socket send buffer size in bytes which a user may
982set by using the
983.B SO_SNDBUF
984socket option.
985.TP
cabf996a 986.IR message_cost " and " message_burst
77117f4f
MK
987configure the token bucket filter used to load limit warning messages
988caused by external network events.
989.TP
990.I netdev_max_backlog
991Maximum number of packets in the global input queue.
992.TP
993.I optmem_max
994Maximum length of ancillary data and user control data like the iovecs
995per socket.
996.\" netdev_fastroute is not documented because it is experimental
997.SS Ioctls
998These operations can be accessed using
999.BR ioctl (2):
1000
1001.in +4n
1002.nf
1003.IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
1004.fi
1005.in
1006.TP
1007.B SIOCGSTAMP
1008Return a
1009.I struct timeval
1010with the receive timestamp of the last packet passed to the user.
1011This is useful for accurate round trip time measurements.
1012See
1013.BR setitimer (2)
1014for a description of
1015.IR "struct timeval" .
1016.\"
33a0ccb2 1017This ioctl should be used only if the socket option
77117f4f
MK
1018.B SO_TIMESTAMP
1019is not set on the socket.
1020Otherwise, it returns the timestamp of the
1021last packet that was received while
1022.B SO_TIMESTAMP
1023was not set, or it fails if no such packet has been received,
1024(i.e.,
1025.BR ioctl (2)
1026returns \-1 with
1027.I errno
1028set to
1029.BR ENOENT ).
1030.TP
1031.B SIOCSPGRP
0d86f490 1032Set the process or process group that is to receive
77117f4f
MK
1033.B SIGIO
1034or
1035.B SIGURG
0d86f490 1036signals when I/O becomes possible or urgent data is available.
77117f4f
MK
1037The argument is a pointer to a
1038.IR pid_t .
0d86f490
MK
1039For further details, see the description of
1040.BR F_SETOWN
1041in
1042.BR fcntl (2).
77117f4f
MK
1043.TP
1044.B FIOASYNC
1045Change the
1046.B O_ASYNC
1047flag to enable or disable asynchronous I/O mode of the socket.
1048Asynchronous I/O mode means that the
1049.B SIGIO
1050signal or the signal set with
1051.B F_SETSIG
1052is raised when a new I/O event occurs.
1053.IP
1054Argument is an integer boolean flag.
1055(This operation is synonymous with the use of
1056.BR fcntl (2)
1057to set the
1058.B O_ASYNC
1059flag.)
1060.\"
1061.TP
1062.B SIOCGPGRP
1063Get the current process or process group that receives
1064.B SIGIO
1065or
1066.B SIGURG
1067signals,
1068or 0
1069when none is set.
1070.PP
1071Valid
1072.BR fcntl (2)
1073operations:
1074.TP
1075.B FIOGETOWN
1076The same as the
1077.B SIOCGPGRP
1078.BR ioctl (2).
1079.TP
1080.B FIOSETOWN
1081The same as the
1082.B SIOCSPGRP
1083.BR ioctl (2).
1084.SH VERSIONS
1085.B SO_BINDTODEVICE
1086was introduced in Linux 2.0.30.
1087.B SO_PASSCRED
1088is new in Linux 2.2.
5a2ff571
MK
1089The
1090.I /proc
159097d4 1091interfaces were introduced in Linux 2.2.
77117f4f
MK
1092.B SO_RCVTIMEO
1093and
1094.B SO_SNDTIMEO
1095are supported since Linux 2.3.41.
1096Earlier, timeouts were fixed to
1097a protocol-specific setting, and could not be read or written.
1098.SH NOTES
1099Linux assumes that half of the send/receive buffer is used for internal
5a2ff571
MK
1100kernel structures; thus the values in the corresponding
1101.I /proc
1102files are twice what can be observed on the wire.
77117f4f 1103
2a479ee4 1104Linux will allow port reuse only with the
77117f4f
MK
1105.B SO_REUSEADDR
1106option
1107when this option was set both in the previous program that performed a
1108.BR bind (2)
3b777aff 1109to the port and in the program that wants to reuse the port.
77117f4f
MK
1110This differs from some implementations (e.g., FreeBSD)
1111where only the later program needs to set the
1112.B SO_REUSEADDR
1113option.
1114Typically this difference is invisible, since, for example, a server
1115program is designed to always set this option.
77117f4f
MK
1116.\" .SH AUTHORS
1117.\" This man page was written by Andi Kleen.
47297adb 1118.SH SEE ALSO
6e933659 1119.BR wireshark (1),
b1e6b7c7 1120.BR bpf (2),
f3277220 1121.BR connect (2),
0ec954ee 1122.BR getsockopt (2),
77117f4f
MK
1123.BR setsockopt (2),
1124.BR socket (2),
587f954b 1125.BR pcap (3),
77117f4f
MK
1126.BR capabilities (7),
1127.BR ddp (7),
1128.BR ip (7),
0b8a4459 1129.BR packet (7),
c24995b9
MK
1130.BR tcp (7),
1131.BR udp (7),
6e933659
MK
1132.BR unix (7),
1133.BR tcpdump (8)