]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/socket.7
socket.7: Add description of SO_SELECT_ERR_QUEUE
[thirdparty/man-pages.git] / man7 / socket.7
1 '\" t
2 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3 .\" and copyright (c) 1999 Matthew Wilcox.
4 .\"
5 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
6 .\" Permission is granted to distribute possibly modified copies
7 .\" of this page provided the header is included verbatim,
8 .\" and in case of nontrivial modification author and date
9 .\" of the modification is added to the header.
10 .\" %%%LICENSE_END
11 .\"
12 .\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
13 .\" Added description of SO_ACCEPTCONN
14 .\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
15 .\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
16 .\" Added notes on capability requirements
17 .\" A few small grammar fixes
18 .\" 2010-06-13 Jan Engelhardt <jengelh@medozas.de>
19 .\" Documented SO_DOMAIN and SO_PROTOCOL.
20 .\"
21 .\" FIXME
22 .\" The following are not yet documented:
23 .\"
24 .\" SO_PEERNAME (2.4?)
25 .\" get only
26 .\" Seems to do something similar to getpeername(), but then
27 .\" why is it necessary / how does it differ?
28 .\"
29 .\" SO_TIMESTAMPNS (2.6.22)
30 .\" Documentation/networking/timestamping.txt
31 .\" commit 92f37fd2ee805aa77925c1e64fd56088b46094fc
32 .\" Author: Eric Dumazet <dada1@cosmosbay.com>
33 .\"
34 .\" SO_TIMESTAMPING (2.6.30)
35 .\" Documentation/networking/timestamping.txt
36 .\" commit cb9eff097831007afb30d64373f29d99825d0068
37 .\" Author: Patrick Ohly <patrick.ohly@intel.com>
38 .\"
39 .\" SO_WIFI_STATUS (3.3)
40 .\" commit 6e3e939f3b1bf8534b32ad09ff199d88800835a0
41 .\" Author: Johannes Berg <johannes.berg@intel.com>
42 .\" Also: SCM_WIFI_STATUS
43 .\"
44 .\" SO_NOFCS (3.4)
45 .\" commit 3bdc0eba0b8b47797f4a76e377dd8360f317450f
46 .\" Author: Ben Greear <greearb@candelatech.com>
47 .\"
48 .\" SO_GET_FILTER (3.8)
49 .\" commit a8fc92778080c845eaadc369a0ecf5699a03bef0
50 .\" Author: Pavel Emelyanov <xemul@parallels.com>
51 .\"
52 .\" SO_MAX_PACING_RATE (3.13)
53 .\" commit 62748f32d501f5d3712a7c372bbb92abc7c62bc7
54 .\" Author: Eric Dumazet <edumazet@google.com>
55 .\"
56 .\" SO_BPF_EXTENSIONS (3.14)
57 .\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
58 .\" Author: Michal Sekletar <msekleta@redhat.com>
59 .\"
60 .TH SOCKET 7 2019-08-02 Linux "Linux Programmer's Manual"
61 .SH NAME
62 socket \- Linux socket interface
63 .SH SYNOPSIS
64 .B #include <sys/socket.h>
65 .PP
66 .IB sockfd " = socket(int " socket_family ", int " socket_type ", int " protocol );
67 .SH DESCRIPTION
68 This manual page describes the Linux networking socket layer user
69 interface.
70 The BSD compatible sockets
71 are the uniform interface
72 between the user process and the network protocol stacks in the kernel.
73 The protocol modules are grouped into
74 .I protocol families
75 such as
76 .BR AF_INET ", " AF_IPX ", and " AF_PACKET ,
77 and
78 .I socket types
79 such as
80 .B SOCK_STREAM
81 or
82 .BR SOCK_DGRAM .
83 See
84 .BR socket (2)
85 for more information on families and types.
86 .SS Socket-layer functions
87 These functions are used by the user process to send or receive packets
88 and to do other socket operations.
89 For more information see their respective manual pages.
90 .PP
91 .BR socket (2)
92 creates a socket,
93 .BR connect (2)
94 connects a socket to a remote socket address,
95 the
96 .BR bind (2)
97 function binds a socket to a local socket address,
98 .BR listen (2)
99 tells the socket that new connections shall be accepted, and
100 .BR accept (2)
101 is used to get a new socket with a new incoming connection.
102 .BR socketpair (2)
103 returns two connected anonymous sockets (implemented only for a few
104 local families like
105 .BR AF_UNIX )
106 .PP
107 .BR send (2),
108 .BR sendto (2),
109 and
110 .BR sendmsg (2)
111 send data over a socket, and
112 .BR recv (2),
113 .BR recvfrom (2),
114 .BR recvmsg (2)
115 receive data from a socket.
116 .BR poll (2)
117 and
118 .BR select (2)
119 wait for arriving data or a readiness to send data.
120 In addition, the standard I/O operations like
121 .BR write (2),
122 .BR writev (2),
123 .BR sendfile (2),
124 .BR read (2),
125 and
126 .BR readv (2)
127 can be used to read and write data.
128 .PP
129 .BR getsockname (2)
130 returns the local socket address and
131 .BR getpeername (2)
132 returns the remote socket address.
133 .BR getsockopt (2)
134 and
135 .BR setsockopt (2)
136 are used to set or get socket layer or protocol options.
137 .BR ioctl (2)
138 can be used to set or read some other options.
139 .PP
140 .BR close (2)
141 is used to close a socket.
142 .BR shutdown (2)
143 closes parts of a full-duplex socket connection.
144 .PP
145 Seeking, or calling
146 .BR pread (2)
147 or
148 .BR pwrite (2)
149 with a nonzero position is not supported on sockets.
150 .PP
151 It is possible to do nonblocking I/O on sockets by setting the
152 .B O_NONBLOCK
153 flag on a socket file descriptor using
154 .BR fcntl (2).
155 Then all operations that would block will (usually)
156 return with
157 .B EAGAIN
158 (operation should be retried later);
159 .BR connect (2)
160 will return
161 .B EINPROGRESS
162 error.
163 The user can then wait for various events via
164 .BR poll (2)
165 or
166 .BR select (2).
167 .TS
168 tab(:) allbox;
169 c s s
170 l l l.
171 I/O events
172 Event:Poll flag:Occurrence
173 Read:POLLIN:T{
174 New data arrived.
175 T}
176 Read:POLLIN:T{
177 A connection setup has been completed
178 (for connection-oriented sockets)
179 T}
180 Read:POLLHUP:T{
181 A disconnection request has been initiated by the other end.
182 T}
183 Read:POLLHUP:T{
184 A connection is broken (only for connection-oriented protocols).
185 When the socket is written
186 .B SIGPIPE
187 is also sent.
188 T}
189 Write:POLLOUT:T{
190 Socket has enough send buffer space for writing new data.
191 T}
192 Read/Write:T{
193 POLLIN |
194 .br
195 POLLOUT
196 T}:T{
197 An outgoing
198 .BR connect (2)
199 finished.
200 T}
201 Read/Write:POLLERR:An asynchronous error occurred.
202 Read/Write:POLLHUP:The other end has shut down one direction.
203 Exception:POLLPRI:T{
204 Urgent data arrived.
205 .B SIGURG
206 is sent then.
207 T}
208 .\" FIXME . The following is not true currently:
209 .\" It is no I/O event when the connection
210 .\" is broken from the local end using
211 .\" .BR shutdown (2)
212 .\" or
213 .\" .BR close (2).
214 .TE
215 .PP
216 An alternative to
217 .BR poll (2)
218 and
219 .BR select (2)
220 is to let the kernel inform the application about events
221 via a
222 .B SIGIO
223 signal.
224 For that the
225 .B O_ASYNC
226 flag must be set on a socket file descriptor via
227 .BR fcntl (2)
228 and a valid signal handler for
229 .B SIGIO
230 must be installed via
231 .BR sigaction (2).
232 See the
233 .I Signals
234 discussion below.
235 .SS Socket address structures
236 Each socket domain has its own format for socket addresses,
237 with a domain-specific address structure.
238 Each of these structures begins with an
239 integer "family" field (typed as
240 .IR sa_family_t )
241 that indicates the type of the address structure.
242 This allows
243 the various system calls (e.g.,
244 .BR connect (2),
245 .BR bind (2),
246 .BR accept (2),
247 .BR getsockname (2),
248 .BR getpeername (2)),
249 which are generic to all socket domains,
250 to determine the domain of a particular socket address.
251 .PP
252 To allow any type of socket address to be passed to
253 interfaces in the sockets API,
254 the type
255 .IR "struct sockaddr"
256 is defined.
257 The purpose of this type is purely to allow casting of
258 domain-specific socket address types to a "generic" type,
259 so as to avoid compiler warnings about type mismatches in
260 calls to the sockets API.
261 .PP
262 In addition, the sockets API provides the data type
263 .IR "struct sockaddr_storage".
264 This type
265 is suitable to accommodate all supported domain-specific socket
266 address structures; it is large enough and is aligned properly.
267 (In particular, it is large enough to hold
268 IPv6 socket addresses.)
269 The structure includes the following field, which can be used to identify
270 the type of socket address actually stored in the structure:
271 .PP
272 .in +4n
273 .EX
274 sa_family_t ss_family;
275 .EE
276 .in
277 .PP
278 The
279 .I sockaddr_storage
280 structure is useful in programs that must handle socket addresses
281 in a generic way
282 (e.g., programs that must deal with both IPv4 and IPv6 socket addresses).
283 .SS Socket options
284 The socket options listed below can be set by using
285 .BR setsockopt (2)
286 and read with
287 .BR getsockopt (2)
288 with the socket level set to
289 .B SOL_SOCKET
290 for all sockets.
291 Unless otherwise noted,
292 .I optval
293 is a pointer to an
294 .IR int .
295 .\" FIXME .
296 .\" In the list below, the text used to describe argument types
297 .\" for each socket option should be more consistent
298 .\"
299 .\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
300 .\" W R Stevens, UNPv1
301 .TP
302 .B SO_ACCEPTCONN
303 Returns a value indicating whether or not this socket has been marked
304 to accept connections with
305 .BR listen (2).
306 The value 0 indicates that this is not a listening socket,
307 the value 1 indicates that this is a listening socket.
308 This socket option is read-only.
309 .TP
310 .BR SO_ATTACH_FILTER " (since Linux 2.2), " SO_ATTACH_BPF " (since Linux 3.19)"
311 Attach a classic BPF
312 .RB ( SO_ATTACH_FILTER )
313 or an extended BPF
314 .RB ( SO_ATTACH_BPF )
315 program to the socket for use as a filter of incoming packets.
316 A packet will be dropped if the filter program returns zero.
317 If the filter program returns a
318 nonzero value which is less than the packet's data length,
319 the packet will be truncated to the length returned.
320 If the value returned by the filter is greater than or equal to the
321 packet's data length, the packet is allowed to proceed unmodified.
322 .IP
323 The argument for
324 .BR SO_ATTACH_FILTER
325 is a
326 .I sock_fprog
327 structure, defined in
328 .IR <linux/filter.h> :
329 .IP
330 .in +4n
331 .EX
332 struct sock_fprog {
333 unsigned short len;
334 struct sock_filter *filter;
335 };
336 .EE
337 .in
338 .IP
339 The argument for
340 .BR SO_ATTACH_BPF
341 is a file descriptor returned by the
342 .BR bpf (2)
343 system call and must refer to a program of type
344 .BR BPF_PROG_TYPE_SOCKET_FILTER .
345 .IP
346 These options may be set multiple times for a given socket,
347 each time replacing the previous filter program.
348 The classic and extended versions may be called on the same socket,
349 but the previous filter will always be replaced such that a socket
350 never has more than one filter defined.
351 .IP
352 Both classic and extended BPF are explained in the kernel source file
353 .I Documentation/networking/filter.txt
354 .TP
355 .BR SO_ATTACH_REUSEPORT_CBPF ", " SO_ATTACH_REUSEPORT_EBPF
356 For use with the
357 .BR SO_REUSEPORT
358 option, these options allow the user to set a classic BPF
359 .RB ( SO_ATTACH_REUSEPORT_CBPF )
360 or an extended BPF
361 .RB ( SO_ATTACH_REUSEPORT_EBPF )
362 program which defines how packets are assigned to
363 the sockets in the reuseport group (that is, all sockets which have
364 .BR SO_REUSEPORT
365 set and are using the same local address to receive packets).
366 .IP
367 The BPF program must return an index between 0 and N\-1 representing
368 the socket which should receive the packet
369 (where N is the number of sockets in the group).
370 If the BPF program returns an invalid index,
371 socket selection will fall back to the plain
372 .BR SO_REUSEPORT
373 mechanism.
374 .IP
375 Sockets are numbered in the order in which they are added to the group
376 (that is, the order of
377 .BR bind (2)
378 calls for UDP sockets or the order of
379 .BR listen (2)
380 calls for TCP sockets).
381 New sockets added to a reuseport group will inherit the BPF program.
382 When a socket is removed from a reuseport group (via
383 .BR close (2)),
384 the last socket in the group will be moved into the closed socket's
385 position.
386 .IP
387 These options may be set repeatedly at any time on any socket in the group
388 to replace the current BPF program used by all sockets in the group.
389 .IP
390 .BR SO_ATTACH_REUSEPORT_CBPF
391 takes the same argument type as
392 .BR SO_ATTACH_FILTER
393 and
394 .BR SO_ATTACH_REUSEPORT_EBPF
395 takes the same argument type as
396 .BR SO_ATTACH_BPF .
397 .IP
398 UDP support for this feature is available since Linux 4.5;
399 TCP support is available since Linux 4.6.
400 .TP
401 .B SO_BINDTODEVICE
402 Bind this socket to a particular device like \(lqeth0\(rq,
403 as specified in the passed interface name.
404 If the
405 name is an empty string or the option length is zero, the socket device
406 binding is removed.
407 The passed option is a variable-length null-terminated
408 interface name string with the maximum size of
409 .BR IFNAMSIZ .
410 If a socket is bound to an interface,
411 only packets received from that particular interface are processed by the
412 socket.
413 Note that this works only for some socket types, particularly
414 .B AF_INET
415 sockets.
416 It is not supported for packet sockets (use normal
417 .BR bind (2)
418 there).
419 .IP
420 Before Linux 3.8,
421 this socket option could be set, but could not retrieved with
422 .BR getsockopt (2).
423 Since Linux 3.8, it is readable.
424 The
425 .I optlen
426 argument should contain the buffer size available
427 to receive the device name and is recommended to be
428 .BR IFNAMSIZ
429 bytes.
430 The real device name length is reported back in the
431 .I optlen
432 argument.
433 .TP
434 .B SO_BROADCAST
435 Set or get the broadcast flag.
436 When enabled, datagram sockets are allowed to send
437 packets to a broadcast address.
438 This option has no effect on stream-oriented sockets.
439 .TP
440 .B SO_BSDCOMPAT
441 Enable BSD bug-to-bug compatibility.
442 This is used by the UDP protocol module in Linux 2.0 and 2.2.
443 If enabled, ICMP errors received for a UDP socket will not be passed
444 to the user program.
445 In later kernel versions, support for this option has been phased out:
446 Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
447 (printk()) if a program uses this option.
448 Linux 2.0 also enabled BSD bug-to-bug compatibility
449 options (random header changing, skipping of the broadcast flag) for raw
450 sockets with this option, but that was removed in Linux 2.2.
451 .TP
452 .B SO_DEBUG
453 Enable socket debugging.
454 Allowed only for processes with the
455 .B CAP_NET_ADMIN
456 capability or an effective user ID of 0.
457 .TP
458 .BR SO_DETACH_FILTER " (since Linux 2.2), " SO_DETACH_BPF " (since Linux 3.19)"
459 These two options, which are synonyms,
460 may be used to remove the classic or extended BPF
461 program attached to a socket with either
462 .BR SO_ATTACH_FILTER
463 or
464 .BR SO_ATTACH_BPF .
465 The option value is ignored.
466 .TP
467 .BR SO_DOMAIN " (since Linux 2.6.32)"
468 Retrieves the socket domain as an integer, returning a value such as
469 .BR AF_INET6 .
470 See
471 .BR socket (2)
472 for details.
473 This socket option is read-only.
474 .TP
475 .B SO_ERROR
476 Get and clear the pending socket error.
477 This socket option is read-only.
478 Expects an integer.
479 .TP
480 .B SO_DONTROUTE
481 Don't send via a gateway, send only to directly connected hosts.
482 The same effect can be achieved by setting the
483 .B MSG_DONTROUTE
484 flag on a socket
485 .BR send (2)
486 operation.
487 Expects an integer boolean flag.
488 .TP
489 .BR SO_INCOMING_CPU " (gettable since Linux 3.19, settable since Linux 4.4)"
490 .\" getsockopt 2c8c56e15df3d4c2af3d656e44feb18789f75837
491 .\" setsockopt 70da268b569d32a9fddeea85dc18043de9d89f89
492 Sets or gets the CPU affinity of a socket.
493 Expects an integer flag.
494 .IP
495 .in +4n
496 .EX
497 int cpu = 1;
498 setsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, sizeof(cpu));
499 .EE
500 .in
501 .IP
502 Because all of the packets for a single stream
503 (i.e., all packets for the same 4-tuple)
504 arrive on the single RX queue that is associated with a particular CPU,
505 the typical use case is to employ one listening process per RX queue,
506 with the incoming flow being handled by a listener
507 on the same CPU that is handling the RX queue.
508 This provides optimal NUMA behavior and keeps CPU caches hot.
509 .\"
510 .\" From an email conversation with Eric Dumazet:
511 .\" >> Note that setting the option is not supported if SO_REUSEPORT is used.
512 .\" >
513 .\" > Please define "not supported". Does this yield an API diagnostic?
514 .\" > If so, what is it?
515 .\" >
516 .\" >> Socket will be selected from an array, either by a hash or BPF program
517 .\" >> that has no access to this information.
518 .\" >
519 .\" > Sorry -- I'm lost here. How does this comment relate to the proposed
520 .\" > man page text above?
521 .\"
522 .\" Simply that :
523 .\"
524 .\" If an application uses both SO_INCOMING_CPU and SO_REUSEPORT, then
525 .\" SO_REUSEPORT logic, selecting the socket to receive the packet, ignores
526 .\" SO_INCOMING_CPU setting.
527 .TP
528 .B SO_KEEPALIVE
529 Enable sending of keep-alive messages on connection-oriented sockets.
530 Expects an integer boolean flag.
531 .TP
532 .B SO_LINGER
533 Sets or gets the
534 .B SO_LINGER
535 option.
536 The argument is a
537 .I linger
538 structure.
539 .IP
540 .in +4n
541 .EX
542 struct linger {
543 int l_onoff; /* linger active */
544 int l_linger; /* how many seconds to linger for */
545 };
546 .EE
547 .in
548 .IP
549 When enabled, a
550 .BR close (2)
551 or
552 .BR shutdown (2)
553 will not return until all queued messages for the socket have been
554 successfully sent or the linger timeout has been reached.
555 Otherwise,
556 the call returns immediately and the closing is done in the background.
557 When the socket is closed as part of
558 .BR exit (2),
559 it always lingers in the background.
560 .TP
561 .B SO_LOCK_FILTER
562 .\" commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
563 When set, this option will prevent
564 changing the filters associated with the socket.
565 These filters include any set using the socket options
566 .BR SO_ATTACH_FILTER ,
567 .BR SO_ATTACH_BPF ,
568 .BR SO_ATTACH_REUSEPORT_CBPF ,
569 and
570 .BR SO_ATTACH_REUSEPORT_EBPF .
571 .IP
572 The typical use case is for a privileged process to set up a raw socket
573 (an operation that requires the
574 .BR CAP_NET_RAW
575 capability), apply a restrictive filter, set the
576 .BR SO_LOCK_FILTER
577 option,
578 and then either drop its privileges or pass the socket file descriptor
579 to an unprivileged process via a UNIX domain socket.
580 .IP
581 Once the
582 .BR SO_LOCK_FILTER
583 option has been enabled, attempts to change or remove the filter
584 attached to a socket, or to disable the
585 .BR SO_LOCK_FILTER
586 option will fail with the error
587 .BR EPERM .
588 .TP
589 .BR SO_MARK " (since Linux 2.6.25)"
590 .\" commit 4a19ec5800fc3bb64e2d87c4d9fdd9e636086fe0
591 .\" and 914a9ab386a288d0f22252fc268ecbc048cdcbd5
592 Set the mark for each packet sent through this socket
593 (similar to the netfilter MARK target but socket-based).
594 Changing the mark can be used for mark-based
595 routing without netfilter or for packet filtering.
596 Setting this option requires the
597 .B CAP_NET_ADMIN
598 capability.
599 .TP
600 .B SO_OOBINLINE
601 If this option is enabled,
602 out-of-band data is directly placed into the receive data stream.
603 Otherwise, out-of-band data is passed only when the
604 .B MSG_OOB
605 flag is set during receiving.
606 .\" don't document it because it can do too much harm.
607 .\".B SO_NO_CHECK
608 .\" The kernel has support for the SO_NO_CHECK socket
609 .\" option (boolean: 0 == default, calculate checksum on xmit,
610 .\" 1 == do not calculate checksum on xmit).
611 .\" Additional note from Andi Kleen on SO_NO_CHECK (2010-08-30)
612 .\" On Linux UDP checksums are essentially free and there's no reason
613 .\" to turn them off and it would disable another safety line.
614 .\" That is why I didn't document the option.
615 .TP
616 .B SO_PASSCRED
617 Enable or disable the receiving of the
618 .B SCM_CREDENTIALS
619 control message.
620 For more information see
621 .BR unix (7).
622 .TP
623 .B SO_PASSSEC
624 Enable or disable the receiving of the
625 .B SCM_SECURITY
626 control message.
627 For more information see
628 .BR unix (7).
629 .TP
630 .BR SO_PEEK_OFF " (since Linux 3.4)"
631 .\" commit ef64a54f6e558155b4f149bb10666b9e914b6c54
632 This option, which is currently supported only for
633 .BR unix (7)
634 sockets, sets the value of the "peek offset" for the
635 .BR recv (2)
636 system call when used with
637 .BR MSG_PEEK
638 flag.
639 .IP
640 When this option is set to a negative value
641 (it is set to \-1 for all new sockets),
642 traditional behavior is provided:
643 .BR recv (2)
644 with the
645 .BR MSG_PEEK
646 flag will peek data from the front of the queue.
647 .IP
648 When the option is set to a value greater than or equal to zero,
649 then the next peek at data queued in the socket will occur at
650 the byte offset specified by the option value.
651 At the same time, the "peek offset" will be
652 incremented by the number of bytes that were peeked from the queue,
653 so that a subsequent peek will return the next data in the queue.
654 .IP
655 If data is removed from the front of the queue via a call to
656 .BR recv (2)
657 (or similar) without the
658 .BR MSG_PEEK
659 flag, the "peek offset" will be decreased by the number of bytes removed.
660 In other words, receiving data without the
661 .B MSG_PEEK
662 flag will cause the "peek offset" to be adjusted to maintain
663 the correct relative position in the queued data,
664 so that a subsequent peek will retrieve the data that would have been
665 retrieved had the data not been removed.
666 .IP
667 For datagram sockets, if the "peek offset" points to the middle of a packet,
668 the data returned will be marked with the
669 .BR MSG_TRUNC
670 flag.
671 .IP
672 The following example serves to illustrate the use of
673 .BR SO_PEEK_OFF .
674 Suppose a stream socket has the following queued input data:
675 .IP
676 aabbccddeeff
677 .IP
678 The following sequence of
679 .BR recv (2)
680 calls would have the effect noted in the comments:
681 .IP
682 .in +4n
683 .EX
684 int ov = 4; // Set peek offset to 4
685 setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
686
687 recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
688 recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
689 recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
690 recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
691 .EE
692 .in
693 .TP
694 .B SO_PEERCRED
695 Return the credentials of the peer process connected to this socket.
696 For further details, see
697 .BR unix (7).
698 .TP
699 .B SO_PRIORITY
700 Set the protocol-defined priority for all packets to be sent on
701 this socket.
702 Linux uses this value to order the networking queues:
703 packets with a higher priority may be processed first depending
704 on the selected device queueing discipline.
705 .\" For
706 .\" .BR ip (7),
707 .\" this also sets the IP type-of-service (TOS) field for outgoing packets.
708 Setting a priority outside the range 0 to 6 requires the
709 .B CAP_NET_ADMIN
710 capability.
711 .TP
712 .BR SO_PROTOCOL " (since Linux 2.6.32)"
713 Retrieves the socket protocol as an integer, returning a value such as
714 .BR IPPROTO_SCTP .
715 See
716 .BR socket (2)
717 for details.
718 This socket option is read-only.
719 .TP
720 .B SO_RCVBUF
721 Sets or gets the maximum socket receive buffer in bytes.
722 The kernel doubles this value (to allow space for bookkeeping overhead)
723 when it is set using
724 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
725 .BR setsockopt (2),
726 and this doubled value is returned by
727 .BR getsockopt (2).
728 .\" The following thread on LMKL is quite informative:
729 .\" getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behavior
730 .\" 17 July 2012
731 .\" http://thread.gmane.org/gmane.linux.kernel/1328935
732 The default value is set by the
733 .I /proc/sys/net/core/rmem_default
734 file, and the maximum allowed value is set by the
735 .I /proc/sys/net/core/rmem_max
736 file.
737 The minimum (doubled) value for this option is 256.
738 .TP
739 .BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
740 Using this socket option, a privileged
741 .RB ( CAP_NET_ADMIN )
742 process can perform the same task as
743 .BR SO_RCVBUF ,
744 but the
745 .I rmem_max
746 limit can be overridden.
747 .TP
748 .BR SO_RCVLOWAT " and " SO_SNDLOWAT
749 Specify the minimum number of bytes in the buffer until the socket layer
750 will pass the data to the protocol
751 .RB ( SO_SNDLOWAT )
752 or the user on receiving
753 .RB ( SO_RCVLOWAT ).
754 These two values are initialized to 1.
755 .B SO_SNDLOWAT
756 is not changeable on Linux
757 .RB ( setsockopt (2)
758 fails with the error
759 .BR ENOPROTOOPT ).
760 .B SO_RCVLOWAT
761 is changeable
762 only since Linux 2.4.
763 .IP
764 Before Linux 2.6.28
765 .\" commit c7004482e8dcb7c3c72666395cfa98a216a4fb70
766 .BR select (2),
767 .BR poll (2),
768 and
769 .BR epoll (7)
770 did not respect the
771 .B SO_RCVLOWAT
772 setting on Linux,
773 and indicated a socket as readable when even a single byte of data
774 was available.
775 A subsequent read from the socket would then block until
776 .B SO_RCVLOWAT
777 bytes are available.
778 .\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
779 .\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
780 .TP
781 .BR SO_RCVTIMEO " and " SO_SNDTIMEO
782 .\" Not implemented in 2.0.
783 .\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
784 .\" Implemented in 2.3.41 for setsockopt, and actually used.
785 Specify the receiving or sending timeouts until reporting an error.
786 The argument is a
787 .IR "struct timeval" .
788 If an input or output function blocks for this period of time, and
789 data has been sent or received, the return value of that function
790 will be the amount of data transferred; if no data has been transferred
791 and the timeout has been reached, then \-1 is returned with
792 .I errno
793 set to
794 .BR EAGAIN
795 or
796 .BR EWOULDBLOCK ,
797 .\" in fact to EAGAIN
798 or
799 .B EINPROGRESS
800 (for
801 .BR connect (2))
802 just as if the socket was specified to be nonblocking.
803 If the timeout is set to zero (the default),
804 then the operation will never timeout.
805 Timeouts only have effect for system calls that perform socket I/O (e.g.,
806 .BR read (2),
807 .BR recvmsg (2),
808 .BR send (2),
809 .BR sendmsg (2));
810 timeouts have no effect for
811 .BR select (2),
812 .BR poll (2),
813 .BR epoll_wait (2),
814 and so on.
815 .TP
816 .B SO_REUSEADDR
817 .\" commit c617f398edd4db2b8567a28e899a88f8f574798d
818 .\" https://lwn.net/Articles/542629/
819 Indicates that the rules used in validating addresses supplied in a
820 .BR bind (2)
821 call should allow reuse of local addresses.
822 For
823 .B AF_INET
824 sockets this
825 means that a socket may bind, except when there
826 is an active listening socket bound to the address.
827 When the listening socket is bound to
828 .B INADDR_ANY
829 with a specific port then it is not possible
830 to bind to this port for any local address.
831 Argument is an integer boolean flag.
832 .TP
833 .BR SO_REUSEPORT " (since Linux 3.9)"
834 Permits multiple
835 .B AF_INET
836 or
837 .B AF_INET6
838 sockets to be bound to an identical socket address.
839 This option must be set on each socket (including the first socket)
840 prior to calling
841 .BR bind (2)
842 on the socket.
843 To prevent port hijacking,
844 all of the processes binding to the same address must have the same
845 effective UID.
846 This option can be employed with both TCP and UDP sockets.
847 .IP
848 For TCP sockets, this option allows
849 .BR accept (2)
850 load distribution in a multi-threaded server to be improved by
851 using a distinct listener socket for each thread.
852 This provides improved load distribution as compared
853 to traditional techniques such using a single
854 .BR accept (2)ing
855 thread that distributes connections,
856 or having multiple threads that compete to
857 .BR accept (2)
858 from the same socket.
859 .IP
860 For UDP sockets,
861 the use of this option can provide better distribution
862 of incoming datagrams to multiple processes (or threads) as compared
863 to the traditional technique of having multiple processes
864 compete to receive datagrams on the same socket.
865 .TP
866 .BR SO_RXQ_OVFL " (since Linux 2.6.33)"
867 .\" commit 3b885787ea4112eaa80945999ea0901bf742707f
868 Indicates that an unsigned 32-bit value ancillary message (cmsg)
869 should be attached to received skbs indicating
870 the number of packets dropped by the socket since its creation.
871 .TP
872 .BR SO_SELECT_ERR_QUEUE " (since Linux 3.10)"
873 .\" commit 7d4c04fc170087119727119074e72445f2bb192b
874 .\" Author: Keller, Jacob E <jacob.e.keller@intel.com>
875 Makes poll adding
876 .B POLLPRI
877 when
878 .B POLLERR
879 event is returned. It does not affect wake up.
880 .IP
881 Background: The flag was added when waking up on
882 .B POLLERR
883 required requesting
884 .B POLLIN
885 or
886 .B POLLPRI.
887 After the commit 6e5d58fdc9bedd0255a8 ("skbuff: Fix not
888 waking applications when errors are enqueued"), introduced
889 in Linux 4.16, waking up on
890 .B POLLERR
891 does not require requesting other events. The flag is kept
892 only for backwards compatibility.
893 .TP
894 .B SO_SNDBUF
895 Sets or gets the maximum socket send buffer in bytes.
896 The kernel doubles this value (to allow space for bookkeeping overhead)
897 when it is set using
898 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
899 .\" See also the comment to SO_RCVBUF (17 Jul 2012 LKML mail)
900 .BR setsockopt (2),
901 and this doubled value is returned by
902 .BR getsockopt (2).
903 The default value is set by the
904 .I /proc/sys/net/core/wmem_default
905 file and the maximum allowed value is set by the
906 .I /proc/sys/net/core/wmem_max
907 file.
908 The minimum (doubled) value for this option is 2048.
909 .TP
910 .BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
911 Using this socket option, a privileged
912 .RB ( CAP_NET_ADMIN )
913 process can perform the same task as
914 .BR SO_SNDBUF ,
915 but the
916 .I wmem_max
917 limit can be overridden.
918 .TP
919 .B SO_TIMESTAMP
920 Enable or disable the receiving of the
921 .B SO_TIMESTAMP
922 control message.
923 The timestamp control message is sent with level
924 .B SOL_SOCKET
925 and the
926 .I cmsg_data
927 field is a
928 .I "struct timeval"
929 indicating the
930 reception time of the last packet passed to the user in this call.
931 See
932 .BR cmsg (3)
933 for details on control messages.
934 .TP
935 .B SO_TYPE
936 Gets the socket type as an integer (e.g.,
937 .BR SOCK_STREAM ).
938 This socket option is read-only.
939 .TP
940 .BR SO_BUSY_POLL " (since Linux 3.11)"
941 Sets the approximate time in microseconds to busy poll on a blocking receive
942 when there is no data.
943 Increasing this value requires
944 .BR CAP_NET_ADMIN .
945 The default for this option is controlled by the
946 .I /proc/sys/net/core/busy_read
947 file.
948 .IP
949 The value in the
950 .I /proc/sys/net/core/busy_poll
951 file determines how long
952 .BR select (2)
953 and
954 .BR poll (2)
955 will busy poll when they operate on sockets with
956 .BR SO_BUSY_POLL
957 set and no events to report are found.
958 .IP
959 In both cases,
960 busy polling will only be done when the socket last received data
961 from a network device that supports this option.
962 .IP
963 While busy polling may improve latency of some applications,
964 care must be taken when using it since this will increase
965 both CPU utilization and power usage.
966 .SS Signals
967 When writing onto a connection-oriented socket that has been shut down
968 (by the local or the remote end)
969 .B SIGPIPE
970 is sent to the writing process and
971 .B EPIPE
972 is returned.
973 The signal is not sent when the write call
974 specified the
975 .B MSG_NOSIGNAL
976 flag.
977 .PP
978 When requested with the
979 .B FIOSETOWN
980 .BR fcntl (2)
981 or
982 .B SIOCSPGRP
983 .BR ioctl (2),
984 .B SIGIO
985 is sent when an I/O event occurs.
986 It is possible to use
987 .BR poll (2)
988 or
989 .BR select (2)
990 in the signal handler to find out which socket the event occurred on.
991 An alternative (in Linux 2.2) is to set a real-time signal using the
992 .B F_SETSIG
993 .BR fcntl (2);
994 the handler of the real time signal will be called with
995 the file descriptor in the
996 .I si_fd
997 field of its
998 .IR siginfo_t .
999 See
1000 .BR fcntl (2)
1001 for more information.
1002 .PP
1003 Under some circumstances (e.g., multiple processes accessing a
1004 single socket), the condition that caused the
1005 .B SIGIO
1006 may have already disappeared when the process reacts to the signal.
1007 If this happens, the process should wait again because Linux
1008 will resend the signal later.
1009 .\" .SS Ancillary messages
1010 .SS /proc interfaces
1011 The core socket networking parameters can be accessed
1012 via files in the directory
1013 .IR /proc/sys/net/core/ .
1014 .TP
1015 .I rmem_default
1016 contains the default setting in bytes of the socket receive buffer.
1017 .TP
1018 .I rmem_max
1019 contains the maximum socket receive buffer size in bytes which a user may
1020 set by using the
1021 .B SO_RCVBUF
1022 socket option.
1023 .TP
1024 .I wmem_default
1025 contains the default setting in bytes of the socket send buffer.
1026 .TP
1027 .I wmem_max
1028 contains the maximum socket send buffer size in bytes which a user may
1029 set by using the
1030 .B SO_SNDBUF
1031 socket option.
1032 .TP
1033 .IR message_cost " and " message_burst
1034 configure the token bucket filter used to load limit warning messages
1035 caused by external network events.
1036 .TP
1037 .I netdev_max_backlog
1038 Maximum number of packets in the global input queue.
1039 .TP
1040 .I optmem_max
1041 Maximum length of ancillary data and user control data like the iovecs
1042 per socket.
1043 .\" netdev_fastroute is not documented because it is experimental
1044 .SS Ioctls
1045 These operations can be accessed using
1046 .BR ioctl (2):
1047 .PP
1048 .in +4n
1049 .EX
1050 .IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
1051 .EE
1052 .in
1053 .TP
1054 .B SIOCGSTAMP
1055 Return a
1056 .I struct timeval
1057 with the receive timestamp of the last packet passed to the user.
1058 This is useful for accurate round trip time measurements.
1059 See
1060 .BR setitimer (2)
1061 for a description of
1062 .IR "struct timeval" .
1063 .\"
1064 This ioctl should be used only if the socket option
1065 .B SO_TIMESTAMP
1066 is not set on the socket.
1067 Otherwise, it returns the timestamp of the
1068 last packet that was received while
1069 .B SO_TIMESTAMP
1070 was not set, or it fails if no such packet has been received,
1071 (i.e.,
1072 .BR ioctl (2)
1073 returns \-1 with
1074 .I errno
1075 set to
1076 .BR ENOENT ).
1077 .TP
1078 .B SIOCSPGRP
1079 Set the process or process group that is to receive
1080 .B SIGIO
1081 or
1082 .B SIGURG
1083 signals when I/O becomes possible or urgent data is available.
1084 The argument is a pointer to a
1085 .IR pid_t .
1086 For further details, see the description of
1087 .BR F_SETOWN
1088 in
1089 .BR fcntl (2).
1090 .TP
1091 .B FIOASYNC
1092 Change the
1093 .B O_ASYNC
1094 flag to enable or disable asynchronous I/O mode of the socket.
1095 Asynchronous I/O mode means that the
1096 .B SIGIO
1097 signal or the signal set with
1098 .B F_SETSIG
1099 is raised when a new I/O event occurs.
1100 .IP
1101 Argument is an integer boolean flag.
1102 (This operation is synonymous with the use of
1103 .BR fcntl (2)
1104 to set the
1105 .B O_ASYNC
1106 flag.)
1107 .\"
1108 .TP
1109 .B SIOCGPGRP
1110 Get the current process or process group that receives
1111 .B SIGIO
1112 or
1113 .B SIGURG
1114 signals,
1115 or 0
1116 when none is set.
1117 .PP
1118 Valid
1119 .BR fcntl (2)
1120 operations:
1121 .TP
1122 .B FIOGETOWN
1123 The same as the
1124 .B SIOCGPGRP
1125 .BR ioctl (2).
1126 .TP
1127 .B FIOSETOWN
1128 The same as the
1129 .B SIOCSPGRP
1130 .BR ioctl (2).
1131 .SH VERSIONS
1132 .B SO_BINDTODEVICE
1133 was introduced in Linux 2.0.30.
1134 .B SO_PASSCRED
1135 is new in Linux 2.2.
1136 The
1137 .I /proc
1138 interfaces were introduced in Linux 2.2.
1139 .B SO_RCVTIMEO
1140 and
1141 .B SO_SNDTIMEO
1142 are supported since Linux 2.3.41.
1143 Earlier, timeouts were fixed to
1144 a protocol-specific setting, and could not be read or written.
1145 .SH NOTES
1146 Linux assumes that half of the send/receive buffer is used for internal
1147 kernel structures; thus the values in the corresponding
1148 .I /proc
1149 files are twice what can be observed on the wire.
1150 .PP
1151 Linux will allow port reuse only with the
1152 .B SO_REUSEADDR
1153 option
1154 when this option was set both in the previous program that performed a
1155 .BR bind (2)
1156 to the port and in the program that wants to reuse the port.
1157 This differs from some implementations (e.g., FreeBSD)
1158 where only the later program needs to set the
1159 .B SO_REUSEADDR
1160 option.
1161 Typically this difference is invisible, since, for example, a server
1162 program is designed to always set this option.
1163 .\" .SH AUTHORS
1164 .\" This man page was written by Andi Kleen.
1165 .SH SEE ALSO
1166 .BR wireshark (1),
1167 .BR bpf (2),
1168 .BR connect (2),
1169 .BR getsockopt (2),
1170 .BR setsockopt (2),
1171 .BR socket (2),
1172 .BR pcap (3),
1173 .BR address_families (7),
1174 .BR capabilities (7),
1175 .BR ddp (7),
1176 .BR ip (7),
1177 .BR packet (7),
1178 .BR tcp (7),
1179 .BR udp (7),
1180 .BR unix (7),
1181 .BR tcpdump (8)