]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/socket.7
socket.7: tfix
[thirdparty/man-pages.git] / man7 / socket.7
1 '\" t
2 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3 .\" and copyright (c) 1999 Matthew Wilcox.
4 .\"
5 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
6 .\" Permission is granted to distribute possibly modified copies
7 .\" of this page provided the header is included verbatim,
8 .\" and in case of nontrivial modification author and date
9 .\" of the modification is added to the header.
10 .\" %%%LICENSE_END
11 .\"
12 .\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
13 .\" Added description of SO_ACCEPTCONN
14 .\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
15 .\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
16 .\" Added notes on capability requirements
17 .\" A few small grammar fixes
18 .\" 2010-06-13 Jan Engelhardt <jengelh@medozas.de>
19 .\" Documented SO_DOMAIN and SO_PROTOCOL.
20 .\"
21 .\" FIXME
22 .\" The following are not yet documented:
23 .\"
24 .\" SO_PEERNAME (2.4?)
25 .\" get only
26 .\" Seems to do something similar to getpeername(), but then
27 .\" why is it necessary / how does it differ?
28 .\"
29 .\" SO_TIMESTAMPNS (2.6.22)
30 .\" Documentation/networking/timestamping.txt
31 .\" commit 92f37fd2ee805aa77925c1e64fd56088b46094fc
32 .\" Author: Eric Dumazet <dada1@cosmosbay.com>
33 .\"
34 .\" SO_TIMESTAMPING (2.6.30)
35 .\" Documentation/networking/timestamping.txt
36 .\" commit cb9eff097831007afb30d64373f29d99825d0068
37 .\" Author: Patrick Ohly <patrick.ohly@intel.com>
38 .\"
39 .\" SO_WIFI_STATUS (3.3)
40 .\" commit 6e3e939f3b1bf8534b32ad09ff199d88800835a0
41 .\" Author: Johannes Berg <johannes.berg@intel.com>
42 .\" Also: SCM_WIFI_STATUS
43 .\"
44 .\" SO_NOFCS (3.4)
45 .\" commit 3bdc0eba0b8b47797f4a76e377dd8360f317450f
46 .\" Author: Ben Greear <greearb@candelatech.com>
47 .\"
48 .\" SO_GET_FILTER (3.8)
49 .\" commit a8fc92778080c845eaadc369a0ecf5699a03bef0
50 .\" Author: Pavel Emelyanov <xemul@parallels.com>
51 .\"
52 .\" SO_SELECT_ERR_QUEUE (3.10)
53 .\" commit 7d4c04fc170087119727119074e72445f2bb192b
54 .\" Author: Keller, Jacob E <jacob.e.keller@intel.com>
55 .\"
56 .\" SO_MAX_PACING_RATE (3.13)
57 .\" commit 62748f32d501f5d3712a7c372bbb92abc7c62bc7
58 .\" Author: Eric Dumazet <edumazet@google.com>
59 .\"
60 .\" SO_BPF_EXTENSIONS (3.14)
61 .\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
62 .\" Author: Michal Sekletar <msekleta@redhat.com>
63 .\"
64 .TH SOCKET 7 2019-03-06 Linux "Linux Programmer's Manual"
65 .SH NAME
66 socket \- Linux socket interface
67 .SH SYNOPSIS
68 .B #include <sys/socket.h>
69 .PP
70 .IB sockfd " = socket(int " socket_family ", int " socket_type ", int " protocol );
71 .SH DESCRIPTION
72 This manual page describes the Linux networking socket layer user
73 interface.
74 The BSD compatible sockets
75 are the uniform interface
76 between the user process and the network protocol stacks in the kernel.
77 The protocol modules are grouped into
78 .I protocol families
79 such as
80 .BR AF_INET ", " AF_IPX ", and " AF_PACKET ,
81 and
82 .I socket types
83 such as
84 .B SOCK_STREAM
85 or
86 .BR SOCK_DGRAM .
87 See
88 .BR socket (2)
89 for more information on families and types.
90 .SS Socket-layer functions
91 These functions are used by the user process to send or receive packets
92 and to do other socket operations.
93 For more information see their respective manual pages.
94 .PP
95 .BR socket (2)
96 creates a socket,
97 .BR connect (2)
98 connects a socket to a remote socket address,
99 the
100 .BR bind (2)
101 function binds a socket to a local socket address,
102 .BR listen (2)
103 tells the socket that new connections shall be accepted, and
104 .BR accept (2)
105 is used to get a new socket with a new incoming connection.
106 .BR socketpair (2)
107 returns two connected anonymous sockets (implemented only for a few
108 local families like
109 .BR AF_UNIX )
110 .PP
111 .BR send (2),
112 .BR sendto (2),
113 and
114 .BR sendmsg (2)
115 send data over a socket, and
116 .BR recv (2),
117 .BR recvfrom (2),
118 .BR recvmsg (2)
119 receive data from a socket.
120 .BR poll (2)
121 and
122 .BR select (2)
123 wait for arriving data or a readiness to send data.
124 In addition, the standard I/O operations like
125 .BR write (2),
126 .BR writev (2),
127 .BR sendfile (2),
128 .BR read (2),
129 and
130 .BR readv (2)
131 can be used to read and write data.
132 .PP
133 .BR getsockname (2)
134 returns the local socket address and
135 .BR getpeername (2)
136 returns the remote socket address.
137 .BR getsockopt (2)
138 and
139 .BR setsockopt (2)
140 are used to set or get socket layer or protocol options.
141 .BR ioctl (2)
142 can be used to set or read some other options.
143 .PP
144 .BR close (2)
145 is used to close a socket.
146 .BR shutdown (2)
147 closes parts of a full-duplex socket connection.
148 .PP
149 Seeking, or calling
150 .BR pread (2)
151 or
152 .BR pwrite (2)
153 with a nonzero position is not supported on sockets.
154 .PP
155 It is possible to do nonblocking I/O on sockets by setting the
156 .B O_NONBLOCK
157 flag on a socket file descriptor using
158 .BR fcntl (2).
159 Then all operations that would block will (usually)
160 return with
161 .B EAGAIN
162 (operation should be retried later);
163 .BR connect (2)
164 will return
165 .B EINPROGRESS
166 error.
167 The user can then wait for various events via
168 .BR poll (2)
169 or
170 .BR select (2).
171 .TS
172 tab(:) allbox;
173 c s s
174 l l l.
175 I/O events
176 Event:Poll flag:Occurrence
177 Read:POLLIN:T{
178 New data arrived.
179 T}
180 Read:POLLIN:T{
181 A connection setup has been completed
182 (for connection-oriented sockets)
183 T}
184 Read:POLLHUP:T{
185 A disconnection request has been initiated by the other end.
186 T}
187 Read:POLLHUP:T{
188 A connection is broken (only for connection-oriented protocols).
189 When the socket is written
190 .B SIGPIPE
191 is also sent.
192 T}
193 Write:POLLOUT:T{
194 Socket has enough send buffer space for writing new data.
195 T}
196 Read/Write:T{
197 POLLIN |
198 .br
199 POLLOUT
200 T}:T{
201 An outgoing
202 .BR connect (2)
203 finished.
204 T}
205 Read/Write:POLLERR:An asynchronous error occurred.
206 Read/Write:POLLHUP:The other end has shut down one direction.
207 Exception:POLLPRI:T{
208 Urgent data arrived.
209 .B SIGURG
210 is sent then.
211 T}
212 .\" FIXME . The following is not true currently:
213 .\" It is no I/O event when the connection
214 .\" is broken from the local end using
215 .\" .BR shutdown (2)
216 .\" or
217 .\" .BR close (2).
218 .TE
219 .PP
220 An alternative to
221 .BR poll (2)
222 and
223 .BR select (2)
224 is to let the kernel inform the application about events
225 via a
226 .B SIGIO
227 signal.
228 For that the
229 .B O_ASYNC
230 flag must be set on a socket file descriptor via
231 .BR fcntl (2)
232 and a valid signal handler for
233 .B SIGIO
234 must be installed via
235 .BR sigaction (2).
236 See the
237 .I Signals
238 discussion below.
239 .SS Socket address structures
240 Each socket domain has its own format for socket addresses,
241 with a domain-specific address structure.
242 Each of these structures begins with an
243 integer "family" field (typed as
244 .IR sa_family_t )
245 that indicates the type of the address structure.
246 This allows
247 the various system calls (e.g.,
248 .BR connect (2),
249 .BR bind (2),
250 .BR accept (2),
251 .BR getsockname (2),
252 .BR getpeername (2)),
253 which are generic to all socket domains,
254 to determine the domain of a particular socket address.
255 .PP
256 To allow any type of socket address to be passed to
257 interfaces in the sockets API,
258 the type
259 .IR "struct sockaddr"
260 is defined.
261 The purpose of this type is purely to allow casting of
262 domain-specific socket address types to a "generic" type,
263 so as to avoid compiler warnings about type mismatches in
264 calls to the sockets API.
265 .PP
266 In addition, the sockets API provides the data type
267 .IR "struct sockaddr_storage".
268 This type
269 is suitable to accommodate all supported domain-specific socket
270 address structures; it is large enough and is aligned properly.
271 (In particular, it is large enough to hold
272 IPv6 socket addresses.)
273 The structure includes the following field, which can be used to identify
274 the type of socket address actually stored in the structure:
275 .PP
276 .in +4n
277 .EX
278 sa_family_t ss_family;
279 .EE
280 .in
281 .PP
282 The
283 .I sockaddr_storage
284 structure is useful in programs that must handle socket addresses
285 in a generic way
286 (e.g., programs that must deal with both IPv4 and IPv6 socket addresses).
287 .SS Socket options
288 The socket options listed below can be set by using
289 .BR setsockopt (2)
290 and read with
291 .BR getsockopt (2)
292 with the socket level set to
293 .B SOL_SOCKET
294 for all sockets.
295 Unless otherwise noted,
296 .I optval
297 is a pointer to an
298 .IR int .
299 .\" FIXME .
300 .\" In the list below, the text used to describe argument types
301 .\" for each socket option should be more consistent
302 .\"
303 .\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
304 .\" W R Stevens, UNPv1
305 .TP
306 .B SO_ACCEPTCONN
307 Returns a value indicating whether or not this socket has been marked
308 to accept connections with
309 .BR listen (2).
310 The value 0 indicates that this is not a listening socket,
311 the value 1 indicates that this is a listening socket.
312 This socket option is read-only.
313 .TP
314 .BR SO_ATTACH_FILTER " (since Linux 2.2), " SO_ATTACH_BPF " (since Linux 3.19)"
315 Attach a classic BPF
316 .RB ( SO_ATTACH_FILTER )
317 or an extended BPF
318 .RB ( SO_ATTACH_BPF )
319 program to the socket for use as a filter of incoming packets.
320 A packet will be dropped if the filter program returns zero.
321 If the filter program returns a
322 nonzero value which is less than the packet's data length,
323 the packet will be truncated to the length returned.
324 If the value returned by the filter is greater than or equal to the
325 packet's data length, the packet is allowed to proceed unmodified.
326 .IP
327 The argument for
328 .BR SO_ATTACH_FILTER
329 is a
330 .I sock_fprog
331 structure, defined in
332 .IR <linux/filter.h> :
333 .IP
334 .in +4n
335 .EX
336 struct sock_fprog {
337 unsigned short len;
338 struct sock_filter *filter;
339 };
340 .EE
341 .in
342 .IP
343 The argument for
344 .BR SO_ATTACH_BPF
345 is a file descriptor returned by the
346 .BR bpf (2)
347 system call and must refer to a program of type
348 .BR BPF_PROG_TYPE_SOCKET_FILTER .
349 .IP
350 These options may be set multiple times for a given socket,
351 each time replacing the previous filter program.
352 The classic and extended versions may be called on the same socket,
353 but the previous filter will always be replaced such that a socket
354 never has more than one filter defined.
355 .IP
356 Both classic and extended BPF are explained in the kernel source file
357 .I Documentation/networking/filter.txt
358 .TP
359 .BR SO_ATTACH_REUSEPORT_CBPF ", " SO_ATTACH_REUSEPORT_EBPF
360 For use with the
361 .BR SO_REUSEPORT
362 option, these options allow the user to set a classic BPF
363 .RB ( SO_ATTACH_REUSEPORT_CBPF )
364 or an extended BPF
365 .RB ( SO_ATTACH_REUSEPORT_EBPF )
366 program which defines how packets are assigned to
367 the sockets in the reuseport group (that is, all sockets which have
368 .BR SO_REUSEPORT
369 set and are using the same local address to receive packets).
370 .IP
371 The BPF program must return an index between 0 and N\-1 representing
372 the socket which should receive the packet
373 (where N is the number of sockets in the group).
374 If the BPF program returns an invalid index,
375 socket selection will fall back to the plain
376 .BR SO_REUSEPORT
377 mechanism.
378 .IP
379 Sockets are numbered in the order in which they are added to the group
380 (that is, the order of
381 .BR bind (2)
382 calls for UDP sockets or the order of
383 .BR listen (2)
384 calls for TCP sockets).
385 New sockets added to a reuseport group will inherit the BPF program.
386 When a socket is removed from a reuseport group (via
387 .BR close (2)),
388 the last socket in the group will be moved into the closed socket's
389 position.
390 .IP
391 These options may be set repeatedly at any time on any socket in the group
392 to replace the current BPF program used by all sockets in the group.
393 .IP
394 .BR SO_ATTACH_REUSEPORT_CBPF
395 takes the same argument type as
396 .BR SO_ATTACH_FILTER
397 and
398 .BR SO_ATTACH_REUSEPORT_EBPF
399 takes the same argument type as
400 .BR SO_ATTACH_BPF .
401 .IP
402 UDP support for this feature is available since Linux 4.5;
403 TCP support is available since Linux 4.6.
404 .TP
405 .B SO_BINDTODEVICE
406 Bind this socket to a particular device like \(lqeth0\(rq,
407 as specified in the passed interface name.
408 If the
409 name is an empty string or the option length is zero, the socket device
410 binding is removed.
411 The passed option is a variable-length null-terminated
412 interface name string with the maximum size of
413 .BR IFNAMSIZ .
414 If a socket is bound to an interface,
415 only packets received from that particular interface are processed by the
416 socket.
417 Note that this works only for some socket types, particularly
418 .B AF_INET
419 sockets.
420 It is not supported for packet sockets (use normal
421 .BR bind (2)
422 there).
423 .IP
424 Before Linux 3.8,
425 this socket option could be set, but could not retrieved with
426 .BR getsockopt (2).
427 Since Linux 3.8, it is readable.
428 The
429 .I optlen
430 argument should contain the buffer size available
431 to receive the device name and is recommended to be
432 .BR IFNAMSIZ
433 bytes.
434 The real device name length is reported back in the
435 .I optlen
436 argument.
437 .TP
438 .B SO_BROADCAST
439 Set or get the broadcast flag.
440 When enabled, datagram sockets are allowed to send
441 packets to a broadcast address.
442 This option has no effect on stream-oriented sockets.
443 .TP
444 .B SO_BSDCOMPAT
445 Enable BSD bug-to-bug compatibility.
446 This is used by the UDP protocol module in Linux 2.0 and 2.2.
447 If enabled, ICMP errors received for a UDP socket will not be passed
448 to the user program.
449 In later kernel versions, support for this option has been phased out:
450 Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
451 (printk()) if a program uses this option.
452 Linux 2.0 also enabled BSD bug-to-bug compatibility
453 options (random header changing, skipping of the broadcast flag) for raw
454 sockets with this option, but that was removed in Linux 2.2.
455 .TP
456 .B SO_DEBUG
457 Enable socket debugging.
458 Allowed only for processes with the
459 .B CAP_NET_ADMIN
460 capability or an effective user ID of 0.
461 .TP
462 .BR SO_DETACH_FILTER " (since Linux 2.2), " SO_DETACH_BPF " (since Linux 3.19)"
463 These two options, which are synonyms,
464 may be used to remove the classic or extended BPF
465 program attached to a socket with either
466 .BR SO_ATTACH_FILTER
467 or
468 .BR SO_ATTACH_BPF .
469 The option value is ignored.
470 .TP
471 .BR SO_DOMAIN " (since Linux 2.6.32)"
472 Retrieves the socket domain as an integer, returning a value such as
473 .BR AF_INET6 .
474 See
475 .BR socket (2)
476 for details.
477 This socket option is read-only.
478 .TP
479 .B SO_ERROR
480 Get and clear the pending socket error.
481 This socket option is read-only.
482 Expects an integer.
483 .TP
484 .B SO_DONTROUTE
485 Don't send via a gateway, send only to directly connected hosts.
486 The same effect can be achieved by setting the
487 .B MSG_DONTROUTE
488 flag on a socket
489 .BR send (2)
490 operation.
491 Expects an integer boolean flag.
492 .TP
493 .BR SO_INCOMING_CPU " (gettable since Linux 3.19, settable since Linux 4.4)"
494 .\" getsockopt 2c8c56e15df3d4c2af3d656e44feb18789f75837
495 .\" setsockopt 70da268b569d32a9fddeea85dc18043de9d89f89
496 Sets or gets the CPU affinity of a socket.
497 Expects an integer flag.
498 .IP
499 .in +4n
500 .EX
501 int cpu = 1;
502 setsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, sizeof(cpu));
503 .EE
504 .in
505 .IP
506 Because all of the packets for a single stream
507 (i.e., all packets for the same 4-tuple)
508 arrive on the single RX queue that is associated with a particular CPU,
509 the typical use case is to employ one listening process per RX queue,
510 with the incoming flow being handled by a listener
511 on the same CPU that is handling the RX queue.
512 This provides optimal NUMA behavior and keeps CPU caches hot.
513 .\"
514 .\" From an email conversation with Eric Dumazet:
515 .\" >> Note that setting the option is not supported if SO_REUSEPORT is used.
516 .\" >
517 .\" > Please define "not supported". Does this yield an API diagnostic?
518 .\" > If so, what is it?
519 .\" >
520 .\" >> Socket will be selected from an array, either by a hash or BPF program
521 .\" >> that has no access to this information.
522 .\" >
523 .\" > Sorry -- I'm lost here. How does this comment relate to the proposed
524 .\" > man page text above?
525 .\"
526 .\" Simply that :
527 .\"
528 .\" If an application uses both SO_INCOMING_CPU and SO_REUSEPORT, then
529 .\" SO_REUSEPORT logic, selecting the socket to receive the packet, ignores
530 .\" SO_INCOMING_CPU setting.
531 .TP
532 .B SO_KEEPALIVE
533 Enable sending of keep-alive messages on connection-oriented sockets.
534 Expects an integer boolean flag.
535 .TP
536 .B SO_LINGER
537 Sets or gets the
538 .B SO_LINGER
539 option.
540 The argument is a
541 .I linger
542 structure.
543 .IP
544 .in +4n
545 .EX
546 struct linger {
547 int l_onoff; /* linger active */
548 int l_linger; /* how many seconds to linger for */
549 };
550 .EE
551 .in
552 .IP
553 When enabled, a
554 .BR close (2)
555 or
556 .BR shutdown (2)
557 will not return until all queued messages for the socket have been
558 successfully sent or the linger timeout has been reached.
559 Otherwise,
560 the call returns immediately and the closing is done in the background.
561 When the socket is closed as part of
562 .BR exit (2),
563 it always lingers in the background.
564 .TP
565 .B SO_LOCK_FILTER
566 .\" commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
567 When set, this option will prevent
568 changing the filters associated with the socket.
569 These filters include any set using the socket options
570 .BR SO_ATTACH_FILTER ,
571 .BR SO_ATTACH_BPF ,
572 .BR SO_ATTACH_REUSEPORT_CBPF ,
573 and
574 .BR SO_ATTACH_REUSEPORT_EBPF .
575 .IP
576 The typical use case is for a privileged process to set up a raw socket
577 (an operation that requires the
578 .BR CAP_NET_RAW
579 capability), apply a restrictive filter, set the
580 .BR SO_LOCK_FILTER
581 option,
582 and then either drop its privileges or pass the socket file descriptor
583 to an unprivileged process via a UNIX domain socket.
584 .IP
585 Once the
586 .BR SO_LOCK_FILTER
587 option has been enabled, attempts to change or remove the filter
588 attached to a socket, or to disable the
589 .BR SO_LOCK_FILTER
590 option will fail with the error
591 .BR EPERM .
592 .TP
593 .BR SO_MARK " (since Linux 2.6.25)"
594 .\" commit 4a19ec5800fc3bb64e2d87c4d9fdd9e636086fe0
595 .\" and 914a9ab386a288d0f22252fc268ecbc048cdcbd5
596 Set the mark for each packet sent through this socket
597 (similar to the netfilter MARK target but socket-based).
598 Changing the mark can be used for mark-based
599 routing without netfilter or for packet filtering.
600 Setting this option requires the
601 .B CAP_NET_ADMIN
602 capability.
603 .TP
604 .B SO_OOBINLINE
605 If this option is enabled,
606 out-of-band data is directly placed into the receive data stream.
607 Otherwise, out-of-band data is passed only when the
608 .B MSG_OOB
609 flag is set during receiving.
610 .\" don't document it because it can do too much harm.
611 .\".B SO_NO_CHECK
612 .\" The kernel has support for the SO_NO_CHECK socket
613 .\" option (boolean: 0 == default, calculate checksum on xmit,
614 .\" 1 == do not calculate checksum on xmit).
615 .\" Additional note from Andi Kleen on SO_NO_CHECK (2010-08-30)
616 .\" On Linux UDP checksums are essentially free and there's no reason
617 .\" to turn them off and it would disable another safety line.
618 .\" That is why I didn't document the option.
619 .TP
620 .B SO_PASSCRED
621 Enable or disable the receiving of the
622 .B SCM_CREDENTIALS
623 control message.
624 For more information see
625 .BR unix (7).
626 .TP
627 .B SO_PASSSEC
628 Enable or disable the receiving of the
629 .B SCM_SECURITY
630 control message.
631 For more information see
632 .BR unix (7).
633 .TP
634 .BR SO_PEEK_OFF " (since Linux 3.4)"
635 .\" commit ef64a54f6e558155b4f149bb10666b9e914b6c54
636 This option, which is currently supported only for
637 .BR unix (7)
638 sockets, sets the value of the "peek offset" for the
639 .BR recv (2)
640 system call when used with
641 .BR MSG_PEEK
642 flag.
643 .IP
644 When this option is set to a negative value
645 (it is set to \-1 for all new sockets),
646 traditional behavior is provided:
647 .BR recv (2)
648 with the
649 .BR MSG_PEEK
650 flag will peek data from the front of the queue.
651 .IP
652 When the option is set to a value greater than or equal to zero,
653 then the next peek at data queued in the socket will occur at
654 the byte offset specified by the option value.
655 At the same time, the "peek offset" will be
656 incremented by the number of bytes that were peeked from the queue,
657 so that a subsequent peek will return the next data in the queue.
658 .IP
659 If data is removed from the front of the queue via a call to
660 .BR recv (2)
661 (or similar) without the
662 .BR MSG_PEEK
663 flag, the "peek offset" will be decreased by the number of bytes removed.
664 In other words, receiving data without the
665 .B MSG_PEEK
666 flag will cause the "peek offset" to be adjusted to maintain
667 the correct relative position in the queued data,
668 so that a subsequent peek will retrieve the data that would have been
669 retrieved had the data not been removed.
670 .IP
671 For datagram sockets, if the "peek offset" points to the middle of a packet,
672 the data returned will be marked with the
673 .BR MSG_TRUNC
674 flag.
675 .IP
676 The following example serves to illustrate the use of
677 .BR SO_PEEK_OFF .
678 Suppose a stream socket has the following queued input data:
679 .IP
680 aabbccddeeff
681 .IP
682 The following sequence of
683 .BR recv (2)
684 calls would have the effect noted in the comments:
685 .IP
686 .in +4n
687 .EX
688 int ov = 4; // Set peek offset to 4
689 setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
690
691 recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
692 recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
693 recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
694 recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
695 .EE
696 .in
697 .TP
698 .B SO_PEERCRED
699 Return the credentials of the peer process connected to this socket.
700 For further details, see
701 .BR unix (7).
702 .TP
703 .B SO_PRIORITY
704 Set the protocol-defined priority for all packets to be sent on
705 this socket.
706 Linux uses this value to order the networking queues:
707 packets with a higher priority may be processed first depending
708 on the selected device queueing discipline.
709 .\" For
710 .\" .BR ip (7),
711 .\" this also sets the IP type-of-service (TOS) field for outgoing packets.
712 Setting a priority outside the range 0 to 6 requires the
713 .B CAP_NET_ADMIN
714 capability.
715 .TP
716 .BR SO_PROTOCOL " (since Linux 2.6.32)"
717 Retrieves the socket protocol as an integer, returning a value such as
718 .BR IPPROTO_SCTP .
719 See
720 .BR socket (2)
721 for details.
722 This socket option is read-only.
723 .TP
724 .B SO_RCVBUF
725 Sets or gets the maximum socket receive buffer in bytes.
726 The kernel doubles this value (to allow space for bookkeeping overhead)
727 when it is set using
728 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
729 .BR setsockopt (2),
730 and this doubled value is returned by
731 .BR getsockopt (2).
732 .\" The following thread on LMKL is quite informative:
733 .\" getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behavior
734 .\" 17 July 2012
735 .\" http://thread.gmane.org/gmane.linux.kernel/1328935
736 The default value is set by the
737 .I /proc/sys/net/core/rmem_default
738 file, and the maximum allowed value is set by the
739 .I /proc/sys/net/core/rmem_max
740 file.
741 The minimum (doubled) value for this option is 256.
742 .TP
743 .BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
744 Using this socket option, a privileged
745 .RB ( CAP_NET_ADMIN )
746 process can perform the same task as
747 .BR SO_RCVBUF ,
748 but the
749 .I rmem_max
750 limit can be overridden.
751 .TP
752 .BR SO_RCVLOWAT " and " SO_SNDLOWAT
753 Specify the minimum number of bytes in the buffer until the socket layer
754 will pass the data to the protocol
755 .RB ( SO_SNDLOWAT )
756 or the user on receiving
757 .RB ( SO_RCVLOWAT ).
758 These two values are initialized to 1.
759 .B SO_SNDLOWAT
760 is not changeable on Linux
761 .RB ( setsockopt (2)
762 fails with the error
763 .BR ENOPROTOOPT ).
764 .B SO_RCVLOWAT
765 is changeable
766 only since Linux 2.4.
767 The
768 .BR select (2)
769 and
770 .BR poll (2)
771 system calls currently do not respect the
772 .B SO_RCVLOWAT
773 setting on Linux,
774 and mark a socket readable when even a single byte of data is available.
775 A subsequent read from the socket will block until
776 .B SO_RCVLOWAT
777 bytes are available.
778 .\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
779 .\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
780 .TP
781 .BR SO_RCVTIMEO " and " SO_SNDTIMEO
782 .\" Not implemented in 2.0.
783 .\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
784 .\" Implemented in 2.3.41 for setsockopt, and actually used.
785 Specify the receiving or sending timeouts until reporting an error.
786 The argument is a
787 .IR "struct timeval" .
788 If an input or output function blocks for this period of time, and
789 data has been sent or received, the return value of that function
790 will be the amount of data transferred; if no data has been transferred
791 and the timeout has been reached, then \-1 is returned with
792 .I errno
793 set to
794 .BR EAGAIN
795 or
796 .BR EWOULDBLOCK ,
797 .\" in fact to EAGAIN
798 or
799 .B EINPROGRESS
800 (for
801 .BR connect (2))
802 just as if the socket was specified to be nonblocking.
803 If the timeout is set to zero (the default),
804 then the operation will never timeout.
805 Timeouts only have effect for system calls that perform socket I/O (e.g.,
806 .BR read (2),
807 .BR recvmsg (2),
808 .BR send (2),
809 .BR sendmsg (2));
810 timeouts have no effect for
811 .BR select (2),
812 .BR poll (2),
813 .BR epoll_wait (2),
814 and so on.
815 .TP
816 .B SO_REUSEADDR
817 .\" commit c617f398edd4db2b8567a28e899a88f8f574798d
818 .\" https://lwn.net/Articles/542629/
819 Indicates that the rules used in validating addresses supplied in a
820 .BR bind (2)
821 call should allow reuse of local addresses.
822 For
823 .B AF_INET
824 sockets this
825 means that a socket may bind, except when there
826 is an active listening socket bound to the address.
827 When the listening socket is bound to
828 .B INADDR_ANY
829 with a specific port then it is not possible
830 to bind to this port for any local address.
831 Argument is an integer boolean flag.
832 .TP
833 .BR SO_REUSEPORT " (since Linux 3.9)"
834 Permits multiple
835 .B AF_INET
836 or
837 .B AF_INET6
838 sockets to be bound to an identical socket address.
839 This option must be set on each socket (including the first socket)
840 prior to calling
841 .BR bind (2)
842 on the socket.
843 To prevent port hijacking,
844 all of the processes binding to the same address must have the same
845 effective UID.
846 This option can be employed with both TCP and UDP sockets.
847 .IP
848 For TCP sockets, this option allows
849 .BR accept (2)
850 load distribution in a multi-threaded server to be improved by
851 using a distinct listener socket for each thread.
852 This provides improved load distribution as compared
853 to traditional techniques such using a single
854 .BR accept (2)ing
855 thread that distributes connections,
856 or having multiple threads that compete to
857 .BR accept (2)
858 from the same socket.
859 .IP
860 For UDP sockets,
861 the use of this option can provide better distribution
862 of incoming datagrams to multiple processes (or threads) as compared
863 to the traditional technique of having multiple processes
864 compete to receive datagrams on the same socket.
865 .TP
866 .BR SO_RXQ_OVFL " (since Linux 2.6.33)"
867 .\" commit 3b885787ea4112eaa80945999ea0901bf742707f
868 Indicates that an unsigned 32-bit value ancillary message (cmsg)
869 should be attached to received skbs indicating
870 the number of packets dropped by the socket since its creation.
871 .TP
872 .B SO_SNDBUF
873 Sets or gets the maximum socket send buffer in bytes.
874 The kernel doubles this value (to allow space for bookkeeping overhead)
875 when it is set using
876 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
877 .\" See also the comment to SO_RCVBUF (17 Jul 2012 LKML mail)
878 .BR setsockopt (2),
879 and this doubled value is returned by
880 .BR getsockopt (2).
881 The default value is set by the
882 .I /proc/sys/net/core/wmem_default
883 file and the maximum allowed value is set by the
884 .I /proc/sys/net/core/wmem_max
885 file.
886 The minimum (doubled) value for this option is 2048.
887 .TP
888 .BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
889 Using this socket option, a privileged
890 .RB ( CAP_NET_ADMIN )
891 process can perform the same task as
892 .BR SO_SNDBUF ,
893 but the
894 .I wmem_max
895 limit can be overridden.
896 .TP
897 .B SO_TIMESTAMP
898 Enable or disable the receiving of the
899 .B SO_TIMESTAMP
900 control message.
901 The timestamp control message is sent with level
902 .B SOL_SOCKET
903 and the
904 .I cmsg_data
905 field is a
906 .I "struct timeval"
907 indicating the
908 reception time of the last packet passed to the user in this call.
909 See
910 .BR cmsg (3)
911 for details on control messages.
912 .TP
913 .B SO_TYPE
914 Gets the socket type as an integer (e.g.,
915 .BR SOCK_STREAM ).
916 This socket option is read-only.
917 .TP
918 .BR SO_BUSY_POLL " (since Linux 3.11)"
919 Sets the approximate time in microseconds to busy poll on a blocking receive
920 when there is no data.
921 Increasing this value requires
922 .BR CAP_NET_ADMIN .
923 The default for this option is controlled by the
924 .I /proc/sys/net/core/busy_read
925 file.
926 .IP
927 The value in the
928 .I /proc/sys/net/core/busy_poll
929 file determines how long
930 .BR select (2)
931 and
932 .BR poll (2)
933 will busy poll when they operate on sockets with
934 .BR SO_BUSY_POLL
935 set and no events to report are found.
936 .IP
937 In both cases,
938 busy polling will only be done when the socket last received data
939 from a network device that supports this option.
940 .IP
941 While busy polling may improve latency of some applications,
942 care must be taken when using it since this will increase
943 both CPU utilization and power usage.
944 .SS Signals
945 When writing onto a connection-oriented socket that has been shut down
946 (by the local or the remote end)
947 .B SIGPIPE
948 is sent to the writing process and
949 .B EPIPE
950 is returned.
951 The signal is not sent when the write call
952 specified the
953 .B MSG_NOSIGNAL
954 flag.
955 .PP
956 When requested with the
957 .B FIOSETOWN
958 .BR fcntl (2)
959 or
960 .B SIOCSPGRP
961 .BR ioctl (2),
962 .B SIGIO
963 is sent when an I/O event occurs.
964 It is possible to use
965 .BR poll (2)
966 or
967 .BR select (2)
968 in the signal handler to find out which socket the event occurred on.
969 An alternative (in Linux 2.2) is to set a real-time signal using the
970 .B F_SETSIG
971 .BR fcntl (2);
972 the handler of the real time signal will be called with
973 the file descriptor in the
974 .I si_fd
975 field of its
976 .IR siginfo_t .
977 See
978 .BR fcntl (2)
979 for more information.
980 .PP
981 Under some circumstances (e.g., multiple processes accessing a
982 single socket), the condition that caused the
983 .B SIGIO
984 may have already disappeared when the process reacts to the signal.
985 If this happens, the process should wait again because Linux
986 will resend the signal later.
987 .\" .SS Ancillary messages
988 .SS /proc interfaces
989 The core socket networking parameters can be accessed
990 via files in the directory
991 .IR /proc/sys/net/core/ .
992 .TP
993 .I rmem_default
994 contains the default setting in bytes of the socket receive buffer.
995 .TP
996 .I rmem_max
997 contains the maximum socket receive buffer size in bytes which a user may
998 set by using the
999 .B SO_RCVBUF
1000 socket option.
1001 .TP
1002 .I wmem_default
1003 contains the default setting in bytes of the socket send buffer.
1004 .TP
1005 .I wmem_max
1006 contains the maximum socket send buffer size in bytes which a user may
1007 set by using the
1008 .B SO_SNDBUF
1009 socket option.
1010 .TP
1011 .IR message_cost " and " message_burst
1012 configure the token bucket filter used to load limit warning messages
1013 caused by external network events.
1014 .TP
1015 .I netdev_max_backlog
1016 Maximum number of packets in the global input queue.
1017 .TP
1018 .I optmem_max
1019 Maximum length of ancillary data and user control data like the iovecs
1020 per socket.
1021 .\" netdev_fastroute is not documented because it is experimental
1022 .SS Ioctls
1023 These operations can be accessed using
1024 .BR ioctl (2):
1025 .PP
1026 .in +4n
1027 .EX
1028 .IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
1029 .EE
1030 .in
1031 .TP
1032 .B SIOCGSTAMP
1033 Return a
1034 .I struct timeval
1035 with the receive timestamp of the last packet passed to the user.
1036 This is useful for accurate round trip time measurements.
1037 See
1038 .BR setitimer (2)
1039 for a description of
1040 .IR "struct timeval" .
1041 .\"
1042 This ioctl should be used only if the socket option
1043 .B SO_TIMESTAMP
1044 is not set on the socket.
1045 Otherwise, it returns the timestamp of the
1046 last packet that was received while
1047 .B SO_TIMESTAMP
1048 was not set, or it fails if no such packet has been received,
1049 (i.e.,
1050 .BR ioctl (2)
1051 returns \-1 with
1052 .I errno
1053 set to
1054 .BR ENOENT ).
1055 .TP
1056 .B SIOCSPGRP
1057 Set the process or process group that is to receive
1058 .B SIGIO
1059 or
1060 .B SIGURG
1061 signals when I/O becomes possible or urgent data is available.
1062 The argument is a pointer to a
1063 .IR pid_t .
1064 For further details, see the description of
1065 .BR F_SETOWN
1066 in
1067 .BR fcntl (2).
1068 .TP
1069 .B FIOASYNC
1070 Change the
1071 .B O_ASYNC
1072 flag to enable or disable asynchronous I/O mode of the socket.
1073 Asynchronous I/O mode means that the
1074 .B SIGIO
1075 signal or the signal set with
1076 .B F_SETSIG
1077 is raised when a new I/O event occurs.
1078 .IP
1079 Argument is an integer boolean flag.
1080 (This operation is synonymous with the use of
1081 .BR fcntl (2)
1082 to set the
1083 .B O_ASYNC
1084 flag.)
1085 .\"
1086 .TP
1087 .B SIOCGPGRP
1088 Get the current process or process group that receives
1089 .B SIGIO
1090 or
1091 .B SIGURG
1092 signals,
1093 or 0
1094 when none is set.
1095 .PP
1096 Valid
1097 .BR fcntl (2)
1098 operations:
1099 .TP
1100 .B FIOGETOWN
1101 The same as the
1102 .B SIOCGPGRP
1103 .BR ioctl (2).
1104 .TP
1105 .B FIOSETOWN
1106 The same as the
1107 .B SIOCSPGRP
1108 .BR ioctl (2).
1109 .SH VERSIONS
1110 .B SO_BINDTODEVICE
1111 was introduced in Linux 2.0.30.
1112 .B SO_PASSCRED
1113 is new in Linux 2.2.
1114 The
1115 .I /proc
1116 interfaces were introduced in Linux 2.2.
1117 .B SO_RCVTIMEO
1118 and
1119 .B SO_SNDTIMEO
1120 are supported since Linux 2.3.41.
1121 Earlier, timeouts were fixed to
1122 a protocol-specific setting, and could not be read or written.
1123 .SH NOTES
1124 Linux assumes that half of the send/receive buffer is used for internal
1125 kernel structures; thus the values in the corresponding
1126 .I /proc
1127 files are twice what can be observed on the wire.
1128 .PP
1129 Linux will allow port reuse only with the
1130 .B SO_REUSEADDR
1131 option
1132 when this option was set both in the previous program that performed a
1133 .BR bind (2)
1134 to the port and in the program that wants to reuse the port.
1135 This differs from some implementations (e.g., FreeBSD)
1136 where only the later program needs to set the
1137 .B SO_REUSEADDR
1138 option.
1139 Typically this difference is invisible, since, for example, a server
1140 program is designed to always set this option.
1141 .\" .SH AUTHORS
1142 .\" This man page was written by Andi Kleen.
1143 .SH SEE ALSO
1144 .BR wireshark (1),
1145 .BR bpf (2),
1146 .BR connect (2),
1147 .BR getsockopt (2),
1148 .BR setsockopt (2),
1149 .BR socket (2),
1150 .BR pcap (3),
1151 .BR address_families (7),
1152 .BR capabilities (7),
1153 .BR ddp (7),
1154 .BR ip (7),
1155 .BR packet (7),
1156 .BR tcp (7),
1157 .BR udp (7),
1158 .BR unix (7),
1159 .BR tcpdump (8)