]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/socket.7
pldd.1, bpf.2, chdir.2, clone.2, fanotify_init.2, fanotify_mark.2, intro.2, ipc.2...
[thirdparty/man-pages.git] / man7 / socket.7
1 '\" t
2 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3 .\" and copyright (c) 1999 Matthew Wilcox.
4 .\"
5 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
6 .\" Permission is granted to distribute possibly modified copies
7 .\" of this page provided the header is included verbatim,
8 .\" and in case of nontrivial modification author and date
9 .\" of the modification is added to the header.
10 .\" %%%LICENSE_END
11 .\"
12 .\" 2002-10-30, Michael Kerrisk, <mtk.manpages@gmail.com>
13 .\" Added description of SO_ACCEPTCONN
14 .\" 2004-05-20, aeb, added SO_RCVTIMEO/SO_SNDTIMEO text.
15 .\" Modified, 27 May 2004, Michael Kerrisk <mtk.manpages@gmail.com>
16 .\" Added notes on capability requirements
17 .\" A few small grammar fixes
18 .\" 2010-06-13 Jan Engelhardt <jengelh@medozas.de>
19 .\" Documented SO_DOMAIN and SO_PROTOCOL.
20 .\"
21 .\" FIXME
22 .\" The following are not yet documented:
23 .\"
24 .\" SO_PEERNAME (2.4?)
25 .\" get only
26 .\" Seems to do something similar to getpeername(), but then
27 .\" why is it necessary / how does it differ?
28 .\"
29 .\" SO_TIMESTAMPNS (2.6.22)
30 .\" Documentation/networking/timestamping.txt
31 .\" commit 92f37fd2ee805aa77925c1e64fd56088b46094fc
32 .\" Author: Eric Dumazet <dada1@cosmosbay.com>
33 .\"
34 .\" SO_TIMESTAMPING (2.6.30)
35 .\" Documentation/networking/timestamping.txt
36 .\" commit cb9eff097831007afb30d64373f29d99825d0068
37 .\" Author: Patrick Ohly <patrick.ohly@intel.com>
38 .\"
39 .\" SO_WIFI_STATUS (3.3)
40 .\" commit 6e3e939f3b1bf8534b32ad09ff199d88800835a0
41 .\" Author: Johannes Berg <johannes.berg@intel.com>
42 .\" Also: SCM_WIFI_STATUS
43 .\"
44 .\" SO_NOFCS (3.4)
45 .\" commit 3bdc0eba0b8b47797f4a76e377dd8360f317450f
46 .\" Author: Ben Greear <greearb@candelatech.com>
47 .\"
48 .\" SO_GET_FILTER (3.8)
49 .\" commit a8fc92778080c845eaadc369a0ecf5699a03bef0
50 .\" Author: Pavel Emelyanov <xemul@parallels.com>
51 .\"
52 .\" SO_SELECT_ERR_QUEUE (3.10)
53 .\" commit 7d4c04fc170087119727119074e72445f2bb192b
54 .\" Author: Keller, Jacob E <jacob.e.keller@intel.com>
55 .\"
56 .\" SO_MAX_PACING_RATE (3.13)
57 .\" commit 62748f32d501f5d3712a7c372bbb92abc7c62bc7
58 .\" Author: Eric Dumazet <edumazet@google.com>
59 .\"
60 .\" SO_BPF_EXTENSIONS (3.14)
61 .\" commit ea02f9411d9faa3553ed09ce0ec9f00ceae9885e
62 .\" Author: Michal Sekletar <msekleta@redhat.com>
63 .\"
64 .TH SOCKET 7 2019-08-02 Linux "Linux Programmer's Manual"
65 .SH NAME
66 socket \- Linux socket interface
67 .SH SYNOPSIS
68 .B #include <sys/socket.h>
69 .PP
70 .IB sockfd " = socket(int " socket_family ", int " socket_type ", int " protocol );
71 .SH DESCRIPTION
72 This manual page describes the Linux networking socket layer user
73 interface.
74 The BSD compatible sockets
75 are the uniform interface
76 between the user process and the network protocol stacks in the kernel.
77 The protocol modules are grouped into
78 .I protocol families
79 such as
80 .BR AF_INET ", " AF_IPX ", and " AF_PACKET ,
81 and
82 .I socket types
83 such as
84 .B SOCK_STREAM
85 or
86 .BR SOCK_DGRAM .
87 See
88 .BR socket (2)
89 for more information on families and types.
90 .SS Socket-layer functions
91 These functions are used by the user process to send or receive packets
92 and to do other socket operations.
93 For more information see their respective manual pages.
94 .PP
95 .BR socket (2)
96 creates a socket,
97 .BR connect (2)
98 connects a socket to a remote socket address,
99 the
100 .BR bind (2)
101 function binds a socket to a local socket address,
102 .BR listen (2)
103 tells the socket that new connections shall be accepted, and
104 .BR accept (2)
105 is used to get a new socket with a new incoming connection.
106 .BR socketpair (2)
107 returns two connected anonymous sockets (implemented only for a few
108 local families like
109 .BR AF_UNIX )
110 .PP
111 .BR send (2),
112 .BR sendto (2),
113 and
114 .BR sendmsg (2)
115 send data over a socket, and
116 .BR recv (2),
117 .BR recvfrom (2),
118 .BR recvmsg (2)
119 receive data from a socket.
120 .BR poll (2)
121 and
122 .BR select (2)
123 wait for arriving data or a readiness to send data.
124 In addition, the standard I/O operations like
125 .BR write (2),
126 .BR writev (2),
127 .BR sendfile (2),
128 .BR read (2),
129 and
130 .BR readv (2)
131 can be used to read and write data.
132 .PP
133 .BR getsockname (2)
134 returns the local socket address and
135 .BR getpeername (2)
136 returns the remote socket address.
137 .BR getsockopt (2)
138 and
139 .BR setsockopt (2)
140 are used to set or get socket layer or protocol options.
141 .BR ioctl (2)
142 can be used to set or read some other options.
143 .PP
144 .BR close (2)
145 is used to close a socket.
146 .BR shutdown (2)
147 closes parts of a full-duplex socket connection.
148 .PP
149 Seeking, or calling
150 .BR pread (2)
151 or
152 .BR pwrite (2)
153 with a nonzero position is not supported on sockets.
154 .PP
155 It is possible to do nonblocking I/O on sockets by setting the
156 .B O_NONBLOCK
157 flag on a socket file descriptor using
158 .BR fcntl (2).
159 Then all operations that would block will (usually)
160 return with
161 .B EAGAIN
162 (operation should be retried later);
163 .BR connect (2)
164 will return
165 .B EINPROGRESS
166 error.
167 The user can then wait for various events via
168 .BR poll (2)
169 or
170 .BR select (2).
171 .TS
172 tab(:) allbox;
173 c s s
174 l l l.
175 I/O events
176 Event:Poll flag:Occurrence
177 Read:POLLIN:T{
178 New data arrived.
179 T}
180 Read:POLLIN:T{
181 A connection setup has been completed
182 (for connection-oriented sockets)
183 T}
184 Read:POLLHUP:T{
185 A disconnection request has been initiated by the other end.
186 T}
187 Read:POLLHUP:T{
188 A connection is broken (only for connection-oriented protocols).
189 When the socket is written
190 .B SIGPIPE
191 is also sent.
192 T}
193 Write:POLLOUT:T{
194 Socket has enough send buffer space for writing new data.
195 T}
196 Read/Write:T{
197 POLLIN |
198 .br
199 POLLOUT
200 T}:T{
201 An outgoing
202 .BR connect (2)
203 finished.
204 T}
205 Read/Write:POLLERR:An asynchronous error occurred.
206 Read/Write:POLLHUP:The other end has shut down one direction.
207 Exception:POLLPRI:T{
208 Urgent data arrived.
209 .B SIGURG
210 is sent then.
211 T}
212 .\" FIXME . The following is not true currently:
213 .\" It is no I/O event when the connection
214 .\" is broken from the local end using
215 .\" .BR shutdown (2)
216 .\" or
217 .\" .BR close (2).
218 .TE
219 .PP
220 An alternative to
221 .BR poll (2)
222 and
223 .BR select (2)
224 is to let the kernel inform the application about events
225 via a
226 .B SIGIO
227 signal.
228 For that the
229 .B O_ASYNC
230 flag must be set on a socket file descriptor via
231 .BR fcntl (2)
232 and a valid signal handler for
233 .B SIGIO
234 must be installed via
235 .BR sigaction (2).
236 See the
237 .I Signals
238 discussion below.
239 .SS Socket address structures
240 Each socket domain has its own format for socket addresses,
241 with a domain-specific address structure.
242 Each of these structures begins with an
243 integer "family" field (typed as
244 .IR sa_family_t )
245 that indicates the type of the address structure.
246 This allows
247 the various system calls (e.g.,
248 .BR connect (2),
249 .BR bind (2),
250 .BR accept (2),
251 .BR getsockname (2),
252 .BR getpeername (2)),
253 which are generic to all socket domains,
254 to determine the domain of a particular socket address.
255 .PP
256 To allow any type of socket address to be passed to
257 interfaces in the sockets API,
258 the type
259 .IR "struct sockaddr"
260 is defined.
261 The purpose of this type is purely to allow casting of
262 domain-specific socket address types to a "generic" type,
263 so as to avoid compiler warnings about type mismatches in
264 calls to the sockets API.
265 .PP
266 In addition, the sockets API provides the data type
267 .IR "struct sockaddr_storage".
268 This type
269 is suitable to accommodate all supported domain-specific socket
270 address structures; it is large enough and is aligned properly.
271 (In particular, it is large enough to hold
272 IPv6 socket addresses.)
273 The structure includes the following field, which can be used to identify
274 the type of socket address actually stored in the structure:
275 .PP
276 .in +4n
277 .EX
278 sa_family_t ss_family;
279 .EE
280 .in
281 .PP
282 The
283 .I sockaddr_storage
284 structure is useful in programs that must handle socket addresses
285 in a generic way
286 (e.g., programs that must deal with both IPv4 and IPv6 socket addresses).
287 .SS Socket options
288 The socket options listed below can be set by using
289 .BR setsockopt (2)
290 and read with
291 .BR getsockopt (2)
292 with the socket level set to
293 .B SOL_SOCKET
294 for all sockets.
295 Unless otherwise noted,
296 .I optval
297 is a pointer to an
298 .IR int .
299 .\" FIXME .
300 .\" In the list below, the text used to describe argument types
301 .\" for each socket option should be more consistent
302 .\"
303 .\" SO_ACCEPTCONN is in POSIX.1-2001, and its origin is explained in
304 .\" W R Stevens, UNPv1
305 .TP
306 .B SO_ACCEPTCONN
307 Returns a value indicating whether or not this socket has been marked
308 to accept connections with
309 .BR listen (2).
310 The value 0 indicates that this is not a listening socket,
311 the value 1 indicates that this is a listening socket.
312 This socket option is read-only.
313 .TP
314 .BR SO_ATTACH_FILTER " (since Linux 2.2), " SO_ATTACH_BPF " (since Linux 3.19)"
315 Attach a classic BPF
316 .RB ( SO_ATTACH_FILTER )
317 or an extended BPF
318 .RB ( SO_ATTACH_BPF )
319 program to the socket for use as a filter of incoming packets.
320 A packet will be dropped if the filter program returns zero.
321 If the filter program returns a
322 nonzero value which is less than the packet's data length,
323 the packet will be truncated to the length returned.
324 If the value returned by the filter is greater than or equal to the
325 packet's data length, the packet is allowed to proceed unmodified.
326 .IP
327 The argument for
328 .BR SO_ATTACH_FILTER
329 is a
330 .I sock_fprog
331 structure, defined in
332 .IR <linux/filter.h> :
333 .IP
334 .in +4n
335 .EX
336 struct sock_fprog {
337 unsigned short len;
338 struct sock_filter *filter;
339 };
340 .EE
341 .in
342 .IP
343 The argument for
344 .BR SO_ATTACH_BPF
345 is a file descriptor returned by the
346 .BR bpf (2)
347 system call and must refer to a program of type
348 .BR BPF_PROG_TYPE_SOCKET_FILTER .
349 .IP
350 These options may be set multiple times for a given socket,
351 each time replacing the previous filter program.
352 The classic and extended versions may be called on the same socket,
353 but the previous filter will always be replaced such that a socket
354 never has more than one filter defined.
355 .IP
356 Both classic and extended BPF are explained in the kernel source file
357 .I Documentation/networking/filter.txt
358 .TP
359 .BR SO_ATTACH_REUSEPORT_CBPF ", " SO_ATTACH_REUSEPORT_EBPF
360 For use with the
361 .BR SO_REUSEPORT
362 option, these options allow the user to set a classic BPF
363 .RB ( SO_ATTACH_REUSEPORT_CBPF )
364 or an extended BPF
365 .RB ( SO_ATTACH_REUSEPORT_EBPF )
366 program which defines how packets are assigned to
367 the sockets in the reuseport group (that is, all sockets which have
368 .BR SO_REUSEPORT
369 set and are using the same local address to receive packets).
370 .IP
371 The BPF program must return an index between 0 and N\-1 representing
372 the socket which should receive the packet
373 (where N is the number of sockets in the group).
374 If the BPF program returns an invalid index,
375 socket selection will fall back to the plain
376 .BR SO_REUSEPORT
377 mechanism.
378 .IP
379 Sockets are numbered in the order in which they are added to the group
380 (that is, the order of
381 .BR bind (2)
382 calls for UDP sockets or the order of
383 .BR listen (2)
384 calls for TCP sockets).
385 New sockets added to a reuseport group will inherit the BPF program.
386 When a socket is removed from a reuseport group (via
387 .BR close (2)),
388 the last socket in the group will be moved into the closed socket's
389 position.
390 .IP
391 These options may be set repeatedly at any time on any socket in the group
392 to replace the current BPF program used by all sockets in the group.
393 .IP
394 .BR SO_ATTACH_REUSEPORT_CBPF
395 takes the same argument type as
396 .BR SO_ATTACH_FILTER
397 and
398 .BR SO_ATTACH_REUSEPORT_EBPF
399 takes the same argument type as
400 .BR SO_ATTACH_BPF .
401 .IP
402 UDP support for this feature is available since Linux 4.5;
403 TCP support is available since Linux 4.6.
404 .TP
405 .B SO_BINDTODEVICE
406 Bind this socket to a particular device like \(lqeth0\(rq,
407 as specified in the passed interface name.
408 If the
409 name is an empty string or the option length is zero, the socket device
410 binding is removed.
411 The passed option is a variable-length null-terminated
412 interface name string with the maximum size of
413 .BR IFNAMSIZ .
414 If a socket is bound to an interface,
415 only packets received from that particular interface are processed by the
416 socket.
417 Note that this works only for some socket types, particularly
418 .B AF_INET
419 sockets.
420 It is not supported for packet sockets (use normal
421 .BR bind (2)
422 there).
423 .IP
424 Before Linux 3.8,
425 this socket option could be set, but could not retrieved with
426 .BR getsockopt (2).
427 Since Linux 3.8, it is readable.
428 The
429 .I optlen
430 argument should contain the buffer size available
431 to receive the device name and is recommended to be
432 .BR IFNAMSIZ
433 bytes.
434 The real device name length is reported back in the
435 .I optlen
436 argument.
437 .TP
438 .B SO_BROADCAST
439 Set or get the broadcast flag.
440 When enabled, datagram sockets are allowed to send
441 packets to a broadcast address.
442 This option has no effect on stream-oriented sockets.
443 .TP
444 .B SO_BSDCOMPAT
445 Enable BSD bug-to-bug compatibility.
446 This is used by the UDP protocol module in Linux 2.0 and 2.2.
447 If enabled, ICMP errors received for a UDP socket will not be passed
448 to the user program.
449 In later kernel versions, support for this option has been phased out:
450 Linux 2.4 silently ignores it, and Linux 2.6 generates a kernel warning
451 (printk()) if a program uses this option.
452 Linux 2.0 also enabled BSD bug-to-bug compatibility
453 options (random header changing, skipping of the broadcast flag) for raw
454 sockets with this option, but that was removed in Linux 2.2.
455 .TP
456 .B SO_DEBUG
457 Enable socket debugging.
458 Allowed only for processes with the
459 .B CAP_NET_ADMIN
460 capability or an effective user ID of 0.
461 .TP
462 .BR SO_DETACH_FILTER " (since Linux 2.2), " SO_DETACH_BPF " (since Linux 3.19)"
463 These two options, which are synonyms,
464 may be used to remove the classic or extended BPF
465 program attached to a socket with either
466 .BR SO_ATTACH_FILTER
467 or
468 .BR SO_ATTACH_BPF .
469 The option value is ignored.
470 .TP
471 .BR SO_DOMAIN " (since Linux 2.6.32)"
472 Retrieves the socket domain as an integer, returning a value such as
473 .BR AF_INET6 .
474 See
475 .BR socket (2)
476 for details.
477 This socket option is read-only.
478 .TP
479 .B SO_ERROR
480 Get and clear the pending socket error.
481 This socket option is read-only.
482 Expects an integer.
483 .TP
484 .B SO_DONTROUTE
485 Don't send via a gateway, send only to directly connected hosts.
486 The same effect can be achieved by setting the
487 .B MSG_DONTROUTE
488 flag on a socket
489 .BR send (2)
490 operation.
491 Expects an integer boolean flag.
492 .TP
493 .BR SO_INCOMING_CPU " (gettable since Linux 3.19, settable since Linux 4.4)"
494 .\" getsockopt 2c8c56e15df3d4c2af3d656e44feb18789f75837
495 .\" setsockopt 70da268b569d32a9fddeea85dc18043de9d89f89
496 Sets or gets the CPU affinity of a socket.
497 Expects an integer flag.
498 .IP
499 .in +4n
500 .EX
501 int cpu = 1;
502 setsockopt(fd, SOL_SOCKET, SO_INCOMING_CPU, &cpu, sizeof(cpu));
503 .EE
504 .in
505 .IP
506 Because all of the packets for a single stream
507 (i.e., all packets for the same 4-tuple)
508 arrive on the single RX queue that is associated with a particular CPU,
509 the typical use case is to employ one listening process per RX queue,
510 with the incoming flow being handled by a listener
511 on the same CPU that is handling the RX queue.
512 This provides optimal NUMA behavior and keeps CPU caches hot.
513 .\"
514 .\" From an email conversation with Eric Dumazet:
515 .\" >> Note that setting the option is not supported if SO_REUSEPORT is used.
516 .\" >
517 .\" > Please define "not supported". Does this yield an API diagnostic?
518 .\" > If so, what is it?
519 .\" >
520 .\" >> Socket will be selected from an array, either by a hash or BPF program
521 .\" >> that has no access to this information.
522 .\" >
523 .\" > Sorry -- I'm lost here. How does this comment relate to the proposed
524 .\" > man page text above?
525 .\"
526 .\" Simply that :
527 .\"
528 .\" If an application uses both SO_INCOMING_CPU and SO_REUSEPORT, then
529 .\" SO_REUSEPORT logic, selecting the socket to receive the packet, ignores
530 .\" SO_INCOMING_CPU setting.
531 .TP
532 .B SO_KEEPALIVE
533 Enable sending of keep-alive messages on connection-oriented sockets.
534 Expects an integer boolean flag.
535 .TP
536 .B SO_LINGER
537 Sets or gets the
538 .B SO_LINGER
539 option.
540 The argument is a
541 .I linger
542 structure.
543 .IP
544 .in +4n
545 .EX
546 struct linger {
547 int l_onoff; /* linger active */
548 int l_linger; /* how many seconds to linger for */
549 };
550 .EE
551 .in
552 .IP
553 When enabled, a
554 .BR close (2)
555 or
556 .BR shutdown (2)
557 will not return until all queued messages for the socket have been
558 successfully sent or the linger timeout has been reached.
559 Otherwise,
560 the call returns immediately and the closing is done in the background.
561 When the socket is closed as part of
562 .BR exit (2),
563 it always lingers in the background.
564 .TP
565 .B SO_LOCK_FILTER
566 .\" commit d59577b6ffd313d0ab3be39cb1ab47e29bdc9182
567 When set, this option will prevent
568 changing the filters associated with the socket.
569 These filters include any set using the socket options
570 .BR SO_ATTACH_FILTER ,
571 .BR SO_ATTACH_BPF ,
572 .BR SO_ATTACH_REUSEPORT_CBPF ,
573 and
574 .BR SO_ATTACH_REUSEPORT_EBPF .
575 .IP
576 The typical use case is for a privileged process to set up a raw socket
577 (an operation that requires the
578 .BR CAP_NET_RAW
579 capability), apply a restrictive filter, set the
580 .BR SO_LOCK_FILTER
581 option,
582 and then either drop its privileges or pass the socket file descriptor
583 to an unprivileged process via a UNIX domain socket.
584 .IP
585 Once the
586 .BR SO_LOCK_FILTER
587 option has been enabled, attempts to change or remove the filter
588 attached to a socket, or to disable the
589 .BR SO_LOCK_FILTER
590 option will fail with the error
591 .BR EPERM .
592 .TP
593 .BR SO_MARK " (since Linux 2.6.25)"
594 .\" commit 4a19ec5800fc3bb64e2d87c4d9fdd9e636086fe0
595 .\" and 914a9ab386a288d0f22252fc268ecbc048cdcbd5
596 Set the mark for each packet sent through this socket
597 (similar to the netfilter MARK target but socket-based).
598 Changing the mark can be used for mark-based
599 routing without netfilter or for packet filtering.
600 Setting this option requires the
601 .B CAP_NET_ADMIN
602 capability.
603 .TP
604 .B SO_OOBINLINE
605 If this option is enabled,
606 out-of-band data is directly placed into the receive data stream.
607 Otherwise, out-of-band data is passed only when the
608 .B MSG_OOB
609 flag is set during receiving.
610 .\" don't document it because it can do too much harm.
611 .\".B SO_NO_CHECK
612 .\" The kernel has support for the SO_NO_CHECK socket
613 .\" option (boolean: 0 == default, calculate checksum on xmit,
614 .\" 1 == do not calculate checksum on xmit).
615 .\" Additional note from Andi Kleen on SO_NO_CHECK (2010-08-30)
616 .\" On Linux UDP checksums are essentially free and there's no reason
617 .\" to turn them off and it would disable another safety line.
618 .\" That is why I didn't document the option.
619 .TP
620 .B SO_PASSCRED
621 Enable or disable the receiving of the
622 .B SCM_CREDENTIALS
623 control message.
624 For more information see
625 .BR unix (7).
626 .TP
627 .B SO_PASSSEC
628 Enable or disable the receiving of the
629 .B SCM_SECURITY
630 control message.
631 For more information see
632 .BR unix (7).
633 .TP
634 .BR SO_PEEK_OFF " (since Linux 3.4)"
635 .\" commit ef64a54f6e558155b4f149bb10666b9e914b6c54
636 This option, which is currently supported only for
637 .BR unix (7)
638 sockets, sets the value of the "peek offset" for the
639 .BR recv (2)
640 system call when used with
641 .BR MSG_PEEK
642 flag.
643 .IP
644 When this option is set to a negative value
645 (it is set to \-1 for all new sockets),
646 traditional behavior is provided:
647 .BR recv (2)
648 with the
649 .BR MSG_PEEK
650 flag will peek data from the front of the queue.
651 .IP
652 When the option is set to a value greater than or equal to zero,
653 then the next peek at data queued in the socket will occur at
654 the byte offset specified by the option value.
655 At the same time, the "peek offset" will be
656 incremented by the number of bytes that were peeked from the queue,
657 so that a subsequent peek will return the next data in the queue.
658 .IP
659 If data is removed from the front of the queue via a call to
660 .BR recv (2)
661 (or similar) without the
662 .BR MSG_PEEK
663 flag, the "peek offset" will be decreased by the number of bytes removed.
664 In other words, receiving data without the
665 .B MSG_PEEK
666 flag will cause the "peek offset" to be adjusted to maintain
667 the correct relative position in the queued data,
668 so that a subsequent peek will retrieve the data that would have been
669 retrieved had the data not been removed.
670 .IP
671 For datagram sockets, if the "peek offset" points to the middle of a packet,
672 the data returned will be marked with the
673 .BR MSG_TRUNC
674 flag.
675 .IP
676 The following example serves to illustrate the use of
677 .BR SO_PEEK_OFF .
678 Suppose a stream socket has the following queued input data:
679 .IP
680 aabbccddeeff
681 .IP
682 The following sequence of
683 .BR recv (2)
684 calls would have the effect noted in the comments:
685 .IP
686 .in +4n
687 .EX
688 int ov = 4; // Set peek offset to 4
689 setsockopt(fd, SOL_SOCKET, SO_PEEK_OFF, &ov, sizeof(ov));
690
691 recv(fd, buf, 2, MSG_PEEK); // Peeks "cc"; offset set to 6
692 recv(fd, buf, 2, MSG_PEEK); // Peeks "dd"; offset set to 8
693 recv(fd, buf, 2, 0); // Reads "aa"; offset set to 6
694 recv(fd, buf, 2, MSG_PEEK); // Peeks "ee"; offset set to 8
695 .EE
696 .in
697 .TP
698 .B SO_PEERCRED
699 Return the credentials of the peer process connected to this socket.
700 For further details, see
701 .BR unix (7).
702 .TP
703 .B SO_PRIORITY
704 Set the protocol-defined priority for all packets to be sent on
705 this socket.
706 Linux uses this value to order the networking queues:
707 packets with a higher priority may be processed first depending
708 on the selected device queueing discipline.
709 .\" For
710 .\" .BR ip (7),
711 .\" this also sets the IP type-of-service (TOS) field for outgoing packets.
712 Setting a priority outside the range 0 to 6 requires the
713 .B CAP_NET_ADMIN
714 capability.
715 .TP
716 .BR SO_PROTOCOL " (since Linux 2.6.32)"
717 Retrieves the socket protocol as an integer, returning a value such as
718 .BR IPPROTO_SCTP .
719 See
720 .BR socket (2)
721 for details.
722 This socket option is read-only.
723 .TP
724 .B SO_RCVBUF
725 Sets or gets the maximum socket receive buffer in bytes.
726 The kernel doubles this value (to allow space for bookkeeping overhead)
727 when it is set using
728 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
729 .BR setsockopt (2),
730 and this doubled value is returned by
731 .BR getsockopt (2).
732 .\" The following thread on LMKL is quite informative:
733 .\" getsockopt/setsockopt with SO_RCVBUF and SO_SNDBUF "non-standard" behavior
734 .\" 17 July 2012
735 .\" http://thread.gmane.org/gmane.linux.kernel/1328935
736 The default value is set by the
737 .I /proc/sys/net/core/rmem_default
738 file, and the maximum allowed value is set by the
739 .I /proc/sys/net/core/rmem_max
740 file.
741 The minimum (doubled) value for this option is 256.
742 .TP
743 .BR SO_RCVBUFFORCE " (since Linux 2.6.14)"
744 Using this socket option, a privileged
745 .RB ( CAP_NET_ADMIN )
746 process can perform the same task as
747 .BR SO_RCVBUF ,
748 but the
749 .I rmem_max
750 limit can be overridden.
751 .TP
752 .BR SO_RCVLOWAT " and " SO_SNDLOWAT
753 Specify the minimum number of bytes in the buffer until the socket layer
754 will pass the data to the protocol
755 .RB ( SO_SNDLOWAT )
756 or the user on receiving
757 .RB ( SO_RCVLOWAT ).
758 These two values are initialized to 1.
759 .B SO_SNDLOWAT
760 is not changeable on Linux
761 .RB ( setsockopt (2)
762 fails with the error
763 .BR ENOPROTOOPT ).
764 .B SO_RCVLOWAT
765 is changeable
766 only since Linux 2.4.
767 .IP
768 Before Linux 2.6.28
769 .\" commit c7004482e8dcb7c3c72666395cfa98a216a4fb70
770 .BR select (2),
771 .BR poll (2),
772 and
773 .BR epoll (7)
774 did not respect the
775 .B SO_RCVLOWAT
776 setting on Linux,
777 and indicated a socket as readable when even a single byte of data
778 was available.
779 A subsequent read from the socket would then block until
780 .B SO_RCVLOWAT
781 bytes are available.
782 .\" See http://marc.theaimsgroup.com/?l=linux-kernel&m=111049368106984&w=2
783 .\" Tested on kernel 2.6.14 -- mtk, 30 Nov 05
784 .TP
785 .BR SO_RCVTIMEO " and " SO_SNDTIMEO
786 .\" Not implemented in 2.0.
787 .\" Implemented in 2.1.11 for getsockopt: always return a zero struct.
788 .\" Implemented in 2.3.41 for setsockopt, and actually used.
789 Specify the receiving or sending timeouts until reporting an error.
790 The argument is a
791 .IR "struct timeval" .
792 If an input or output function blocks for this period of time, and
793 data has been sent or received, the return value of that function
794 will be the amount of data transferred; if no data has been transferred
795 and the timeout has been reached, then \-1 is returned with
796 .I errno
797 set to
798 .BR EAGAIN
799 or
800 .BR EWOULDBLOCK ,
801 .\" in fact to EAGAIN
802 or
803 .B EINPROGRESS
804 (for
805 .BR connect (2))
806 just as if the socket was specified to be nonblocking.
807 If the timeout is set to zero (the default),
808 then the operation will never timeout.
809 Timeouts only have effect for system calls that perform socket I/O (e.g.,
810 .BR read (2),
811 .BR recvmsg (2),
812 .BR send (2),
813 .BR sendmsg (2));
814 timeouts have no effect for
815 .BR select (2),
816 .BR poll (2),
817 .BR epoll_wait (2),
818 and so on.
819 .TP
820 .B SO_REUSEADDR
821 .\" commit c617f398edd4db2b8567a28e899a88f8f574798d
822 .\" https://lwn.net/Articles/542629/
823 Indicates that the rules used in validating addresses supplied in a
824 .BR bind (2)
825 call should allow reuse of local addresses.
826 For
827 .B AF_INET
828 sockets this
829 means that a socket may bind, except when there
830 is an active listening socket bound to the address.
831 When the listening socket is bound to
832 .B INADDR_ANY
833 with a specific port then it is not possible
834 to bind to this port for any local address.
835 Argument is an integer boolean flag.
836 .TP
837 .BR SO_REUSEPORT " (since Linux 3.9)"
838 Permits multiple
839 .B AF_INET
840 or
841 .B AF_INET6
842 sockets to be bound to an identical socket address.
843 This option must be set on each socket (including the first socket)
844 prior to calling
845 .BR bind (2)
846 on the socket.
847 To prevent port hijacking,
848 all of the processes binding to the same address must have the same
849 effective UID.
850 This option can be employed with both TCP and UDP sockets.
851 .IP
852 For TCP sockets, this option allows
853 .BR accept (2)
854 load distribution in a multi-threaded server to be improved by
855 using a distinct listener socket for each thread.
856 This provides improved load distribution as compared
857 to traditional techniques such using a single
858 .BR accept (2)ing
859 thread that distributes connections,
860 or having multiple threads that compete to
861 .BR accept (2)
862 from the same socket.
863 .IP
864 For UDP sockets,
865 the use of this option can provide better distribution
866 of incoming datagrams to multiple processes (or threads) as compared
867 to the traditional technique of having multiple processes
868 compete to receive datagrams on the same socket.
869 .TP
870 .BR SO_RXQ_OVFL " (since Linux 2.6.33)"
871 .\" commit 3b885787ea4112eaa80945999ea0901bf742707f
872 Indicates that an unsigned 32-bit value ancillary message (cmsg)
873 should be attached to received skbs indicating
874 the number of packets dropped by the socket since its creation.
875 .TP
876 .B SO_SNDBUF
877 Sets or gets the maximum socket send buffer in bytes.
878 The kernel doubles this value (to allow space for bookkeeping overhead)
879 when it is set using
880 .\" Most (all?) other implementations do not do this -- MTK, Dec 05
881 .\" See also the comment to SO_RCVBUF (17 Jul 2012 LKML mail)
882 .BR setsockopt (2),
883 and this doubled value is returned by
884 .BR getsockopt (2).
885 The default value is set by the
886 .I /proc/sys/net/core/wmem_default
887 file and the maximum allowed value is set by the
888 .I /proc/sys/net/core/wmem_max
889 file.
890 The minimum (doubled) value for this option is 2048.
891 .TP
892 .BR SO_SNDBUFFORCE " (since Linux 2.6.14)"
893 Using this socket option, a privileged
894 .RB ( CAP_NET_ADMIN )
895 process can perform the same task as
896 .BR SO_SNDBUF ,
897 but the
898 .I wmem_max
899 limit can be overridden.
900 .TP
901 .B SO_TIMESTAMP
902 Enable or disable the receiving of the
903 .B SO_TIMESTAMP
904 control message.
905 The timestamp control message is sent with level
906 .B SOL_SOCKET
907 and the
908 .I cmsg_data
909 field is a
910 .I "struct timeval"
911 indicating the
912 reception time of the last packet passed to the user in this call.
913 See
914 .BR cmsg (3)
915 for details on control messages.
916 .TP
917 .B SO_TYPE
918 Gets the socket type as an integer (e.g.,
919 .BR SOCK_STREAM ).
920 This socket option is read-only.
921 .TP
922 .BR SO_BUSY_POLL " (since Linux 3.11)"
923 Sets the approximate time in microseconds to busy poll on a blocking receive
924 when there is no data.
925 Increasing this value requires
926 .BR CAP_NET_ADMIN .
927 The default for this option is controlled by the
928 .I /proc/sys/net/core/busy_read
929 file.
930 .IP
931 The value in the
932 .I /proc/sys/net/core/busy_poll
933 file determines how long
934 .BR select (2)
935 and
936 .BR poll (2)
937 will busy poll when they operate on sockets with
938 .BR SO_BUSY_POLL
939 set and no events to report are found.
940 .IP
941 In both cases,
942 busy polling will only be done when the socket last received data
943 from a network device that supports this option.
944 .IP
945 While busy polling may improve latency of some applications,
946 care must be taken when using it since this will increase
947 both CPU utilization and power usage.
948 .SS Signals
949 When writing onto a connection-oriented socket that has been shut down
950 (by the local or the remote end)
951 .B SIGPIPE
952 is sent to the writing process and
953 .B EPIPE
954 is returned.
955 The signal is not sent when the write call
956 specified the
957 .B MSG_NOSIGNAL
958 flag.
959 .PP
960 When requested with the
961 .B FIOSETOWN
962 .BR fcntl (2)
963 or
964 .B SIOCSPGRP
965 .BR ioctl (2),
966 .B SIGIO
967 is sent when an I/O event occurs.
968 It is possible to use
969 .BR poll (2)
970 or
971 .BR select (2)
972 in the signal handler to find out which socket the event occurred on.
973 An alternative (in Linux 2.2) is to set a real-time signal using the
974 .B F_SETSIG
975 .BR fcntl (2);
976 the handler of the real time signal will be called with
977 the file descriptor in the
978 .I si_fd
979 field of its
980 .IR siginfo_t .
981 See
982 .BR fcntl (2)
983 for more information.
984 .PP
985 Under some circumstances (e.g., multiple processes accessing a
986 single socket), the condition that caused the
987 .B SIGIO
988 may have already disappeared when the process reacts to the signal.
989 If this happens, the process should wait again because Linux
990 will resend the signal later.
991 .\" .SS Ancillary messages
992 .SS /proc interfaces
993 The core socket networking parameters can be accessed
994 via files in the directory
995 .IR /proc/sys/net/core/ .
996 .TP
997 .I rmem_default
998 contains the default setting in bytes of the socket receive buffer.
999 .TP
1000 .I rmem_max
1001 contains the maximum socket receive buffer size in bytes which a user may
1002 set by using the
1003 .B SO_RCVBUF
1004 socket option.
1005 .TP
1006 .I wmem_default
1007 contains the default setting in bytes of the socket send buffer.
1008 .TP
1009 .I wmem_max
1010 contains the maximum socket send buffer size in bytes which a user may
1011 set by using the
1012 .B SO_SNDBUF
1013 socket option.
1014 .TP
1015 .IR message_cost " and " message_burst
1016 configure the token bucket filter used to load limit warning messages
1017 caused by external network events.
1018 .TP
1019 .I netdev_max_backlog
1020 Maximum number of packets in the global input queue.
1021 .TP
1022 .I optmem_max
1023 Maximum length of ancillary data and user control data like the iovecs
1024 per socket.
1025 .\" netdev_fastroute is not documented because it is experimental
1026 .SS Ioctls
1027 These operations can be accessed using
1028 .BR ioctl (2):
1029 .PP
1030 .in +4n
1031 .EX
1032 .IB error " = ioctl(" ip_socket ", " ioctl_type ", " &value_result ");"
1033 .EE
1034 .in
1035 .TP
1036 .B SIOCGSTAMP
1037 Return a
1038 .I struct timeval
1039 with the receive timestamp of the last packet passed to the user.
1040 This is useful for accurate round trip time measurements.
1041 See
1042 .BR setitimer (2)
1043 for a description of
1044 .IR "struct timeval" .
1045 .\"
1046 This ioctl should be used only if the socket option
1047 .B SO_TIMESTAMP
1048 is not set on the socket.
1049 Otherwise, it returns the timestamp of the
1050 last packet that was received while
1051 .B SO_TIMESTAMP
1052 was not set, or it fails if no such packet has been received,
1053 (i.e.,
1054 .BR ioctl (2)
1055 returns \-1 with
1056 .I errno
1057 set to
1058 .BR ENOENT ).
1059 .TP
1060 .B SIOCSPGRP
1061 Set the process or process group that is to receive
1062 .B SIGIO
1063 or
1064 .B SIGURG
1065 signals when I/O becomes possible or urgent data is available.
1066 The argument is a pointer to a
1067 .IR pid_t .
1068 For further details, see the description of
1069 .BR F_SETOWN
1070 in
1071 .BR fcntl (2).
1072 .TP
1073 .B FIOASYNC
1074 Change the
1075 .B O_ASYNC
1076 flag to enable or disable asynchronous I/O mode of the socket.
1077 Asynchronous I/O mode means that the
1078 .B SIGIO
1079 signal or the signal set with
1080 .B F_SETSIG
1081 is raised when a new I/O event occurs.
1082 .IP
1083 Argument is an integer boolean flag.
1084 (This operation is synonymous with the use of
1085 .BR fcntl (2)
1086 to set the
1087 .B O_ASYNC
1088 flag.)
1089 .\"
1090 .TP
1091 .B SIOCGPGRP
1092 Get the current process or process group that receives
1093 .B SIGIO
1094 or
1095 .B SIGURG
1096 signals,
1097 or 0
1098 when none is set.
1099 .PP
1100 Valid
1101 .BR fcntl (2)
1102 operations:
1103 .TP
1104 .B FIOGETOWN
1105 The same as the
1106 .B SIOCGPGRP
1107 .BR ioctl (2).
1108 .TP
1109 .B FIOSETOWN
1110 The same as the
1111 .B SIOCSPGRP
1112 .BR ioctl (2).
1113 .SH VERSIONS
1114 .B SO_BINDTODEVICE
1115 was introduced in Linux 2.0.30.
1116 .B SO_PASSCRED
1117 is new in Linux 2.2.
1118 The
1119 .I /proc
1120 interfaces were introduced in Linux 2.2.
1121 .B SO_RCVTIMEO
1122 and
1123 .B SO_SNDTIMEO
1124 are supported since Linux 2.3.41.
1125 Earlier, timeouts were fixed to
1126 a protocol-specific setting, and could not be read or written.
1127 .SH NOTES
1128 Linux assumes that half of the send/receive buffer is used for internal
1129 kernel structures; thus the values in the corresponding
1130 .I /proc
1131 files are twice what can be observed on the wire.
1132 .PP
1133 Linux will allow port reuse only with the
1134 .B SO_REUSEADDR
1135 option
1136 when this option was set both in the previous program that performed a
1137 .BR bind (2)
1138 to the port and in the program that wants to reuse the port.
1139 This differs from some implementations (e.g., FreeBSD)
1140 where only the later program needs to set the
1141 .B SO_REUSEADDR
1142 option.
1143 Typically this difference is invisible, since, for example, a server
1144 program is designed to always set this option.
1145 .\" .SH AUTHORS
1146 .\" This man page was written by Andi Kleen.
1147 .SH SEE ALSO
1148 .BR wireshark (1),
1149 .BR bpf (2),
1150 .BR connect (2),
1151 .BR getsockopt (2),
1152 .BR setsockopt (2),
1153 .BR socket (2),
1154 .BR pcap (3),
1155 .BR address_families (7),
1156 .BR capabilities (7),
1157 .BR ddp (7),
1158 .BR ip (7),
1159 .BR packet (7),
1160 .BR tcp (7),
1161 .BR udp (7),
1162 .BR unix (7),
1163 .BR tcpdump (8)