]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/ip.7
user_namespaces.7: Minor rewordings of recently added text
[thirdparty/man-pages.git] / man7 / ip.7
1 '\" t
2 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3 .\"
4 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
5 .\" Permission is granted to distribute possibly modified copies
6 .\" of this page provided the header is included verbatim,
7 .\" and in case of nontrivial modification author and date
8 .\" of the modification is added to the header.
9 .\" %%%LICENSE_END
10 .\"
11 .\" $Id: ip.7,v 1.19 2000/12/20 18:10:31 ak Exp $
12 .\"
13 .\" FIXME The following socket options are yet to be documented
14 .\"
15 .\" IP_XFRM_POLICY (2.5.48)
16 .\" Needs CAP_NET_ADMIN
17 .\"
18 .\" IP_IPSEC_POLICY (2.5.47)
19 .\" Needs CAP_NET_ADMIN
20 .\"
21 .\" IP_PASSSEC (2.6.17)
22 .\" Boolean
23 .\" commit 2c7946a7bf45ae86736ab3b43d0085e43947945c
24 .\" Author: Catherine Zhang <cxzhang@watson.ibm.com>
25 .\"
26 .\" IP_MINTTL (2.6.34)
27 .\" commit d218d11133d888f9745802146a50255a4781d37a
28 .\" Author: Stephen Hemminger <shemminger@vyatta.com>
29 .\"
30 .\" MCAST_JOIN_GROUP (2.4.22 / 2.6)
31 .\"
32 .\" MCAST_BLOCK_SOURCE (2.4.22 / 2.6)
33 .\"
34 .\" MCAST_UNBLOCK_SOURCE (2.4.22 / 2.6)
35 .\"
36 .\" MCAST_LEAVE_GROUP (2.4.22 / 2.6)
37 .\"
38 .\" MCAST_JOIN_SOURCE_GROUP (2.4.22 / 2.6)
39 .\"
40 .\" MCAST_LEAVE_SOURCE_GROUP (2.4.22 / 2.6)
41 .\"
42 .\" MCAST_MSFILTER (2.4.22 / 2.6)
43 .\"
44 .\" IP_UNICAST_IF (3.4)
45 .\" commit 76e21053b5bf33a07c76f99d27a74238310e3c71
46 .\" Author: Erich E. Hoover <ehoover@mines.edu>
47 .\"
48 .TH IP 7 2019-03-06 "Linux" "Linux Programmer's Manual"
49 .SH NAME
50 ip \- Linux IPv4 protocol implementation
51 .SH SYNOPSIS
52 .B #include <sys/socket.h>
53 .br
54 .\" .B #include <net/netinet.h> -- does not exist anymore
55 .\" .B #include <linux/errqueue.h> -- never include <linux/foo.h>
56 .B #include <netinet/in.h>
57 .br
58 .B #include <netinet/ip.h> \fR/* superset of previous */
59 .PP
60 .IB tcp_socket " = socket(AF_INET, SOCK_STREAM, 0);"
61 .br
62 .IB udp_socket " = socket(AF_INET, SOCK_DGRAM, 0);"
63 .br
64 .IB raw_socket " = socket(AF_INET, SOCK_RAW, " protocol ");"
65 .SH DESCRIPTION
66 Linux implements the Internet Protocol, version 4,
67 described in RFC\ 791 and RFC\ 1122.
68 .B ip
69 contains a level 2 multicasting implementation conforming to RFC\ 1112.
70 It also contains an IP router including a packet filter.
71 .PP
72 The programming interface is BSD-sockets compatible.
73 For more information on sockets, see
74 .BR socket (7).
75 .PP
76 An IP socket is created using
77 .BR socket (2):
78 .PP
79 socket(AF_INET, socket_type, protocol);
80 .PP
81 Valid socket types are
82 .B SOCK_STREAM
83 to open a
84 .BR tcp (7)
85 socket,
86 .B SOCK_DGRAM
87 to open a
88 .BR udp (7)
89 socket, or
90 .B SOCK_RAW
91 to open a
92 .BR raw (7)
93 socket to access the IP protocol directly.
94 .I protocol
95 is the IP protocol in the IP header to be received or sent.
96 The only valid values for
97 .I protocol
98 are 0 and
99 .B IPPROTO_TCP
100 for TCP sockets, and 0 and
101 .B IPPROTO_UDP
102 for UDP sockets.
103 For
104 .B SOCK_RAW
105 you may specify a valid IANA IP protocol defined in
106 RFC\ 1700 assigned numbers.
107 .PP
108 When a process wants to receive new incoming packets or connections, it
109 should bind a socket to a local interface address using
110 .BR bind (2).
111 In this case, only one IP socket may be bound to any given local
112 (address, port) pair.
113 When
114 .B INADDR_ANY
115 is specified in the bind call, the socket will be bound to
116 .I all
117 local interfaces.
118 When
119 .BR listen (2)
120 is called on an unbound socket, the socket is automatically bound
121 to a random free port with the local address set to
122 .BR INADDR_ANY .
123 When
124 .BR connect (2)
125 is called on an unbound socket, the socket is automatically bound
126 to a random free port or to a usable shared port with the local address
127 set to
128 .BR INADDR_ANY .
129 .PP
130 A TCP local socket address that has been bound is unavailable for
131 some time after closing, unless the
132 .B SO_REUSEADDR
133 flag has been set.
134 Care should be taken when using this flag as it makes TCP less reliable.
135 .SS Address format
136 An IP socket address is defined as a combination of an IP interface
137 address and a 16-bit port number.
138 The basic IP protocol does not supply port numbers, they
139 are implemented by higher level protocols like
140 .BR udp (7)
141 and
142 .BR tcp (7).
143 On raw sockets
144 .I sin_port
145 is set to the IP protocol.
146 .PP
147 .in +4n
148 .EX
149 struct sockaddr_in {
150 sa_family_t sin_family; /* address family: AF_INET */
151 in_port_t sin_port; /* port in network byte order */
152 struct in_addr sin_addr; /* internet address */
153 };
154
155 /* Internet address. */
156 struct in_addr {
157 uint32_t s_addr; /* address in network byte order */
158 };
159 .EE
160 .in
161 .PP
162 .I sin_family
163 is always set to
164 .BR AF_INET .
165 This is required; in Linux 2.2 most networking functions return
166 .B EINVAL
167 when this setting is missing.
168 .I sin_port
169 contains the port in network byte order.
170 The port numbers below 1024 are called
171 .IR "privileged ports"
172 (or sometimes:
173 .IR "reserved ports" ).
174 Only a privileged process
175 (on Linux: a process that has the
176 .B CAP_NET_BIND_SERVICE
177 capability in the user namespace governing its network namespace) may
178 .BR bind (2)
179 to these sockets.
180 Note that the raw IPv4 protocol as such has no concept of a
181 port, they are implemented only by higher protocols like
182 .BR tcp (7)
183 and
184 .BR udp (7).
185 .PP
186 .I sin_addr
187 is the IP host address.
188 The
189 .I s_addr
190 member of
191 .I struct in_addr
192 contains the host interface address in network byte order.
193 .I in_addr
194 should be assigned one of the
195 .BR INADDR_*
196 values
197 (e.g.,
198 .BR INADDR_LOOPBACK )
199 using
200 .BR htonl (3)
201 or set using the
202 .BR inet_aton (3),
203 .BR inet_addr (3),
204 .BR inet_makeaddr (3)
205 library functions or directly with the name resolver (see
206 .BR gethostbyname (3)).
207 .PP
208 IPv4 addresses are divided into unicast, broadcast,
209 and multicast addresses.
210 Unicast addresses specify a single interface of a host,
211 broadcast addresses specify all hosts on a network, and multicast
212 addresses address all hosts in a multicast group.
213 Datagrams to broadcast addresses can be sent or received only when the
214 .B SO_BROADCAST
215 socket flag is set.
216 In the current implementation, connection-oriented sockets are allowed
217 to use only unicast addresses.
218 .\" Leave a loophole for XTP @)
219 .PP
220 Note that the address and the port are always stored in
221 network byte order.
222 In particular, this means that you need to call
223 .BR htons (3)
224 on the number that is assigned to a port.
225 All address/port manipulation
226 functions in the standard library work in network byte order.
227 .PP
228 There are several special addresses:
229 .B INADDR_LOOPBACK
230 (127.0.0.1)
231 always refers to the local host via the loopback device;
232 .B INADDR_ANY
233 (0.0.0.0)
234 means any address for binding;
235 .B INADDR_BROADCAST
236 (255.255.255.255)
237 means any host and has the same effect on bind as
238 .B INADDR_ANY
239 for historical reasons.
240 .SS Socket options
241 IP supports some protocol-specific socket options that can be set with
242 .BR setsockopt (2)
243 and read with
244 .BR getsockopt (2).
245 The socket option level for IP is
246 .BR IPPROTO_IP .
247 .\" or SOL_IP on Linux
248 A boolean integer flag is zero when it is false, otherwise true.
249 .PP
250 When an invalid socket option is specified,
251 .BR getsockopt (2)
252 and
253 .BR setsockopt (2)
254 fail with the error
255 .BR ENOPROTOOPT .
256 .TP
257 .BR IP_ADD_MEMBERSHIP " (since Linux 1.2)"
258 Join a multicast group.
259 Argument is an
260 .I ip_mreqn
261 structure.
262 .PP
263 .in +4n
264 .EX
265 struct ip_mreqn {
266 struct in_addr imr_multiaddr; /* IP multicast group
267 address */
268 struct in_addr imr_address; /* IP address of local
269 interface */
270 int imr_ifindex; /* interface index */
271 };
272 .EE
273 .in
274 .PP
275 .I imr_multiaddr
276 contains the address of the multicast group the application
277 wants to join or leave.
278 It must be a valid multicast address
279 .\" (i.e., within the 224.0.0.0-239.255.255.255 range)
280 (or
281 .BR setsockopt (2)
282 fails with the error
283 .BR EINVAL ).
284 .I imr_address
285 is the address of the local interface with which the system
286 should join the multicast group; if it is equal to
287 .BR INADDR_ANY ,
288 an appropriate interface is chosen by the system.
289 .I imr_ifindex
290 is the interface index of the interface that should join/leave the
291 .I imr_multiaddr
292 group, or 0 to indicate any interface.
293 .IP
294 The
295 .I ip_mreqn
296 structure is available only since Linux 2.2.
297 For compatibility, the old
298 .I ip_mreq
299 structure (present since Linux 1.2) is still supported;
300 it differs from
301 .I ip_mreqn
302 only by not including the
303 .I imr_ifindex
304 field.
305 (The kernel determines which structure is being passed based
306 on the size passed in
307 .IR optlen .)
308 .IP
309 .B IP_ADD_MEMBERSHIP
310 is valid only for
311 .BR setsockopt (2).
312 .\"
313 .TP
314 .BR IP_ADD_SOURCE_MEMBERSHIP " (since Linux 2.4.22 / 2.5.68)"
315 Join a multicast group and allow receiving data only
316 from a specified source.
317 Argument is an
318 .I ip_mreq_source
319 structure.
320 .PP
321 .in +4n
322 .EX
323 struct ip_mreq_source {
324 struct in_addr imr_multiaddr; /* IP multicast group
325 address */
326 struct in_addr imr_interface; /* IP address of local
327 interface */
328 struct in_addr imr_sourceaddr; /* IP address of
329 multicast source */
330 };
331 .EE
332 .in
333 .PP
334 The
335 .I ip_mreq_source
336 structure is similar to
337 .I ip_mreqn
338 described under
339 .BR IP_ADD_MEMBERSIP .
340 The
341 .I imr_multiaddr
342 field contains the address of the multicast group the application
343 wants to join or leave.
344 The
345 .I imr_interface
346 field is the address of the local interface with which
347 the system should join the multicast group.
348 Finally, the
349 .I imr_sourceaddr
350 field contains the address of the source the
351 application wants to receive data from.
352 .IP
353 This option can be used multiple times to allow
354 receiving data from more than one source.
355 .TP
356 .BR IP_BIND_ADDRESS_NO_PORT " (since Linux 4.2)"
357 .\" commit 90c337da1524863838658078ec34241f45d8394d
358 Inform the kernel to not reserve an ephemeral port when using
359 .BR bind (2)
360 with a port number of 0.
361 The port will later be automatically chosen at
362 .BR connect (2)
363 time,
364 in a way that allows sharing a source port as long as the 4-tuple is unique.
365 .TP
366 .BR IP_BLOCK_SOURCE " (since Linux 2.4.22 / 2.5.68)"
367 Stop receiving multicast data from a specific source in a given group.
368 This is valid only after the application has subscribed
369 to the multicast group using either
370 .BR IP_ADD_MEMBERSHIP
371 or
372 .BR IP_ADD_SOURCE_MEMBERSHIP .
373 .IP
374 Argument is an
375 .I ip_mreq_source
376 structure as described under
377 .BR IP_ADD_SOURCE_MEMBERSHIP .
378 .TP
379 .BR IP_DROP_MEMBERSHIP " (since Linux 1.2)"
380 Leave a multicast group.
381 Argument is an
382 .I ip_mreqn
383 or
384 .I ip_mreq
385 structure similar to
386 .BR IP_ADD_MEMBERSHIP .
387 .TP
388 .BR IP_DROP_SOURCE_MEMBERSHIP " (since Linux 2.4.22 / 2.5.68)"
389 Leave a source-specific group\(emthat is, stop receiving data from
390 a given multicast group that come from a given source.
391 If the application has subscribed to multiple sources within
392 the same group, data from the remaining sources will still be delivered.
393 To stop receiving data from all sources at once, use
394 .BR IP_DROP_MEMBERSHIP .
395 .IP
396 Argument is an
397 .I ip_mreq_source
398 structure as described under
399 .BR IP_ADD_SOURCE_MEMBERSHIP .
400 .TP
401 .BR IP_FREEBIND " (since Linux 2.4)"
402 .\" Precisely: 2.4.0-test10
403 If enabled, this boolean option allows binding to an IP address
404 that is nonlocal or does not (yet) exist.
405 This permits listening on a socket,
406 without requiring the underlying network interface or the
407 specified dynamic IP address to be up at the time that
408 the application is trying to bind to it.
409 This option is the per-socket equivalent of the
410 .IR ip_nonlocal_bind
411 .I /proc
412 interface described below.
413 .TP
414 .BR IP_HDRINCL " (since Linux 2.0)"
415 If enabled,
416 the user supplies an IP header in front of the user data.
417 Valid only for
418 .B SOCK_RAW
419 sockets; see
420 .BR raw (7)
421 for more information.
422 When this flag is enabled, the values set by
423 .BR IP_OPTIONS ,
424 .BR IP_TTL ,
425 and
426 .B IP_TOS
427 are ignored.
428 .TP
429 .BR IP_MSFILTER " (since Linux 2.4.22 / 2.5.68)"
430 This option provides access to the advanced full-state filtering API.
431 Argument is an
432 .I ip_msfilter
433 structure.
434 .PP
435 .in +4n
436 .EX
437 struct ip_msfilter {
438 struct in_addr imsf_multiaddr; /* IP multicast group
439 address */
440 struct in_addr imsf_interface; /* IP address of local
441 interface */
442 uint32_t imsf_fmode; /* Filter-mode */
443
444 uint32_t imsf_numsrc; /* Number of sources in
445 the following array */
446 struct in_addr imsf_slist[1]; /* Array of source
447 addresses */
448 };
449 .EE
450 .in
451 .PP
452 There are two macros,
453 .BR MCAST_INCLUDE
454 and
455 .BR MCAST_EXCLUDE ,
456 which can be used to specify the filtering mode.
457 Additionally, the
458 .BR IP_MSFILTER_SIZE (n)
459 macro exists to determine how much memory is needed to store
460 .I ip_msfilter
461 structure with
462 .I n
463 sources in the source list.
464 .IP
465 For the full description of multicast source filtering
466 refer to RFC 3376.
467 .TP
468 .BR IP_MTU " (since Linux 2.2)"
469 .\" Precisely: 2.1.124
470 Retrieve the current known path MTU of the current socket.
471 Returns an integer.
472 .IP
473 .B IP_MTU
474 is valid only for
475 .BR getsockopt (2)
476 and can be employed only when the socket has been connected.
477 .TP
478 .BR IP_MTU_DISCOVER " (since Linux 2.2)"
479 .\" Precisely: 2.1.124
480 Set or receive the Path MTU Discovery setting for a socket.
481 When enabled, Linux will perform Path MTU Discovery
482 as defined in RFC\ 1191 on
483 .B SOCK_STREAM
484 sockets.
485 For
486 .RB non- SOCK_STREAM
487 sockets,
488 .B IP_PMTUDISC_DO
489 forces the don't-fragment flag to be set on all outgoing packets.
490 It is the user's responsibility to packetize the data
491 in MTU-sized chunks and to do the retransmits if necessary.
492 The kernel will reject (with
493 .BR EMSGSIZE )
494 datagrams that are bigger than the known path MTU.
495 .B IP_PMTUDISC_WANT
496 will fragment a datagram if needed according to the path MTU,
497 or will set the don't-fragment flag otherwise.
498 .IP
499 The system-wide default can be toggled between
500 .B IP_PMTUDISC_WANT
501 and
502 .B IP_PMTUDISC_DONT
503 by writing (respectively, zero and nonzero values) to the
504 .I /proc/sys/net/ipv4/ip_no_pmtu_disc
505 file.
506 .TS
507 tab(:);
508 c l
509 l l.
510 Path MTU discovery value:Meaning
511 IP_PMTUDISC_WANT:Use per-route settings.
512 IP_PMTUDISC_DONT:Never do Path MTU Discovery.
513 IP_PMTUDISC_DO:Always do Path MTU Discovery.
514 IP_PMTUDISC_PROBE:Set DF but ignore Path MTU.
515 .TE
516 .sp 1
517 When PMTU discovery is enabled, the kernel automatically keeps track of
518 the path MTU per destination host.
519 When it is connected to a specific peer with
520 .BR connect (2),
521 the currently known path MTU can be retrieved conveniently using the
522 .B IP_MTU
523 socket option (e.g., after an
524 .B EMSGSIZE
525 error occurred).
526 The path MTU may change over time.
527 For connectionless sockets with many destinations,
528 the new MTU for a given destination can also be accessed using the
529 error queue (see
530 .BR IP_RECVERR ).
531 A new error will be queued for every incoming MTU update.
532 .IP
533 While MTU discovery is in progress, initial packets from datagram sockets
534 may be dropped.
535 Applications using UDP should be aware of this and not
536 take it into account for their packet retransmit strategy.
537 .IP
538 To bootstrap the path MTU discovery process on unconnected sockets, it
539 is possible to start with a big datagram size
540 (headers up to 64 kilobytes long) and let it shrink by updates of the path MTU.
541 .IP
542 To get an initial estimate of the
543 path MTU, connect a datagram socket to the destination address using
544 .BR connect (2)
545 and retrieve the MTU by calling
546 .BR getsockopt (2)
547 with the
548 .B IP_MTU
549 option.
550 .IP
551 It is possible to implement RFC 4821 MTU probing with
552 .B SOCK_DGRAM
553 or
554 .B SOCK_RAW
555 sockets by setting a value of
556 .BR IP_PMTUDISC_PROBE
557 (available since Linux 2.6.22).
558 This is also particularly useful for diagnostic tools such as
559 .BR tracepath (8)
560 that wish to deliberately send probe packets larger than
561 the observed Path MTU.
562 .TP
563 .BR IP_MULTICAST_ALL " (since Linux 2.6.31)"
564 This option can be used to modify the delivery policy of multicast messages
565 to sockets bound to the wildcard
566 .B INADDR_ANY
567 address.
568 The argument is a boolean integer (defaults to 1).
569 If set to 1,
570 the socket will receive messages from all the groups that have been joined
571 globally on the whole system.
572 Otherwise, it will deliver messages only from
573 the groups that have been explicitly joined (for example via the
574 .B IP_ADD_MEMBERSHIP
575 option) on this particular socket.
576 .TP
577 .BR IP_MULTICAST_IF " (since Linux 1.2)"
578 Set the local device for a multicast socket.
579 The argument for
580 .BR setsockopt (2)
581 is an
582 .I ip_mreqn
583 or
584 .\" net: IP_MULTICAST_IF setsockopt now recognizes struct mreq
585 .\" Commit: 3a084ddb4bf299a6e898a9a07c89f3917f0713f7
586 (since Linux 3.5)
587 .I ip_mreq
588 structure similar to
589 .BR IP_ADD_MEMBERSHIP ,
590 or an
591 .I in_addr
592 structure.
593 (The kernel determines which structure is being passed based
594 on the size passed in
595 .IR optlen .)
596 For
597 .BR getsockopt (2),
598 the argument is an
599 .I in_addr
600 structure.
601 .TP
602 .BR IP_MULTICAST_LOOP " (since Linux 1.2)"
603 Set or read a boolean integer argument that determines whether
604 sent multicast packets should be looped back to the local sockets.
605 .TP
606 .BR IP_MULTICAST_TTL " (since Linux 1.2)"
607 Set or read the time-to-live value of outgoing multicast packets for this
608 socket.
609 It is very important for multicast packets to set the smallest TTL possible.
610 The default is 1 which means that multicast packets don't leave the local
611 network unless the user program explicitly requests it.
612 Argument is an integer.
613 .TP
614 .BR IP_NODEFRAG " (since Linux 2.6.36)"
615 If enabled (argument is nonzero),
616 the reassembly of outgoing packets is disabled in the netfilter layer.
617 The argument is an integer.
618 .IP
619 This option is valid only for
620 .B SOCK_RAW
621 sockets.
622 .TP
623 .BR IP_OPTIONS " (since Linux 2.0)"
624 .\" Precisely: 1.3.30
625 Set or get the IP options to be sent with every packet from this socket.
626 The arguments are a pointer to a memory buffer containing the options
627 and the option length.
628 The
629 .BR setsockopt (2)
630 call sets the IP options associated with a socket.
631 The maximum option size for IPv4 is 40 bytes.
632 See RFC\ 791 for the allowed options.
633 When the initial connection request packet for a
634 .B SOCK_STREAM
635 socket contains IP options, the IP options will be set automatically
636 to the options from the initial packet with routing headers reversed.
637 Incoming packets are not allowed to change options after the connection
638 is established.
639 The processing of all incoming source routing options
640 is disabled by default and can be enabled by using the
641 .I accept_source_route
642 .I /proc
643 interface.
644 Other options like timestamps are still handled.
645 For datagram sockets, IP options can be only set by the local user.
646 Calling
647 .BR getsockopt (2)
648 with
649 .B IP_OPTIONS
650 puts the current IP options used for sending into the supplied buffer.
651 .TP
652 .BR IP_PKTINFO " (since Linux 2.2)"
653 .\" Precisely: 2.1.68
654 Pass an
655 .B IP_PKTINFO
656 ancillary message that contains a
657 .I pktinfo
658 structure that supplies some information about the incoming packet.
659 This only works for datagram oriented sockets.
660 The argument is a flag that tells the socket whether the
661 .B IP_PKTINFO
662 message should be passed or not.
663 The message itself can only be sent/retrieved
664 as control message with a packet using
665 .BR recvmsg (2)
666 or
667 .BR sendmsg (2).
668 .IP
669 .in +4n
670 .EX
671 struct in_pktinfo {
672 unsigned int ipi_ifindex; /* Interface index */
673 struct in_addr ipi_spec_dst; /* Local address */
674 struct in_addr ipi_addr; /* Header Destination
675 address */
676 };
677 .EE
678 .in
679 .IP
680 .I ipi_ifindex
681 is the unique index of the interface the packet was received on.
682 .I ipi_spec_dst
683 is the local address of the packet and
684 .I ipi_addr
685 is the destination address in the packet header.
686 If
687 .B IP_PKTINFO
688 is passed to
689 .BR sendmsg (2)
690 and
691 .\" This field is grossly misnamed
692 .I ipi_spec_dst
693 is not zero, then it is used as the local source address for the routing
694 table lookup and for setting up IP source route options.
695 When
696 .I ipi_ifindex
697 is not zero, the primary local address of the interface specified by the
698 index overwrites
699 .I ipi_spec_dst
700 for the routing table lookup.
701 .TP
702 .BR IP_RECVERR " (since Linux 2.2)"
703 .\" Precisely: 2.1.15
704 Enable extended reliable error message passing.
705 When enabled on a datagram socket, all
706 generated errors will be queued in a per-socket error queue.
707 When the user receives an error from a socket operation,
708 the errors can be received by calling
709 .BR recvmsg (2)
710 with the
711 .B MSG_ERRQUEUE
712 flag set.
713 The
714 .I sock_extended_err
715 structure describing the error will be passed in an ancillary message with
716 the type
717 .B IP_RECVERR
718 and the level
719 .BR IPPROTO_IP .
720 .\" or SOL_IP on Linux
721 This is useful for reliable error handling on unconnected sockets.
722 The received data portion of the error queue contains the error packet.
723 .IP
724 The
725 .B IP_RECVERR
726 control message contains a
727 .I sock_extended_err
728 structure:
729 .IP
730 .in +4n
731 .EX
732 #define SO_EE_ORIGIN_NONE 0
733 #define SO_EE_ORIGIN_LOCAL 1
734 #define SO_EE_ORIGIN_ICMP 2
735 #define SO_EE_ORIGIN_ICMP6 3
736
737 struct sock_extended_err {
738 uint32_t ee_errno; /* error number */
739 uint8_t ee_origin; /* where the error originated */
740 uint8_t ee_type; /* type */
741 uint8_t ee_code; /* code */
742 uint8_t ee_pad;
743 uint32_t ee_info; /* additional information */
744 uint32_t ee_data; /* other data */
745 /* More data may follow */
746 };
747
748 struct sockaddr *SO_EE_OFFENDER(struct sock_extended_err *);
749 .EE
750 .in
751 .IP
752 .I ee_errno
753 contains the
754 .I errno
755 number of the queued error.
756 .I ee_origin
757 is the origin code of where the error originated.
758 The other fields are protocol-specific.
759 The macro
760 .B SO_EE_OFFENDER
761 returns a pointer to the address of the network object
762 where the error originated from given a pointer to the ancillary message.
763 If this address is not known, the
764 .I sa_family
765 member of the
766 .I sockaddr
767 contains
768 .B AF_UNSPEC
769 and the other fields of the
770 .I sockaddr
771 are undefined.
772 .IP
773 IP uses the
774 .I sock_extended_err
775 structure as follows:
776 .I ee_origin
777 is set to
778 .B SO_EE_ORIGIN_ICMP
779 for errors received as an ICMP packet, or
780 .B SO_EE_ORIGIN_LOCAL
781 for locally generated errors.
782 Unknown values should be ignored.
783 .I ee_type
784 and
785 .I ee_code
786 are set from the type and code fields of the ICMP header.
787 .I ee_info
788 contains the discovered MTU for
789 .B EMSGSIZE
790 errors.
791 The message also contains the
792 .I sockaddr_in of the node
793 caused the error, which can be accessed with the
794 .B SO_EE_OFFENDER
795 macro.
796 The
797 .I sin_family
798 field of the
799 .B SO_EE_OFFENDER
800 address is
801 .B AF_UNSPEC
802 when the source was unknown.
803 When the error originated from the network, all IP options
804 .RB ( IP_OPTIONS ", " IP_TTL ", "
805 etc.) enabled on the socket and contained in the
806 error packet are passed as control messages.
807 The payload of the packet causing the error is returned as normal payload.
808 .\" FIXME . Is it a good idea to document that? It is a dubious feature.
809 .\" On
810 .\" .B SOCK_STREAM
811 .\" sockets,
812 .\" .B IP_RECVERR
813 .\" has slightly different semantics. Instead of
814 .\" saving the errors for the next timeout, it passes all incoming
815 .\" errors immediately to the user.
816 .\" This might be useful for very short-lived TCP connections which
817 .\" need fast error handling. Use this option with care:
818 .\" it makes TCP unreliable
819 .\" by not allowing it to recover properly from routing
820 .\" shifts and other normal
821 .\" conditions and breaks the protocol specification.
822 Note that TCP has no error queue;
823 .B MSG_ERRQUEUE
824 is not permitted on
825 .B SOCK_STREAM
826 sockets.
827 .B IP_RECVERR
828 is valid for TCP, but all errors are returned by socket function return or
829 .B SO_ERROR
830 only.
831 .IP
832 For raw sockets,
833 .B IP_RECVERR
834 enables passing of all received ICMP errors to the
835 application, otherwise errors are only reported on connected sockets
836 .IP
837 It sets or retrieves an integer boolean flag.
838 .B IP_RECVERR
839 defaults to off.
840 .TP
841 .BR IP_RECVOPTS " (since Linux 2.2)"
842 .\" Precisely: 2.1.15
843 Pass all incoming IP options to the user in a
844 .B IP_OPTIONS
845 control message.
846 The routing header and other options are already filled in
847 for the local host.
848 Not supported for
849 .B SOCK_STREAM
850 sockets.
851 .TP
852 .BR IP_RECVORIGDSTADDR " (since Linux 2.6.29)"
853 .\" commit e8b2dfe9b4501ed0047459b2756ba26e5a940a69
854 This boolean option enables the
855 .B IP_ORIGDSTADDR
856 ancillary message in
857 .BR recvmsg (2),
858 in which the kernel returns the original destination address
859 of the datagram being received.
860 The ancillary message contains a
861 .IR "struct sockaddr_in" .
862 .TP
863 .BR IP_RECVTOS " (since Linux 2.2)"
864 .\" Precisely: 2.1.68
865 If enabled, the
866 .B IP_TOS
867 ancillary message is passed with incoming packets.
868 It contains a byte which specifies the Type of Service/Precedence
869 field of the packet header.
870 Expects a boolean integer flag.
871 .TP
872 .BR IP_RECVTTL " (since Linux 2.2)"
873 .\" Precisely: 2.1.68
874 When this flag is set, pass a
875 .B IP_TTL
876 control message with the time-to-live
877 field of the received packet as a 32 bit integer.
878 Not supported for
879 .B SOCK_STREAM
880 sockets.
881 .TP
882 .BR IP_RETOPTS " (since Linux 2.2)"
883 .\" Precisely: 2.1.15
884 Identical to
885 .BR IP_RECVOPTS ,
886 but returns raw unprocessed options with timestamp and route record
887 options not filled in for this hop.
888 .TP
889 .BR IP_ROUTER_ALERT " (since Linux 2.2)"
890 .\" Precisely: 2.1.68
891 Pass all to-be forwarded packets with the
892 IP Router Alert option set to this socket.
893 Valid only for raw sockets.
894 This is useful, for instance, for user-space RSVP daemons.
895 The tapped packets are not forwarded by the kernel; it is
896 the user's responsibility to send them out again.
897 Socket binding is ignored,
898 such packets are only filtered by protocol.
899 Expects an integer flag.
900 .TP
901 .BR IP_TOS " (since Linux 1.0)"
902 Set or receive the Type-Of-Service (TOS) field that is sent
903 with every IP packet originating from this socket.
904 It is used to prioritize packets on the network.
905 TOS is a byte.
906 There are some standard TOS flags defined:
907 .B IPTOS_LOWDELAY
908 to minimize delays for interactive traffic,
909 .B IPTOS_THROUGHPUT
910 to optimize throughput,
911 .B IPTOS_RELIABILITY
912 to optimize for reliability,
913 .B IPTOS_MINCOST
914 should be used for "filler data" where slow transmission doesn't matter.
915 At most one of these TOS values can be specified.
916 Other bits are invalid and shall be cleared.
917 Linux sends
918 .B IPTOS_LOWDELAY
919 datagrams first by default,
920 but the exact behavior depends on the configured queueing discipline.
921 .\" FIXME elaborate on this
922 Some high-priority levels may require superuser privileges (the
923 .B CAP_NET_ADMIN
924 capability).
925 .\" The priority can also be set in a protocol-independent way by the
926 .\" .RB ( SOL_SOCKET ", " SO_PRIORITY )
927 .\" socket option (see
928 .\" .BR socket (7)).
929 .TP
930 .BR IP_TRANSPARENT " (since Linux 2.6.24)"
931 .\" commit f5715aea4564f233767ea1d944b2637a5fd7cd2e
932 .\" This patch introduces the IP_TRANSPARENT socket option: enabling that
933 .\" will make the IPv4 routing omit the non-local source address check on
934 .\" output. Setting IP_TRANSPARENT requires NET_ADMIN capability.
935 .\" http://lwn.net/Articles/252545/
936 Setting this boolean option enables transparent proxying on this socket.
937 This socket option allows
938 the calling application to bind to a nonlocal IP address and operate
939 both as a client and a server with the foreign address as the local endpoint.
940 NOTE: this requires that routing be set up in a way that
941 packets going to the foreign address are routed through the TProxy box
942 (i.e., the system hosting the application that employs the
943 .B IP_TRANSPARENT
944 socket option).
945 Enabling this socket option requires superuser privileges
946 (the
947 .BR CAP_NET_ADMIN
948 capability).
949 .IP
950 TProxy redirection with the iptables TPROXY target also requires that
951 this option be set on the redirected socket.
952 .TP
953 .BR IP_TTL " (since Linux 1.0)"
954 Set or retrieve the current time-to-live field that is used in every packet
955 sent from this socket.
956 .TP
957 .BR IP_UNBLOCK_SOURCE " (since Linux 2.4.22 / 2.5.68)"
958 Unblock previously blocked multicast source.
959 Returns
960 .BR EADDRNOTAVAIL
961 when given source is not being blocked.
962 .IP
963 Argument is an
964 .I ip_mreq_source
965 structure as described under
966 .BR IP_ADD_SOURCE_MEMBERSHIP .
967 .SS /proc interfaces
968 The IP protocol
969 supports a set of
970 .I /proc
971 interfaces to configure some global parameters.
972 The parameters can be accessed by reading or writing files in the directory
973 .IR /proc/sys/net/ipv4/ .
974 .\" FIXME As at 2.6.12, 14 Jun 2005, the following are undocumented:
975 .\" ip_queue_maxlen
976 .\" ip_conntrack_max
977 Interfaces described as
978 .I Boolean
979 take an integer value, with a nonzero value ("true") meaning that
980 the corresponding option is enabled, and a zero value ("false")
981 meaning that the option is disabled.
982 .\"
983 .TP
984 .IR ip_always_defrag " (Boolean; since Linux 2.2.13)"
985 [New with kernel 2.2.13; in earlier kernel versions this feature
986 was controlled at compile time by the
987 .B CONFIG_IP_ALWAYS_DEFRAG
988 option; this option is not present in 2.4.x and later]
989 .IP
990 When this boolean flag is enabled (not equal 0), incoming fragments
991 (parts of IP packets
992 that arose when some host between origin and destination decided
993 that the packets were too large and cut them into pieces) will be
994 reassembled (defragmented) before being processed, even if they are
995 about to be forwarded.
996 .IP
997 Enable only if running either a firewall that is the sole link
998 to your network or a transparent proxy; never ever use it for a
999 normal router or host.
1000 Otherwise, fragmented communication can be disturbed
1001 if the fragments travel over different links.
1002 Defragmentation also has a large memory and CPU time cost.
1003 .IP
1004 This is automagically turned on when masquerading or transparent
1005 proxying are configured.
1006 .\"
1007 .TP
1008 .IR ip_autoconfig " (since Linux 2.2 to 2.6.17)"
1009 .\" Precisely: since 2.1.68
1010 .\" FIXME document ip_autoconfig
1011 Not documented.
1012 .\"
1013 .TP
1014 .IR ip_default_ttl " (integer; default: 64; since Linux 2.2)"
1015 .\" Precisely: 2.1.15
1016 Set the default time-to-live value of outgoing packets.
1017 This can be changed per socket with the
1018 .B IP_TTL
1019 option.
1020 .\"
1021 .TP
1022 .IR ip_dynaddr " (Boolean; default: disabled; since Linux 2.0.31)"
1023 Enable dynamic socket address and masquerading entry rewriting on interface
1024 address change.
1025 This is useful for dialup interface with changing IP addresses.
1026 0 means no rewriting, 1 turns it on and 2 enables verbose mode.
1027 .\"
1028 .TP
1029 .IR ip_forward " (Boolean; default: disabled; since Linux 1.2)"
1030 Enable IP forwarding with a boolean flag.
1031 IP forwarding can be also set on a per-interface basis.
1032 .\"
1033 .TP
1034 .IR ip_local_port_range " (since Linux 2.2)"
1035 .\" Precisely: since 2.1.68
1036 This file contains two integers that define the default local port range
1037 allocated to sockets that are not explicitly bound to a port number\(emthat
1038 is, the range used for
1039 .IR "ephemeral ports" .
1040 An ephemeral port is allocated to a socket in the following circumstances:
1041 .RS
1042 .IP * 3
1043 the port number in a socket address is specified as 0 when calling
1044 .BR bind (2);
1045 .IP *
1046 .BR listen (2)
1047 is called on a stream socket that was not previously bound;
1048 .IP *
1049 .BR connect (2)
1050 was called on a socket that was not previously bound;
1051 .IP *
1052 .BR sendto (2)
1053 is called on a datagram socket that was not previously bound.
1054 .RE
1055 .IP
1056 Allocation of ephemeral ports starts with the first number in
1057 .IR ip_local_port_range
1058 and ends with the second number.
1059 If the range of ephemeral ports is exhausted,
1060 then the relevant system call returns an error (but see BUGS).
1061 .IP
1062 Note that the port range in
1063 .IR ip_local_port_range
1064 should not conflict with the ports used by masquerading
1065 (although the case is handled).
1066 Also, arbitrary choices may cause problems with some firewall packet
1067 filters that make assumptions about the local ports in use.
1068 The first number should be at least greater than 1024,
1069 or better, greater than 4096, to avoid clashes
1070 with well known ports and to minimize firewall problems.
1071 .\"
1072 .TP
1073 .IR ip_no_pmtu_disc " (Boolean; default: disabled; since Linux 2.2)"
1074 .\" Precisely: 2.1.15
1075 If enabled, don't do Path MTU Discovery for TCP sockets by default.
1076 Path MTU discovery may fail if misconfigured firewalls (that drop
1077 all ICMP packets) or misconfigured interfaces (e.g., a point-to-point
1078 link where the both ends don't agree on the MTU) are on the path.
1079 It is better to fix the broken routers on the path than to turn off
1080 Path MTU Discovery globally, because not doing it incurs a high cost
1081 to the network.
1082 .\"
1083 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
1084 .TP
1085 .IR ip_nonlocal_bind " (Boolean; default: disabled; since Linux 2.4)"
1086 .\" Precisely: patch-2.4.0-test10
1087 If set, allows processes to
1088 .BR bind (2)
1089 to nonlocal IP addresses,
1090 which can be quite useful, but may break some applications.
1091 .\"
1092 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
1093 .TP
1094 .IR ip6frag_time " (integer; default: 30)"
1095 Time in seconds to keep an IPv6 fragment in memory.
1096 .\"
1097 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
1098 .TP
1099 .IR ip6frag_secret_interval " (integer; default: 600)"
1100 Regeneration interval (in seconds) of the hash secret (or lifetime
1101 for the hash secret) for IPv6 fragments.
1102 .TP
1103 .IR ipfrag_high_thresh " (integer), " ipfrag_low_thresh " (integer)"
1104 If the amount of queued IP fragments reaches
1105 .IR ipfrag_high_thresh ,
1106 the queue is pruned down to
1107 .IR ipfrag_low_thresh .
1108 Contains an integer with the number of bytes.
1109 .TP
1110 .I neigh/*
1111 See
1112 .BR arp (7).
1113 .\" FIXME Document the conf/*/* interfaces
1114 .\"
1115 .\" FIXME Document the route/* interfaces
1116 .SS Ioctls
1117 All ioctls described in
1118 .BR socket (7)
1119 apply to
1120 .BR ip .
1121 .\" 2006-04-02, mtk
1122 .\" commented out the following because ipchains is obsolete
1123 .\" .PP
1124 .\" The ioctls to configure firewalling are documented in
1125 .\" .BR ipfw (4)
1126 .\" from the
1127 .\" .B ipchains
1128 .\" package.
1129 .PP
1130 Ioctls to configure generic device parameters are described in
1131 .BR netdevice (7).
1132 .\" FIXME Add a discussion of multicasting
1133 .SH ERRORS
1134 .\" FIXME document all errors.
1135 .\" We should really fix the kernels to give more uniform
1136 .\" error returns (ENOMEM vs ENOBUFS, EPERM vs EACCES etc.)
1137 .TP
1138 .B EACCES
1139 The user tried to execute an operation without the necessary permissions.
1140 These include:
1141 sending a packet to a broadcast address without having the
1142 .B SO_BROADCAST
1143 flag set;
1144 sending a packet via a
1145 .I prohibit
1146 route;
1147 modifying firewall settings without superuser privileges (the
1148 .B CAP_NET_ADMIN
1149 capability);
1150 binding to a privileged port without superuser privileges (the
1151 .B CAP_NET_BIND_SERVICE
1152 capability).
1153 .TP
1154 .B EADDRINUSE
1155 Tried to bind to an address already in use.
1156 .TP
1157 .B EADDRNOTAVAIL
1158 A nonexistent interface was requested or the requested source
1159 address was not local.
1160 .TP
1161 .B EAGAIN
1162 Operation on a nonblocking socket would block.
1163 .TP
1164 .B EALREADY
1165 A connection operation on a nonblocking socket is already in progress.
1166 .TP
1167 .B ECONNABORTED
1168 A connection was closed during an
1169 .BR accept (2).
1170 .TP
1171 .B EHOSTUNREACH
1172 No valid routing table entry matches the destination address.
1173 This error can be caused by an ICMP message from a remote router or
1174 for the local routing table.
1175 .TP
1176 .B EINVAL
1177 Invalid argument passed.
1178 For send operations this can be caused by sending to a
1179 .I blackhole
1180 route.
1181 .TP
1182 .B EISCONN
1183 .BR connect (2)
1184 was called on an already connected socket.
1185 .TP
1186 .B EMSGSIZE
1187 Datagram is bigger than an MTU on the path and it cannot be fragmented.
1188 .TP
1189 .BR ENOBUFS ", " ENOMEM
1190 Not enough free memory.
1191 This often means that the memory allocation is limited by the socket
1192 buffer limits, not by the system memory, but this is not 100% consistent.
1193 .TP
1194 .B ENOENT
1195 .B SIOCGSTAMP
1196 was called on a socket where no packet arrived.
1197 .TP
1198 .B ENOPKG
1199 A kernel subsystem was not configured.
1200 .TP
1201 .BR ENOPROTOOPT " and " EOPNOTSUPP
1202 Invalid socket option passed.
1203 .TP
1204 .B ENOTCONN
1205 The operation is defined only on a connected socket, but the socket wasn't
1206 connected.
1207 .TP
1208 .B EPERM
1209 User doesn't have permission to set high priority, change configuration,
1210 or send signals to the requested process or group.
1211 .TP
1212 .B EPIPE
1213 The connection was unexpectedly closed or shut down by the other end.
1214 .TP
1215 .B ESOCKTNOSUPPORT
1216 The socket is not configured or an unknown socket type was requested.
1217 .PP
1218 Other errors may be generated by the overlaying protocols; see
1219 .BR tcp (7),
1220 .BR raw (7),
1221 .BR udp (7),
1222 and
1223 .BR socket (7).
1224 .SH NOTES
1225 .BR IP_FREEBIND ,
1226 .BR IP_MSFILTER ,
1227 .BR IP_MTU ,
1228 .BR IP_MTU_DISCOVER ,
1229 .BR IP_RECVORIGDSTADDR ,
1230 .BR IP_PKTINFO ,
1231 .BR IP_RECVERR ,
1232 .BR IP_ROUTER_ALERT ,
1233 and
1234 .BR IP_TRANSPARENT
1235 are Linux-specific.
1236 .\" IP_PASSSEC is Linux-specific
1237 .\" IP_XFRM_POLICY is Linux-specific
1238 .\" IP_IPSEC_POLICY is a nonstandard extension, also present on some BSDs
1239 .PP
1240 Be very careful with the
1241 .B SO_BROADCAST
1242 option \- it is not privileged in Linux.
1243 It is easy to overload the network
1244 with careless broadcasts.
1245 For new application protocols
1246 it is better to use a multicast group instead of broadcasting.
1247 Broadcasting is discouraged.
1248 .PP
1249 Some other BSD sockets implementations provide
1250 .B IP_RCVDSTADDR
1251 and
1252 .B IP_RECVIF
1253 socket options to get the destination address and the interface of
1254 received datagrams.
1255 Linux has the more general
1256 .B IP_PKTINFO
1257 for the same task.
1258 .PP
1259 Some BSD sockets implementations also provide an
1260 .B IP_RECVTTL
1261 option, but an ancillary message with type
1262 .B IP_RECVTTL
1263 is passed with the incoming packet.
1264 This is different from the
1265 .B IP_TTL
1266 option used in Linux.
1267 .PP
1268 Using the
1269 .B SOL_IP
1270 socket options level isn't portable; BSD-based stacks use the
1271 .B IPPROTO_IP
1272 level.
1273 .PP
1274 .B INADDR_ANY
1275 (0.0.0.0) and
1276 .B INADDR_BROADCAST
1277 (255.255.255.255) are byte-order-neutral.
1278 This means
1279 .BR htonl (3)
1280 has no effect on them.
1281 .SS Compatibility
1282 For compatibility with Linux 2.0, the obsolete
1283 .BI "socket(AF_INET, SOCK_PACKET, " protocol )
1284 syntax is still supported to open a
1285 .BR packet (7)
1286 socket.
1287 This is deprecated and should be replaced by
1288 .BI "socket(AF_PACKET, SOCK_RAW, " protocol )
1289 instead.
1290 The main difference is the new
1291 .I sockaddr_ll
1292 address structure for generic link layer information instead of the old
1293 .BR sockaddr_pkt .
1294 .SH BUGS
1295 There are too many inconsistent error values.
1296 .PP
1297 The error used to diagnose exhaustion of the ephemeral port range differs
1298 across the various system calls
1299 .RB ( connect (2),
1300 .BR bind (2),
1301 .BR listen (2),
1302 .BR sendto (2))
1303 that can assign ephemeral ports.
1304 .PP
1305 The ioctls to configure IP-specific interface options and ARP tables are
1306 not described.
1307 .\" .PP
1308 .\" Some versions of glibc forget to declare
1309 .\" .IR in_pktinfo .
1310 .\" Workaround currently is to copy it into your program from this man page.
1311 .PP
1312 Receiving the original destination address with
1313 .B MSG_ERRQUEUE
1314 in
1315 .I msg_name
1316 by
1317 .BR recvmsg (2)
1318 does not work in some 2.2 kernels.
1319 .\" .SH AUTHORS
1320 .\" This man page was written by Andi Kleen.
1321 .SH SEE ALSO
1322 .BR recvmsg (2),
1323 .BR sendmsg (2),
1324 .BR byteorder (3),
1325 .BR ipfw (4),
1326 .BR capabilities (7),
1327 .BR icmp (7),
1328 .BR ipv6 (7),
1329 .BR netlink (7),
1330 .BR raw (7),
1331 .BR socket (7),
1332 .BR tcp (7),
1333 .BR udp (7),
1334 .BR ip (8)
1335 .PP
1336 RFC\ 791 for the original IP specification.
1337 RFC\ 1122 for the IPv4 host requirements.
1338 RFC\ 1812 for the IPv4 router requirements.