]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/ip.7
ip.7: tfix
[thirdparty/man-pages.git] / man7 / ip.7
1 '\" t
2 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3 .\"
4 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
5 .\" Permission is granted to distribute possibly modified copies
6 .\" of this page provided the header is included verbatim,
7 .\" and in case of nontrivial modification author and date
8 .\" of the modification is added to the header.
9 .\" %%%LICENSE_END
10 .\"
11 .\" $Id: ip.7,v 1.19 2000/12/20 18:10:31 ak Exp $
12 .\"
13 .\" FIXME The following socket options are yet to be documented
14 .\"
15 .\" IP_XFRM_POLICY (2.5.48)
16 .\" Needs CAP_NET_ADMIN
17 .\"
18 .\" IP_IPSEC_POLICY (2.5.47)
19 .\" Needs CAP_NET_ADMIN
20 .\"
21 .\" IP_PASSSEC (2.6.17)
22 .\" Boolean
23 .\" commit 2c7946a7bf45ae86736ab3b43d0085e43947945c
24 .\" Author: Catherine Zhang <cxzhang@watson.ibm.com>
25 .\"
26 .\" IP_MINTTL (2.6.34)
27 .\" commit d218d11133d888f9745802146a50255a4781d37a
28 .\" Author: Stephen Hemminger <shemminger@vyatta.com>
29 .\"
30 .\" MCAST_JOIN_GROUP (2.4.22 / 2.6)
31 .\"
32 .\" MCAST_BLOCK_SOURCE (2.4.22 / 2.6)
33 .\"
34 .\" MCAST_UNBLOCK_SOURCE (2.4.22 / 2.6)
35 .\"
36 .\" MCAST_LEAVE_GROUP (2.4.22 / 2.6)
37 .\"
38 .\" MCAST_JOIN_SOURCE_GROUP (2.4.22 / 2.6)
39 .\"
40 .\" MCAST_LEAVE_SOURCE_GROUP (2.4.22 / 2.6)
41 .\"
42 .\" MCAST_MSFILTER (2.4.22 / 2.6)
43 .\"
44 .\" IP_UNICAST_IF (3.4)
45 .\" commit 76e21053b5bf33a07c76f99d27a74238310e3c71
46 .\" Author: Erich E. Hoover <ehoover@mines.edu>
47 .\"
48 .TH IP 7 2019-03-06 "Linux" "Linux Programmer's Manual"
49 .SH NAME
50 ip \- Linux IPv4 protocol implementation
51 .SH SYNOPSIS
52 .B #include <sys/socket.h>
53 .br
54 .\" .B #include <net/netinet.h> -- does not exist anymore
55 .\" .B #include <linux/errqueue.h> -- never include <linux/foo.h>
56 .B #include <netinet/in.h>
57 .br
58 .B #include <netinet/ip.h> \fR/* superset of previous */
59 .PP
60 .IB tcp_socket " = socket(AF_INET, SOCK_STREAM, 0);"
61 .br
62 .IB udp_socket " = socket(AF_INET, SOCK_DGRAM, 0);"
63 .br
64 .IB raw_socket " = socket(AF_INET, SOCK_RAW, " protocol ");"
65 .SH DESCRIPTION
66 Linux implements the Internet Protocol, version 4,
67 described in RFC\ 791 and RFC\ 1122.
68 .B ip
69 contains a level 2 multicasting implementation conforming to RFC\ 1112.
70 It also contains an IP router including a packet filter.
71 .PP
72 The programming interface is BSD-sockets compatible.
73 For more information on sockets, see
74 .BR socket (7).
75 .PP
76 An IP socket is created using
77 .BR socket (2):
78 .PP
79 socket(AF_INET, socket_type, protocol);
80 .PP
81 Valid socket types include
82 .B SOCK_STREAM
83 to open a stream socket,
84 .B SOCK_DGRAM
85 to open a datagram socket, and
86 .B SOCK_RAW
87 to open a
88 .BR raw (7)
89 socket to access the IP protocol directly.
90 .PP
91 .I protocol
92 is the IP protocol in the IP header to be received or sent.
93 Valid values for
94 .I protocol
95 include:
96 .IP \(bu 2
97 0 and
98 .B IPPROTO_TCP
99 for
100 .BR tcp (7)
101 stream sockets;
102 .IP \(bu
103 0 and
104 .B IPPROTO_UDP
105 for
106 .BR udp (7)
107 datagram sockets;
108 .IP \(bu
109 .B IPPROTO_SCTP
110 for
111 .BR sctp (7)
112 stream sockets; and
113 .IP \(bu
114 .B IPPROTO_UDPLITE
115 for
116 .BR udplite (7)
117 datagram sockets.
118 .PP
119 For
120 .B SOCK_RAW
121 you may specify a valid IANA IP protocol defined in
122 RFC\ 1700 assigned numbers.
123 .PP
124 When a process wants to receive new incoming packets or connections, it
125 should bind a socket to a local interface address using
126 .BR bind (2).
127 In this case, only one IP socket may be bound to any given local
128 (address, port) pair.
129 When
130 .B INADDR_ANY
131 is specified in the bind call, the socket will be bound to
132 .I all
133 local interfaces.
134 When
135 .BR listen (2)
136 is called on an unbound socket, the socket is automatically bound
137 to a random free port with the local address set to
138 .BR INADDR_ANY .
139 When
140 .BR connect (2)
141 is called on an unbound socket, the socket is automatically bound
142 to a random free port or to a usable shared port with the local address
143 set to
144 .BR INADDR_ANY .
145 .PP
146 A TCP local socket address that has been bound is unavailable for
147 some time after closing, unless the
148 .B SO_REUSEADDR
149 flag has been set.
150 Care should be taken when using this flag as it makes TCP less reliable.
151 .SS Address format
152 An IP socket address is defined as a combination of an IP interface
153 address and a 16-bit port number.
154 The basic IP protocol does not supply port numbers, they
155 are implemented by higher level protocols like
156 .BR udp (7)
157 and
158 .BR tcp (7).
159 On raw sockets
160 .I sin_port
161 is set to the IP protocol.
162 .PP
163 .in +4n
164 .EX
165 struct sockaddr_in {
166 sa_family_t sin_family; /* address family: AF_INET */
167 in_port_t sin_port; /* port in network byte order */
168 struct in_addr sin_addr; /* internet address */
169 };
170
171 /* Internet address. */
172 struct in_addr {
173 uint32_t s_addr; /* address in network byte order */
174 };
175 .EE
176 .in
177 .PP
178 .I sin_family
179 is always set to
180 .BR AF_INET .
181 This is required; in Linux 2.2 most networking functions return
182 .B EINVAL
183 when this setting is missing.
184 .I sin_port
185 contains the port in network byte order.
186 The port numbers below 1024 are called
187 .IR "privileged ports"
188 (or sometimes:
189 .IR "reserved ports" ).
190 Only a privileged process
191 (on Linux: a process that has the
192 .B CAP_NET_BIND_SERVICE
193 capability in the user namespace governing its network namespace) may
194 .BR bind (2)
195 to these sockets.
196 Note that the raw IPv4 protocol as such has no concept of a
197 port, they are implemented only by higher protocols like
198 .BR tcp (7)
199 and
200 .BR udp (7).
201 .PP
202 .I sin_addr
203 is the IP host address.
204 The
205 .I s_addr
206 member of
207 .I struct in_addr
208 contains the host interface address in network byte order.
209 .I in_addr
210 should be assigned one of the
211 .BR INADDR_*
212 values
213 (e.g.,
214 .BR INADDR_LOOPBACK )
215 using
216 .BR htonl (3)
217 or set using the
218 .BR inet_aton (3),
219 .BR inet_addr (3),
220 .BR inet_makeaddr (3)
221 library functions or directly with the name resolver (see
222 .BR gethostbyname (3)).
223 .PP
224 IPv4 addresses are divided into unicast, broadcast,
225 and multicast addresses.
226 Unicast addresses specify a single interface of a host,
227 broadcast addresses specify all hosts on a network, and multicast
228 addresses address all hosts in a multicast group.
229 Datagrams to broadcast addresses can be sent or received only when the
230 .B SO_BROADCAST
231 socket flag is set.
232 In the current implementation, connection-oriented sockets are allowed
233 to use only unicast addresses.
234 .\" Leave a loophole for XTP @)
235 .PP
236 Note that the address and the port are always stored in
237 network byte order.
238 In particular, this means that you need to call
239 .BR htons (3)
240 on the number that is assigned to a port.
241 All address/port manipulation
242 functions in the standard library work in network byte order.
243 .PP
244 There are several special addresses:
245 .B INADDR_LOOPBACK
246 (127.0.0.1)
247 always refers to the local host via the loopback device;
248 .B INADDR_ANY
249 (0.0.0.0)
250 means any address for binding;
251 .B INADDR_BROADCAST
252 (255.255.255.255)
253 means any host and has the same effect on bind as
254 .B INADDR_ANY
255 for historical reasons.
256 .SS Socket options
257 IP supports some protocol-specific socket options that can be set with
258 .BR setsockopt (2)
259 and read with
260 .BR getsockopt (2).
261 The socket option level for IP is
262 .BR IPPROTO_IP .
263 .\" or SOL_IP on Linux
264 A boolean integer flag is zero when it is false, otherwise true.
265 .PP
266 When an invalid socket option is specified,
267 .BR getsockopt (2)
268 and
269 .BR setsockopt (2)
270 fail with the error
271 .BR ENOPROTOOPT .
272 .TP
273 .BR IP_ADD_MEMBERSHIP " (since Linux 1.2)"
274 Join a multicast group.
275 Argument is an
276 .I ip_mreqn
277 structure.
278 .PP
279 .in +4n
280 .EX
281 struct ip_mreqn {
282 struct in_addr imr_multiaddr; /* IP multicast group
283 address */
284 struct in_addr imr_address; /* IP address of local
285 interface */
286 int imr_ifindex; /* interface index */
287 };
288 .EE
289 .in
290 .PP
291 .I imr_multiaddr
292 contains the address of the multicast group the application
293 wants to join or leave.
294 It must be a valid multicast address
295 .\" (i.e., within the 224.0.0.0-239.255.255.255 range)
296 (or
297 .BR setsockopt (2)
298 fails with the error
299 .BR EINVAL ).
300 .I imr_address
301 is the address of the local interface with which the system
302 should join the multicast group; if it is equal to
303 .BR INADDR_ANY ,
304 an appropriate interface is chosen by the system.
305 .I imr_ifindex
306 is the interface index of the interface that should join/leave the
307 .I imr_multiaddr
308 group, or 0 to indicate any interface.
309 .IP
310 The
311 .I ip_mreqn
312 structure is available only since Linux 2.2.
313 For compatibility, the old
314 .I ip_mreq
315 structure (present since Linux 1.2) is still supported;
316 it differs from
317 .I ip_mreqn
318 only by not including the
319 .I imr_ifindex
320 field.
321 (The kernel determines which structure is being passed based
322 on the size passed in
323 .IR optlen .)
324 .IP
325 .B IP_ADD_MEMBERSHIP
326 is valid only for
327 .BR setsockopt (2).
328 .\"
329 .TP
330 .BR IP_ADD_SOURCE_MEMBERSHIP " (since Linux 2.4.22 / 2.5.68)"
331 Join a multicast group and allow receiving data only
332 from a specified source.
333 Argument is an
334 .I ip_mreq_source
335 structure.
336 .PP
337 .in +4n
338 .EX
339 struct ip_mreq_source {
340 struct in_addr imr_multiaddr; /* IP multicast group
341 address */
342 struct in_addr imr_interface; /* IP address of local
343 interface */
344 struct in_addr imr_sourceaddr; /* IP address of
345 multicast source */
346 };
347 .EE
348 .in
349 .PP
350 The
351 .I ip_mreq_source
352 structure is similar to
353 .I ip_mreqn
354 described under
355 .BR IP_ADD_MEMBERSHIP .
356 The
357 .I imr_multiaddr
358 field contains the address of the multicast group the application
359 wants to join or leave.
360 The
361 .I imr_interface
362 field is the address of the local interface with which
363 the system should join the multicast group.
364 Finally, the
365 .I imr_sourceaddr
366 field contains the address of the source the
367 application wants to receive data from.
368 .IP
369 This option can be used multiple times to allow
370 receiving data from more than one source.
371 .TP
372 .BR IP_BIND_ADDRESS_NO_PORT " (since Linux 4.2)"
373 .\" commit 90c337da1524863838658078ec34241f45d8394d
374 Inform the kernel to not reserve an ephemeral port when using
375 .BR bind (2)
376 with a port number of 0.
377 The port will later be automatically chosen at
378 .BR connect (2)
379 time,
380 in a way that allows sharing a source port as long as the 4-tuple is unique.
381 .TP
382 .BR IP_BLOCK_SOURCE " (since Linux 2.4.22 / 2.5.68)"
383 Stop receiving multicast data from a specific source in a given group.
384 This is valid only after the application has subscribed
385 to the multicast group using either
386 .BR IP_ADD_MEMBERSHIP
387 or
388 .BR IP_ADD_SOURCE_MEMBERSHIP .
389 .IP
390 Argument is an
391 .I ip_mreq_source
392 structure as described under
393 .BR IP_ADD_SOURCE_MEMBERSHIP .
394 .TP
395 .BR IP_DROP_MEMBERSHIP " (since Linux 1.2)"
396 Leave a multicast group.
397 Argument is an
398 .I ip_mreqn
399 or
400 .I ip_mreq
401 structure similar to
402 .BR IP_ADD_MEMBERSHIP .
403 .TP
404 .BR IP_DROP_SOURCE_MEMBERSHIP " (since Linux 2.4.22 / 2.5.68)"
405 Leave a source-specific group\(emthat is, stop receiving data from
406 a given multicast group that come from a given source.
407 If the application has subscribed to multiple sources within
408 the same group, data from the remaining sources will still be delivered.
409 To stop receiving data from all sources at once, use
410 .BR IP_DROP_MEMBERSHIP .
411 .IP
412 Argument is an
413 .I ip_mreq_source
414 structure as described under
415 .BR IP_ADD_SOURCE_MEMBERSHIP .
416 .TP
417 .BR IP_FREEBIND " (since Linux 2.4)"
418 .\" Precisely: 2.4.0-test10
419 If enabled, this boolean option allows binding to an IP address
420 that is nonlocal or does not (yet) exist.
421 This permits listening on a socket,
422 without requiring the underlying network interface or the
423 specified dynamic IP address to be up at the time that
424 the application is trying to bind to it.
425 This option is the per-socket equivalent of the
426 .IR ip_nonlocal_bind
427 .I /proc
428 interface described below.
429 .TP
430 .BR IP_HDRINCL " (since Linux 2.0)"
431 If enabled,
432 the user supplies an IP header in front of the user data.
433 Valid only for
434 .B SOCK_RAW
435 sockets; see
436 .BR raw (7)
437 for more information.
438 When this flag is enabled, the values set by
439 .BR IP_OPTIONS ,
440 .BR IP_TTL ,
441 and
442 .B IP_TOS
443 are ignored.
444 .TP
445 .BR IP_MSFILTER " (since Linux 2.4.22 / 2.5.68)"
446 This option provides access to the advanced full-state filtering API.
447 Argument is an
448 .I ip_msfilter
449 structure.
450 .PP
451 .in +4n
452 .EX
453 struct ip_msfilter {
454 struct in_addr imsf_multiaddr; /* IP multicast group
455 address */
456 struct in_addr imsf_interface; /* IP address of local
457 interface */
458 uint32_t imsf_fmode; /* Filter-mode */
459
460 uint32_t imsf_numsrc; /* Number of sources in
461 the following array */
462 struct in_addr imsf_slist[1]; /* Array of source
463 addresses */
464 };
465 .EE
466 .in
467 .PP
468 There are two macros,
469 .BR MCAST_INCLUDE
470 and
471 .BR MCAST_EXCLUDE ,
472 which can be used to specify the filtering mode.
473 Additionally, the
474 .BR IP_MSFILTER_SIZE (n)
475 macro exists to determine how much memory is needed to store
476 .I ip_msfilter
477 structure with
478 .I n
479 sources in the source list.
480 .IP
481 For the full description of multicast source filtering
482 refer to RFC 3376.
483 .TP
484 .BR IP_MTU " (since Linux 2.2)"
485 .\" Precisely: 2.1.124
486 Retrieve the current known path MTU of the current socket.
487 Returns an integer.
488 .IP
489 .B IP_MTU
490 is valid only for
491 .BR getsockopt (2)
492 and can be employed only when the socket has been connected.
493 .TP
494 .BR IP_MTU_DISCOVER " (since Linux 2.2)"
495 .\" Precisely: 2.1.124
496 Set or receive the Path MTU Discovery setting for a socket.
497 When enabled, Linux will perform Path MTU Discovery
498 as defined in RFC\ 1191 on
499 .B SOCK_STREAM
500 sockets.
501 For
502 .RB non- SOCK_STREAM
503 sockets,
504 .B IP_PMTUDISC_DO
505 forces the don't-fragment flag to be set on all outgoing packets.
506 It is the user's responsibility to packetize the data
507 in MTU-sized chunks and to do the retransmits if necessary.
508 The kernel will reject (with
509 .BR EMSGSIZE )
510 datagrams that are bigger than the known path MTU.
511 .B IP_PMTUDISC_WANT
512 will fragment a datagram if needed according to the path MTU,
513 or will set the don't-fragment flag otherwise.
514 .IP
515 The system-wide default can be toggled between
516 .B IP_PMTUDISC_WANT
517 and
518 .B IP_PMTUDISC_DONT
519 by writing (respectively, zero and nonzero values) to the
520 .I /proc/sys/net/ipv4/ip_no_pmtu_disc
521 file.
522 .TS
523 tab(:);
524 c l
525 l l.
526 Path MTU discovery value:Meaning
527 IP_PMTUDISC_WANT:Use per-route settings.
528 IP_PMTUDISC_DONT:Never do Path MTU Discovery.
529 IP_PMTUDISC_DO:Always do Path MTU Discovery.
530 IP_PMTUDISC_PROBE:Set DF but ignore Path MTU.
531 .TE
532 .sp 1
533 When PMTU discovery is enabled, the kernel automatically keeps track of
534 the path MTU per destination host.
535 When it is connected to a specific peer with
536 .BR connect (2),
537 the currently known path MTU can be retrieved conveniently using the
538 .B IP_MTU
539 socket option (e.g., after an
540 .B EMSGSIZE
541 error occurred).
542 The path MTU may change over time.
543 For connectionless sockets with many destinations,
544 the new MTU for a given destination can also be accessed using the
545 error queue (see
546 .BR IP_RECVERR ).
547 A new error will be queued for every incoming MTU update.
548 .IP
549 While MTU discovery is in progress, initial packets from datagram sockets
550 may be dropped.
551 Applications using UDP should be aware of this and not
552 take it into account for their packet retransmit strategy.
553 .IP
554 To bootstrap the path MTU discovery process on unconnected sockets, it
555 is possible to start with a big datagram size
556 (headers up to 64 kilobytes long) and let it shrink by updates of the path MTU.
557 .IP
558 To get an initial estimate of the
559 path MTU, connect a datagram socket to the destination address using
560 .BR connect (2)
561 and retrieve the MTU by calling
562 .BR getsockopt (2)
563 with the
564 .B IP_MTU
565 option.
566 .IP
567 It is possible to implement RFC 4821 MTU probing with
568 .B SOCK_DGRAM
569 or
570 .B SOCK_RAW
571 sockets by setting a value of
572 .BR IP_PMTUDISC_PROBE
573 (available since Linux 2.6.22).
574 This is also particularly useful for diagnostic tools such as
575 .BR tracepath (8)
576 that wish to deliberately send probe packets larger than
577 the observed Path MTU.
578 .TP
579 .BR IP_MULTICAST_ALL " (since Linux 2.6.31)"
580 This option can be used to modify the delivery policy of multicast messages
581 to sockets bound to the wildcard
582 .B INADDR_ANY
583 address.
584 The argument is a boolean integer (defaults to 1).
585 If set to 1,
586 the socket will receive messages from all the groups that have been joined
587 globally on the whole system.
588 Otherwise, it will deliver messages only from
589 the groups that have been explicitly joined (for example via the
590 .B IP_ADD_MEMBERSHIP
591 option) on this particular socket.
592 .TP
593 .BR IP_MULTICAST_IF " (since Linux 1.2)"
594 Set the local device for a multicast socket.
595 The argument for
596 .BR setsockopt (2)
597 is an
598 .I ip_mreqn
599 or
600 .\" net: IP_MULTICAST_IF setsockopt now recognizes struct mreq
601 .\" Commit: 3a084ddb4bf299a6e898a9a07c89f3917f0713f7
602 (since Linux 3.5)
603 .I ip_mreq
604 structure similar to
605 .BR IP_ADD_MEMBERSHIP ,
606 or an
607 .I in_addr
608 structure.
609 (The kernel determines which structure is being passed based
610 on the size passed in
611 .IR optlen .)
612 For
613 .BR getsockopt (2),
614 the argument is an
615 .I in_addr
616 structure.
617 .TP
618 .BR IP_MULTICAST_LOOP " (since Linux 1.2)"
619 Set or read a boolean integer argument that determines whether
620 sent multicast packets should be looped back to the local sockets.
621 .TP
622 .BR IP_MULTICAST_TTL " (since Linux 1.2)"
623 Set or read the time-to-live value of outgoing multicast packets for this
624 socket.
625 It is very important for multicast packets to set the smallest TTL possible.
626 The default is 1 which means that multicast packets don't leave the local
627 network unless the user program explicitly requests it.
628 Argument is an integer.
629 .TP
630 .BR IP_NODEFRAG " (since Linux 2.6.36)"
631 If enabled (argument is nonzero),
632 the reassembly of outgoing packets is disabled in the netfilter layer.
633 The argument is an integer.
634 .IP
635 This option is valid only for
636 .B SOCK_RAW
637 sockets.
638 .TP
639 .BR IP_OPTIONS " (since Linux 2.0)"
640 .\" Precisely: 1.3.30
641 Set or get the IP options to be sent with every packet from this socket.
642 The arguments are a pointer to a memory buffer containing the options
643 and the option length.
644 The
645 .BR setsockopt (2)
646 call sets the IP options associated with a socket.
647 The maximum option size for IPv4 is 40 bytes.
648 See RFC\ 791 for the allowed options.
649 When the initial connection request packet for a
650 .B SOCK_STREAM
651 socket contains IP options, the IP options will be set automatically
652 to the options from the initial packet with routing headers reversed.
653 Incoming packets are not allowed to change options after the connection
654 is established.
655 The processing of all incoming source routing options
656 is disabled by default and can be enabled by using the
657 .I accept_source_route
658 .I /proc
659 interface.
660 Other options like timestamps are still handled.
661 For datagram sockets, IP options can be only set by the local user.
662 Calling
663 .BR getsockopt (2)
664 with
665 .B IP_OPTIONS
666 puts the current IP options used for sending into the supplied buffer.
667 .TP
668 .BR IP_PKTINFO " (since Linux 2.2)"
669 .\" Precisely: 2.1.68
670 Pass an
671 .B IP_PKTINFO
672 ancillary message that contains a
673 .I pktinfo
674 structure that supplies some information about the incoming packet.
675 This only works for datagram oriented sockets.
676 The argument is a flag that tells the socket whether the
677 .B IP_PKTINFO
678 message should be passed or not.
679 The message itself can only be sent/retrieved
680 as control message with a packet using
681 .BR recvmsg (2)
682 or
683 .BR sendmsg (2).
684 .IP
685 .in +4n
686 .EX
687 struct in_pktinfo {
688 unsigned int ipi_ifindex; /* Interface index */
689 struct in_addr ipi_spec_dst; /* Local address */
690 struct in_addr ipi_addr; /* Header Destination
691 address */
692 };
693 .EE
694 .in
695 .IP
696 .I ipi_ifindex
697 is the unique index of the interface the packet was received on.
698 .I ipi_spec_dst
699 is the local address of the packet and
700 .I ipi_addr
701 is the destination address in the packet header.
702 If
703 .B IP_PKTINFO
704 is passed to
705 .BR sendmsg (2)
706 and
707 .\" This field is grossly misnamed
708 .I ipi_spec_dst
709 is not zero, then it is used as the local source address for the routing
710 table lookup and for setting up IP source route options.
711 When
712 .I ipi_ifindex
713 is not zero, the primary local address of the interface specified by the
714 index overwrites
715 .I ipi_spec_dst
716 for the routing table lookup.
717 .TP
718 .BR IP_RECVERR " (since Linux 2.2)"
719 .\" Precisely: 2.1.15
720 Enable extended reliable error message passing.
721 When enabled on a datagram socket, all
722 generated errors will be queued in a per-socket error queue.
723 When the user receives an error from a socket operation,
724 the errors can be received by calling
725 .BR recvmsg (2)
726 with the
727 .B MSG_ERRQUEUE
728 flag set.
729 The
730 .I sock_extended_err
731 structure describing the error will be passed in an ancillary message with
732 the type
733 .B IP_RECVERR
734 and the level
735 .BR IPPROTO_IP .
736 .\" or SOL_IP on Linux
737 This is useful for reliable error handling on unconnected sockets.
738 The received data portion of the error queue contains the error packet.
739 .IP
740 The
741 .B IP_RECVERR
742 control message contains a
743 .I sock_extended_err
744 structure:
745 .IP
746 .in +4n
747 .EX
748 #define SO_EE_ORIGIN_NONE 0
749 #define SO_EE_ORIGIN_LOCAL 1
750 #define SO_EE_ORIGIN_ICMP 2
751 #define SO_EE_ORIGIN_ICMP6 3
752
753 struct sock_extended_err {
754 uint32_t ee_errno; /* error number */
755 uint8_t ee_origin; /* where the error originated */
756 uint8_t ee_type; /* type */
757 uint8_t ee_code; /* code */
758 uint8_t ee_pad;
759 uint32_t ee_info; /* additional information */
760 uint32_t ee_data; /* other data */
761 /* More data may follow */
762 };
763
764 struct sockaddr *SO_EE_OFFENDER(struct sock_extended_err *);
765 .EE
766 .in
767 .IP
768 .I ee_errno
769 contains the
770 .I errno
771 number of the queued error.
772 .I ee_origin
773 is the origin code of where the error originated.
774 The other fields are protocol-specific.
775 The macro
776 .B SO_EE_OFFENDER
777 returns a pointer to the address of the network object
778 where the error originated from given a pointer to the ancillary message.
779 If this address is not known, the
780 .I sa_family
781 member of the
782 .I sockaddr
783 contains
784 .B AF_UNSPEC
785 and the other fields of the
786 .I sockaddr
787 are undefined.
788 .IP
789 IP uses the
790 .I sock_extended_err
791 structure as follows:
792 .I ee_origin
793 is set to
794 .B SO_EE_ORIGIN_ICMP
795 for errors received as an ICMP packet, or
796 .B SO_EE_ORIGIN_LOCAL
797 for locally generated errors.
798 Unknown values should be ignored.
799 .I ee_type
800 and
801 .I ee_code
802 are set from the type and code fields of the ICMP header.
803 .I ee_info
804 contains the discovered MTU for
805 .B EMSGSIZE
806 errors.
807 The message also contains the
808 .I sockaddr_in of the node
809 caused the error, which can be accessed with the
810 .B SO_EE_OFFENDER
811 macro.
812 The
813 .I sin_family
814 field of the
815 .B SO_EE_OFFENDER
816 address is
817 .B AF_UNSPEC
818 when the source was unknown.
819 When the error originated from the network, all IP options
820 .RB ( IP_OPTIONS ", " IP_TTL ", "
821 etc.) enabled on the socket and contained in the
822 error packet are passed as control messages.
823 The payload of the packet causing the error is returned as normal payload.
824 .\" FIXME . Is it a good idea to document that? It is a dubious feature.
825 .\" On
826 .\" .B SOCK_STREAM
827 .\" sockets,
828 .\" .B IP_RECVERR
829 .\" has slightly different semantics. Instead of
830 .\" saving the errors for the next timeout, it passes all incoming
831 .\" errors immediately to the user.
832 .\" This might be useful for very short-lived TCP connections which
833 .\" need fast error handling. Use this option with care:
834 .\" it makes TCP unreliable
835 .\" by not allowing it to recover properly from routing
836 .\" shifts and other normal
837 .\" conditions and breaks the protocol specification.
838 Note that TCP has no error queue;
839 .B MSG_ERRQUEUE
840 is not permitted on
841 .B SOCK_STREAM
842 sockets.
843 .B IP_RECVERR
844 is valid for TCP, but all errors are returned by socket function return or
845 .B SO_ERROR
846 only.
847 .IP
848 For raw sockets,
849 .B IP_RECVERR
850 enables passing of all received ICMP errors to the
851 application, otherwise errors are only reported on connected sockets
852 .IP
853 It sets or retrieves an integer boolean flag.
854 .B IP_RECVERR
855 defaults to off.
856 .TP
857 .BR IP_RECVOPTS " (since Linux 2.2)"
858 .\" Precisely: 2.1.15
859 Pass all incoming IP options to the user in a
860 .B IP_OPTIONS
861 control message.
862 The routing header and other options are already filled in
863 for the local host.
864 Not supported for
865 .B SOCK_STREAM
866 sockets.
867 .TP
868 .BR IP_RECVORIGDSTADDR " (since Linux 2.6.29)"
869 .\" commit e8b2dfe9b4501ed0047459b2756ba26e5a940a69
870 This boolean option enables the
871 .B IP_ORIGDSTADDR
872 ancillary message in
873 .BR recvmsg (2),
874 in which the kernel returns the original destination address
875 of the datagram being received.
876 The ancillary message contains a
877 .IR "struct sockaddr_in" .
878 .TP
879 .BR IP_RECVTOS " (since Linux 2.2)"
880 .\" Precisely: 2.1.68
881 If enabled, the
882 .B IP_TOS
883 ancillary message is passed with incoming packets.
884 It contains a byte which specifies the Type of Service/Precedence
885 field of the packet header.
886 Expects a boolean integer flag.
887 .TP
888 .BR IP_RECVTTL " (since Linux 2.2)"
889 .\" Precisely: 2.1.68
890 When this flag is set, pass a
891 .B IP_TTL
892 control message with the time-to-live
893 field of the received packet as a 32 bit integer.
894 Not supported for
895 .B SOCK_STREAM
896 sockets.
897 .TP
898 .BR IP_RETOPTS " (since Linux 2.2)"
899 .\" Precisely: 2.1.15
900 Identical to
901 .BR IP_RECVOPTS ,
902 but returns raw unprocessed options with timestamp and route record
903 options not filled in for this hop.
904 .TP
905 .BR IP_ROUTER_ALERT " (since Linux 2.2)"
906 .\" Precisely: 2.1.68
907 Pass all to-be forwarded packets with the
908 IP Router Alert option set to this socket.
909 Valid only for raw sockets.
910 This is useful, for instance, for user-space RSVP daemons.
911 The tapped packets are not forwarded by the kernel; it is
912 the user's responsibility to send them out again.
913 Socket binding is ignored,
914 such packets are only filtered by protocol.
915 Expects an integer flag.
916 .TP
917 .BR IP_TOS " (since Linux 1.0)"
918 Set or receive the Type-Of-Service (TOS) field that is sent
919 with every IP packet originating from this socket.
920 It is used to prioritize packets on the network.
921 TOS is a byte.
922 There are some standard TOS flags defined:
923 .B IPTOS_LOWDELAY
924 to minimize delays for interactive traffic,
925 .B IPTOS_THROUGHPUT
926 to optimize throughput,
927 .B IPTOS_RELIABILITY
928 to optimize for reliability,
929 .B IPTOS_MINCOST
930 should be used for "filler data" where slow transmission doesn't matter.
931 At most one of these TOS values can be specified.
932 Other bits are invalid and shall be cleared.
933 Linux sends
934 .B IPTOS_LOWDELAY
935 datagrams first by default,
936 but the exact behavior depends on the configured queueing discipline.
937 .\" FIXME elaborate on this
938 Some high-priority levels may require superuser privileges (the
939 .B CAP_NET_ADMIN
940 capability).
941 .\" The priority can also be set in a protocol-independent way by the
942 .\" .RB ( SOL_SOCKET ", " SO_PRIORITY )
943 .\" socket option (see
944 .\" .BR socket (7)).
945 .TP
946 .BR IP_TRANSPARENT " (since Linux 2.6.24)"
947 .\" commit f5715aea4564f233767ea1d944b2637a5fd7cd2e
948 .\" This patch introduces the IP_TRANSPARENT socket option: enabling that
949 .\" will make the IPv4 routing omit the non-local source address check on
950 .\" output. Setting IP_TRANSPARENT requires NET_ADMIN capability.
951 .\" http://lwn.net/Articles/252545/
952 Setting this boolean option enables transparent proxying on this socket.
953 This socket option allows
954 the calling application to bind to a nonlocal IP address and operate
955 both as a client and a server with the foreign address as the local endpoint.
956 NOTE: this requires that routing be set up in a way that
957 packets going to the foreign address are routed through the TProxy box
958 (i.e., the system hosting the application that employs the
959 .B IP_TRANSPARENT
960 socket option).
961 Enabling this socket option requires superuser privileges
962 (the
963 .BR CAP_NET_ADMIN
964 capability).
965 .IP
966 TProxy redirection with the iptables TPROXY target also requires that
967 this option be set on the redirected socket.
968 .TP
969 .BR IP_TTL " (since Linux 1.0)"
970 Set or retrieve the current time-to-live field that is used in every packet
971 sent from this socket.
972 .TP
973 .BR IP_UNBLOCK_SOURCE " (since Linux 2.4.22 / 2.5.68)"
974 Unblock previously blocked multicast source.
975 Returns
976 .BR EADDRNOTAVAIL
977 when given source is not being blocked.
978 .IP
979 Argument is an
980 .I ip_mreq_source
981 structure as described under
982 .BR IP_ADD_SOURCE_MEMBERSHIP .
983 .SS /proc interfaces
984 The IP protocol
985 supports a set of
986 .I /proc
987 interfaces to configure some global parameters.
988 The parameters can be accessed by reading or writing files in the directory
989 .IR /proc/sys/net/ipv4/ .
990 .\" FIXME As at 2.6.12, 14 Jun 2005, the following are undocumented:
991 .\" ip_queue_maxlen
992 .\" ip_conntrack_max
993 Interfaces described as
994 .I Boolean
995 take an integer value, with a nonzero value ("true") meaning that
996 the corresponding option is enabled, and a zero value ("false")
997 meaning that the option is disabled.
998 .\"
999 .TP
1000 .IR ip_always_defrag " (Boolean; since Linux 2.2.13)"
1001 [New with kernel 2.2.13; in earlier kernel versions this feature
1002 was controlled at compile time by the
1003 .B CONFIG_IP_ALWAYS_DEFRAG
1004 option; this option is not present in 2.4.x and later]
1005 .IP
1006 When this boolean flag is enabled (not equal 0), incoming fragments
1007 (parts of IP packets
1008 that arose when some host between origin and destination decided
1009 that the packets were too large and cut them into pieces) will be
1010 reassembled (defragmented) before being processed, even if they are
1011 about to be forwarded.
1012 .IP
1013 Enable only if running either a firewall that is the sole link
1014 to your network or a transparent proxy; never ever use it for a
1015 normal router or host.
1016 Otherwise, fragmented communication can be disturbed
1017 if the fragments travel over different links.
1018 Defragmentation also has a large memory and CPU time cost.
1019 .IP
1020 This is automagically turned on when masquerading or transparent
1021 proxying are configured.
1022 .\"
1023 .TP
1024 .IR ip_autoconfig " (since Linux 2.2 to 2.6.17)"
1025 .\" Precisely: since 2.1.68
1026 .\" FIXME document ip_autoconfig
1027 Not documented.
1028 .\"
1029 .TP
1030 .IR ip_default_ttl " (integer; default: 64; since Linux 2.2)"
1031 .\" Precisely: 2.1.15
1032 Set the default time-to-live value of outgoing packets.
1033 This can be changed per socket with the
1034 .B IP_TTL
1035 option.
1036 .\"
1037 .TP
1038 .IR ip_dynaddr " (Boolean; default: disabled; since Linux 2.0.31)"
1039 Enable dynamic socket address and masquerading entry rewriting on interface
1040 address change.
1041 This is useful for dialup interface with changing IP addresses.
1042 0 means no rewriting, 1 turns it on and 2 enables verbose mode.
1043 .\"
1044 .TP
1045 .IR ip_forward " (Boolean; default: disabled; since Linux 1.2)"
1046 Enable IP forwarding with a boolean flag.
1047 IP forwarding can be also set on a per-interface basis.
1048 .\"
1049 .TP
1050 .IR ip_local_port_range " (since Linux 2.2)"
1051 .\" Precisely: since 2.1.68
1052 This file contains two integers that define the default local port range
1053 allocated to sockets that are not explicitly bound to a port number\(emthat
1054 is, the range used for
1055 .IR "ephemeral ports" .
1056 An ephemeral port is allocated to a socket in the following circumstances:
1057 .RS
1058 .IP * 3
1059 the port number in a socket address is specified as 0 when calling
1060 .BR bind (2);
1061 .IP *
1062 .BR listen (2)
1063 is called on a stream socket that was not previously bound;
1064 .IP *
1065 .BR connect (2)
1066 was called on a socket that was not previously bound;
1067 .IP *
1068 .BR sendto (2)
1069 is called on a datagram socket that was not previously bound.
1070 .RE
1071 .IP
1072 Allocation of ephemeral ports starts with the first number in
1073 .IR ip_local_port_range
1074 and ends with the second number.
1075 If the range of ephemeral ports is exhausted,
1076 then the relevant system call returns an error (but see BUGS).
1077 .IP
1078 Note that the port range in
1079 .IR ip_local_port_range
1080 should not conflict with the ports used by masquerading
1081 (although the case is handled).
1082 Also, arbitrary choices may cause problems with some firewall packet
1083 filters that make assumptions about the local ports in use.
1084 The first number should be at least greater than 1024,
1085 or better, greater than 4096, to avoid clashes
1086 with well known ports and to minimize firewall problems.
1087 .\"
1088 .TP
1089 .IR ip_no_pmtu_disc " (Boolean; default: disabled; since Linux 2.2)"
1090 .\" Precisely: 2.1.15
1091 If enabled, don't do Path MTU Discovery for TCP sockets by default.
1092 Path MTU discovery may fail if misconfigured firewalls (that drop
1093 all ICMP packets) or misconfigured interfaces (e.g., a point-to-point
1094 link where the both ends don't agree on the MTU) are on the path.
1095 It is better to fix the broken routers on the path than to turn off
1096 Path MTU Discovery globally, because not doing it incurs a high cost
1097 to the network.
1098 .\"
1099 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
1100 .TP
1101 .IR ip_nonlocal_bind " (Boolean; default: disabled; since Linux 2.4)"
1102 .\" Precisely: patch-2.4.0-test10
1103 If set, allows processes to
1104 .BR bind (2)
1105 to nonlocal IP addresses,
1106 which can be quite useful, but may break some applications.
1107 .\"
1108 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
1109 .TP
1110 .IR ip6frag_time " (integer; default: 30)"
1111 Time in seconds to keep an IPv6 fragment in memory.
1112 .\"
1113 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
1114 .TP
1115 .IR ip6frag_secret_interval " (integer; default: 600)"
1116 Regeneration interval (in seconds) of the hash secret (or lifetime
1117 for the hash secret) for IPv6 fragments.
1118 .TP
1119 .IR ipfrag_high_thresh " (integer), " ipfrag_low_thresh " (integer)"
1120 If the amount of queued IP fragments reaches
1121 .IR ipfrag_high_thresh ,
1122 the queue is pruned down to
1123 .IR ipfrag_low_thresh .
1124 Contains an integer with the number of bytes.
1125 .TP
1126 .I neigh/*
1127 See
1128 .BR arp (7).
1129 .\" FIXME Document the conf/*/* interfaces
1130 .\"
1131 .\" FIXME Document the route/* interfaces
1132 .SS Ioctls
1133 All ioctls described in
1134 .BR socket (7)
1135 apply to
1136 .BR ip .
1137 .\" 2006-04-02, mtk
1138 .\" commented out the following because ipchains is obsolete
1139 .\" .PP
1140 .\" The ioctls to configure firewalling are documented in
1141 .\" .BR ipfw (4)
1142 .\" from the
1143 .\" .B ipchains
1144 .\" package.
1145 .PP
1146 Ioctls to configure generic device parameters are described in
1147 .BR netdevice (7).
1148 .\" FIXME Add a discussion of multicasting
1149 .SH ERRORS
1150 .\" FIXME document all errors.
1151 .\" We should really fix the kernels to give more uniform
1152 .\" error returns (ENOMEM vs ENOBUFS, EPERM vs EACCES etc.)
1153 .TP
1154 .B EACCES
1155 The user tried to execute an operation without the necessary permissions.
1156 These include:
1157 sending a packet to a broadcast address without having the
1158 .B SO_BROADCAST
1159 flag set;
1160 sending a packet via a
1161 .I prohibit
1162 route;
1163 modifying firewall settings without superuser privileges (the
1164 .B CAP_NET_ADMIN
1165 capability);
1166 binding to a privileged port without superuser privileges (the
1167 .B CAP_NET_BIND_SERVICE
1168 capability).
1169 .TP
1170 .B EADDRINUSE
1171 Tried to bind to an address already in use.
1172 .TP
1173 .B EADDRNOTAVAIL
1174 A nonexistent interface was requested or the requested source
1175 address was not local.
1176 .TP
1177 .B EAGAIN
1178 Operation on a nonblocking socket would block.
1179 .TP
1180 .B EALREADY
1181 A connection operation on a nonblocking socket is already in progress.
1182 .TP
1183 .B ECONNABORTED
1184 A connection was closed during an
1185 .BR accept (2).
1186 .TP
1187 .B EHOSTUNREACH
1188 No valid routing table entry matches the destination address.
1189 This error can be caused by an ICMP message from a remote router or
1190 for the local routing table.
1191 .TP
1192 .B EINVAL
1193 Invalid argument passed.
1194 For send operations this can be caused by sending to a
1195 .I blackhole
1196 route.
1197 .TP
1198 .B EISCONN
1199 .BR connect (2)
1200 was called on an already connected socket.
1201 .TP
1202 .B EMSGSIZE
1203 Datagram is bigger than an MTU on the path and it cannot be fragmented.
1204 .TP
1205 .BR ENOBUFS ", " ENOMEM
1206 Not enough free memory.
1207 This often means that the memory allocation is limited by the socket
1208 buffer limits, not by the system memory, but this is not 100% consistent.
1209 .TP
1210 .B ENOENT
1211 .B SIOCGSTAMP
1212 was called on a socket where no packet arrived.
1213 .TP
1214 .B ENOPKG
1215 A kernel subsystem was not configured.
1216 .TP
1217 .BR ENOPROTOOPT " and " EOPNOTSUPP
1218 Invalid socket option passed.
1219 .TP
1220 .B ENOTCONN
1221 The operation is defined only on a connected socket, but the socket wasn't
1222 connected.
1223 .TP
1224 .B EPERM
1225 User doesn't have permission to set high priority, change configuration,
1226 or send signals to the requested process or group.
1227 .TP
1228 .B EPIPE
1229 The connection was unexpectedly closed or shut down by the other end.
1230 .TP
1231 .B ESOCKTNOSUPPORT
1232 The socket is not configured or an unknown socket type was requested.
1233 .PP
1234 Other errors may be generated by the overlaying protocols; see
1235 .BR tcp (7),
1236 .BR raw (7),
1237 .BR udp (7),
1238 and
1239 .BR socket (7).
1240 .SH NOTES
1241 .BR IP_FREEBIND ,
1242 .BR IP_MSFILTER ,
1243 .BR IP_MTU ,
1244 .BR IP_MTU_DISCOVER ,
1245 .BR IP_RECVORIGDSTADDR ,
1246 .BR IP_PKTINFO ,
1247 .BR IP_RECVERR ,
1248 .BR IP_ROUTER_ALERT ,
1249 and
1250 .BR IP_TRANSPARENT
1251 are Linux-specific.
1252 .\" IP_PASSSEC is Linux-specific
1253 .\" IP_XFRM_POLICY is Linux-specific
1254 .\" IP_IPSEC_POLICY is a nonstandard extension, also present on some BSDs
1255 .PP
1256 Be very careful with the
1257 .B SO_BROADCAST
1258 option \- it is not privileged in Linux.
1259 It is easy to overload the network
1260 with careless broadcasts.
1261 For new application protocols
1262 it is better to use a multicast group instead of broadcasting.
1263 Broadcasting is discouraged.
1264 .PP
1265 Some other BSD sockets implementations provide
1266 .B IP_RCVDSTADDR
1267 and
1268 .B IP_RECVIF
1269 socket options to get the destination address and the interface of
1270 received datagrams.
1271 Linux has the more general
1272 .B IP_PKTINFO
1273 for the same task.
1274 .PP
1275 Some BSD sockets implementations also provide an
1276 .B IP_RECVTTL
1277 option, but an ancillary message with type
1278 .B IP_RECVTTL
1279 is passed with the incoming packet.
1280 This is different from the
1281 .B IP_TTL
1282 option used in Linux.
1283 .PP
1284 Using the
1285 .B SOL_IP
1286 socket options level isn't portable; BSD-based stacks use the
1287 .B IPPROTO_IP
1288 level.
1289 .PP
1290 .B INADDR_ANY
1291 (0.0.0.0) and
1292 .B INADDR_BROADCAST
1293 (255.255.255.255) are byte-order-neutral.
1294 This means
1295 .BR htonl (3)
1296 has no effect on them.
1297 .SS Compatibility
1298 For compatibility with Linux 2.0, the obsolete
1299 .BI "socket(AF_INET, SOCK_PACKET, " protocol )
1300 syntax is still supported to open a
1301 .BR packet (7)
1302 socket.
1303 This is deprecated and should be replaced by
1304 .BI "socket(AF_PACKET, SOCK_RAW, " protocol )
1305 instead.
1306 The main difference is the new
1307 .I sockaddr_ll
1308 address structure for generic link layer information instead of the old
1309 .BR sockaddr_pkt .
1310 .SH BUGS
1311 There are too many inconsistent error values.
1312 .PP
1313 The error used to diagnose exhaustion of the ephemeral port range differs
1314 across the various system calls
1315 .RB ( connect (2),
1316 .BR bind (2),
1317 .BR listen (2),
1318 .BR sendto (2))
1319 that can assign ephemeral ports.
1320 .PP
1321 The ioctls to configure IP-specific interface options and ARP tables are
1322 not described.
1323 .\" .PP
1324 .\" Some versions of glibc forget to declare
1325 .\" .IR in_pktinfo .
1326 .\" Workaround currently is to copy it into your program from this man page.
1327 .PP
1328 Receiving the original destination address with
1329 .B MSG_ERRQUEUE
1330 in
1331 .I msg_name
1332 by
1333 .BR recvmsg (2)
1334 does not work in some 2.2 kernels.
1335 .\" .SH AUTHORS
1336 .\" This man page was written by Andi Kleen.
1337 .SH SEE ALSO
1338 .BR recvmsg (2),
1339 .BR sendmsg (2),
1340 .BR byteorder (3),
1341 .BR ipfw (4),
1342 .BR capabilities (7),
1343 .BR icmp (7),
1344 .BR ipv6 (7),
1345 .BR netdevice (7),
1346 .BR netlink (7),
1347 .BR raw (7),
1348 .BR socket (7),
1349 .BR tcp (7),
1350 .BR udp (7),
1351 .BR ip (8)
1352 .PP
1353 The kernel source file
1354 .IR Documentation/networking/ip-sysctl.txt .
1355 .PP
1356 RFC\ 791 for the original IP specification.
1357 RFC\ 1122 for the IPv4 host requirements.
1358 RFC\ 1812 for the IPv4 router requirements.