]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/ip.7
ip.7: Fix incorrect sockopt name
[thirdparty/man-pages.git] / man7 / ip.7
1 '\" t
2 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
3 .\"
4 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
5 .\" Permission is granted to distribute possibly modified copies
6 .\" of this page provided the header is included verbatim,
7 .\" and in case of nontrivial modification author and date
8 .\" of the modification is added to the header.
9 .\" %%%LICENSE_END
10 .\"
11 .\" $Id: ip.7,v 1.19 2000/12/20 18:10:31 ak Exp $
12 .\"
13 .\" FIXME The following socket options are yet to be documented
14 .\" IP_XFRM_POLICY (2.5.48)
15 .\" Needs CAP_NET_ADMIN
16 .\" IP_IPSEC_POLICY (2.5.47)
17 .\" Needs CAP_NET_ADMIN
18 .\" IP_PASSSEC (2.6.17)
19 .\" Boolean
20 .\" commit 2c7946a7bf45ae86736ab3b43d0085e43947945c
21 .\" Author: Catherine Zhang <cxzhang@watson.ibm.com>
22 .\" IP_MINTTL (2.6.34)
23 .\" commit d218d11133d888f9745802146a50255a4781d37a
24 .\" Author: Stephen Hemminger <shemminger@vyatta.com>
25 .\" MCAST_JOIN_GROUP (2.4.22 / 2.6)
26 .\" MCAST_BLOCK_SOURCE (2.4.22 / 2.6)
27 .\" MCAST_UNBLOCK_SOURCE (2.4.22 / 2.6)
28 .\" MCAST_LEAVE_GROUP (2.4.22 / 2.6)
29 .\" MCAST_JOIN_SOURCE_GROUP (2.4.22 / 2.6)
30 .\" MCAST_LEAVE_SOURCE_GROUP (2.4.22 / 2.6)
31 .\" MCAST_MSFILTER (2.4.22 / 2.6)
32 .\" IP_UNICAST_IF (3.4)
33 .\" commit 76e21053b5bf33a07c76f99d27a74238310e3c71
34 .\" Author: Erich E. Hoover <ehoover@mines.edu>
35 .\"
36 .TH IP 7 2016-03-15 "Linux" "Linux Programmer's Manual"
37 .SH NAME
38 ip \- Linux IPv4 protocol implementation
39 .SH SYNOPSIS
40 .B #include <sys/socket.h>
41 .br
42 .\" .B #include <net/netinet.h> -- does not exist anymore
43 .\" .B #include <linux/errqueue.h> -- never include <linux/foo.h>
44 .B #include <netinet/in.h>
45 .br
46 .B #include <netinet/ip.h> \fR/* superset of previous */
47 .sp
48 .IB tcp_socket " = socket(AF_INET, SOCK_STREAM, 0);"
49 .br
50 .IB udp_socket " = socket(AF_INET, SOCK_DGRAM, 0);"
51 .br
52 .IB raw_socket " = socket(AF_INET, SOCK_RAW, " protocol ");"
53 .SH DESCRIPTION
54 Linux implements the Internet Protocol, version 4,
55 described in RFC\ 791 and RFC\ 1122.
56 .B ip
57 contains a level 2 multicasting implementation conforming to RFC\ 1112.
58 It also contains an IP router including a packet filter.
59 .\" FIXME . has someone verified that 2.1 is really 1812 compliant?
60 .PP
61 The programming interface is BSD-sockets compatible.
62 For more information on sockets, see
63 .BR socket (7).
64 .PP
65 An IP socket is created using
66 .BR socket (2):
67
68 socket(AF_INET, socket_type, protocol);
69
70 Valid socket types are
71 .B SOCK_STREAM
72 to open a
73 .BR tcp (7)
74 socket,
75 .B SOCK_DGRAM
76 to open a
77 .BR udp (7)
78 socket, or
79 .B SOCK_RAW
80 to open a
81 .BR raw (7)
82 socket to access the IP protocol directly.
83 .I protocol
84 is the IP protocol in the IP header to be received or sent.
85 The only valid values for
86 .I protocol
87 are 0 and
88 .B IPPROTO_TCP
89 for TCP sockets, and 0 and
90 .B IPPROTO_UDP
91 for UDP sockets.
92 For
93 .B SOCK_RAW
94 you may specify a valid IANA IP protocol defined in
95 RFC\ 1700 assigned numbers.
96 .PP
97 When a process wants to receive new incoming packets or connections, it
98 should bind a socket to a local interface address using
99 .BR bind (2).
100 In this case, only one IP socket may be bound to any given local
101 (address, port) pair.
102 When
103 .B INADDR_ANY
104 is specified in the bind call, the socket will be bound to
105 .I all
106 local interfaces.
107 When
108 .BR listen (2)
109 is called on an unbound socket, the socket is automatically bound
110 to a random free port with the local address set to
111 .BR INADDR_ANY .
112 When
113 .BR connect (2)
114 is called on an unbound socket, the socket is automatically bound
115 to a random free port or to a usable shared port with the local address
116 set to
117 .BR INADDR_ANY .
118
119 A TCP local socket address that has been bound is unavailable for
120 some time after closing, unless the
121 .B SO_REUSEADDR
122 flag has been set.
123 Care should be taken when using this flag as it makes TCP less reliable.
124 .SS Address format
125 An IP socket address is defined as a combination of an IP interface
126 address and a 16-bit port number.
127 The basic IP protocol does not supply port numbers, they
128 are implemented by higher level protocols like
129 .BR udp (7)
130 and
131 .BR tcp (7).
132 On raw sockets
133 .I sin_port
134 is set to the IP protocol.
135 .PP
136 .in +4n
137 .nf
138 struct sockaddr_in {
139 sa_family_t sin_family; /* address family: AF_INET */
140 in_port_t sin_port; /* port in network byte order */
141 struct in_addr sin_addr; /* internet address */
142 };
143
144 /* Internet address. */
145 struct in_addr {
146 uint32_t s_addr; /* address in network byte order */
147 };
148 .fi
149 .in
150 .PP
151 .I sin_family
152 is always set to
153 .BR AF_INET .
154 This is required; in Linux 2.2 most networking functions return
155 .B EINVAL
156 when this setting is missing.
157 .I sin_port
158 contains the port in network byte order.
159 The port numbers below 1024 are called
160 .IR "privileged ports"
161 (or sometimes:
162 .IR "reserved ports" ).
163 Only privileged processes (i.e., those having the
164 .B CAP_NET_BIND_SERVICE
165 capability) may
166 .BR bind (2)
167 to these sockets.
168 Note that the raw IPv4 protocol as such has no concept of a
169 port, they are implemented only by higher protocols like
170 .BR tcp (7)
171 and
172 .BR udp (7).
173 .PP
174 .I sin_addr
175 is the IP host address.
176 The
177 .I s_addr
178 member of
179 .I struct in_addr
180 contains the host interface address in network byte order.
181 .I in_addr
182 should be assigned one of the
183 .BR INADDR_*
184 values (e.g.,
185 .BR INADDR_ANY )
186 or set using the
187 .BR inet_aton (3),
188 .BR inet_addr (3),
189 .BR inet_makeaddr (3)
190 library functions or directly with the name resolver (see
191 .BR gethostbyname (3)).
192
193 IPv4 addresses are divided into unicast, broadcast
194 and multicast addresses.
195 Unicast addresses specify a single interface of a host,
196 broadcast addresses specify all hosts on a network and multicast
197 addresses address all hosts in a multicast group.
198 Datagrams to broadcast addresses can be sent or received only when the
199 .B SO_BROADCAST
200 socket flag is set.
201 In the current implementation, connection-oriented sockets are allowed
202 to use only unicast addresses.
203 .\" Leave a loophole for XTP @)
204
205 Note that the address and the port are always stored in
206 network byte order.
207 In particular, this means that you need to call
208 .BR htons (3)
209 on the number that is assigned to a port.
210 All address/port manipulation
211 functions in the standard library work in network byte order.
212
213 There are several special addresses:
214 .B INADDR_LOOPBACK
215 (127.0.0.1)
216 always refers to the local host via the loopback device;
217 .B INADDR_ANY
218 (0.0.0.0)
219 means any address for binding;
220 .B INADDR_BROADCAST
221 (255.255.255.255)
222 means any host and has the same effect on bind as
223 .B INADDR_ANY
224 for historical reasons.
225 .SS Socket options
226 IP supports some protocol-specific socket options that can be set with
227 .BR setsockopt (2)
228 and read with
229 .BR getsockopt (2).
230 The socket option level for IP is
231 .BR IPPROTO_IP .
232 .\" or SOL_IP on Linux
233 A boolean integer flag is zero when it is false, otherwise true.
234
235 When an invalid socket option is specified,
236 .BR getsockopt (2)
237 and
238 .BR setsockopt (2)
239 fail with the error
240 .BR ENOPROTOOPT .
241 .TP
242 .BR IP_ADD_MEMBERSHIP " (since Linux 1.2)"
243 Join a multicast group.
244 Argument is an
245 .I ip_mreqn
246 structure.
247 .sp
248 .in +4n
249 .nf
250 struct ip_mreqn {
251 struct in_addr imr_multiaddr; /* IP multicast group
252 address */
253 struct in_addr imr_address; /* IP address of local
254 interface */
255 int imr_ifindex; /* interface index */
256 };
257 .fi
258 .in
259 .sp
260 .I imr_multiaddr
261 contains the address of the multicast group the application
262 wants to join or leave.
263 It must be a valid multicast address
264 .\" (i.e., within the 224.0.0.0-239.255.255.255 range)
265 (or
266 .BR setsockopt (2)
267 fails with the error
268 .BR EINVAL ).
269 .I imr_address
270 is the address of the local interface with which the system
271 should join the multicast group; if it is equal to
272 .BR INADDR_ANY ,
273 an appropriate interface is chosen by the system.
274 .I imr_ifindex
275 is the interface index of the interface that should join/leave the
276 .I imr_multiaddr
277 group, or 0 to indicate any interface.
278 .IP
279 The
280 .I ip_mreqn
281 structure is available only since Linux 2.2.
282 For compatibility, the old
283 .I ip_mreq
284 structure (present since Linux 1.2) is still supported;
285 it differs from
286 .I ip_mreqn
287 only by not including the
288 .I imr_ifindex
289 field.
290 (The kernel determines which structure is being passed based
291 on the size passed in
292 .IR optlen .)
293
294 .B IP_ADD_MEMBERSHIP
295 is valid only for
296 .BR setsockopt (2).
297 .\"
298 .TP
299 .BR IP_ADD_SOURCE_MEMBERSHIP " (since Linux 2.4.22 / 2.5.68)"
300 Join a multicast group and allow receiving data only
301 from a specified source.
302 Argument is an
303 .I ip_mreq_source
304 structure.
305 .sp
306 .in +4n
307 .nf
308 struct ip_mreq_source {
309 struct in_addr imr_multiaddr; /* IP multicast group
310 address */
311 struct in_addr imr_interface; /* IP address of local
312 interface */
313 struct in_addr imr_sourceaddr; /* IP address of
314 multicast source */
315 };
316 .fi
317 .in
318 .sp
319 The
320 .I ip_mreq_source
321 structure is similar to
322 .I ip_mreqn
323 described under
324 .BR IP_ADD_MEMBERSIP .
325 The
326 .I imr_multiaddr
327 field contains the address of the multicast group the application
328 wants to join or leave.
329 The
330 .I imr_interface
331 field is the address of the local interface with which
332 the system should join the multicast group.
333 Finally, the
334 .I imr_sourceaddr
335 field contains the address of the source the
336 application wants to receive data from.
337 .IP
338 This option can be used multiple times to allow
339 receiving data from more than one source.
340 .TP
341 .BR IP_BIND_ADDRESS_NO_PORT " (since Linux 4.2)"
342 .\" commit 90c337da1524863838658078ec34241f45d8394d
343 Inform the kernel to not reserve an ephemeral port when using
344 .BR bind (2)
345 with a port number of 0.
346 The port will later be automatically chosen at
347 .BR connect (2)
348 time,
349 in a way that allows sharing a source port as long as the 4-tuple is unique.
350 .TP
351 .BR IP_BLOCK_SOURCE " (since Linux 2.4.22 / 2.5.68)"
352 Stop receiving multicast data from a specific source in a given group.
353 This is valid only after the application has subscribed
354 to the multicast group using either
355 .BR IP_ADD_MEMBERSHIP
356 or
357 .BR IP_ADD_SOURCE_MEMBERSHIP .
358 .IP
359 Argument is an
360 .I ip_mreq_source
361 structure as described under
362 .BR IP_ADD_SOURCE_MEMBERSHIP .
363 .TP
364 .BR IP_DROP_MEMBERSHIP " (since Linux 1.2)"
365 Leave a multicast group.
366 Argument is an
367 .I ip_mreqn
368 or
369 .I ip_mreq
370 structure similar to
371 .BR IP_ADD_MEMBERSHIP .
372 .TP
373 .BR IP_DROP_SOURCE_MEMBERSHIP " (since Linux 2.4.22 / 2.5.68)"
374 Leave a source-specific group\(emthat is, stop receiving data from
375 a given multicast group that come from a given source.
376 If the application has subscribed to multiple sources within
377 the same group, data from the remaining sources will still be delivered.
378 To stop receiving data from all sources at once, use
379 .BR IP_DROP_MEMBERSHIP .
380 .IP
381 Argument is an
382 .I ip_mreq_source
383 structure as described under
384 .BR IP_ADD_SOURCE_MEMBERSHIP .
385 .TP
386 .BR IP_FREEBIND " (since Linux 2.4)"
387 .\" Precisely: 2.4.0-test10
388 If enabled, this boolean option allows binding to an IP address
389 that is nonlocal or does not (yet) exist.
390 This permits listening on a socket,
391 without requiring the underlying network interface or the
392 specified dynamic IP address to be up at the time that
393 the application is trying to bind to it.
394 This option is the per-socket equivalent of the
395 .IR ip_nonlocal_bind
396 .I /proc
397 interface described below.
398 .TP
399 .BR IP_HDRINCL " (since Linux 2.0)"
400 If enabled,
401 the user supplies an IP header in front of the user data.
402 Valid only for
403 .B SOCK_RAW
404 sockets; see
405 .BR raw (7)
406 for more information.
407 When this flag is enabled, the values set by
408 .BR IP_OPTIONS ,
409 .BR IP_TTL ,
410 and
411 .B IP_TOS
412 are ignored.
413 .TP
414 .BR IP_MSFILTER " (since Linux 2.4.22 / 2.5.68)"
415 This option provides access to the advanced full-state filtering API.
416 Argument is an
417 .I ip_msfilter
418 structure.
419 .sp
420 .in +4n
421 .nf
422 struct ip_msfilter {
423 struct in_addr imsf_multiaddr; /* IP multicast group
424 address */
425 struct in_addr imsf_interface; /* IP address of local
426 interface */
427 uint32_t imsf_fmode; /* Filter-mode */
428
429 uint32_t imsf_numsrc; /* Number of sources in
430 the following array */
431 struct in_addr imsf_slist[1]; /* Array of source
432 addresses */
433 };
434 .fi
435 .in
436 .sp
437 There are two macros,
438 .BR MCAST_INCLUDE
439 and
440 .BR MCAST_EXCLUDE ,
441 which can be used to specify the filtering mode.
442 Additionally, the
443 .BR IP_MSFILTER_SIZE (n)
444 macro exists to determine how much memory is needed to store
445 .I ip_msfilter
446 structure with
447 .I n
448 sources in the source list.
449 .IP
450 For the full description of multicast source filtering
451 refer to RFC 3376.
452 .TP
453 .BR IP_MTU " (since Linux 2.2)"
454 .\" Precisely: 2.1.124
455 Retrieve the current known path MTU of the current socket.
456 Returns an integer.
457
458 .B IP_MTU
459 is valid only for
460 .BR getsockopt (2)
461 and can be employed only when the socket has been connected.
462 .TP
463 .BR IP_MTU_DISCOVER " (since Linux 2.2)"
464 .\" Precisely: 2.1.124
465 Set or receive the Path MTU Discovery setting for a socket.
466 When enabled, Linux will perform Path MTU Discovery
467 as defined in RFC\ 1191 on
468 .B SOCK_STREAM
469 sockets.
470 For
471 .RB non- SOCK_STREAM
472 sockets,
473 .B IP_PMTUDISC_DO
474 forces the don't-fragment flag to be set on all outgoing packets.
475 It is the user's responsibility to packetize the data
476 in MTU-sized chunks and to do the retransmits if necessary.
477 The kernel will reject (with
478 .BR EMSGSIZE )
479 datagrams that are bigger than the known path MTU.
480 .B IP_PMTUDISC_WANT
481 will fragment a datagram if needed according to the path MTU,
482 or will set the don't-fragment flag otherwise.
483
484 The system-wide default can be toggled between
485 .B IP_PMTUDISC_WANT
486 and
487 .B IP_PMTUDISC_DONT
488 by writing (respectively, zero and nonzero values) to the
489 .I /proc/sys/net/ipv4/ip_no_pmtu_disc
490 file.
491 .TS
492 tab(:);
493 c l
494 l l.
495 Path MTU discovery value:Meaning
496 IP_PMTUDISC_WANT:Use per-route settings.
497 IP_PMTUDISC_DONT:Never do Path MTU Discovery.
498 IP_PMTUDISC_DO:Always do Path MTU Discovery.
499 IP_PMTUDISC_PROBE:Set DF but ignore Path MTU.
500 .TE
501
502 When PMTU discovery is enabled, the kernel automatically keeps track of
503 the path MTU per destination host.
504 When it is connected to a specific peer with
505 .BR connect (2),
506 the currently known path MTU can be retrieved conveniently using the
507 .B IP_MTU
508 socket option (e.g., after an
509 .B EMSGSIZE
510 error occurred).
511 The path MTU may change over time.
512 For connectionless sockets with many destinations,
513 the new MTU for a given destination can also be accessed using the
514 error queue (see
515 .BR IP_RECVERR ).
516 A new error will be queued for every incoming MTU update.
517
518 While MTU discovery is in progress, initial packets from datagram sockets
519 may be dropped.
520 Applications using UDP should be aware of this and not
521 take it into account for their packet retransmit strategy.
522
523 To bootstrap the path MTU discovery process on unconnected sockets, it
524 is possible to start with a big datagram size
525 (up to 64K-headers bytes long) and let it shrink by updates of the path MTU.
526 .\" FIXME . this is an ugly hack
527
528 To get an initial estimate of the
529 path MTU, connect a datagram socket to the destination address using
530 .BR connect (2)
531 and retrieve the MTU by calling
532 .BR getsockopt (2)
533 with the
534 .B IP_MTU
535 option.
536
537 It is possible to implement RFC 4821 MTU probing with
538 .B SOCK_DGRAM
539 or
540 .B SOCK_RAW
541 sockets by setting a value of
542 .BR IP_PMTUDISC_PROBE
543 (available since Linux 2.6.22).
544 This is also particularly useful for diagnostic tools such as
545 .BR tracepath (8)
546 that wish to deliberately send probe packets larger than
547 the observed Path MTU.
548 .TP
549 .BR IP_MULTICAST_ALL " (since Linux 2.6.31)"
550 This option can be used to modify the delivery policy of multicast messages
551 to sockets bound to the wildcard
552 .B INADDR_ANY
553 address.
554 The argument is a boolean integer (defaults to 1).
555 If set to 1,
556 the socket will receive messages from all the groups that have been joined
557 globally on the whole system.
558 Otherwise, it will deliver messages only from
559 the groups that have been explicitly joined (for example via the
560 .B IP_ADD_MEMBERSHIP
561 option) on this particular socket.
562 .TP
563 .BR IP_MULTICAST_IF " (since Linux 1.2)"
564 Set the local device for a multicast socket.
565 The argument for
566 .BR setsockopt (2)
567 is an
568 .I ip_mreqn
569 or
570 .\" net: IP_MULTICAST_IF setsockopt now recognizes struct mreq
571 .\" Commit: 3a084ddb4bf299a6e898a9a07c89f3917f0713f7
572 (since Linux 3.5)
573 .I ip_mreq
574 structure similar to
575 .BR IP_ADD_MEMBERSHIP ,
576 or an
577 .I in_addr
578 structure.
579 (The kernel determines which structure is being passed based
580 on the size passed in
581 .IR optlen .)
582 For
583 .BR getsockopt (2),
584 the argument is an
585 .I in_addr
586 structure.
587 .TP
588 .BR IP_MULTICAST_LOOP " (since Linux 1.2)"
589 Set or read a boolean integer argument that determines whether
590 sent multicast packets should be looped back to the local sockets.
591 .TP
592 .BR IP_MULTICAST_TTL " (since Linux 1.2)"
593 Set or read the time-to-live value of outgoing multicast packets for this
594 socket.
595 It is very important for multicast packets to set the smallest TTL possible.
596 The default is 1 which means that multicast packets don't leave the local
597 network unless the user program explicitly requests it.
598 Argument is an integer.
599 .TP
600 .BR IP_NODEFRAG " (since Linux 2.6.36)"
601 If enabled (argument is nonzero),
602 the reassembly of outgoing packets is disabled in the netfilter layer.
603 The argument is an integer.
604
605 This option is valid only for
606 .B SOCK_RAW
607 sockets.
608 .TP
609 .BR IP_OPTIONS " (since Linux 2.0)"
610 .\" Precisely: 1.3.30
611 Set or get the IP options to be sent with every packet from this socket.
612 The arguments are a pointer to a memory buffer containing the options
613 and the option length.
614 The
615 .BR setsockopt (2)
616 call sets the IP options associated with a socket.
617 The maximum option size for IPv4 is 40 bytes.
618 See RFC\ 791 for the allowed options.
619 When the initial connection request packet for a
620 .B SOCK_STREAM
621 socket contains IP options, the IP options will be set automatically
622 to the options from the initial packet with routing headers reversed.
623 Incoming packets are not allowed to change options after the connection
624 is established.
625 The processing of all incoming source routing options
626 is disabled by default and can be enabled by using the
627 .I accept_source_route
628 .I /proc
629 interface.
630 Other options like timestamps are still handled.
631 For datagram sockets, IP options can be only set by the local user.
632 Calling
633 .BR getsockopt (2)
634 with
635 .B IP_OPTIONS
636 puts the current IP options used for sending into the supplied buffer.
637 .TP
638 .BR IP_PKTINFO " (since Linux 2.2)"
639 .\" Precisely: 2.1.68
640 Pass an
641 .B IP_PKTINFO
642 ancillary message that contains a
643 .I pktinfo
644 structure that supplies some information about the incoming packet.
645 This only works for datagram oriented sockets.
646 The argument is a flag that tells the socket whether the
647 .B IP_PKTINFO
648 message should be passed or not.
649 The message itself can only be sent/retrieved
650 as control message with a packet using
651 .BR recvmsg (2)
652 or
653 .BR sendmsg (2).
654 .IP
655 .in +4n
656 .nf
657 struct in_pktinfo {
658 unsigned int ipi_ifindex; /* Interface index */
659 struct in_addr ipi_spec_dst; /* Local address */
660 struct in_addr ipi_addr; /* Header Destination
661 address */
662 };
663 .fi
664 .in
665 .IP
666 .\" FIXME . elaborate on that.
667 .I ipi_ifindex
668 is the unique index of the interface the packet was received on.
669 .I ipi_spec_dst
670 is the local address of the packet and
671 .I ipi_addr
672 is the destination address in the packet header.
673 If
674 .B IP_PKTINFO
675 is passed to
676 .BR sendmsg (2)
677 and
678 .\" This field is grossly misnamed
679 .I ipi_spec_dst
680 is not zero, then it is used as the local source address for the routing
681 table lookup and for setting up IP source route options.
682 When
683 .I ipi_ifindex
684 is not zero, the primary local address of the interface specified by the
685 index overwrites
686 .I ipi_spec_dst
687 for the routing table lookup.
688 .TP
689 .BR IP_RECVERR " (since Linux 2.2)"
690 .\" Precisely: 2.1.15
691 Enable extended reliable error message passing.
692 When enabled on a datagram socket, all
693 generated errors will be queued in a per-socket error queue.
694 When the user receives an error from a socket operation,
695 the errors can be received by calling
696 .BR recvmsg (2)
697 with the
698 .B MSG_ERRQUEUE
699 flag set.
700 The
701 .I sock_extended_err
702 structure describing the error will be passed in an ancillary message with
703 the type
704 .B IP_RECVERR
705 and the level
706 .BR IPPROTO_IP .
707 .\" or SOL_IP on Linux
708 This is useful for reliable error handling on unconnected sockets.
709 The received data portion of the error queue contains the error packet.
710 .IP
711 The
712 .B IP_RECVERR
713 control message contains a
714 .I sock_extended_err
715 structure:
716 .IP
717 .in +4n
718 .ne 18
719 .nf
720 #define SO_EE_ORIGIN_NONE 0
721 #define SO_EE_ORIGIN_LOCAL 1
722 #define SO_EE_ORIGIN_ICMP 2
723 #define SO_EE_ORIGIN_ICMP6 3
724
725 struct sock_extended_err {
726 uint32_t ee_errno; /* error number */
727 uint8_t ee_origin; /* where the error originated */
728 uint8_t ee_type; /* type */
729 uint8_t ee_code; /* code */
730 uint8_t ee_pad;
731 uint32_t ee_info; /* additional information */
732 uint32_t ee_data; /* other data */
733 /* More data may follow */
734 };
735
736 struct sockaddr *SO_EE_OFFENDER(struct sock_extended_err *);
737 .fi
738 .in
739 .IP
740 .I ee_errno
741 contains the
742 .I errno
743 number of the queued error.
744 .I ee_origin
745 is the origin code of where the error originated.
746 The other fields are protocol-specific.
747 The macro
748 .B SO_EE_OFFENDER
749 returns a pointer to the address of the network object
750 where the error originated from given a pointer to the ancillary message.
751 If this address is not known, the
752 .I sa_family
753 member of the
754 .I sockaddr
755 contains
756 .B AF_UNSPEC
757 and the other fields of the
758 .I sockaddr
759 are undefined.
760 .IP
761 IP uses the
762 .I sock_extended_err
763 structure as follows:
764 .I ee_origin
765 is set to
766 .B SO_EE_ORIGIN_ICMP
767 for errors received as an ICMP packet, or
768 .B SO_EE_ORIGIN_LOCAL
769 for locally generated errors.
770 Unknown values should be ignored.
771 .I ee_type
772 and
773 .I ee_code
774 are set from the type and code fields of the ICMP header.
775 .I ee_info
776 contains the discovered MTU for
777 .B EMSGSIZE
778 errors.
779 The message also contains the
780 .I sockaddr_in of the node
781 caused the error, which can be accessed with the
782 .B SO_EE_OFFENDER
783 macro.
784 The
785 .I sin_family
786 field of the
787 .B SO_EE_OFFENDER
788 address is
789 .B AF_UNSPEC
790 when the source was unknown.
791 When the error originated from the network, all IP options
792 .RB ( IP_OPTIONS ", " IP_TTL ", "
793 etc.) enabled on the socket and contained in the
794 error packet are passed as control messages.
795 The payload of the packet causing the error is returned as normal payload.
796 .\" FIXME . Is it a good idea to document that? It is a dubious feature.
797 .\" On
798 .\" .B SOCK_STREAM
799 .\" sockets,
800 .\" .B IP_RECVERR
801 .\" has slightly different semantics. Instead of
802 .\" saving the errors for the next timeout, it passes all incoming
803 .\" errors immediately to the user.
804 .\" This might be useful for very short-lived TCP connections which
805 .\" need fast error handling. Use this option with care:
806 .\" it makes TCP unreliable
807 .\" by not allowing it to recover properly from routing
808 .\" shifts and other normal
809 .\" conditions and breaks the protocol specification.
810 Note that TCP has no error queue;
811 .B MSG_ERRQUEUE
812 is not permitted on
813 .B SOCK_STREAM
814 sockets.
815 .B IP_RECVERR
816 is valid for TCP, but all errors are returned by socket function return or
817 .B SO_ERROR
818 only.
819 .IP
820 For raw sockets,
821 .B IP_RECVERR
822 enables passing of all received ICMP errors to the
823 application, otherwise errors are only reported on connected sockets
824 .IP
825 It sets or retrieves an integer boolean flag.
826 .B IP_RECVERR
827 defaults to off.
828 .TP
829 .BR IP_RECVOPTS " (since Linux 2.2)"
830 .\" Precisely: 2.1.15
831 Pass all incoming IP options to the user in a
832 .B IP_OPTIONS
833 control message.
834 The routing header and other options are already filled in
835 for the local host.
836 Not supported for
837 .B SOCK_STREAM
838 sockets.
839 .TP
840 .BR IP_RECVORIGDSTADDR " (since Linux 2.6.29)"
841 .\" commit e8b2dfe9b4501ed0047459b2756ba26e5a940a69
842 This boolean option enables the
843 .B IP_ORIGDSTADDR
844 ancillary message in
845 .BR recvmsg (2),
846 in which the kernel returns the original destination address
847 of the datagram being received.
848 The ancillary message contains a
849 .IR "struct sockaddr_in" .
850 .TP
851 .BR IP_RECVTOS " (since Linux 2.2)"
852 .\" Precisely: 2.1.68
853 If enabled, the
854 .B IP_TOS
855 ancillary message is passed with incoming packets.
856 It contains a byte which specifies the Type of Service/Precedence
857 field of the packet header.
858 Expects a boolean integer flag.
859 .TP
860 .BR IP_RECVTTL " (since Linux 2.2)"
861 .\" Precisely: 2.1.68
862 When this flag is set, pass a
863 .B IP_TTL
864 control message with the time-to-live
865 field of the received packet as a byte.
866 Not supported for
867 .B SOCK_STREAM
868 sockets.
869 .TP
870 .BR IP_RETOPTS " (since Linux 2.2)"
871 .\" Precisely: 2.1.15
872 Identical to
873 .BR IP_RECVOPTS ,
874 but returns raw unprocessed options with timestamp and route record
875 options not filled in for this hop.
876 .TP
877 .BR IP_ROUTER_ALERT " (since Linux 2.2)"
878 .\" Precisely: 2.1.68
879 Pass all to-be forwarded packets with the
880 IP Router Alert option set to this socket.
881 Valid only for raw sockets.
882 This is useful, for instance, for user-space RSVP daemons.
883 The tapped packets are not forwarded by the kernel; it is
884 the user's responsibility to send them out again.
885 Socket binding is ignored,
886 such packets are only filtered by protocol.
887 Expects an integer flag.
888 .TP
889 .BR IP_TOS " (since Linux 1.0)"
890 Set or receive the Type-Of-Service (TOS) field that is sent
891 with every IP packet originating from this socket.
892 It is used to prioritize packets on the network.
893 TOS is a byte.
894 There are some standard TOS flags defined:
895 .B IPTOS_LOWDELAY
896 to minimize delays for interactive traffic,
897 .B IPTOS_THROUGHPUT
898 to optimize throughput,
899 .B IPTOS_RELIABILITY
900 to optimize for reliability,
901 .B IPTOS_MINCOST
902 should be used for "filler data" where slow transmission doesn't matter.
903 At most one of these TOS values can be specified.
904 Other bits are invalid and shall be cleared.
905 Linux sends
906 .B IPTOS_LOWDELAY
907 datagrams first by default,
908 but the exact behavior depends on the configured queueing discipline.
909 .\" FIXME elaborate on this
910 Some high-priority levels may require superuser privileges (the
911 .B CAP_NET_ADMIN
912 capability).
913 .\" The priority can also be set in a protocol-independent way by the
914 .\" .RB ( SOL_SOCKET ", " SO_PRIORITY )
915 .\" socket option (see
916 .\" .BR socket (7)).
917 .TP
918 .BR IP_TRANSPARENT " (since Linux 2.6.24)"
919 .\" commit f5715aea4564f233767ea1d944b2637a5fd7cd2e
920 .\" This patch introduces the IP_TRANSPARENT socket option: enabling that
921 .\" will make the IPv4 routing omit the non-local source address check on
922 .\" output. Setting IP_TRANSPARENT requires NET_ADMIN capability.
923 .\" http://lwn.net/Articles/252545/
924 Setting this boolean option enables transparent proxying on this socket.
925 This socket option allows
926 the calling application to bind to a nonlocal IP address and operate
927 both as a client and a server with the foreign address as the local endpoint.
928 NOTE: this requires that routing be set up in a way that
929 packets going to the foreign address are routed through the TProxy box
930 (i.e., the system hosting the application that employs the
931 .B IP_TRANSPARENT
932 socket option).
933 Enabling this socket option requires superuser privileges
934 (the
935 .BR CAP_NET_ADMIN
936 capability).
937 .IP
938 TProxy redirection with the iptables TPROXY target also requires that
939 this option be set on the redirected socket.
940 .TP
941 .BR IP_TTL " (since Linux 1.0)"
942 Set or retrieve the current time-to-live field that is used in every packet
943 sent from this socket.
944 .TP
945 .BR IP_UNBLOCK_SOURCE " (since Linux 2.4.22 / 2.5.68)"
946 Unblock previously blocked multicast source.
947 Returns
948 .BR EADDRNOTAVAIL
949 when given source is not being blocked.
950 .IP
951 Argument is an
952 .I ip_mreq_source
953 structure as described under
954 .BR IP_ADD_SOURCE_MEMBERSHIP .
955 .SS /proc interfaces
956 The IP protocol
957 supports a set of
958 .I /proc
959 interfaces to configure some global parameters.
960 The parameters can be accessed by reading or writing files in the directory
961 .IR /proc/sys/net/ipv4/ .
962 .\" FIXME As at 2.6.12, 14 Jun 2005, the following are undocumented:
963 .\" ip_queue_maxlen
964 .\" ip_conntrack_max
965 Interfaces described as
966 .I Boolean
967 take an integer value, with a nonzero value ("true") meaning that
968 the corresponding option is enabled, and a zero value ("false")
969 meaning that the option is disabled.
970 .\"
971 .TP
972 .IR ip_always_defrag " (Boolean; since Linux 2.2.13)"
973 [New with kernel 2.2.13; in earlier kernel versions this feature
974 was controlled at compile time by the
975 .B CONFIG_IP_ALWAYS_DEFRAG
976 option; this option is not present in 2.4.x and later]
977
978 When this boolean flag is enabled (not equal 0), incoming fragments
979 (parts of IP packets
980 that arose when some host between origin and destination decided
981 that the packets were too large and cut them into pieces) will be
982 reassembled (defragmented) before being processed, even if they are
983 about to be forwarded.
984
985 Only enable if running either a firewall that is the sole link
986 to your network or a transparent proxy; never ever use it for a
987 normal router or host.
988 Otherwise, fragmented communication can be disturbed
989 if the fragments travel over different links.
990 Defragmentation also has a large memory and CPU time cost.
991
992 This is automagically turned on when masquerading or transparent
993 proxying are configured.
994 .\"
995 .TP
996 .IR ip_autoconfig " (since Linux 2.2 to 2.6.17)"
997 .\" Precisely: since 2.1.68
998 .\" FIXME document ip_autoconfig
999 Not documented.
1000 .\"
1001 .TP
1002 .IR ip_default_ttl " (integer; default: 64; since Linux 2.2)"
1003 .\" Precisely: 2.1.15
1004 Set the default time-to-live value of outgoing packets.
1005 This can be changed per socket with the
1006 .B IP_TTL
1007 option.
1008 .\"
1009 .TP
1010 .IR ip_dynaddr " (Boolean; default: disabled; since Linux 2.0.31)"
1011 Enable dynamic socket address and masquerading entry rewriting on interface
1012 address change.
1013 This is useful for dialup interface with changing IP addresses.
1014 0 means no rewriting, 1 turns it on and 2 enables verbose mode.
1015 .\"
1016 .TP
1017 .IR ip_forward " (Boolean; default: disabled; since Linux 1.2)"
1018 Enable IP forwarding with a boolean flag.
1019 IP forwarding can be also set on a per-interface basis.
1020 .\"
1021 .TP
1022 .IR ip_local_port_range " (since Linux 2.2)"
1023 .\" Precisely: since 2.1.68
1024 This file contains two integers that define the default local port range
1025 allocated to sockets that are not explicitly bound to a port number\(emthat
1026 is, the range used for
1027 .IR "ephemeral ports" .
1028 An ephemeral port is allocated to a socket in the following circumstances:
1029 .RS
1030 .IP * 3
1031 the port number in a socket address is specified as 0 when calling
1032 .BR bind (2);
1033 .IP *
1034 .BR listen (2)
1035 is called on a stream socket that was not previously bound;
1036 .IP *
1037 .BR connect (2)
1038 was called on a socket that was not previously bound;
1039 .IP *
1040 .BR sendto (2)
1041 is called on a datagram socket that was not previously bound.
1042 .RE
1043 .IP
1044 Allocation of ephemeral ports starts with the first number in
1045 .IR ip_local_port_range
1046 and ends with the second number.
1047 If the range of ephemeral ports is exhausted,
1048 then the relevant system call returns an error (but see BUGS).
1049 .IP
1050 Note that the port range in
1051 .IR ip_local_port_range
1052 should not conflict with the ports used by masquerading
1053 (although the case is handled).
1054 Also, arbitrary choices may cause problems with some firewall packet
1055 filters that make assumptions about the local ports in use.
1056 The first number should be at least greater than 1024,
1057 or better, greater than 4096, to avoid clashes
1058 with well known ports and to minimize firewall problems.
1059 .\"
1060 .TP
1061 .IR ip_no_pmtu_disc " (Boolean; default: disabled; since Linux 2.2)"
1062 .\" Precisely: 2.1.15
1063 If enabled, don't do Path MTU Discovery for TCP sockets by default.
1064 Path MTU discovery may fail if misconfigured firewalls (that drop
1065 all ICMP packets) or misconfigured interfaces (e.g., a point-to-point
1066 link where the both ends don't agree on the MTU) are on the path.
1067 It is better to fix the broken routers on the path than to turn off
1068 Path MTU Discovery globally, because not doing it incurs a high cost
1069 to the network.
1070 .\"
1071 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
1072 .TP
1073 .IR ip_nonlocal_bind " (Boolean; default: disabled; since Linux 2.4)"
1074 .\" Precisely: patch-2.4.0-test10
1075 If set, allows processes to
1076 .BR bind (2)
1077 to nonlocal IP addresses,
1078 which can be quite useful, but may break some applications.
1079 .\"
1080 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
1081 .TP
1082 .IR ip6frag_time " (integer; default: 30)"
1083 Time in seconds to keep an IPv6 fragment in memory.
1084 .\"
1085 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
1086 .TP
1087 .IR ip6frag_secret_interval " (integer; default: 600)"
1088 Regeneration interval (in seconds) of the hash secret (or lifetime
1089 for the hash secret) for IPv6 fragments.
1090 .TP
1091 .IR ipfrag_high_thresh " (integer), " ipfrag_low_thresh " (integer)"
1092 If the amount of queued IP fragments reaches
1093 .IR ipfrag_high_thresh ,
1094 the queue is pruned down to
1095 .IR ipfrag_low_thresh .
1096 Contains an integer with the number of bytes.
1097 .TP
1098 .I neigh/*
1099 See
1100 .BR arp (7).
1101 .\" FIXME Document the conf/*/* interfaces
1102 .\"
1103 .\" FIXME Document the route/* interfaces
1104 .SS Ioctls
1105 All ioctls described in
1106 .BR socket (7)
1107 apply to
1108 .BR ip .
1109 .\" 2006-04-02, mtk
1110 .\" commented out the following because ipchains is obsolete
1111 .\" .PP
1112 .\" The ioctls to configure firewalling are documented in
1113 .\" .BR ipfw (4)
1114 .\" from the
1115 .\" .B ipchains
1116 .\" package.
1117 .PP
1118 Ioctls to configure generic device parameters are described in
1119 .BR netdevice (7).
1120 .\" FIXME Add a discussion of multicasting
1121 .SH ERRORS
1122 .\" FIXME document all errors.
1123 .\" We should really fix the kernels to give more uniform
1124 .\" error returns (ENOMEM vs ENOBUFS, EPERM vs EACCES etc.)
1125 .TP
1126 .B EACCES
1127 The user tried to execute an operation without the necessary permissions.
1128 These include:
1129 sending a packet to a broadcast address without having the
1130 .B SO_BROADCAST
1131 flag set;
1132 sending a packet via a
1133 .I prohibit
1134 route;
1135 modifying firewall settings without superuser privileges (the
1136 .B CAP_NET_ADMIN
1137 capability);
1138 binding to a privileged port without superuser privileges (the
1139 .B CAP_NET_BIND_SERVICE
1140 capability).
1141 .TP
1142 .B EADDRINUSE
1143 Tried to bind to an address already in use.
1144 .TP
1145 .B EADDRNOTAVAIL
1146 A nonexistent interface was requested or the requested source
1147 address was not local.
1148 .TP
1149 .B EAGAIN
1150 Operation on a nonblocking socket would block.
1151 .TP
1152 .B EALREADY
1153 An connection operation on a nonblocking socket is already in progress.
1154 .TP
1155 .B ECONNABORTED
1156 A connection was closed during an
1157 .BR accept (2).
1158 .TP
1159 .B EHOSTUNREACH
1160 No valid routing table entry matches the destination address.
1161 This error can be caused by a ICMP message from a remote router or
1162 for the local routing table.
1163 .TP
1164 .B EINVAL
1165 Invalid argument passed.
1166 For send operations this can be caused by sending to a
1167 .I blackhole
1168 route.
1169 .TP
1170 .B EISCONN
1171 .BR connect (2)
1172 was called on an already connected socket.
1173 .TP
1174 .B EMSGSIZE
1175 Datagram is bigger than an MTU on the path and it cannot be fragmented.
1176 .TP
1177 .BR ENOBUFS ", " ENOMEM
1178 Not enough free memory.
1179 This often means that the memory allocation is limited by the socket
1180 buffer limits, not by the system memory, but this is not 100% consistent.
1181 .TP
1182 .B ENOENT
1183 .B SIOCGSTAMP
1184 was called on a socket where no packet arrived.
1185 .TP
1186 .B ENOPKG
1187 A kernel subsystem was not configured.
1188 .TP
1189 .BR ENOPROTOOPT " and " EOPNOTSUPP
1190 Invalid socket option passed.
1191 .TP
1192 .B ENOTCONN
1193 The operation is defined only on a connected socket, but the socket wasn't
1194 connected.
1195 .TP
1196 .B EPERM
1197 User doesn't have permission to set high priority, change configuration,
1198 or send signals to the requested process or group.
1199 .TP
1200 .B EPIPE
1201 The connection was unexpectedly closed or shut down by the other end.
1202 .TP
1203 .B ESOCKTNOSUPPORT
1204 The socket is not configured or an unknown socket type was requested.
1205 .PP
1206 Other errors may be generated by the overlaying protocols; see
1207 .BR tcp (7),
1208 .BR raw (7),
1209 .BR udp (7),
1210 and
1211 .BR socket (7).
1212 .SH NOTES
1213 .BR IP_FREEBIND ,
1214 .BR IP_MSFILTER ,
1215 .BR IP_MTU ,
1216 .BR IP_MTU_DISCOVER ,
1217 .BR IP_RECVORIGDSTADDR ,
1218 .BR IP_PKTINFO ,
1219 .BR IP_RECVERR ,
1220 .BR IP_ROUTER_ALERT ,
1221 and
1222 .BR IP_TRANSPARENT
1223 are Linux-specific.
1224 .\" IP_PASSSEC is Linux-specific
1225 .\" IP_XFRM_POLICY is Linux-specific
1226 .\" IP_IPSEC_POLICY is a nonstandard extension, also present on some BSDs
1227
1228 Be very careful with the
1229 .B SO_BROADCAST
1230 option \- it is not privileged in Linux.
1231 It is easy to overload the network
1232 with careless broadcasts.
1233 For new application protocols
1234 it is better to use a multicast group instead of broadcasting.
1235 Broadcasting is discouraged.
1236 .PP
1237 Some other BSD sockets implementations provide
1238 .B IP_RCVDSTADDR
1239 and
1240 .B IP_RECVIF
1241 socket options to get the destination address and the interface of
1242 received datagrams.
1243 Linux has the more general
1244 .B IP_PKTINFO
1245 for the same task.
1246 .PP
1247 Some BSD sockets implementations also provide an
1248 .B IP_RECVTTL
1249 option, but an ancillary message with type
1250 .B IP_RECVTTL
1251 is passed with the incoming packet.
1252 This is different from the
1253 .B IP_TTL
1254 option used in Linux.
1255 .PP
1256 Using
1257 .B SOL_IP
1258 socket options level isn't portable, BSD-based stacks use
1259 .B IPPROTO_IP
1260 level.
1261 .SS Compatibility
1262 For compatibility with Linux 2.0, the obsolete
1263 .BI "socket(AF_INET, SOCK_PACKET, " protocol )
1264 syntax is still supported to open a
1265 .BR packet (7)
1266 socket.
1267 This is deprecated and should be replaced by
1268 .BI "socket(AF_PACKET, SOCK_RAW, " protocol )
1269 instead.
1270 The main difference is the new
1271 .I sockaddr_ll
1272 address structure for generic link layer information instead of the old
1273 .BR sockaddr_pkt .
1274 .SH BUGS
1275 There are too many inconsistent error values.
1276 .PP
1277 The error used to diagnose exhaustion of the ephemeral port range differs
1278 across the various system calls
1279 .RB ( connect (2),
1280 .BR bind (2),
1281 .BR listen (2),
1282 .BR sendto (2))
1283 that can assign ephemeral ports.
1284 .PP
1285 The ioctls to configure IP-specific interface options and ARP tables are
1286 not described.
1287 .\" .PP
1288 .\" Some versions of glibc forget to declare
1289 .\" .IR in_pktinfo .
1290 .\" Workaround currently is to copy it into your program from this man page.
1291 .PP
1292 Receiving the original destination address with
1293 .B MSG_ERRQUEUE
1294 in
1295 .I msg_name
1296 by
1297 .BR recvmsg (2)
1298 does not work in some 2.2 kernels.
1299 .\" .SH AUTHORS
1300 .\" This man page was written by Andi Kleen.
1301 .SH SEE ALSO
1302 .BR recvmsg (2),
1303 .BR sendmsg (2),
1304 .BR byteorder (3),
1305 .BR ipfw (4),
1306 .BR capabilities (7),
1307 .BR icmp (7),
1308 .BR ipv6 (7),
1309 .BR netlink (7),
1310 .BR raw (7),
1311 .BR socket (7),
1312 .BR tcp (7),
1313 .BR udp (7)
1314 .PP
1315 RFC\ 791 for the original IP specification.
1316 RFC\ 1122 for the IPv4 host requirements.
1317 RFC\ 1812 for the IPv4 router requirements.