2 .\" Don't change the line above. it tells man that tbl is needed.
3 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
4 .\" Permission is granted to distribute possibly modified copies
5 .\" of this page provided the header is included verbatim,
6 .\" and in case of nontrivial modification author and date
7 .\" of the modification is added to the header.
8 .\" $Id: ip.7,v 1.19 2000/12/20 18:10:31 ak Exp $
9 .TH IP 7 2001-06-19 "Linux" "Linux Programmer's Manual"
11 ip \- Linux IPv4 protocol implementation
13 .B #include <sys/socket.h>
15 .\" .B #include <net/netinet.h> -- does not exist anymore
16 .\" .B #include <linux/errqueue.h> -- never include <linux/foo.h>
17 .B #include <netinet/in.h>
19 .B #include <netinet/ip.h> \fR/* superset of previous */
21 .IB tcp_socket " = socket(PF_INET, SOCK_STREAM, 0);"
23 .IB udp_socket " = socket(PF_INET, SOCK_DGRAM, 0);"
25 .IB raw_socket " = socket(PF_INET, SOCK_RAW, " protocol ");"
27 Linux implements the Internet Protocol, version 4,
28 described in RFC\ 791 and RFC\ 1122.
31 multicasting implementation conforming to RFC\ 1112.
32 It also contains an IP router including a packet filter.
33 .\" FIXME has someone verified that 2.1 is really 1812 compliant?
35 The programming interface is BSD sockets compatible.
36 For more information on sockets, see
39 An IP socket is created by calling the
42 .BR "socket(PF_INET, socket_type, protocol)" .
43 Valid socket types are
55 socket to access the IP protocol directly.
57 is the IP protocol in the IP header to be received or sent.
58 The only valid values for
72 a valid IANA IP protocol defined in
76 .\" FIXME ip current does an autobind in listen, but I'm not sure
77 .\" if that should be documented.
78 When a process wants to receive new incoming packets or connections, it
79 should bind a socket to a local interface address using
81 Only one IP socket may be bound to any given local (address, port) pair.
84 is specified in the bind call the socket will be bound to
91 are called on an unbound socket, it is automatically bound to a
92 random free port with the local address set to
95 A TCP local socket address that has been bound is unavailable for
96 some time after closing,
100 Care should be taken when using this flag as it
101 makes TCP less reliable.
103 An IP socket address is defined as a combination of an IP interface
104 address and a 16-bit port number.
105 The basic IP protocol does not supply port numbers, they
106 are implemented by higher level protocols like
112 is set to the IP protocol.
117 sa_family_t sin_family; /* address family: AF_INET */
118 uint16_t sin_port; /* port in network byte order */
119 struct in_addr sin_addr; /* internet address */
122 /* Internet address. */
124 uint32_t s_addr; /* address in network byte order */
132 This is required; in Linux 2.2 most networking functions return
134 when this setting is missing.
136 contains the port in network byte order.
137 The port numbers below 1024 are called
138 .IR "reserved ports" .
139 Only privileged processes (i.e., those having the
140 .B CAP_NET_BIND_SERVICE
144 Note that the raw IPv4 protocol as such has no concept of a
145 port, they are only implemented by higher protocols like
151 is the IP host address.
156 contains the host interface address in network byte order.
158 should be assigned one of the INADDR_* values (e.g.,
163 .BR inet_makeaddr (3)
164 library functions or directly with the name resolver (see
165 .BR gethostbyname (3)).
166 IPv4 addresses are divided into unicast, broadcast
167 and multicast addresses.
168 Unicast addresses specify a single interface of a host,
169 broadcast addresses specify all hosts on a network and multicast
170 addresses address all hosts in a multicast group.
171 Datagrams to broadcast addresses can be only sent or received when the
174 In the current implementation connection oriented sockets are only allowed
175 to use unicast addresses.
176 .\" Leave a loophole for XTP @)
178 Note that the address and the port are always stored in
180 In particular, this means that you need to call
182 on the number that is assigned to a port.
183 All address/port manipulation
184 functions in the standard library work in network byte order.
186 There are several special addresses:
189 always refers to the local host via the loopback device;
192 means any address for binding;
195 means any host and has the same effect on bind as
197 for historical reasons.
199 IP supports some protocol-specific socket options that can be set with
203 The socket option level for IP is
205 .\" or SOL_IP on Linux
206 A boolean integer flag is zero when it is false, otherwise true.
208 .\" FIXME Document IP_FREEBIND
212 Sets or get the IP options to be sent with every packet from this
214 The arguments are a pointer to a memory buffer containing the options
215 and the option length.
218 call sets the IP options associated with a socket.
219 The maximum option size for IPv4 is 40 bytes.
220 See RFC\ 791 for the allowed
222 When the initial connection request packet for a
224 socket contains IP options, the IP options will be set automatically
225 to the options from the initial packet with routing headers reversed.
226 Incoming packets are not allowed to change options after the connection
228 The processing of all incoming source routing options
229 is disabled by default and can be enabled by using the
230 .B accept_source_route
232 Other options like timestamps are still handled.
233 For datagram sockets, IP options can be only set by the local user.
238 puts the current IP options used for sending into the supplied buffer.
243 ancillary message that contains a
245 structure that supplies some information about the incoming packet.
246 This only works for datagram oriented sockets.
247 The argument is a flag that tells the socket whether the
249 message should be passed or not.
250 The message itself can only be sent/retrieved
251 as control message with a packet using
259 unsigned int ipi_ifindex; /* Interface index */
260 struct in_addr ipi_spec_dst; /* Local address */
261 struct in_addr ipi_addr; /* Header Destination
267 .\" FIXME elaborate on that.
269 is the unique index of the interface the packet was received on.
271 is the local address of the packet and
273 is the destination address in the packet header.
279 .\" This field is grossly misnamed
281 is not zero, then it is used as the local source address for the routing
282 table lookup and for setting up IP source route options.
285 is not zero the primary local address of the interface specified by the
288 for the routing table lookup.
293 ancillary message is passed with incoming packets.
294 It contains a byte which specifies the Type of Service/Precedence
295 field of the packet header.
296 Expects a boolean integer flag.
299 When this flag is set
302 control message with the time to live
303 field of the received packet as a byte.
309 Pass all incoming IP options to the user in a
312 The routing header and other options are already filled in
321 but returns raw unprocessed options with timestamp and route record
322 options not filled in for this hop.
325 Set or receive the Type-Of-Service (TOS) field that is sent
326 with every IP packet originating from this socket.
327 It is used to prioritize packets on the network.
329 There are some standard TOS flags defined:
331 to minimize delays for interactive traffic,
333 to optimize throughput,
335 to optimize for reliability,
337 should be used for "filler data" where slow transmission doesn't matter.
338 At most one of these TOS values can be specified.
339 Other bits are invalid and shall be cleared.
342 datagrams first by default,
343 but the exact behavior depends on the configured queueing discipline.
344 .\" FIXME elaborate on this
345 Some high priority levels may require superuser privileges (the
348 The priority can also be set in a protocol independent way by the
349 .RB ( SOL_SOCKET ", " SO_PRIORITY )
354 Set or retrieve the current time to live field that is used in every packet
355 sent from this socket.
359 the user supplies an IP header in front of the user data.
365 for more information.
366 When this flag is enabled the values set by
373 .BR IP_RECVERR " (defined in \fI<linux/errqueue.h>\fP)"
374 Enable extended reliable error message passing.
375 When enabled on a datagram socket all
376 generated errors will be queued in a per-socket error queue.
378 receives an error from a socket operation the errors can
379 be received by calling
386 structure describing the error will be passed in a ancillary message with
391 .\" or SOL_IP on Linux
392 This is useful for reliable error handling on unconnected sockets.
393 The received data portion of the error queue
394 contains the error packet.
398 control message contains a
405 #define SO_EE_ORIGIN_NONE 0
406 #define SO_EE_ORIGIN_LOCAL 1
407 #define SO_EE_ORIGIN_ICMP 2
408 #define SO_EE_ORIGIN_ICMP6 3
410 struct sock_extended_err {
411 uint32_t ee_errno; /* error number */
412 uint8_t ee_origin; /* where the error originated */
413 uint8_t ee_type; /* type */
414 uint8_t ee_code; /* code */
416 uint32_t ee_info; /* additional information */
417 uint32_t ee_data; /* other data */
418 /* More data may follow */
421 struct sockaddr *SO_EE_OFFENDER(struct sock_extended_err *);
428 number of the queued error.
430 is the origin code of where the error originated.
431 The other fields are protocol specific.
434 returns a pointer to the address of the network object
435 where the error originated from given a pointer to the ancillary message.
436 If this address is not known, the
442 and the other fields of the
448 structure as follows:
452 for errors received as an ICMP packet, or
453 .B SO_EE_ORIGIN_LOCAL
454 for locally generated errors.
455 Unknown values should be ignored.
459 are set from the type and code fields of the ICMP header.
461 contains the discovered MTU for
464 The message also contains the
465 .I sockaddr_in of the node
466 caused the error, which can be accessed with the
471 field of the SO_EE_OFFENDER address is
473 when the source was unknown.
474 When the error originated from the network, all IP options
475 .RI ( IP_OPTIONS ", " IP_TTL ", "
476 etc.) enabled on the socket and contained in the
477 error packet are passed as control messages.
478 The payload of the packet
479 causing the error is returned as normal payload.
480 .\" FIXME . Is it a good idea to document that? It is a dubious feature.
485 .\" has slightly different semantics. Instead of
486 .\" saving the errors for the next timeout, it passes all incoming
487 .\" errors immediately to the user.
488 .\" This might be useful for very short-lived TCP connections which
489 .\" need fast error handling. Use this option with care:
490 .\" it makes TCP unreliable
491 .\" by not allowing it to recover properly from routing
492 .\" shifts and other normal
493 .\" conditions and breaks the protocol specification.
494 Note that TCP has no error queue;
500 is valid for TCP, but all errors are
501 returned by socket function return or
507 enables passing of all received ICMP errors to the
508 application, otherwise errors are only reported on connected sockets
510 It sets or retrieves an integer boolean flag.
515 Sets or receives the Path MTU Discovery setting
517 When enabled, Linux will perform Path MTU Discovery
518 as defined in RFC\ 1191
520 The don't fragment flag is set on all outgoing datagrams.
521 The system-wide default is controlled by the
525 sockets, and disabled on all others.
528 sockets it is the user's responsibility to packetize the data
529 in MTU sized chunks and to do the retransmits if necessary.
530 The kernel will reject packets that are bigger than the known
531 path MTU if this flag is set (with
538 Path MTU discovery flags:Meaning
539 IP_PMTUDISC_WANT:Use per-route settings.
540 IP_PMTUDISC_DONT:Never do Path MTU Discovery.
541 IP_PMTUDISC_DO:Always do Path MTU Discovery.
542 IP_PMTUDISC_PROBE:Set DF but ignore Path MTU.
545 When PMTU discovery is enabled the kernel automatically keeps track of
546 the path MTU per destination host.
547 When it is connected to a specific peer with
549 the currently known path MTU can be retrieved conveniently using the
551 socket option (e.g., after a
554 It may change over time.
555 For connectionless sockets with many destinations
556 the new also MTU for a given destination can also be accessed using the
559 A new error will be queued for every incoming MTU update.
561 While MTU discovery is in progress initial packets from datagram sockets
563 Applications using UDP should be aware of this and not
564 take it into account for their packet retransmit strategy.
566 To bootstrap the path MTU discovery process on unconnected sockets it
567 is possible to start with a big datagram size
568 (up to 64K-headers bytes long) and let it shrink by updates of the
570 .\" FIXME this is an ugly hack
572 To get an initial estimate of the
573 path MTU connect a datagram socket to the destination address using
575 and retrieve the MTU by calling
581 It is possible to implement RFC 4821 MTU probing with
585 sockets by setting a value of
586 .BR IP_PMTUDISC_PROBE .
587 This is also particularly useful for diagnostic tools such as
589 that wish to deliberately send probe packets larger than
590 the observed Path MTU.
593 Retrieve the current known path MTU of the current socket.
594 Only valid when the socket has been connected.
601 Pass all to-be forwarded packets with the
605 Only valid for raw sockets.
606 This is useful, for instance, for user
608 The tapped packets are not forwarded by the kernel, it is
609 the users responsibility to send them out again.
610 Socket binding is ignored,
611 such packets are only filtered by protocol.
612 Expects an integer flag.
616 Set or reads the time-to-live value of outgoing multicast packets for this
618 It is very important for multicast packets to set the smallest TTL possible.
619 The default is 1 which means that multicast packets don't leave the local
620 network unless the user program explicitly requests it.
626 Sets or reads a boolean integer argument whether sent multicast
627 packets should be looped back to the local sockets.
631 Join a multicast group.
639 struct in_addr imr_multiaddr; /* IP multicast group
641 struct in_addr imr_address; /* IP address of local
643 int imr_ifindex; /* interface index */
649 contains the address of the multicast group the application
650 wants to join or leave.
651 It must be a valid multicast address.
653 is the address of the local interface with which the system
654 should join the multicast
655 group; if it is equal to
657 an appropriate interface is chosen by the system.
659 is the interface index of the interface that should join/leave the
661 group, or 0 to indicate any interface.
663 For compatibility, the old
665 structure is still supported.
668 only by not including
676 .B IP_DROP_MEMBERSHIP
677 Leave a multicast group.
683 .BR IP_ADD_MEMBERSHIP .
687 Set the local device for a multicast socket.
693 .BR IP_ADD_MEMBERSHIP .
695 When an invalid socket option is passed,
700 supports the sysctl interface to configure some global options.
701 The sysctls can be accessed by reading or writing the
702 .I /proc/sys/net/ipv4/*
704 .\" FIXME As at 2.6.12, 14 Jun 2005, the following are undocumented:
709 Variables described as
711 take an integer value, with a non-zero value ("true") meaning that
712 the corresponding option is enabled, and a zero value ("false")
713 meaning that the option is disabled.
716 .BR ip_always_defrag " (Boolean)"
717 [New with kernel 2.2.13; in earlier kernel versions this feature
718 was controlled at compile time by the
719 .B CONFIG_IP_ALWAYS_DEFRAG
720 option; this option is not present in 2.4.x and later]
722 When this boolean frag is enabled (not equal 0) incoming fragments
724 that arose when some host between origin and destination decided
725 that the packets were too large and cut them into pieces) will be
726 reassembled (defragmented) before being processed, even if they are
727 about to be forwarded.
729 Only enable if running either a firewall that is the sole link
730 to your network or a transparent proxy; never ever use it for a
731 normal router or host.
732 Otherwise fragmented communication can be disturbed
733 if the fragments travel over different links.
734 Defragmentation also has a large memory and CPU time cost.
736 This is automagically turned on when masquerading or transparent
737 proxying are configured.
741 .\" FIXME document ip_autoconfig
745 .BR ip_default_ttl " (integer; default: 64)"
746 Set the default time-to-live value of outgoing packets.
747 This can be changed per socket with the
752 .BR ip_dynaddr " (Boolean; default: disabled)"
753 Enable dynamic socket address and masquerading entry rewriting on interface
755 This is useful for dialup interface with changing IP addresses.
756 0 means no rewriting, 1 turns it on and 2 enables verbose mode.
759 .BR ip_forward " (Boolean; default: disabled)"
760 Enable IP forwarding with a boolean flag.
761 IP forwarding can be also set on a per interface basis.
764 .B ip_local_port_range
765 Contains two integers that define the default local port range
766 allocated to sockets.
767 Allocation starts with the first number and ends with the second number.
768 Note that these should not conflict with the ports used by masquerading
769 (although the case is handled).
770 Also arbitrary choices may cause problems with some firewall packet
771 filters that make assumptions about the local ports in use.
772 First number should be at least >1024, better >4096 to avoid clashes
773 with well known ports and to minimize firewall problems.
776 .BR ip_no_pmtu_disc " (Boolean; default: disabled)"
777 If enabled, don't do Path MTU Discovery for TCP sockets by default.
778 Path MTU discovery may fail if misconfigured firewalls (that drop
779 all ICMP packets) or misconfigured interfaces (e.g., a point-to-point
780 link where the both ends don't agree on the MTU) are on the path.
781 It is better to fix the broken routers on the path than to turn off
782 Path MTU Discovery globally, because not doing it incurs a high cost
785 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
787 .BR ip_nonlocal_bind " (Boolean; default: disabled)"
788 If set, allows processes to
790 to non-local IP addresses,
791 which can be quite useful, but may break some applications.
793 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
795 .BR ip6frag_time " (integer; default 30)"
796 Time in seconds to keep an IPv6 fragment in memory.
798 .\" The following is from 2.6.12: Documentation/networking/ip-sysctl.txt
800 .BR ip6frag_secret_interval " (integer; default 600)"
801 Regeneration interval (in seconds) of the hash secret (or lifetime
802 for the hash secret) for IPv6 fragments.
804 .BR ipfrag_high_thresh " (integer), " ipfrag_low_thresh " (integer)"
805 If the amount of queued IP fragments reaches
806 .BR ipfrag_high_thresh ,
809 .BR ipfrag_low_thresh .
810 Contains an integer with the number of
816 .\" FIXME Document the conf/*/* sysctls
817 .\" FIXME Document the route/* sysctls
818 .\" FIXME document them all
820 All ioctls described in
824 .\" commented out the following because ipchains is obsolete
826 .\" The ioctls to configure firewalling are documented in
832 Ioctls to configure generic device parameters are described in
834 .\" FIXME Add a discussion of multicasting
836 .\" FIXME document all errors.
837 .\" We should really fix the kernels to give more uniform
838 .\" error returns (ENOMEM vs ENOBUFS, EPERM vs EACCES etc.)
841 The user tried to execute an operation without the necessary permissions.
843 sending a packet to a broadcast address without having the
846 sending a packet via a
849 modifying firewall settings without superuser privileges (the
852 binding to a reserved port without superuser privileges (the
853 .B CAP_NET_BIND_SERVICE
857 Tried to bind to an address already in use.
860 A non-existent interface was requested or the requested source
865 Operation on a non-blocking socket would block.
868 An connection operation on a non-blocking socket is already in progress.
871 A connection was closed during an
875 No valid routing table entry matches the destination address.
876 This error can be caused by a ICMP message from a remote router or
877 for the local routing table.
880 Invalid argument passed.
881 For send operations this can be caused by sending to a
887 was called on an already connected socket.
890 Datagram is bigger than an MTU on the path and it cannot be fragmented.
892 .BR ENOBUFS ", " ENOMEM
893 Not enough free memory.
894 This often means that the memory allocation is limited by the socket
895 buffer limits, not by the system memory, but this is not
900 was called on a socket where no packet arrived.
903 A kernel subsystem was not configured.
905 .BR ENOPROTOOPT " and " EOPNOTSUPP
906 Invalid socket option passed.
909 The operation is only defined on a connected socket, but the socket wasn't
913 User doesn't have permission to set high priority, change configuration,
914 or send signals to the requested process or group.
917 The connection was unexpectedly closed or shut down by the other end.
920 The socket is not configured or an unknown socket type was requested.
922 Other errors may be generated by the overlaying protocols; see
930 .BR IP_MTU_DISCOVER ,
935 are new options in Linux 2.2.
936 They are also all Linux-specific and should not be used in
937 programs intended to be portable.
940 .\" To be confirmed that IP_PMTUDISC_PROBE makes it into kernel 2.6.22
942 is new in Linux 2.6.22.
946 Linux 2.0 only supported
949 The sysctls were introduced with Linux 2.2.
951 Be very careful with the
953 option \- it is not privileged in Linux.
954 It is easy to overload the network
955 with careless broadcasts.
956 For new application protocols
957 it is better to use a multicast group instead of broadcasting.
958 Broadcasting is discouraged.
960 Some other BSD sockets implementations provide
964 socket options to get the destination address and the interface of
966 Linux has the more general
970 Some BSD sockets implementations also provide an
972 option, but an ancillary message with type
974 is passed with the incoming packet.
975 This is different from the
977 option used in Linux.
981 socket options level isn't portable, BSD-based stacks use
985 For compatibility with Linux 2.0, the obsolete
986 .BI "socket(PF_INET, SOCK_PACKET, " protocol )
987 syntax is still supported to open a
990 This is deprecated and should be replaced by
991 .BI "socket(PF_PACKET, SOCK_RAW, " protocol )
993 The main difference is the new
995 address structure for generic link layer information instead of the old
998 There are too many inconsistent error values.
1000 The ioctls to configure IP-specific interface options and ARP tables are
1003 Some versions of glibc forget to declare
1005 Workaround currently is to copy it into your program from this man page.
1007 Receiving the original destination address with
1013 does not work in some 2.2 kernels.
1015 .\" This man page was written by Andi Kleen.
1021 .BR capabilities (7),
1028 RFC\ 791 for the original IP specification.
1030 RFC\ 1122 for the IPv4 host requirements.
1032 RFC\ 1812 for the IPv4 router requirements.
1033 .\" FIXME autobind INADDR REUSEADDR