]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/packet.7
getcpu.2, getunwind.2, kcmp.2, mmap.2, perf_event_open.2, perfmonctl.2, quotactl...
[thirdparty/man-pages.git] / man7 / packet.7
CommitLineData
77117f4f 1.\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
2297bf0e 2.\"
00acdba1 3.\" %%%LICENSE_START(VERBATIM_ONE_PARA)
77117f4f
MK
4.\" Permission is granted to distribute possibly modified copies
5.\" of this page provided the header is included verbatim,
6.\" and in case of nontrivial modification author and date
7.\" of the modification is added to the header.
8ff7380d 8.\" %%%LICENSE_END
6a717e5e 9.\"
77117f4f 10.\" $Id: packet.7,v 1.13 2000/08/14 08:03:45 ak Exp $
6a717e5e 11.\"
342639d9 12.TH PACKET 7 2014-01-05 "Linux" "Linux Programmer's Manual"
77117f4f 13.SH NAME
239bd6e9 14packet \- packet interface on device level
77117f4f
MK
15.SH SYNOPSIS
16.nf
17.B #include <sys/socket.h>
18.br
19.B #include <netpacket/packet.h>
20.br
21.B #include <net/ethernet.h> /* the L2 protocols */
22.sp
d4c8c97c 23.BI "packet_socket = socket(AF_PACKET, int " socket_type ", int "protocol );
77117f4f
MK
24.fi
25.SH DESCRIPTION
26Packet sockets are used to receive or send raw packets at the device driver
27(OSI Layer 2) level.
28They allow the user to implement protocol modules in user space
29on top of the physical layer.
30
31The
32.I socket_type
33is either
34.B SOCK_RAW
c412686c 35for raw packets including the link-level header or
77117f4f 36.B SOCK_DGRAM
c412686c
MK
37for cooked packets with the link-level header removed.
38The link-level header information is available in a common format in a
77117f4f
MK
39.IR sockaddr_ll .
40.I protocol
084cf3d1 41is the IEEE 802.3 protocol number in network byte order.
77117f4f
MK
42See the
43.I <linux/if_ether.h>
44include file for a list of allowed protocols.
45When protocol
46is set to
47.B htons(ETH_P_ALL)
48then all protocols are received.
49All incoming packets of that protocol type will be passed to the packet
50socket before they are passed to the protocols implemented in the kernel.
51
52Only processes with effective UID 0 or the
53.B CAP_NET_RAW
54capability may open packet sockets.
55
56.B SOCK_RAW
57packets are passed to and from the device driver without any changes in
58the packet data.
59When receiving a packet, the address is still parsed and
60passed in a standard
61.I sockaddr_ll
62address structure.
63When transmitting a packet, the user supplied buffer
64should contain the physical layer header.
65That packet is then
66queued unmodified to the network driver of the interface defined by the
67destination address.
68Some device drivers always add other headers.
69.B SOCK_RAW
70is similar to but not compatible with the obsolete
d4c8c97c 71.B AF_INET/SOCK_PACKET
77117f4f
MK
72of Linux 2.0.
73
74.B SOCK_DGRAM
75operates on a slightly higher level.
76The physical header is removed before the packet is passed to the user.
77Packets sent through a
78.B SOCK_DGRAM
79packet socket get a suitable physical layer header based on the
80information in the
81.I sockaddr_ll
82destination address before they are queued.
83
84By default all packets of the specified protocol type
85are passed to a packet socket.
33a0ccb2 86To get packets only from a specific interface use
77117f4f
MK
87.BR bind (2)
88specifying an address in a
89.I struct sockaddr_ll
90to bind the packet socket to an interface.
91Only the
92.I sll_protocol
93and the
94.I sll_ifindex
95address fields are used for purposes of binding.
96
97The
98.BR connect (2)
99operation is not supported on packet sockets.
100
101When the
102.B MSG_TRUNC
103flag is passed to
104.BR recvmsg (2),
105.BR recv (2),
106.BR recvfrom (2)
107the real length of the packet on the wire is always returned,
108even when it is longer than the buffer.
c634028a 109.SS Address types
77117f4f
MK
110The sockaddr_ll is a device independent physical layer address.
111
112.in +4n
113.nf
114struct sockaddr_ll {
115 unsigned short sll_family; /* Always AF_PACKET */
116 unsigned short sll_protocol; /* Physical layer protocol */
117 int sll_ifindex; /* Interface number */
ed924588 118 unsigned short sll_hatype; /* ARP hardware type */
77117f4f
MK
119 unsigned char sll_pkttype; /* Packet type */
120 unsigned char sll_halen; /* Length of address */
121 unsigned char sll_addr[8]; /* Physical layer address */
122};
123.fi
124.in
125
126.I sll_protocol
084cf3d1 127is the standard ethernet protocol type in network byte order as defined
77117f4f
MK
128in the
129.I <linux/if_ether.h>
130include file.
131It defaults to the socket's protocol.
132.I sll_ifindex
133is the interface index of the interface
134(see
135.BR netdevice (7));
1360 matches any interface (only permitted for binding).
137.I sll_hatype
88ea664e 138is an ARP type as defined in the
77117f4f
MK
139.I <linux/if_arp.h>
140include file.
141.I sll_pkttype
142contains the packet type.
143Valid types are
144.B PACKET_HOST
145for a packet addressed to the local host,
146.B PACKET_BROADCAST
147for a physical layer broadcast packet,
148.B PACKET_MULTICAST
149for a packet sent to a physical layer multicast address,
150.B PACKET_OTHERHOST
151for a packet to some other host that has been caught by a device driver
152in promiscuous mode, and
153.B PACKET_OUTGOING
154for a packet originated from the local host that is looped back to a packet
155socket.
33a0ccb2 156These types make sense only for receiving.
77117f4f
MK
157.I sll_addr
158and
159.I sll_halen
160contain the physical layer (e.g., IEEE 802.3) address and its length.
161The exact interpretation depends on the device.
162
163When you send packets it is enough to specify
164.IR sll_family ,
165.IR sll_addr ,
166.IR sll_halen ,
167.IR sll_ifindex .
168The other fields should be 0.
169.I sll_hatype
170and
171.I sll_pkttype
172are set on received packets for your information.
173For bind only
174.I sll_protocol
175and
176.I sll_ifindex
177are used.
c634028a 178.SS Socket options
dbb4f751
WB
179Packet socket options are configured by calling
180.BR setsockopt (2)
181with level
182.BR SOL_PACKET .
183.TP
184.BR PACKET_ADD_MEMBERSHIP
185.PD 0
186.TP
187.BR PACKET_DROP_MEMBERSHIP
188.PD
77117f4f
MK
189Packet sockets can be used to configure physical layer multicasting
190and promiscuous mode.
77117f4f 191.B PACKET_ADD_MEMBERSHIP
dbb4f751 192adds a binding and
77117f4f 193.B PACKET_DROP_MEMBERSHIP
dbb4f751 194drops it.
77117f4f 195They both expect a
c412686c 196.I packet_mreq
77117f4f
MK
197structure as argument:
198
199.in +4n
200.nf
201struct packet_mreq {
202 int mr_ifindex; /* interface index */
203 unsigned short mr_type; /* action */
204 unsigned short mr_alen; /* address length */
205 unsigned char mr_address[8]; /* physical layer address */
206};
207.fi
208.in
209
210.B mr_ifindex
211contains the interface index for the interface whose status
212should be changed.
213The
214.B mr_type
215parameter specifies which action to perform.
216.B PACKET_MR_PROMISC
217enables receiving all packets on a shared medium (often known as
218"promiscuous mode"),
219.B PACKET_MR_MULTICAST
220binds the socket to the physical layer multicast group specified in
221.B mr_address
222and
223.BR mr_alen ,
224and
225.B PACKET_MR_ALLMULTI
226sets the socket up to receive all multicast packets arriving at
227the interface.
228
229In addition the traditional ioctls
230.BR SIOCSIFFLAGS ,
231.BR SIOCADDMULTI ,
232.B SIOCDELMULTI
233can be used for the same purpose.
dbb4f751
WB
234.TP
235.BR PACKET_AUXDATA " (since Linux 2.6.21)"
236.\" commit 8dc4194474159660d7f37c495e3fc3f10d0db8cc
237If this binary option is enabled, the packet socket passes a metadata
238structure along with each packet in the
239.BR recvmsg (2)
240control field.
241The structure can be read with
242.BR cmsg (3).
243It is defined as
244
245.in +4n
246.nf
247struct tpacket_auxdata {
248 __u32 tp_status;
249 __u32 tp_len; /* packet length */
250 __u32 tp_snaplen; /* captured length */
251 __u16 tp_mac;
252 __u16 tp_net;
253 __u16 tp_vlan_tci;
254 __u16 tp_padding;
255};
256.fi
257.in
258.TP
259.BR PACKET_FANOUT " (since Linux 3.1)"
260.\" commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc
261To scale processing across threads, packet sockets can form a fanout
262group.
263In this mode, each matching packet is enqueued onto only one
264socket in the group.
265A socket joins a fanout group by calling
266.BR setsockopt (2)
267with level
268.B SOL_PACKET
269and option
270.BR PACKET_FANOUT .
271Each network namespace can have up to 65536 independent groups.
272A socket selects a group by encoding the ID in the first 16 bits of
273the integer option value.
274The first packet socket to join a group implicitly creates it.
275To successfully join an existing group, subsequent packet sockets
c412686c 276must have the same protocol, device settings, fanout mode and
dbb4f751
WB
277flags (see below).
278Packet sockets can leave a fanout group only by closing the socket.
279The group is deleted when the last socket is closed.
280
281Fanout supports multiple algorithms to spread traffic between sockets.
282The default mode,
283.BR PACKET_FANOUT_HASH ,
284sends packets from the same flow to the same socket to maintain
285per-flow ordering.
286For each packet, it chooses a socket by taking the packet flow hash
287modulo the number of sockets in the group, where a flow hash is a hash
c412686c
MK
288over network-layer address and optional transport-layer port fields.
289The load-balance mode
dbb4f751
WB
290.BR PACKET_FANOUT_LB
291implements a round-robin algorithm.
292.BR PACKET_FANOUT_CPU
293selects the socket based on the CPU that the packet arrived on.
294.BR PACKET_FANOUT_ROLLOVER
295processes all data on a single socket, moves to the next when one
296becomes backlogged.
297.BR PACKET_FANOUT_RND
c412686c 298selects the socket using a pseudo-random number generator.
dbb4f751
WB
299
300Fanout modes can take additional options.
301IP fragmentation causes packets from the same flow to have different
302flow hashes.
303The flag
304.BR PACKET_FANOUT_FLAG_DEFRAG ,
305if set, causes packet to be defragmented before fanout is applied, to
306preserve order even in this case.
307Fanout mode and options are communicated in the second 16 bits of the
308integer option value.
309The flag
310.BR PACKET_FANOUT_FLAG_ROLLOVER
311enables the roll over mechanism as a backup strategy: if the
312original fanout algorithm selects a backlogged socket, the packet
313rolls over to the next available one.
314.TP
c412686c 315.BR PACKET_LOSS " (with " PACKET_TX_RING )
dbb4f751
WB
316If set, do not silently drop a packet on transmission error, but
317return it with status set to
318.BR TP_STATUS_WRONG_FORMAT .
319.TP
c412686c 320.BR PACKET_RESERVE " (with " PACKET_RX_RING )
dbb4f751
WB
321By default, a packet receive ring writes packets immediately following the
322metadata structure and alignment padding.
323This integer option reserves additional headroom.
324.TP
325.BR PACKET_RX_RING
9a141bfb 326Create a memory-mapped ring buffer for asynchronous packet reception.
dbb4f751
WB
327The packet socket reserves a contiguous region of application address
328space, lays it out into an array of packet slots and copies packets
329(up to
330.IR tp_snaplen
331) into subsequent slots.
332Each packet is preceded by a metadata structure similar to
333.IR tpacket_auxdata .
334The protocol fields encode the offset to the data
335from the start of the metadata header.
336.I tp_net
337stores the offset to the network layer.
338If the packet socket is of type
339.BR SOCK_DGRAM ,
340then
341.I tp_mac
342is the same.
343If it is of type
344.BR SOCK_RAW ,
c412686c 345then that field stores the offset to the link-layer frame.
dbb4f751
WB
346Packet socket and application communicate the head and tail of the ring
347through the
348.I tp_status
349field.
350The packet socket owns all slots with status
351.BR TP_STATUS_KERNEL .
352After filling a slot, it changes the status of the slot to transfer
353ownership to the application.
354During normal operation, the new status is
355.BR TP_STATUS_USER ,
356to signal that a correctly received packet has been stored.
357When the application has finished processing a packet, it transfers
358ownership of the slot back to the socket by setting the status to
359.BR TP_STATUS_KERNEL .
360Packet sockets implement multiple variants of the packet ring.
361The implementation details are described in
362.IR Documentation/networking/packet_mmap.txt
363in the Linux kernel source tree.
364.TP
365.BR PACKET_STATISTICS
366Retrieve packet socket statistics in the form of a structure
367
368.in +4n
369.nf
370struct tpacket_stats {
c412686c
MK
371 unsigned int tp_packets; /* Total packet count */
372 unsigned int tp_drops; /* Dropped packet count */
dbb4f751
WB
373};
374.fi
375.in
376
377Receiving statistics resets the internal counters.
378The statistics structure differs when using a ring of variant
379.BR TPACKET_V3 .
380.TP
342639d9 381.BR PACKET_TIMESTAMP " (with " PACKET_RX_RING "; since Linux 2.6.36)"
dbb4f751
WB
382.\" commit 614f60fa9d73a9e8fdff3df83381907fea7c5649
383The packet receive ring always stores a timestamp in the metadata header.
384By default, this is a software generated timestamp generated when the
385packet is copied into the ring.
386This integer option selects the type of timestamp.
387Besides the default, it support the two hardware formats described in
388.IR Documentation/networking/timestamping.txt
389in the Linux kernel source tree.
390.TP
391.BR PACKET_TX_RING " (since Linux 2.6.31)"
392.\" commit 69e3c75f4d541a6eb151b3ef91f34033cb3ad6e1
9a141bfb 393Create a memory-mapped ring buffer for packet transmission.
dbb4f751
WB
394This option is similar to
395.BR PACKET_RX_RING
396and takes the same arguments.
397The application writes packets into slots with status
398.BR TP_STATUS_AVAILABLE
399and schedules them for transmission by changing the status to
400.BR TP_STATUS_SEND_REQUEST .
401When packets are ready to be transmitted, the application calls
402.BR send (2)
403or a variant thereof.
404The
405.I buf
406and
407.I len
408fields of this call are ignored.
409If an address is passed using
410.BR sendto (2)
411or
412.BR sendmsg (2) ,
413then that overrides the socket default.
414On successful transmission, the socket resets the slot to
415.BR TP_STATUS_AVAILABLE .
416It discards packets silently on error unless
417.BR PACKET_LOSS
418is set.
419.TP
342639d9 420.BR PACKET_VERSION " (with " PACKET_RX_RING "; since Linux 2.6.27)"
dbb4f751
WB
421.\" commit bbd6ef87c544d88c30e4b762b1b61ef267a7d279
422By default,
423.BR PACKET_RX_RING
424creates a packet receive ring of variant
425.BR TPACKET_V1 .
426To create another variant, configure the desired variant by setting this
427integer option before creating the ring.
428
77117f4f
MK
429.SS Ioctls
430.B SIOCGSTAMP
431can be used to receive the timestamp of the last received packet.
432Argument is a
dbb4f751
WB
433.I struct timeval
434variable.
00c84453 435.\" FIXME Document SIOCGSTAMPNS
77117f4f
MK
436
437In addition all standard ioctls defined in
438.BR netdevice (7)
439and
440.BR socket (7)
441are valid on packet sockets.
c634028a 442.SS Error handling
77117f4f
MK
443Packet sockets do no error handling other than errors occurred
444while passing the packet to the device driver.
445They don't have the concept of a pending error.
446.SH ERRORS
447.TP
448.B EADDRNOTAVAIL
449Unknown multicast group address passed.
450.TP
451.B EFAULT
452User passed invalid memory address.
453.TP
454.B EINVAL
455Invalid argument.
456.TP
457.B EMSGSIZE
458Packet is bigger than interface MTU.
459.TP
460.B ENETDOWN
461Interface is not up.
462.TP
463.B ENOBUFS
464Not enough memory to allocate the packet.
465.TP
466.B ENODEV
467Unknown device name or interface index specified in interface address.
468.TP
469.B ENOENT
470No packet received.
471.TP
472.B ENOTCONN
473No interface address passed.
474.TP
475.B ENXIO
476Interface address contained an invalid interface index.
477.TP
478.B EPERM
479User has insufficient privileges to carry out this operation.
480
481In addition other errors may be generated by the low-level driver.
482.SH VERSIONS
d4c8c97c 483.B AF_PACKET
77117f4f
MK
484is a new feature in Linux 2.2.
485Earlier Linux versions supported only
486.BR SOCK_PACKET .
487.PP
488The include file
489.I <netpacket/packet.h>
490is present since glibc 2.1.
491Older systems need:
492.sp
493.in +4n
494.nf
495#include <asm/types.h>
496#include <linux/if_packet.h>
497#include <linux/if_ether.h> /* The L2 protocols */
498.fi
499.in
500.SH NOTES
501For portable programs it is suggested to use
d4c8c97c 502.B AF_PACKET
77117f4f
MK
503via
504.BR pcap (3);
33a0ccb2 505although this covers only a subset of the
d4c8c97c 506.B AF_PACKET
77117f4f
MK
507features.
508
509The
510.B SOCK_DGRAM
511packet sockets make no attempt to create or parse the IEEE 802.2 LLC
512header for a IEEE 802.3 frame.
513When
514.B ETH_P_802_3
515is specified as protocol for sending the kernel creates the
516802.3 frame and fills out the length field; the user has to supply the LLC
517header to get a fully conforming packet.
518Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol
519fields; instead they are supplied to the user as protocol
520.B ETH_P_802_2
dbb4f751 521with the LLC header prefixed.
77117f4f
MK
522It is thus not possible to bind to
523.BR ETH_P_802_3 ;
524bind to
525.B ETH_P_802_2
526instead and do the protocol multiplex yourself.
527The default for sending is the standard Ethernet DIX
528encapsulation with the protocol filled in.
529
530Packet sockets are not subject to the input or output firewall chains.
531.SS Compatibility
532In Linux 2.0, the only way to get a packet socket was by calling
d4c8c97c 533.BI "socket(AF_INET, SOCK_PACKET, " protocol )\fR.
77117f4f
MK
534This is still supported but strongly deprecated.
535The main difference between the two methods is that
536.B SOCK_PACKET
537uses the old
538.I struct sockaddr_pkt
539to specify an interface, which doesn't provide physical layer
540independence.
541
542.in +4n
543.nf
544struct sockaddr_pkt {
545 unsigned short spkt_family;
546 unsigned char spkt_device[14];
547 unsigned short spkt_protocol;
548};
549.fi
550.in
551
552.I spkt_family
553contains
554the device type,
555.I spkt_protocol
556is the IEEE 802.3 protocol type as defined in
557.I <sys/if_ether.h>
558and
559.I spkt_device
d0cb7cc6 560is the device name as a null-terminated string, for example, eth0.
77117f4f
MK
561
562This structure is obsolete and should not be used in new code.
563.SH BUGS
564glibc 2.1 does not have a define for
565.BR SOL_PACKET .
566The suggested workaround is to use:
567.in +4n
568.nf
569
570#ifndef SOL_PACKET
571#define SOL_PACKET 263
572#endif
573
574.fi
575.in
576This is fixed in later glibc versions and also does not occur on
577libc5 systems.
578
579The IEEE 802.2/803.3 LLC handling could be considered as a bug.
580
581Socket filters are not documented.
582
583The
584.B MSG_TRUNC
585.BR recvmsg (2)
586extension is an ugly hack and should be replaced by a control message.
587There is currently no way to get the original destination address of
588packets via
589.BR SOCK_DGRAM .
590.\" .SH CREDITS
591.\" This man page was written by Andi Kleen with help from Matthew Wilcox.
d4c8c97c 592.\" AF_PACKET in Linux 2.2 was implemented
77117f4f 593.\" by Alexey Kuznetsov, based on code by Alan Cox and others.
47297adb 594.SH SEE ALSO
77117f4f
MK
595.BR socket (2),
596.BR pcap (3),
597.BR capabilities (7),
598.BR ip (7),
599.BR raw (7),
600.BR socket (7)
601
602RFC\ 894 for the standard IP Ethernet encapsulation.
77117f4f
MK
603RFC\ 1700 for the IEEE 802.3 IP encapsulation.
604
605The
606.I <linux/if_ether.h>
607include file for physical layer protocols.