]> git.ipfire.org Git - thirdparty/man-pages.git/blob - man7/packet.7
epoll.7: wfix
[thirdparty/man-pages.git] / man7 / packet.7
1 .\" This man page is Copyright (C) 1999 Andi Kleen <ak@muc.de>.
2 .\"
3 .\" %%%LICENSE_START(VERBATIM_ONE_PARA)
4 .\" Permission is granted to distribute possibly modified copies
5 .\" of this page provided the header is included verbatim,
6 .\" and in case of nontrivial modification author and date
7 .\" of the modification is added to the header.
8 .\" %%%LICENSE_END
9 .\"
10 .\" $Id: packet.7,v 1.13 2000/08/14 08:03:45 ak Exp $
11 .\"
12 .TH PACKET 7 2017-09-15 "Linux" "Linux Programmer's Manual"
13 .SH NAME
14 packet \- packet interface on device level
15 .SH SYNOPSIS
16 .nf
17 .B #include <sys/socket.h>
18 .B #include <linux/if_packet.h>
19 .B #include <net/ethernet.h> /* the L2 protocols */
20 .PP
21 .BI "packet_socket = socket(AF_PACKET, int " socket_type ", int "protocol );
22 .fi
23 .SH DESCRIPTION
24 Packet sockets are used to receive or send raw packets at the device driver
25 (OSI Layer 2) level.
26 They allow the user to implement protocol modules in user space
27 on top of the physical layer.
28 .PP
29 The
30 .I socket_type
31 is either
32 .B SOCK_RAW
33 for raw packets including the link-level header or
34 .B SOCK_DGRAM
35 for cooked packets with the link-level header removed.
36 The link-level header information is available in a common format in a
37 .IR sockaddr_ll
38 structure.
39 .I protocol
40 is the IEEE 802.3 protocol number in network byte order.
41 See the
42 .I <linux/if_ether.h>
43 include file for a list of allowed protocols.
44 When protocol
45 is set to
46 .BR htons(ETH_P_ALL) ,
47 then all protocols are received.
48 All incoming packets of that protocol type will be passed to the packet
49 socket before they are passed to the protocols implemented in the kernel.
50 .PP
51 In order to create a packet socket, a process must have the
52 .B CAP_NET_RAW
53 capability in the user namespace that governs its network namespace.
54 .PP
55 .B SOCK_RAW
56 packets are passed to and from the device driver without any changes in
57 the packet data.
58 When receiving a packet, the address is still parsed and
59 passed in a standard
60 .I sockaddr_ll
61 address structure.
62 When transmitting a packet, the user-supplied buffer
63 should contain the physical-layer header.
64 That packet is then
65 queued unmodified to the network driver of the interface defined by the
66 destination address.
67 Some device drivers always add other headers.
68 .B SOCK_RAW
69 is similar to but not compatible with the obsolete
70 .B AF_INET/SOCK_PACKET
71 of Linux 2.0.
72 .PP
73 .B SOCK_DGRAM
74 operates on a slightly higher level.
75 The physical header is removed before the packet is passed to the user.
76 Packets sent through a
77 .B SOCK_DGRAM
78 packet socket get a suitable physical-layer header based on the
79 information in the
80 .I sockaddr_ll
81 destination address before they are queued.
82 .PP
83 By default, all packets of the specified protocol type
84 are passed to a packet socket.
85 To get packets only from a specific interface use
86 .BR bind (2)
87 specifying an address in a
88 .I struct sockaddr_ll
89 to bind the packet socket to an interface.
90 Fields used for binding are
91 .IR sll_family
92 (should be
93 .BR AF_PACKET ),
94 .IR sll_protocol ,
95 and
96 .IR sll_ifindex .
97 .PP
98 The
99 .BR connect (2)
100 operation is not supported on packet sockets.
101 .PP
102 When the
103 .B MSG_TRUNC
104 flag is passed to
105 .BR recvmsg (2),
106 .BR recv (2),
107 or
108 .BR recvfrom (2),
109 the real length of the packet on the wire is always returned,
110 even when it is longer than the buffer.
111 .SS Address types
112 The
113 .I sockaddr_ll
114 structure is a device-independent physical-layer address.
115 .PP
116 .in +4n
117 .EX
118 struct sockaddr_ll {
119 unsigned short sll_family; /* Always AF_PACKET */
120 unsigned short sll_protocol; /* Physical-layer protocol */
121 int sll_ifindex; /* Interface number */
122 unsigned short sll_hatype; /* ARP hardware type */
123 unsigned char sll_pkttype; /* Packet type */
124 unsigned char sll_halen; /* Length of address */
125 unsigned char sll_addr[8]; /* Physical-layer address */
126 };
127 .EE
128 .in
129 .PP
130 The fields of this structure are as follows:
131 .IP * 3
132 .I sll_protocol
133 is the standard ethernet protocol type in network byte order as defined
134 in the
135 .I <linux/if_ether.h>
136 include file.
137 It defaults to the socket's protocol.
138 .IP *
139 .I sll_ifindex
140 is the interface index of the interface
141 (see
142 .BR netdevice (7));
143 0 matches any interface (only permitted for binding).
144 .I sll_hatype
145 is an ARP type as defined in the
146 .I <linux/if_arp.h>
147 include file.
148 .IP *
149 .I sll_pkttype
150 contains the packet type.
151 Valid types are
152 .B PACKET_HOST
153 for a packet addressed to the local host,
154 .B PACKET_BROADCAST
155 for a physical-layer broadcast packet,
156 .B PACKET_MULTICAST
157 for a packet sent to a physical-layer multicast address,
158 .B PACKET_OTHERHOST
159 for a packet to some other host that has been caught by a device driver
160 in promiscuous mode, and
161 .B PACKET_OUTGOING
162 for a packet originating from the local host that is looped back to a packet
163 socket.
164 These types make sense only for receiving.
165 .IP *
166 .I sll_addr
167 and
168 .I sll_halen
169 contain the physical-layer (e.g., IEEE 802.3) address and its length.
170 The exact interpretation depends on the device.
171 .PP
172 When you send packets, it is enough to specify
173 .IR sll_family ,
174 .IR sll_addr ,
175 .IR sll_halen ,
176 .IR sll_ifindex ,
177 and
178 .IR sll_protocol .
179 The other fields should be 0.
180 .I sll_hatype
181 and
182 .I sll_pkttype
183 are set on received packets for your information.
184 .SS Socket options
185 Packet socket options are configured by calling
186 .BR setsockopt (2)
187 with level
188 .BR SOL_PACKET .
189 .TP
190 .BR PACKET_ADD_MEMBERSHIP
191 .PD 0
192 .TP
193 .BR PACKET_DROP_MEMBERSHIP
194 .PD
195 Packet sockets can be used to configure physical-layer multicasting
196 and promiscuous mode.
197 .B PACKET_ADD_MEMBERSHIP
198 adds a binding and
199 .B PACKET_DROP_MEMBERSHIP
200 drops it.
201 They both expect a
202 .I packet_mreq
203 structure as argument:
204 .IP
205 .in +4n
206 .EX
207 struct packet_mreq {
208 int mr_ifindex; /* interface index */
209 unsigned short mr_type; /* action */
210 unsigned short mr_alen; /* address length */
211 unsigned char mr_address[8]; /* physical-layer address */
212 };
213 .EE
214 .in
215 .IP
216 .I mr_ifindex
217 contains the interface index for the interface whose status
218 should be changed.
219 The
220 .I mr_type
221 field specifies which action to perform.
222 .B PACKET_MR_PROMISC
223 enables receiving all packets on a shared medium (often known as
224 "promiscuous mode"),
225 .B PACKET_MR_MULTICAST
226 binds the socket to the physical-layer multicast group specified in
227 .I mr_address
228 and
229 .IR mr_alen ,
230 and
231 .B PACKET_MR_ALLMULTI
232 sets the socket up to receive all multicast packets arriving at
233 the interface.
234 .IP
235 In addition, the traditional ioctls
236 .BR SIOCSIFFLAGS ,
237 .BR SIOCADDMULTI ,
238 .B SIOCDELMULTI
239 can be used for the same purpose.
240 .TP
241 .BR PACKET_AUXDATA " (since Linux 2.6.21)"
242 .\" commit 8dc4194474159660d7f37c495e3fc3f10d0db8cc
243 If this binary option is enabled, the packet socket passes a metadata
244 structure along with each packet in the
245 .BR recvmsg (2)
246 control field.
247 The structure can be read with
248 .BR cmsg (3).
249 It is defined as
250 .IP
251 .in +4n
252 .EX
253 struct tpacket_auxdata {
254 __u32 tp_status;
255 __u32 tp_len; /* packet length */
256 __u32 tp_snaplen; /* captured length */
257 __u16 tp_mac;
258 __u16 tp_net;
259 __u16 tp_vlan_tci;
260 __u16 tp_vlan_tpid; /* Since Linux 3.14; earlier, these
261 were unused padding bytes */
262 .\" commit a0cdfcf39362410d5ea983f4daf67b38de129408 added tp_vlan_tpid
263 };
264 .EE
265 .in
266 .TP
267 .BR PACKET_FANOUT " (since Linux 3.1)"
268 .\" commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc
269 To scale processing across threads, packet sockets can form a fanout
270 group.
271 In this mode, each matching packet is enqueued onto only one
272 socket in the group.
273 A socket joins a fanout group by calling
274 .BR setsockopt (2)
275 with level
276 .B SOL_PACKET
277 and option
278 .BR PACKET_FANOUT .
279 Each network namespace can have up to 65536 independent groups.
280 A socket selects a group by encoding the ID in the first 16 bits of
281 the integer option value.
282 The first packet socket to join a group implicitly creates it.
283 To successfully join an existing group, subsequent packet sockets
284 must have the same protocol, device settings, fanout mode and
285 flags (see below).
286 Packet sockets can leave a fanout group only by closing the socket.
287 The group is deleted when the last socket is closed.
288 .IP
289 Fanout supports multiple algorithms to spread traffic between sockets,
290 as follows:
291 .RS
292 .IP * 3
293 The default mode,
294 .BR PACKET_FANOUT_HASH ,
295 sends packets from the same flow to the same socket to maintain
296 per-flow ordering.
297 For each packet, it chooses a socket by taking the packet flow hash
298 modulo the number of sockets in the group, where a flow hash is a hash
299 over network-layer address and optional transport-layer port fields.
300 .IP *
301 The load-balance mode
302 .BR PACKET_FANOUT_LB
303 implements a round-robin algorithm.
304 .IP *
305 .BR PACKET_FANOUT_CPU
306 selects the socket based on the CPU that the packet arrived on.
307 .IP *
308 .BR PACKET_FANOUT_ROLLOVER
309 processes all data on a single socket, moving to the next when one
310 becomes backlogged.
311 .IP *
312 .BR PACKET_FANOUT_RND
313 selects the socket using a pseudo-random number generator.
314 .IP *
315 .BR PACKET_FANOUT_QM
316 .\" commit 2d36097d26b5991d71a2cf4a20c1a158f0f1bfcd
317 (available since Linux 3.14)
318 selects the socket using the recorded queue_mapping of the received skb.
319 .RE
320 .IP
321 Fanout modes can take additional options.
322 IP fragmentation causes packets from the same flow to have different
323 flow hashes.
324 The flag
325 .BR PACKET_FANOUT_FLAG_DEFRAG ,
326 if set, causes packets to be defragmented before fanout is applied, to
327 preserve order even in this case.
328 Fanout mode and options are communicated in the second 16 bits of the
329 integer option value.
330 The flag
331 .BR PACKET_FANOUT_FLAG_ROLLOVER
332 enables the roll over mechanism as a backup strategy: if the
333 original fanout algorithm selects a backlogged socket, the packet
334 rolls over to the next available one.
335 .TP
336 .BR PACKET_LOSS " (with " PACKET_TX_RING )
337 When a malformed packet is encountered on a transmit ring,
338 the default is to reset its
339 .I tp_status
340 to
341 .BR TP_STATUS_WRONG_FORMAT
342 and abort the transmission immediately.
343 The malformed packet blocks itself and subsequently enqueued packets from
344 being sent.
345 The format error must be fixed, the associated
346 .I tp_status
347 reset to
348 .BR TP_STATUS_SEND_REQUEST ,
349 and the transmission process restarted via
350 .BR send (2).
351 However, if
352 .BR PACKET_LOSS
353 is set, any malformed packet will be skipped, its
354 .I tp_status
355 reset to
356 .BR TP_STATUS_AVAILABLE ,
357 and the transmission process continued.
358 .TP
359 .BR PACKET_RESERVE " (with " PACKET_RX_RING )
360 By default, a packet receive ring writes packets immediately following the
361 metadata structure and alignment padding.
362 This integer option reserves additional headroom.
363 .TP
364 .BR PACKET_RX_RING
365 Create a memory-mapped ring buffer for asynchronous packet reception.
366 The packet socket reserves a contiguous region of application address
367 space, lays it out into an array of packet slots and copies packets
368 (up to
369 .IR tp_snaplen )
370 into subsequent slots.
371 Each packet is preceded by a metadata structure similar to
372 .IR tpacket_auxdata .
373 The protocol fields encode the offset to the data
374 from the start of the metadata header.
375 .I tp_net
376 stores the offset to the network layer.
377 If the packet socket is of type
378 .BR SOCK_DGRAM ,
379 then
380 .I tp_mac
381 is the same.
382 If it is of type
383 .BR SOCK_RAW ,
384 then that field stores the offset to the link-layer frame.
385 Packet socket and application communicate the head and tail of the ring
386 through the
387 .I tp_status
388 field.
389 The packet socket owns all slots with
390 .I tp_status
391 equal to
392 .BR TP_STATUS_KERNEL .
393 After filling a slot, it changes the status of the slot to transfer
394 ownership to the application.
395 During normal operation, the new
396 .I tp_status
397 value has at least the
398 .BR TP_STATUS_USER
399 bit set to signal that a received packet has been stored.
400 When the application has finished processing a packet, it transfers
401 ownership of the slot back to the socket by setting
402 .I tp_status
403 equal to
404 .BR TP_STATUS_KERNEL .
405 .IP
406 Packet sockets implement multiple variants of the packet ring.
407 The implementation details are described in
408 .IR Documentation/networking/packet_mmap.txt
409 in the Linux kernel source tree.
410 .TP
411 .BR PACKET_STATISTICS
412 Retrieve packet socket statistics in the form of a structure
413 .IP
414 .in +4n
415 .EX
416 struct tpacket_stats {
417 unsigned int tp_packets; /* Total packet count */
418 unsigned int tp_drops; /* Dropped packet count */
419 };
420 .EE
421 .in
422 .IP
423 Receiving statistics resets the internal counters.
424 The statistics structure differs when using a ring of variant
425 .BR TPACKET_V3 .
426 .TP
427 .BR PACKET_TIMESTAMP " (with " PACKET_RX_RING "; since Linux 2.6.36)"
428 .\" commit 614f60fa9d73a9e8fdff3df83381907fea7c5649
429 The packet receive ring always stores a timestamp in the metadata header.
430 By default, this is a software generated timestamp generated when the
431 packet is copied into the ring.
432 This integer option selects the type of timestamp.
433 Besides the default, it support the two hardware formats described in
434 .IR Documentation/networking/timestamping.txt
435 in the Linux kernel source tree.
436 .TP
437 .BR PACKET_TX_RING " (since Linux 2.6.31)"
438 .\" commit 69e3c75f4d541a6eb151b3ef91f34033cb3ad6e1
439 Create a memory-mapped ring buffer for packet transmission.
440 This option is similar to
441 .BR PACKET_RX_RING
442 and takes the same arguments.
443 The application writes packets into slots with
444 .I tp_status
445 equal to
446 .BR TP_STATUS_AVAILABLE
447 and schedules them for transmission by changing
448 .I tp_status
449 to
450 .BR TP_STATUS_SEND_REQUEST .
451 When packets are ready to be transmitted, the application calls
452 .BR send (2)
453 or a variant thereof.
454 The
455 .I buf
456 and
457 .I len
458 fields of this call are ignored.
459 If an address is passed using
460 .BR sendto (2)
461 or
462 .BR sendmsg (2),
463 then that overrides the socket default.
464 On successful transmission, the socket resets
465 .I tp_status
466 to
467 .BR TP_STATUS_AVAILABLE .
468 It immediately aborts the transmission on error unless
469 .BR PACKET_LOSS
470 is set.
471 .TP
472 .BR PACKET_VERSION " (with " PACKET_RX_RING "; since Linux 2.6.27)"
473 .\" commit bbd6ef87c544d88c30e4b762b1b61ef267a7d279
474 By default,
475 .BR PACKET_RX_RING
476 creates a packet receive ring of variant
477 .BR TPACKET_V1 .
478 To create another variant, configure the desired variant by setting this
479 integer option before creating the ring.
480 .TP
481 .BR PACKET_QDISC_BYPASS " (since Linux 3.14)"
482 .\" commit d346a3fae3ff1d99f5d0c819bf86edf9094a26a1
483 By default, packets sent through packet sockets pass through the kernel's
484 qdisc (traffic control) layer, which is fine for the vast majority of use
485 cases.
486 For traffic generator appliances using packet sockets
487 that intend to brute-force flood the network\(emfor example,
488 to test devices under load in a similar
489 fashion to pktgen\(emthis layer can be bypassed by setting
490 this integer option to 1.
491 A side effect is that packet buffering in the qdisc layer is avoided,
492 which will lead to increased drops when network
493 device transmit queues are busy;
494 therefore, use at your own risk.
495 .SS Ioctls
496 .B SIOCGSTAMP
497 can be used to receive the timestamp of the last received packet.
498 Argument is a
499 .I struct timeval
500 variable.
501 .\" FIXME Document SIOCGSTAMPNS
502 .PP
503 In addition, all standard ioctls defined in
504 .BR netdevice (7)
505 and
506 .BR socket (7)
507 are valid on packet sockets.
508 .SS Error handling
509 Packet sockets do no error handling other than errors occurred
510 while passing the packet to the device driver.
511 They don't have the concept of a pending error.
512 .SH ERRORS
513 .TP
514 .B EADDRNOTAVAIL
515 Unknown multicast group address passed.
516 .TP
517 .B EFAULT
518 User passed invalid memory address.
519 .TP
520 .B EINVAL
521 Invalid argument.
522 .TP
523 .B EMSGSIZE
524 Packet is bigger than interface MTU.
525 .TP
526 .B ENETDOWN
527 Interface is not up.
528 .TP
529 .B ENOBUFS
530 Not enough memory to allocate the packet.
531 .TP
532 .B ENODEV
533 Unknown device name or interface index specified in interface address.
534 .TP
535 .B ENOENT
536 No packet received.
537 .TP
538 .B ENOTCONN
539 No interface address passed.
540 .TP
541 .B ENXIO
542 Interface address contained an invalid interface index.
543 .TP
544 .B EPERM
545 User has insufficient privileges to carry out this operation.
546 .PP
547 In addition, other errors may be generated by the low-level driver.
548 .SH VERSIONS
549 .B AF_PACKET
550 is a new feature in Linux 2.2.
551 Earlier Linux versions supported only
552 .BR SOCK_PACKET .
553 .PP
554 .SH NOTES
555 For portable programs it is suggested to use
556 .B AF_PACKET
557 via
558 .BR pcap (3);
559 although this covers only a subset of the
560 .B AF_PACKET
561 features.
562 .PP
563 The
564 .B SOCK_DGRAM
565 packet sockets make no attempt to create or parse the IEEE 802.2 LLC
566 header for a IEEE 802.3 frame.
567 When
568 .B ETH_P_802_3
569 is specified as protocol for sending the kernel creates the
570 802.3 frame and fills out the length field; the user has to supply the LLC
571 header to get a fully conforming packet.
572 Incoming 802.3 packets are not multiplexed on the DSAP/SSAP protocol
573 fields; instead they are supplied to the user as protocol
574 .B ETH_P_802_2
575 with the LLC header prefixed.
576 It is thus not possible to bind to
577 .BR ETH_P_802_3 ;
578 bind to
579 .B ETH_P_802_2
580 instead and do the protocol multiplex yourself.
581 The default for sending is the standard Ethernet DIX
582 encapsulation with the protocol filled in.
583 .PP
584 Packet sockets are not subject to the input or output firewall chains.
585 .SS Compatibility
586 In Linux 2.0, the only way to get a packet socket was with the call:
587 .PP
588 socket(AF_INET, SOCK_PACKET, protocol)
589 .PP
590 This is still supported, but deprecated and strongly discouraged.
591 The main difference between the two methods is that
592 .B SOCK_PACKET
593 uses the old
594 .I struct sockaddr_pkt
595 to specify an interface, which doesn't provide physical-layer
596 independence.
597 .PP
598 .in +4n
599 .EX
600 struct sockaddr_pkt {
601 unsigned short spkt_family;
602 unsigned char spkt_device[14];
603 unsigned short spkt_protocol;
604 };
605 .EE
606 .in
607 .PP
608 .I spkt_family
609 contains
610 the device type,
611 .I spkt_protocol
612 is the IEEE 802.3 protocol type as defined in
613 .I <sys/if_ether.h>
614 and
615 .I spkt_device
616 is the device name as a null-terminated string, for example, eth0.
617 .PP
618 This structure is obsolete and should not be used in new code.
619 .SH BUGS
620 The IEEE 802.2/803.3 LLC handling could be considered as a bug.
621 .PP
622 Socket filters are not documented.
623 .PP
624 The
625 .B MSG_TRUNC
626 .BR recvmsg (2)
627 extension is an ugly hack and should be replaced by a control message.
628 There is currently no way to get the original destination address of
629 packets via
630 .BR SOCK_DGRAM .
631 .\" .SH CREDITS
632 .\" This man page was written by Andi Kleen with help from Matthew Wilcox.
633 .\" AF_PACKET in Linux 2.2 was implemented
634 .\" by Alexey Kuznetsov, based on code by Alan Cox and others.
635 .SH SEE ALSO
636 .BR socket (2),
637 .BR pcap (3),
638 .BR capabilities (7),
639 .BR ip (7),
640 .BR raw (7),
641 .BR socket (7)
642 .PP
643 RFC\ 894 for the standard IP Ethernet encapsulation.
644 RFC\ 1700 for the IEEE 802.3 IP encapsulation.
645 .PP
646 The
647 .I <linux/if_ether.h>
648 include file for physical-layer protocols.
649 .PP
650 The Linux kernel source tree.
651 .IR /Documentation/networking/filter.txt
652 describes how to apply Berkeley Packet Filters to packet sockets.
653 .IR /tools/testing/selftests/net/psock_tpacket.c
654 contains example source code for all available versions of
655 .BR PACKET_RX_RING
656 and
657 .BR PACKET_TX_RING .