]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/bpf-helpers.7
fanotify.7: Minor code typesetting fix-ups
[thirdparty/man-pages.git] / man7 / bpf-helpers.7
CommitLineData
53666f6c
MK
1.\" Man page generated from reStructuredText.
2.\" Copyright (C) All BPF authors and contributors from 2014 to present.
3.\" See git log include/uapi/linux/bpf.h in kernel tree for details.
880c3f67 4.\"
53666f6c
MK
5.\" %%%LICENSE_START(VERBATIM)
6.\" Permission is granted to make and distribute verbatim copies of this
7.\" manual provided the copyright notice and this permission notice are
8.\" preserved on all copies.
880c3f67 9.\"
53666f6c
MK
10.\" Permission is granted to copy and distribute modified versions of this
11.\" manual under the conditions for verbatim copying, provided that the
12.\" entire resulting derived work is distributed under the terms of a
13.\" permission notice identical to this one.
880c3f67 14.\"
53666f6c
MK
15.\" Since the Linux kernel and libraries are constantly changing, this
16.\" manual page may be incorrect or out-of-date. The author(s) assume no
17.\" responsibility for errors or omissions, or for damages resulting from
18.\" the use of the information contained herein. The author(s) may not
19.\" have taken the same level of care in the production of this manual,
20.\" which is licensed free of charge, as they might when working
21.\" professionally.
880c3f67 22.\"
53666f6c
MK
23.\" Formatted or processed versions of this manual, if unaccompanied by
24.\" the source, must acknowledge the copyright and authors of this work.
25.\" %%%LICENSE_END
880c3f67 26.\"
53666f6c
MK
27.\" Please do not edit this file. It was generated from the documentation
28.\" located in file include/uapi/linux/bpf.h of the Linux kernel sources
29.\" (helpers description), and from scripts/bpf_helpers_doc.py in the same
30.\" repository (header and footer).
8d1b260e 31.TH BPF-HELPERS 7 2019-03-06 "Linux" "Linux Programmer's Manual"
53666f6c
MK
32.SH NAME
33BPF-HELPERS \- list of eBPF helper functions
53666f6c 34.nr rst2man-indent-level 0
53666f6c
MK
35.de1 rstReportMargin
36\\$1 \\n[an-margin]
37level \\n[rst2man-indent-level]
38level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
53666f6c
MK
39\\n[rst2man-indent0]
40\\n[rst2man-indent1]
41\\n[rst2man-indent2]
42..
43.de1 INDENT
44.\" .rstReportMargin pre:
45. RS \\$1
46. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
47. nr rst2man-indent-level +1
48.\" .rstReportMargin post:
49..
50.de UNINDENT
51. RE
52.\" indent \\n[an-margin]
53.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
54.nr rst2man-indent-level -1
55.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
56.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
57..
58.SH DESCRIPTION
59.sp
60The extended Berkeley Packet Filter (eBPF) subsystem consists in programs
61written in a pseudo\-assembly language, then attached to one of the several
62kernel hooks and run in reaction of specific events. This framework differs
63from the older, "classic" BPF (or "cBPF") in several aspects, one of them being
64the ability to call special functions (or "helpers") from within a program.
65These functions are restricted to a white\-list of helpers defined in the
66kernel.
67.sp
68These helpers are used by eBPF programs to interact with the system, or with
69the context in which they work. For instance, they can be used to print
70debugging messages, to get the time since the system was booted, to interact
71with eBPF maps, or to manipulate network packets. Since there are several eBPF
72program types, and that they do not run in the same context, each program type
73can only call a subset of those helpers.
74.sp
75Due to eBPF conventions, a helper can not have more than five arguments.
76.sp
77Internally, eBPF programs call directly into the compiled helper functions
78without requiring any foreign\-function interface. As a result, calling helpers
79introduces no overhead, thus offering excellent performance.
80.sp
81This document is an attempt to list and document the helpers available to eBPF
82developers. They are sorted by chronological order (the oldest helpers in the
83kernel at the top).
84.SH HELPERS
85.INDENT 0.0
86.TP
87.B \fBvoid *bpf_map_lookup_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP
88.INDENT 7.0
89.TP
90.B Description
91Perform a lookup in \fImap\fP for an entry associated to \fIkey\fP\&.
92.TP
93.B Return
94Map value associated to \fIkey\fP, or \fBNULL\fP if no entry was
95found.
96.UNINDENT
97.TP
98.B \fBint bpf_map_update_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP
99.INDENT 7.0
100.TP
101.B Description
102Add or update the value of the entry associated to \fIkey\fP in
103\fImap\fP with \fIvalue\fP\&. \fIflags\fP is one of:
104.INDENT 7.0
105.TP
106.B \fBBPF_NOEXIST\fP
107The entry for \fIkey\fP must not exist in the map.
108.TP
109.B \fBBPF_EXIST\fP
110The entry for \fIkey\fP must already exist in the map.
111.TP
112.B \fBBPF_ANY\fP
113No condition on the existence of the entry for \fIkey\fP\&.
114.UNINDENT
115.sp
116Flag value \fBBPF_NOEXIST\fP cannot be used for maps of types
117\fBBPF_MAP_TYPE_ARRAY\fP or \fBBPF_MAP_TYPE_PERCPU_ARRAY\fP (all
118elements always exist), the helper would return an error.
119.TP
120.B Return
1210 on success, or a negative error in case of failure.
122.UNINDENT
123.TP
124.B \fBint bpf_map_delete_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP
125.INDENT 7.0
126.TP
127.B Description
128Delete entry with \fIkey\fP from \fImap\fP\&.
129.TP
130.B Return
1310 on success, or a negative error in case of failure.
132.UNINDENT
133.TP
2223d7df
MK
134.B \fBint bpf_map_push_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP
135.INDENT 7.0
136.TP
137.B Description
138Push an element \fIvalue\fP in \fImap\fP\&. \fIflags\fP is one of:
139.sp
140\fBBPF_EXIST\fP
141If the queue/stack is full, the oldest element is removed to
142make room for this.
143.TP
144.B Return
1450 on success, or a negative error in case of failure.
146.UNINDENT
147.TP
53666f6c
MK
148.B \fBint bpf_probe_read(void *\fP\fIdst\fP\fB, u32\fP \fIsize\fP\fB, const void *\fP\fIsrc\fP\fB)\fP
149.INDENT 7.0
150.TP
151.B Description
152For tracing programs, safely attempt to read \fIsize\fP bytes from
153address \fIsrc\fP and store the data in \fIdst\fP\&.
154.TP
155.B Return
1560 on success, or a negative error in case of failure.
157.UNINDENT
158.TP
159.B \fBu64 bpf_ktime_get_ns(void)\fP
160.INDENT 7.0
161.TP
162.B Description
163Return the time elapsed since system boot, in nanoseconds.
164.TP
165.B Return
166Current \fIktime\fP\&.
167.UNINDENT
168.TP
169.B \fBint bpf_trace_printk(const char *\fP\fIfmt\fP\fB, u32\fP \fIfmt_size\fP\fB, ...)\fP
170.INDENT 7.0
171.TP
172.B Description
173This helper is a "printk()\-like" facility for debugging. It
174prints a message defined by format \fIfmt\fP (of size \fIfmt_size\fP)
175to file \fI/sys/kernel/debug/tracing/trace\fP from DebugFS, if
176available. It can take up to three additional \fBu64\fP
177arguments (as an eBPF helpers, the total number of arguments is
178limited to five).
179.sp
180Each time the helper is called, it appends a line to the trace.
181The format of the trace is customizable, and the exact output
182one will get depends on the options set in
183\fI/sys/kernel/debug/tracing/trace_options\fP (see also the
184\fIREADME\fP file under the same directory). However, it usually
185defaults to something like:
186.INDENT 7.0
187.INDENT 3.5
188.sp
189.nf
190.ft C
191telnet\-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
192.ft P
193.fi
194.UNINDENT
195.UNINDENT
196.sp
197In the above:
198.INDENT 7.0
199.INDENT 3.5
200.INDENT 0.0
201.IP \(bu 2
202\fBtelnet\fP is the name of the current task.
203.IP \(bu 2
204\fB470\fP is the PID of the current task.
205.IP \(bu 2
206\fB001\fP is the CPU number on which the task is
207running.
208.IP \(bu 2
209In \fB\&.N..\fP, each character refers to a set of
210options (whether irqs are enabled, scheduling
211options, whether hard/softirqs are running, level of
212preempt_disabled respectively). \fBN\fP means that
213\fBTIF_NEED_RESCHED\fP and \fBPREEMPT_NEED_RESCHED\fP
214are set.
215.IP \(bu 2
216\fB419421.045894\fP is a timestamp.
217.IP \(bu 2
218\fB0x00000001\fP is a fake value used by BPF for the
219instruction pointer register.
220.IP \(bu 2
221\fB<formatted msg>\fP is the message formatted with
222\fIfmt\fP\&.
223.UNINDENT
224.UNINDENT
225.UNINDENT
226.sp
227The conversion specifiers supported by \fIfmt\fP are similar, but
228more limited than for printk(). They are \fB%d\fP, \fB%i\fP,
229\fB%u\fP, \fB%x\fP, \fB%ld\fP, \fB%li\fP, \fB%lu\fP, \fB%lx\fP, \fB%lld\fP,
230\fB%lli\fP, \fB%llu\fP, \fB%llx\fP, \fB%p\fP, \fB%s\fP\&. No modifier (size
231of field, padding with zeroes, etc.) is available, and the
232helper will return \fB\-EINVAL\fP (but print nothing) if it
233encounters an unknown specifier.
234.sp
235Also, note that \fBbpf_trace_printk\fP() is slow, and should
236only be used for debugging purposes. For this reason, a notice
237bloc (spanning several lines) is printed to kernel logs and
238states that the helper should not be used "for production use"
239the first time this helper is used (or more precisely, when
240\fBtrace_printk\fP() buffers are allocated). For passing values
241to user space, perf events should be preferred.
242.TP
243.B Return
244The number of bytes written to the buffer, or a negative error
245in case of failure.
246.UNINDENT
247.TP
248.B \fBu32 bpf_get_prandom_u32(void)\fP
249.INDENT 7.0
250.TP
251.B Description
252Get a pseudo\-random number.
253.sp
254From a security point of view, this helper uses its own
255pseudo\-random internal state, and cannot be used to infer the
256seed of other random functions in the kernel. However, it is
257essential to note that the generator used by the helper is not
258cryptographically secure.
259.TP
260.B Return
261A random 32\-bit unsigned value.
262.UNINDENT
263.TP
264.B \fBu32 bpf_get_smp_processor_id(void)\fP
265.INDENT 7.0
266.TP
267.B Description
268Get the SMP (symmetric multiprocessing) processor id. Note that
269all programs run with preemption disabled, which means that the
270SMP processor id is stable during all the execution of the
271program.
272.TP
273.B Return
274The SMP id of the processor running the program.
275.UNINDENT
276.TP
277.B \fBint bpf_skb_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
278.INDENT 7.0
279.TP
280.B Description
281Store \fIlen\fP bytes from address \fIfrom\fP into the packet
282associated to \fIskb\fP, at \fIoffset\fP\&. \fIflags\fP are a combination of
283\fBBPF_F_RECOMPUTE_CSUM\fP (automatically recompute the
284checksum for the packet after storing the bytes) and
285\fBBPF_F_INVALIDATE_HASH\fP (set \fIskb\fP\fB\->hash\fP, \fIskb\fP\fB\->swhash\fP and \fIskb\fP\fB\->l4hash\fP to 0).
286.sp
287A call to this helper is susceptible to change the underlaying
288packet buffer. Therefore, at load time, all checks on pointers
289previously done by the verifier are invalidated and must be
290performed again, if the helper is used in combination with
291direct packet access.
292.TP
293.B Return
2940 on success, or a negative error in case of failure.
295.UNINDENT
296.TP
297.B \fBint bpf_l3_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIsize\fP\fB)\fP
298.INDENT 7.0
299.TP
300.B Description
301Recompute the layer 3 (e.g. IP) checksum for the packet
302associated to \fIskb\fP\&. Computation is incremental, so the helper
303must know the former value of the header field that was
304modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the
305number of bytes (2 or 4) for this field, stored in \fIsize\fP\&.
306Alternatively, it is possible to store the difference between
307the previous and the new values of the header field in \fIto\fP, by
308setting \fIfrom\fP and \fIsize\fP to 0. For both methods, \fIoffset\fP
309indicates the location of the IP checksum within the packet.
310.sp
311This helper works in combination with \fBbpf_csum_diff\fP(),
312which does not update the checksum in\-place, but offers more
313flexibility and can handle sizes larger than 2 or 4 for the
314checksum to update.
315.sp
316A call to this helper is susceptible to change the underlaying
317packet buffer. Therefore, at load time, all checks on pointers
318previously done by the verifier are invalidated and must be
319performed again, if the helper is used in combination with
320direct packet access.
321.TP
322.B Return
3230 on success, or a negative error in case of failure.
324.UNINDENT
325.TP
326.B \fBint bpf_l4_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIflags\fP\fB)\fP
327.INDENT 7.0
328.TP
329.B Description
330Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the
331packet associated to \fIskb\fP\&. Computation is incremental, so the
332helper must know the former value of the header field that was
333modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the
334number of bytes (2 or 4) for this field, stored on the lowest
335four bits of \fIflags\fP\&. Alternatively, it is possible to store
336the difference between the previous and the new values of the
337header field in \fIto\fP, by setting \fIfrom\fP and the four lowest
338bits of \fIflags\fP to 0. For both methods, \fIoffset\fP indicates the
339location of the IP checksum within the packet. In addition to
340the size of the field, \fIflags\fP can be added (bitwise OR) actual
341flags. With \fBBPF_F_MARK_MANGLED_0\fP, a null checksum is left
342untouched (unless \fBBPF_F_MARK_ENFORCE\fP is added as well), and
343for updates resulting in a null checksum the value is set to
344\fBCSUM_MANGLED_0\fP instead. Flag \fBBPF_F_PSEUDO_HDR\fP indicates
345the checksum is to be computed against a pseudo\-header.
346.sp
347This helper works in combination with \fBbpf_csum_diff\fP(),
348which does not update the checksum in\-place, but offers more
349flexibility and can handle sizes larger than 2 or 4 for the
350checksum to update.
351.sp
352A call to this helper is susceptible to change the underlaying
353packet buffer. Therefore, at load time, all checks on pointers
354previously done by the verifier are invalidated and must be
355performed again, if the helper is used in combination with
356direct packet access.
357.TP
358.B Return
3590 on success, or a negative error in case of failure.
360.UNINDENT
361.TP
362.B \fBint bpf_tail_call(void *\fP\fIctx\fP\fB, struct bpf_map *\fP\fIprog_array_map\fP\fB, u32\fP \fIindex\fP\fB)\fP
363.INDENT 7.0
364.TP
365.B Description
366This special helper is used to trigger a "tail call", or in
367other words, to jump into another eBPF program. The same stack
368frame is used (but values on stack and in registers for the
369caller are not accessible to the callee). This mechanism allows
370for program chaining, either for raising the maximum number of
371available eBPF instructions, or to execute given programs in
372conditional blocks. For security reasons, there is an upper
373limit to the number of successive tail calls that can be
374performed.
375.sp
376Upon call of this helper, the program attempts to jump into a
377program referenced at index \fIindex\fP in \fIprog_array_map\fP, a
378special map of type \fBBPF_MAP_TYPE_PROG_ARRAY\fP, and passes
379\fIctx\fP, a pointer to the context.
380.sp
381If the call succeeds, the kernel immediately runs the first
382instruction of the new program. This is not a function call,
383and it never returns to the previous program. If the call
384fails, then the helper has no effect, and the caller continues
385to run its subsequent instructions. A call can fail if the
386destination program for the jump does not exist (i.e. \fIindex\fP
387is superior to the number of entries in \fIprog_array_map\fP), or
388if the maximum number of tail calls has been reached for this
389chain of programs. This limit is defined in the kernel by the
390macro \fBMAX_TAIL_CALL_CNT\fP (not accessible to user space),
391which is currently set to 32.
392.TP
393.B Return
3940 on success, or a negative error in case of failure.
395.UNINDENT
396.TP
397.B \fBint bpf_clone_redirect(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP
398.INDENT 7.0
399.TP
400.B Description
401Clone and redirect the packet associated to \fIskb\fP to another
402net device of index \fIifindex\fP\&. Both ingress and egress
403interfaces can be used for redirection. The \fBBPF_F_INGRESS\fP
404value in \fIflags\fP is used to make the distinction (ingress path
405is selected if the flag is present, egress path otherwise).
406This is the only flag supported for now.
407.sp
408In comparison with \fBbpf_redirect\fP() helper,
409\fBbpf_clone_redirect\fP() has the associated cost of
410duplicating the packet buffer, but this can be executed out of
411the eBPF program. Conversely, \fBbpf_redirect\fP() is more
412efficient, but it is handled through an action code where the
413redirection happens only after the eBPF program has returned.
414.sp
415A call to this helper is susceptible to change the underlaying
416packet buffer. Therefore, at load time, all checks on pointers
417previously done by the verifier are invalidated and must be
418performed again, if the helper is used in combination with
419direct packet access.
420.TP
421.B Return
4220 on success, or a negative error in case of failure.
423.UNINDENT
424.TP
425.B \fBu64 bpf_get_current_pid_tgid(void)\fP
426.INDENT 7.0
427.TP
428.B Return
429A 64\-bit integer containing the current tgid and pid, and
430created as such:
431\fIcurrent_task\fP\fB\->tgid << 32 |\fP
432\fIcurrent_task\fP\fB\->pid\fP\&.
433.UNINDENT
434.TP
435.B \fBu64 bpf_get_current_uid_gid(void)\fP
436.INDENT 7.0
437.TP
438.B Return
439A 64\-bit integer containing the current GID and UID, and
440created as such: \fIcurrent_gid\fP \fB<< 32 |\fP \fIcurrent_uid\fP\&.
441.UNINDENT
442.TP
443.B \fBint bpf_get_current_comm(char *\fP\fIbuf\fP\fB, u32\fP \fIsize_of_buf\fP\fB)\fP
444.INDENT 7.0
445.TP
446.B Description
447Copy the \fBcomm\fP attribute of the current task into \fIbuf\fP of
448\fIsize_of_buf\fP\&. The \fBcomm\fP attribute contains the name of
449the executable (excluding the path) for the current task. The
450\fIsize_of_buf\fP must be strictly positive. On success, the
451helper makes sure that the \fIbuf\fP is NUL\-terminated. On failure,
452it is filled with zeroes.
453.TP
454.B Return
4550 on success, or a negative error in case of failure.
456.UNINDENT
457.TP
458.B \fBu32 bpf_get_cgroup_classid(struct sk_buff *\fP\fIskb\fP\fB)\fP
459.INDENT 7.0
460.TP
461.B Description
462Retrieve the classid for the current task, i.e. for the net_cls
463cgroup to which \fIskb\fP belongs.
464.sp
465This helper can be used on TC egress path, but not on ingress.
466.sp
467The net_cls cgroup provides an interface to tag network packets
468based on a user\-provided identifier for all traffic coming from
469the tasks belonging to the related cgroup. See also the related
470kernel documentation, available from the Linux sources in file
471\fIDocumentation/cgroup\-v1/net_cls.txt\fP\&.
472.sp
473The Linux kernel has two versions for cgroups: there are
474cgroups v1 and cgroups v2. Both are available to users, who can
475use a mixture of them, but note that the net_cls cgroup is for
476cgroup v1 only. This makes it incompatible with BPF programs
477run on cgroups, which is a cgroup\-v2\-only feature (a socket can
478only hold data for one version of cgroups at a time).
479.sp
480This helper is only available is the kernel was compiled with
481the \fBCONFIG_CGROUP_NET_CLASSID\fP configuration option set to
482"\fBy\fP" or to "\fBm\fP".
483.TP
484.B Return
485The classid, or 0 for the default unconfigured classid.
486.UNINDENT
487.TP
488.B \fBint bpf_skb_vlan_push(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIvlan_proto\fP\fB, u16\fP \fIvlan_tci\fP\fB)\fP
489.INDENT 7.0
490.TP
491.B Description
492Push a \fIvlan_tci\fP (VLAN tag control information) of protocol
493\fIvlan_proto\fP to the packet associated to \fIskb\fP, then update
494the checksum. Note that if \fIvlan_proto\fP is different from
495\fBETH_P_8021Q\fP and \fBETH_P_8021AD\fP, it is considered to
496be \fBETH_P_8021Q\fP\&.
497.sp
498A call to this helper is susceptible to change the underlaying
499packet buffer. Therefore, at load time, all checks on pointers
500previously done by the verifier are invalidated and must be
501performed again, if the helper is used in combination with
502direct packet access.
503.TP
504.B Return
5050 on success, or a negative error in case of failure.
506.UNINDENT
507.TP
508.B \fBint bpf_skb_vlan_pop(struct sk_buff *\fP\fIskb\fP\fB)\fP
509.INDENT 7.0
510.TP
511.B Description
512Pop a VLAN header from the packet associated to \fIskb\fP\&.
513.sp
514A call to this helper is susceptible to change the underlaying
515packet buffer. Therefore, at load time, all checks on pointers
516previously done by the verifier are invalidated and must be
517performed again, if the helper is used in combination with
518direct packet access.
519.TP
520.B Return
5210 on success, or a negative error in case of failure.
522.UNINDENT
523.TP
524.B \fBint bpf_skb_get_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
525.INDENT 7.0
526.TP
527.B Description
528Get tunnel metadata. This helper takes a pointer \fIkey\fP to an
529empty \fBstruct bpf_tunnel_key\fP of \fBsize\fP, that will be
530filled with tunnel metadata for the packet associated to \fIskb\fP\&.
531The \fIflags\fP can be set to \fBBPF_F_TUNINFO_IPV6\fP, which
532indicates that the tunnel is based on IPv6 protocol instead of
533IPv4.
534.sp
535The \fBstruct bpf_tunnel_key\fP is an object that generalizes the
536principal parameters used by various tunneling protocols into a
537single struct. This way, it can be used to easily make a
538decision based on the contents of the encapsulation header,
539"summarized" in this struct. In particular, it holds the IP
540address of the remote end (IPv4 or IPv6, depending on the case)
541in \fIkey\fP\fB\->remote_ipv4\fP or \fIkey\fP\fB\->remote_ipv6\fP\&. Also,
542this struct exposes the \fIkey\fP\fB\->tunnel_id\fP, which is
543generally mapped to a VNI (Virtual Network Identifier), making
544it programmable together with the \fBbpf_skb_set_tunnel_key\fP() helper.
545.sp
546Let\(aqs imagine that the following code is part of a program
547attached to the TC ingress interface, on one end of a GRE
548tunnel, and is supposed to filter out all messages coming from
549remote ends with IPv4 address other than 10.0.0.1:
550.INDENT 7.0
551.INDENT 3.5
552.sp
553.nf
554.ft C
555int ret;
556struct bpf_tunnel_key key = {};
557
558ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
559if (ret < 0)
560 return TC_ACT_SHOT; // drop packet
561
562if (key.remote_ipv4 != 0x0a000001)
563 return TC_ACT_SHOT; // drop packet
564
565return TC_ACT_OK; // accept packet
566.ft P
567.fi
568.UNINDENT
569.UNINDENT
570.sp
571This interface can also be used with all encapsulation devices
572that can operate in "collect metadata" mode: instead of having
573one network device per specific configuration, the "collect
574metadata" mode only requires a single device where the
575configuration can be extracted from this helper.
576.sp
577This can be used together with various tunnels such as VXLan,
578Geneve, GRE or IP in IP (IPIP).
579.TP
580.B Return
5810 on success, or a negative error in case of failure.
582.UNINDENT
583.TP
584.B \fBint bpf_skb_set_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
585.INDENT 7.0
586.TP
587.B Description
588Populate tunnel metadata for packet associated to \fIskb.\fP The
589tunnel metadata is set to the contents of \fIkey\fP, of \fIsize\fP\&. The
590\fIflags\fP can be set to a combination of the following values:
591.INDENT 7.0
592.TP
593.B \fBBPF_F_TUNINFO_IPV6\fP
594Indicate that the tunnel is based on IPv6 protocol
595instead of IPv4.
596.TP
597.B \fBBPF_F_ZERO_CSUM_TX\fP
598For IPv4 packets, add a flag to tunnel metadata
599indicating that checksum computation should be skipped
600and checksum set to zeroes.
601.TP
602.B \fBBPF_F_DONT_FRAGMENT\fP
603Add a flag to tunnel metadata indicating that the
604packet should not be fragmented.
605.TP
606.B \fBBPF_F_SEQ_NUMBER\fP
607Add a flag to tunnel metadata indicating that a
608sequence number should be added to tunnel header before
609sending the packet. This flag was added for GRE
610encapsulation, but might be used with other protocols
611as well in the future.
612.UNINDENT
613.sp
614Here is a typical usage on the transmit path:
615.INDENT 7.0
616.INDENT 3.5
617.sp
618.nf
619.ft C
620struct bpf_tunnel_key key;
621 populate key ...
622bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
623bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
624.ft P
625.fi
626.UNINDENT
627.UNINDENT
628.sp
629See also the description of the \fBbpf_skb_get_tunnel_key\fP()
630helper for additional information.
631.TP
632.B Return
6330 on success, or a negative error in case of failure.
634.UNINDENT
635.TP
636.B \fBu64 bpf_perf_event_read(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP
637.INDENT 7.0
638.TP
639.B Description
640Read the value of a perf event counter. This helper relies on a
641\fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of
642the perf event counter is selected when \fImap\fP is updated with
643perf event file descriptors. The \fImap\fP is an array whose size
644is the number of available CPUs, and each cell contains a value
645relative to one CPU. The value to retrieve is indicated by
646\fIflags\fP, that contains the index of the CPU to look up, masked
647with \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to
648\fBBPF_F_CURRENT_CPU\fP to indicate that the value for the
649current CPU should be retrieved.
650.sp
651Note that before Linux 4.13, only hardware perf event can be
652retrieved.
653.sp
654Also, be aware that the newer helper
655\fBbpf_perf_event_read_value\fP() is recommended over
656\fBbpf_perf_event_read\fP() in general. The latter has some ABI
657quirks where error and counter value are used as a return code
658(which is wrong to do since ranges may overlap). This issue is
659fixed with \fBbpf_perf_event_read_value\fP(), which at the same
660time provides more features over the \fBbpf_perf_event_read\fP() interface. Please refer to the description of
661\fBbpf_perf_event_read_value\fP() for details.
662.TP
663.B Return
664The value of the perf event counter read from the map, or a
665negative error code in case of failure.
666.UNINDENT
667.TP
668.B \fBint bpf_redirect(u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP
669.INDENT 7.0
670.TP
671.B Description
672Redirect the packet to another net device of index \fIifindex\fP\&.
673This helper is somewhat similar to \fBbpf_clone_redirect\fP(), except that the packet is not cloned, which provides
674increased performance.
675.sp
676Except for XDP, both ingress and egress interfaces can be used
677for redirection. The \fBBPF_F_INGRESS\fP value in \fIflags\fP is used
678to make the distinction (ingress path is selected if the flag
679is present, egress path otherwise). Currently, XDP only
680supports redirection to the egress interface, and accepts no
681flag at all.
682.sp
683The same effect can be attained with the more generic
684\fBbpf_redirect_map\fP(), which requires specific maps to be
685used but offers better performance.
686.TP
687.B Return
688For XDP, the helper returns \fBXDP_REDIRECT\fP on success or
689\fBXDP_ABORTED\fP on error. For other program types, the values
690are \fBTC_ACT_REDIRECT\fP on success or \fBTC_ACT_SHOT\fP on
691error.
692.UNINDENT
693.TP
694.B \fBu32 bpf_get_route_realm(struct sk_buff *\fP\fIskb\fP\fB)\fP
695.INDENT 7.0
696.TP
697.B Description
698Retrieve the realm or the route, that is to say the
699\fBtclassid\fP field of the destination for the \fIskb\fP\&. The
700indentifier retrieved is a user\-provided tag, similar to the
701one used with the net_cls cgroup (see description for
702\fBbpf_get_cgroup_classid\fP() helper), but here this tag is
703held by a route (a destination entry), not by a task.
704.sp
705Retrieving this identifier works with the clsact TC egress hook
706(see also \fBtc\-bpf(8)\fP), or alternatively on conventional
707classful egress qdiscs, but not on TC ingress path. In case of
708clsact TC egress hook, this has the advantage that, internally,
709the destination entry has not been dropped yet in the transmit
710path. Therefore, the destination entry does not need to be
711artificially held via \fBnetif_keep_dst\fP() for a classful
712qdisc until the \fIskb\fP is freed.
713.sp
714This helper is available only if the kernel was compiled with
715\fBCONFIG_IP_ROUTE_CLASSID\fP configuration option.
716.TP
717.B Return
718The realm of the route for the packet associated to \fIskb\fP, or 0
719if none was found.
720.UNINDENT
721.TP
722.B \fBint bpf_perf_event_output(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, void *\fP\fIdata\fP\fB, u64\fP \fIsize\fP\fB)\fP
723.INDENT 7.0
724.TP
725.B Description
726Write raw \fIdata\fP blob into a special BPF perf event held by
727\fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. This perf
728event must have the following attributes: \fBPERF_SAMPLE_RAW\fP
729as \fBsample_type\fP, \fBPERF_TYPE_SOFTWARE\fP as \fBtype\fP, and
730\fBPERF_COUNT_SW_BPF_OUTPUT\fP as \fBconfig\fP\&.
731.sp
732The \fIflags\fP are used to indicate the index in \fImap\fP for which
733the value must be put, masked with \fBBPF_F_INDEX_MASK\fP\&.
734Alternatively, \fIflags\fP can be set to \fBBPF_F_CURRENT_CPU\fP
735to indicate that the index of the current CPU core should be
736used.
737.sp
738The value to write, of \fIsize\fP, is passed through eBPF stack and
739pointed by \fIdata\fP\&.
740.sp
741The context of the program \fIctx\fP needs also be passed to the
742helper.
743.sp
744On user space, a program willing to read the values needs to
745call \fBperf_event_open\fP() on the perf event (either for
746one or for all CPUs) and to store the file descriptor into the
747\fImap\fP\&. This must be done before the eBPF program can send data
748into it. An example is available in file
749\fIsamples/bpf/trace_output_user.c\fP in the Linux kernel source
750tree (the eBPF program counterpart is in
751\fIsamples/bpf/trace_output_kern.c\fP).
752.sp
753\fBbpf_perf_event_output\fP() achieves better performance
754than \fBbpf_trace_printk\fP() for sharing data with user
755space, and is much better suitable for streaming data from eBPF
756programs.
757.sp
758Note that this helper is not restricted to tracing use cases
759and can be used with programs attached to TC or XDP as well,
760where it allows for passing data to user space listeners. Data
761can be:
762.INDENT 7.0
763.IP \(bu 2
764Only custom structs,
765.IP \(bu 2
766Only the packet payload, or
767.IP \(bu 2
768A combination of both.
769.UNINDENT
770.TP
771.B Return
7720 on success, or a negative error in case of failure.
773.UNINDENT
774.TP
775.B \fBint bpf_skb_load_bytes(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB)\fP
776.INDENT 7.0
777.TP
778.B Description
779This helper was provided as an easy way to load data from a
780packet. It can be used to load \fIlen\fP bytes from \fIoffset\fP from
781the packet associated to \fIskb\fP, into the buffer pointed by
782\fIto\fP\&.
783.sp
784Since Linux 4.7, usage of this helper has mostly been replaced
785by "direct packet access", enabling packet data to be
786manipulated with \fIskb\fP\fB\->data\fP and \fIskb\fP\fB\->data_end\fP
787pointing respectively to the first byte of packet data and to
788the byte after the last byte of packet data. However, it
789remains useful if one wishes to read large quantities of data
790at once from a packet into the eBPF stack.
791.TP
792.B Return
7930 on success, or a negative error in case of failure.
794.UNINDENT
795.TP
796.B \fBint bpf_get_stackid(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP
797.INDENT 7.0
798.TP
799.B Description
800Walk a user or a kernel stack and return its id. To achieve
801this, the helper needs \fIctx\fP, which is a pointer to the context
802on which the tracing program is executed, and a pointer to a
803\fImap\fP of type \fBBPF_MAP_TYPE_STACK_TRACE\fP\&.
804.sp
805The last argument, \fIflags\fP, holds the number of stack frames to
806skip (from 0 to 255), masked with
807\fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set
808a combination of the following flags:
809.INDENT 7.0
810.TP
811.B \fBBPF_F_USER_STACK\fP
812Collect a user space stack instead of a kernel stack.
813.TP
814.B \fBBPF_F_FAST_STACK_CMP\fP
815Compare stacks by hash only.
816.TP
817.B \fBBPF_F_REUSE_STACKID\fP
818If two different stacks hash into the same \fIstackid\fP,
819discard the old one.
820.UNINDENT
821.sp
822The stack id retrieved is a 32 bit long integer handle which
823can be further combined with other data (including other stack
824ids) and used as a key into maps. This can be useful for
825generating a variety of graphs (such as flame graphs or off\-cpu
826graphs).
827.sp
828For walking a stack, this helper is an improvement over
829\fBbpf_probe_read\fP(), which can be used with unrolled loops
830but is not efficient and consumes a lot of eBPF instructions.
831Instead, \fBbpf_get_stackid\fP() can collect up to
832\fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames. Note that
833this limit can be controlled with the \fBsysctl\fP program, and
834that it should be manually increased in order to profile long
835user stacks (such as stacks for Java programs). To do so, use:
836.INDENT 7.0
837.INDENT 3.5
838.sp
839.nf
840.ft C
841# sysctl kernel.perf_event_max_stack=<new value>
842.ft P
843.fi
844.UNINDENT
845.UNINDENT
846.TP
847.B Return
848The positive or null stack id on success, or a negative error
849in case of failure.
850.UNINDENT
851.TP
852.B \fBs64 bpf_csum_diff(__be32 *\fP\fIfrom\fP\fB, u32\fP \fIfrom_size\fP\fB, __be32 *\fP\fIto\fP\fB, u32\fP \fIto_size\fP\fB, __wsum\fP \fIseed\fP\fB)\fP
853.INDENT 7.0
854.TP
855.B Description
856Compute a checksum difference, from the raw buffer pointed by
857\fIfrom\fP, of length \fIfrom_size\fP (that must be a multiple of 4),
858towards the raw buffer pointed by \fIto\fP, of size \fIto_size\fP
859(same remark). An optional \fIseed\fP can be added to the value
860(this can be cascaded, the seed may come from a previous call
861to the helper).
862.sp
863This is flexible enough to be used in several ways:
864.INDENT 7.0
865.IP \(bu 2
866With \fIfrom_size\fP == 0, \fIto_size\fP > 0 and \fIseed\fP set to
867checksum, it can be used when pushing new data.
868.IP \(bu 2
869With \fIfrom_size\fP > 0, \fIto_size\fP == 0 and \fIseed\fP set to
870checksum, it can be used when removing data from a packet.
871.IP \(bu 2
872With \fIfrom_size\fP > 0, \fIto_size\fP > 0 and \fIseed\fP set to 0, it
873can be used to compute a diff. Note that \fIfrom_size\fP and
874\fIto_size\fP do not need to be equal.
875.UNINDENT
876.sp
877This helper can be used in combination with
878\fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(), to
879which one can feed in the difference computed with
880\fBbpf_csum_diff\fP().
881.TP
882.B Return
883The checksum result, or a negative error code in case of
884failure.
885.UNINDENT
886.TP
887.B \fBint bpf_skb_get_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP
888.INDENT 7.0
889.TP
890.B Description
891Retrieve tunnel options metadata for the packet associated to
892\fIskb\fP, and store the raw tunnel option data to the buffer \fIopt\fP
893of \fIsize\fP\&.
894.sp
895This helper can be used with encapsulation devices that can
896operate in "collect metadata" mode (please refer to the related
897note in the description of \fBbpf_skb_get_tunnel_key\fP() for
898more details). A particular example where this can be used is
899in combination with the Geneve encapsulation protocol, where it
900allows for pushing (with \fBbpf_skb_get_tunnel_opt\fP() helper)
901and retrieving arbitrary TLVs (Type\-Length\-Value headers) from
902the eBPF program. This allows for full customization of these
903headers.
904.TP
905.B Return
906The size of the option data retrieved.
907.UNINDENT
908.TP
909.B \fBint bpf_skb_set_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP
910.INDENT 7.0
911.TP
912.B Description
913Set tunnel options metadata for the packet associated to \fIskb\fP
914to the option data contained in the raw buffer \fIopt\fP of \fIsize\fP\&.
915.sp
916See also the description of the \fBbpf_skb_get_tunnel_opt\fP()
917helper for additional information.
918.TP
919.B Return
9200 on success, or a negative error in case of failure.
921.UNINDENT
922.TP
923.B \fBint bpf_skb_change_proto(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIproto\fP\fB, u64\fP \fIflags\fP\fB)\fP
924.INDENT 7.0
925.TP
926.B Description
927Change the protocol of the \fIskb\fP to \fIproto\fP\&. Currently
928supported are transition from IPv4 to IPv6, and from IPv6 to
929IPv4. The helper takes care of the groundwork for the
930transition, including resizing the socket buffer. The eBPF
931program is expected to fill the new headers, if any, via
932\fBskb_store_bytes\fP() and to recompute the checksums with
933\fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(). The main case for this helper is to perform NAT64
934operations out of an eBPF program.
935.sp
936Internally, the GSO type is marked as dodgy so that headers are
937checked and segments are recalculated by the GSO/GRO engine.
938The size for GSO target is adapted as well.
939.sp
940All values for \fIflags\fP are reserved for future usage, and must
941be left at zero.
942.sp
943A call to this helper is susceptible to change the underlaying
944packet buffer. Therefore, at load time, all checks on pointers
945previously done by the verifier are invalidated and must be
946performed again, if the helper is used in combination with
947direct packet access.
948.TP
949.B Return
9500 on success, or a negative error in case of failure.
951.UNINDENT
952.TP
953.B \fBint bpf_skb_change_type(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB)\fP
954.INDENT 7.0
955.TP
956.B Description
957Change the packet type for the packet associated to \fIskb\fP\&. This
958comes down to setting \fIskb\fP\fB\->pkt_type\fP to \fItype\fP, except
959the eBPF program does not have a write access to \fIskb\fP\fB\->pkt_type\fP beside this helper. Using a helper here allows
960for graceful handling of errors.
961.sp
962The major use case is to change incoming \fIskb*s to
963**PACKET_HOST*\fP in a programmatic way instead of having to
964recirculate via \fBredirect\fP(..., \fBBPF_F_INGRESS\fP), for
965example.
966.sp
967Note that \fItype\fP only allows certain values. At this time, they
968are:
969.INDENT 7.0
970.TP
971.B \fBPACKET_HOST\fP
972Packet is for us.
973.TP
974.B \fBPACKET_BROADCAST\fP
975Send packet to all.
976.TP
977.B \fBPACKET_MULTICAST\fP
978Send packet to group.
979.TP
980.B \fBPACKET_OTHERHOST\fP
981Send packet to someone else.
982.UNINDENT
983.TP
984.B Return
9850 on success, or a negative error in case of failure.
986.UNINDENT
987.TP
988.B \fBint bpf_skb_under_cgroup(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP
989.INDENT 7.0
990.TP
991.B Description
992Check whether \fIskb\fP is a descendant of the cgroup2 held by
993\fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&.
994.TP
995.B Return
996The return value depends on the result of the test, and can be:
997.INDENT 7.0
998.IP \(bu 2
9990, if the \fIskb\fP failed the cgroup2 descendant test.
1000.IP \(bu 2
10011, if the \fIskb\fP succeeded the cgroup2 descendant test.
1002.IP \(bu 2
1003A negative error code, if an error occurred.
1004.UNINDENT
1005.UNINDENT
1006.TP
1007.B \fBu32 bpf_get_hash_recalc(struct sk_buff *\fP\fIskb\fP\fB)\fP
1008.INDENT 7.0
1009.TP
1010.B Description
1011Retrieve the hash of the packet, \fIskb\fP\fB\->hash\fP\&. If it is
1012not set, in particular if the hash was cleared due to mangling,
1013recompute this hash. Later accesses to the hash can be done
1014directly with \fIskb\fP\fB\->hash\fP\&.
1015.sp
1016Calling \fBbpf_set_hash_invalid\fP(), changing a packet
1017prototype with \fBbpf_skb_change_proto\fP(), or calling
1018\fBbpf_skb_store_bytes\fP() with the
1019\fBBPF_F_INVALIDATE_HASH\fP are actions susceptible to clear
1020the hash and to trigger a new computation for the next call to
1021\fBbpf_get_hash_recalc\fP().
1022.TP
1023.B Return
1024The 32\-bit hash.
1025.UNINDENT
1026.TP
1027.B \fBu64 bpf_get_current_task(void)\fP
1028.INDENT 7.0
1029.TP
1030.B Return
1031A pointer to the current task struct.
1032.UNINDENT
1033.TP
1034.B \fBint bpf_probe_write_user(void *\fP\fIdst\fP\fB, const void *\fP\fIsrc\fP\fB, u32\fP \fIlen\fP\fB)\fP
1035.INDENT 7.0
1036.TP
1037.B Description
1038Attempt in a safe way to write \fIlen\fP bytes from the buffer
1039\fIsrc\fP to \fIdst\fP in memory. It only works for threads that are in
1040user context, and \fIdst\fP must be a valid user space address.
1041.sp
1042This helper should not be used to implement any kind of
1043security mechanism because of TOC\-TOU attacks, but rather to
1044debug, divert, and manipulate execution of semi\-cooperative
1045processes.
1046.sp
1047Keep in mind that this feature is meant for experiments, and it
1048has a risk of crashing the system and running programs.
1049Therefore, when an eBPF program using this helper is attached,
1050a warning including PID and process name is printed to kernel
1051logs.
1052.TP
1053.B Return
10540 on success, or a negative error in case of failure.
1055.UNINDENT
1056.TP
1057.B \fBint bpf_current_task_under_cgroup(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP
1058.INDENT 7.0
1059.TP
1060.B Description
1061Check whether the probe is being run is the context of a given
1062subset of the cgroup2 hierarchy. The cgroup2 to test is held by
1063\fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&.
1064.TP
1065.B Return
1066The return value depends on the result of the test, and can be:
1067.INDENT 7.0
1068.IP \(bu 2
10690, if the \fIskb\fP task belongs to the cgroup2.
1070.IP \(bu 2
10711, if the \fIskb\fP task does not belong to the cgroup2.
1072.IP \(bu 2
1073A negative error code, if an error occurred.
1074.UNINDENT
1075.UNINDENT
1076.TP
1077.B \fBint bpf_skb_change_tail(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
1078.INDENT 7.0
1079.TP
1080.B Description
1081Resize (trim or grow) the packet associated to \fIskb\fP to the
1082new \fIlen\fP\&. The \fIflags\fP are reserved for future usage, and must
1083be left at zero.
1084.sp
1085The basic idea is that the helper performs the needed work to
1086change the size of the packet, then the eBPF program rewrites
1087the rest via helpers like \fBbpf_skb_store_bytes\fP(),
1088\fBbpf_l3_csum_replace\fP(), \fBbpf_l3_csum_replace\fP()
1089and others. This helper is a slow path utility intended for
1090replies with control messages. And because it is targeted for
1091slow path, the helper itself can afford to be slow: it
1092implicitly linearizes, unclones and drops offloads from the
1093\fIskb\fP\&.
1094.sp
1095A call to this helper is susceptible to change the underlaying
1096packet buffer. Therefore, at load time, all checks on pointers
1097previously done by the verifier are invalidated and must be
1098performed again, if the helper is used in combination with
1099direct packet access.
1100.TP
1101.B Return
11020 on success, or a negative error in case of failure.
1103.UNINDENT
1104.TP
1105.B \fBint bpf_skb_pull_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB)\fP
1106.INDENT 7.0
1107.TP
1108.B Description
1109Pull in non\-linear data in case the \fIskb\fP is non\-linear and not
1110all of \fIlen\fP are part of the linear section. Make \fIlen\fP bytes
1111from \fIskb\fP readable and writable. If a zero value is passed for
1112\fIlen\fP, then the whole length of the \fIskb\fP is pulled.
1113.sp
1114This helper is only needed for reading and writing with direct
1115packet access.
1116.sp
1117For direct packet access, testing that offsets to access
1118are within packet boundaries (test on \fIskb\fP\fB\->data_end\fP) is
1119susceptible to fail if offsets are invalid, or if the requested
1120data is in non\-linear parts of the \fIskb\fP\&. On failure the
1121program can just bail out, or in the case of a non\-linear
1122buffer, use a helper to make the data available. The
1123\fBbpf_skb_load_bytes\fP() helper is a first solution to access
1124the data. Another one consists in using \fBbpf_skb_pull_data\fP
1125to pull in once the non\-linear parts, then retesting and
1126eventually access the data.
1127.sp
1128At the same time, this also makes sure the \fIskb\fP is uncloned,
1129which is a necessary condition for direct write. As this needs
1130to be an invariant for the write part only, the verifier
1131detects writes and adds a prologue that is calling
1132\fBbpf_skb_pull_data()\fP to effectively unclone the \fIskb\fP from
1133the very beginning in case it is indeed cloned.
1134.sp
1135A call to this helper is susceptible to change the underlaying
1136packet buffer. Therefore, at load time, all checks on pointers
1137previously done by the verifier are invalidated and must be
1138performed again, if the helper is used in combination with
1139direct packet access.
1140.TP
1141.B Return
11420 on success, or a negative error in case of failure.
1143.UNINDENT
1144.TP
1145.B \fBs64 bpf_csum_update(struct sk_buff *\fP\fIskb\fP\fB, __wsum\fP \fIcsum\fP\fB)\fP
1146.INDENT 7.0
1147.TP
1148.B Description
1149Add the checksum \fIcsum\fP into \fIskb\fP\fB\->csum\fP in case the
1150driver has supplied a checksum for the entire packet into that
1151field. Return an error otherwise. This helper is intended to be
1152used in combination with \fBbpf_csum_diff\fP(), in particular
1153when the checksum needs to be updated after data has been
1154written into the packet through direct packet access.
1155.TP
1156.B Return
1157The checksum on success, or a negative error code in case of
1158failure.
1159.UNINDENT
1160.TP
1161.B \fBvoid bpf_set_hash_invalid(struct sk_buff *\fP\fIskb\fP\fB)\fP
1162.INDENT 7.0
1163.TP
1164.B Description
1165Invalidate the current \fIskb\fP\fB\->hash\fP\&. It can be used after
1166mangling on headers through direct packet access, in order to
1167indicate that the hash is outdated and to trigger a
1168recalculation the next time the kernel tries to access this
1169hash or when the \fBbpf_get_hash_recalc\fP() helper is called.
1170.UNINDENT
1171.TP
1172.B \fBint bpf_get_numa_node_id(void)\fP
1173.INDENT 7.0
1174.TP
1175.B Description
1176Return the id of the current NUMA node. The primary use case
1177for this helper is the selection of sockets for the local NUMA
1178node, when the program is attached to sockets using the
1179\fBSO_ATTACH_REUSEPORT_EBPF\fP option (see also \fBsocket(7)\fP),
1180but the helper is also available to other eBPF program types,
1181similarly to \fBbpf_get_smp_processor_id\fP().
1182.TP
1183.B Return
1184The id of current NUMA node.
1185.UNINDENT
1186.TP
1187.B \fBint bpf_skb_change_head(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
1188.INDENT 7.0
1189.TP
1190.B Description
1191Grows headroom of packet associated to \fIskb\fP and adjusts the
1192offset of the MAC header accordingly, adding \fIlen\fP bytes of
1193space. It automatically extends and reallocates memory as
1194required.
1195.sp
1196This helper can be used on a layer 3 \fIskb\fP to push a MAC header
1197for redirection into a layer 2 device.
1198.sp
1199All values for \fIflags\fP are reserved for future usage, and must
1200be left at zero.
1201.sp
1202A call to this helper is susceptible to change the underlaying
1203packet buffer. Therefore, at load time, all checks on pointers
1204previously done by the verifier are invalidated and must be
1205performed again, if the helper is used in combination with
1206direct packet access.
1207.TP
1208.B Return
12090 on success, or a negative error in case of failure.
1210.UNINDENT
1211.TP
1212.B \fBint bpf_xdp_adjust_head(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP
1213.INDENT 7.0
1214.TP
1215.B Description
1216Adjust (move) \fIxdp_md\fP\fB\->data\fP by \fIdelta\fP bytes. Note that
1217it is possible to use a negative value for \fIdelta\fP\&. This helper
1218can be used to prepare the packet for pushing or popping
1219headers.
1220.sp
1221A call to this helper is susceptible to change the underlaying
1222packet buffer. Therefore, at load time, all checks on pointers
1223previously done by the verifier are invalidated and must be
1224performed again, if the helper is used in combination with
1225direct packet access.
1226.TP
1227.B Return
12280 on success, or a negative error in case of failure.
1229.UNINDENT
1230.TP
1231.B \fBint bpf_probe_read_str(void *\fP\fIdst\fP\fB, int\fP \fIsize\fP\fB, const void *\fP\fIunsafe_ptr\fP\fB)\fP
1232.INDENT 7.0
1233.TP
1234.B Description
1235Copy a NUL terminated string from an unsafe address
1236\fIunsafe_ptr\fP to \fIdst\fP\&. The \fIsize\fP should include the
1237terminating NUL byte. In case the string length is smaller than
1238\fIsize\fP, the target is not padded with further NUL bytes. If the
1239string length is larger than \fIsize\fP, just \fIsize\fP\-1 bytes are
1240copied and the last byte is set to NUL.
1241.sp
1242On success, the length of the copied string is returned. This
1243makes this helper useful in tracing programs for reading
1244strings, and more importantly to get its length at runtime. See
1245the following snippet:
1246.INDENT 7.0
1247.INDENT 3.5
1248.sp
1249.nf
1250.ft C
1251SEC("kprobe/sys_open")
1252void bpf_sys_open(struct pt_regs *ctx)
53666f6c
MK
1253 char buf[PATHLEN]; // PATHLEN is defined to 256
1254 int res = bpf_probe_read_str(buf, sizeof(buf),
1255 ctx\->di);
1256
1257 // Consume buf, for example push it to
1258 // userspace via bpf_perf_event_output(); we
1259 // can use res (the string length) as event
1260 // size, after checking its boundaries.
53666f6c
MK
1261.ft P
1262.fi
1263.UNINDENT
1264.UNINDENT
1265.sp
1266In comparison, using \fBbpf_probe_read()\fP helper here instead
1267to read the string would require to estimate the length at
1268compile time, and would often result in copying more memory
1269than necessary.
1270.sp
1271Another useful use case is when parsing individual process
1272arguments or individual environment variables navigating
1273\fIcurrent\fP\fB\->mm\->arg_start\fP and \fIcurrent\fP\fB\->mm\->env_start\fP: using this helper and the return value,
1274one can quickly iterate at the right offset of the memory area.
1275.TP
1276.B Return
1277On success, the strictly positive length of the string,
1278including the trailing NUL character. On error, a negative
1279value.
1280.UNINDENT
1281.TP
1282.B \fBu64 bpf_get_socket_cookie(struct sk_buff *\fP\fIskb\fP\fB)\fP
1283.INDENT 7.0
1284.TP
1285.B Description
1286If the \fBstruct sk_buff\fP pointed by \fIskb\fP has a known socket,
1287retrieve the cookie (generated by the kernel) of this socket.
1288If no cookie has been set yet, generate a new cookie. Once
1289generated, the socket cookie remains stable for the life of the
1290socket. This helper can be useful for monitoring per socket
1291networking traffic statistics as it provides a unique socket
1292identifier per namespace.
1293.TP
1294.B Return
1295A 8\-byte long non\-decreasing number on success, or 0 if the
1296socket field is missing inside \fIskb\fP\&.
1297.UNINDENT
1298.TP
1299.B \fBu64 bpf_get_socket_cookie(struct bpf_sock_addr *\fP\fIctx\fP\fB)\fP
1300.INDENT 7.0
1301.TP
1302.B Description
1303Equivalent to bpf_get_socket_cookie() helper that accepts
1304\fIskb\fP, but gets socket from \fBstruct bpf_sock_addr\fP contex.
1305.TP
1306.B Return
1307A 8\-byte long non\-decreasing number.
1308.UNINDENT
1309.TP
1310.B \fBu64 bpf_get_socket_cookie(struct bpf_sock_ops *\fP\fIctx\fP\fB)\fP
1311.INDENT 7.0
1312.TP
1313.B Description
1314Equivalent to bpf_get_socket_cookie() helper that accepts
1315\fIskb\fP, but gets socket from \fBstruct bpf_sock_ops\fP contex.
1316.TP
1317.B Return
1318A 8\-byte long non\-decreasing number.
1319.UNINDENT
1320.TP
1321.B \fBu32 bpf_get_socket_uid(struct sk_buff *\fP\fIskb\fP\fB)\fP
1322.INDENT 7.0
1323.TP
1324.B Return
1325The owner UID of the socket associated to \fIskb\fP\&. If the socket
1326is \fBNULL\fP, or if it is not a full socket (i.e. if it is a
1327time\-wait or a request socket instead), \fBoverflowuid\fP value
1328is returned (note that \fBoverflowuid\fP might also be the actual
1329UID value for the socket).
1330.UNINDENT
1331.TP
1332.B \fBu32 bpf_set_hash(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIhash\fP\fB)\fP
1333.INDENT 7.0
1334.TP
1335.B Description
1336Set the full hash for \fIskb\fP (set the field \fIskb\fP\fB\->hash\fP)
1337to value \fIhash\fP\&.
1338.TP
1339.B Return
53666f6c
MK
1340.UNINDENT
1341.TP
1342.B \fBint bpf_setsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP
1343.INDENT 7.0
1344.TP
1345.B Description
1346Emulate a call to \fBsetsockopt()\fP on the socket associated to
1347\fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at
1348which the option resides and the name \fIoptname\fP of the option
1349must be specified, see \fBsetsockopt(2)\fP for more information.
1350The option value of length \fIoptlen\fP is pointed by \fIoptval\fP\&.
1351.sp
1352This helper actually implements a subset of \fBsetsockopt()\fP\&.
1353It supports the following \fIlevel\fPs:
1354.INDENT 7.0
1355.IP \(bu 2
1356\fBSOL_SOCKET\fP, which supports the following \fIoptname\fPs:
1357\fBSO_RCVBUF\fP, \fBSO_SNDBUF\fP, \fBSO_MAX_PACING_RATE\fP,
1358\fBSO_PRIORITY\fP, \fBSO_RCVLOWAT\fP, \fBSO_MARK\fP\&.
1359.IP \(bu 2
1360\fBIPPROTO_TCP\fP, which supports the following \fIoptname\fPs:
1361\fBTCP_CONGESTION\fP, \fBTCP_BPF_IW\fP,
1362\fBTCP_BPF_SNDCWND_CLAMP\fP\&.
1363.IP \(bu 2
1364\fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&.
1365.IP \(bu 2
1366\fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&.
1367.UNINDENT
1368.TP
1369.B Return
13700 on success, or a negative error in case of failure.
1371.UNINDENT
1372.TP
2223d7df 1373.B \fBint bpf_skb_adjust_room(struct sk_buff *\fP\fIskb\fP\fB, s32\fP \fIlen_diff\fP\fB, u32\fP \fImode\fP\fB, u64\fP \fIflags\fP\fB)\fP
53666f6c
MK
1374.INDENT 7.0
1375.TP
1376.B Description
1377Grow or shrink the room for data in the packet associated to
1378\fIskb\fP by \fIlen_diff\fP, and according to the selected \fImode\fP\&.
1379.sp
1380There is a single supported mode at this time:
1381.INDENT 7.0
1382.IP \(bu 2
1383\fBBPF_ADJ_ROOM_NET\fP: Adjust room at the network layer
1384(room space is added or removed below the layer 3 header).
1385.UNINDENT
1386.sp
1387All values for \fIflags\fP are reserved for future usage, and must
1388be left at zero.
1389.sp
1390A call to this helper is susceptible to change the underlaying
1391packet buffer. Therefore, at load time, all checks on pointers
1392previously done by the verifier are invalidated and must be
1393performed again, if the helper is used in combination with
1394direct packet access.
1395.TP
1396.B Return
13970 on success, or a negative error in case of failure.
1398.UNINDENT
1399.TP
1400.B \fBint bpf_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1401.INDENT 7.0
1402.TP
1403.B Description
1404Redirect the packet to the endpoint referenced by \fImap\fP at
1405index \fIkey\fP\&. Depending on its type, this \fImap\fP can contain
1406references to net devices (for forwarding packets through other
1407ports), or to CPUs (for redirecting XDP frames to another CPU;
1408but this is only implemented for native XDP (with driver
1409support) as of this writing).
1410.sp
1411All values for \fIflags\fP are reserved for future usage, and must
1412be left at zero.
1413.sp
1414When used to redirect packets to net devices, this helper
1415provides a high performance increase over \fBbpf_redirect\fP().
1416This is due to various implementation details of the underlying
1417mechanisms, one of which is the fact that \fBbpf_redirect_map\fP() tries to send packet as a "bulk" to the device.
1418.TP
1419.B Return
1420\fBXDP_REDIRECT\fP on success, or \fBXDP_ABORTED\fP on error.
1421.UNINDENT
1422.TP
1423.B \fBint bpf_sk_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1424.INDENT 7.0
1425.TP
1426.B Description
1427Redirect the packet to the socket referenced by \fImap\fP (of type
1428\fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and
1429egress interfaces can be used for redirection. The
1430\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
1431distinction (ingress path is selected if the flag is present,
1432egress path otherwise). This is the only flag supported for now.
1433.TP
1434.B Return
1435\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
1436.UNINDENT
1437.TP
1438.B \fBint bpf_sock_map_update(struct bpf_sock_ops *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1439.INDENT 7.0
1440.TP
1441.B Description
1442Add an entry to, or update a \fImap\fP referencing sockets. The
1443\fIskops\fP is used as a new value for the entry associated to
1444\fIkey\fP\&. \fIflags\fP is one of:
1445.INDENT 7.0
1446.TP
1447.B \fBBPF_NOEXIST\fP
1448The entry for \fIkey\fP must not exist in the map.
1449.TP
1450.B \fBBPF_EXIST\fP
1451The entry for \fIkey\fP must already exist in the map.
1452.TP
1453.B \fBBPF_ANY\fP
1454No condition on the existence of the entry for \fIkey\fP\&.
1455.UNINDENT
1456.sp
1457If the \fImap\fP has eBPF programs (parser and verdict), those will
1458be inherited by the socket being added. If the socket is
1459already attached to eBPF programs, this results in an error.
1460.TP
1461.B Return
14620 on success, or a negative error in case of failure.
1463.UNINDENT
1464.TP
1465.B \fBint bpf_xdp_adjust_meta(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP
1466.INDENT 7.0
1467.TP
1468.B Description
1469Adjust the address pointed by \fIxdp_md\fP\fB\->data_meta\fP by
1470\fIdelta\fP (which can be positive or negative). Note that this
1471operation modifies the address stored in \fIxdp_md\fP\fB\->data\fP,
1472so the latter must be loaded only after the helper has been
1473called.
1474.sp
1475The use of \fIxdp_md\fP\fB\->data_meta\fP is optional and programs
1476are not required to use it. The rationale is that when the
1477packet is processed with XDP (e.g. as DoS filter), it is
1478possible to push further meta data along with it before passing
1479to the stack, and to give the guarantee that an ingress eBPF
1480program attached as a TC classifier on the same device can pick
1481this up for further post\-processing. Since TC works with socket
1482buffers, it remains possible to set from XDP the \fBmark\fP or
1483\fBpriority\fP pointers, or other pointers for the socket buffer.
1484Having this scratch space generic and programmable allows for
1485more flexibility as the user is free to store whatever meta
1486data they need.
1487.sp
1488A call to this helper is susceptible to change the underlaying
1489packet buffer. Therefore, at load time, all checks on pointers
1490previously done by the verifier are invalidated and must be
1491performed again, if the helper is used in combination with
1492direct packet access.
1493.TP
1494.B Return
14950 on success, or a negative error in case of failure.
1496.UNINDENT
1497.TP
1498.B \fBint bpf_perf_event_read_value(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP
1499.INDENT 7.0
1500.TP
1501.B Description
1502Read the value of a perf event counter, and store it into \fIbuf\fP
1503of size \fIbuf_size\fP\&. This helper relies on a \fImap\fP of type
1504\fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of the perf event
1505counter is selected when \fImap\fP is updated with perf event file
1506descriptors. The \fImap\fP is an array whose size is the number of
1507available CPUs, and each cell contains a value relative to one
1508CPU. The value to retrieve is indicated by \fIflags\fP, that
1509contains the index of the CPU to look up, masked with
1510\fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to
1511\fBBPF_F_CURRENT_CPU\fP to indicate that the value for the
1512current CPU should be retrieved.
1513.sp
1514This helper behaves in a way close to
1515\fBbpf_perf_event_read\fP() helper, save that instead of
1516just returning the value observed, it fills the \fIbuf\fP
1517structure. This allows for additional data to be retrieved: in
1518particular, the enabled and running times (in \fIbuf\fP\fB\->enabled\fP and \fIbuf\fP\fB\->running\fP, respectively) are
1519copied. In general, \fBbpf_perf_event_read_value\fP() is
1520recommended over \fBbpf_perf_event_read\fP(), which has some
1521ABI issues and provides fewer functionalities.
1522.sp
1523These values are interesting, because hardware PMU (Performance
1524Monitoring Unit) counters are limited resources. When there are
1525more PMU based perf events opened than available counters,
1526kernel will multiplex these events so each event gets certain
1527percentage (but not all) of the PMU time. In case that
1528multiplexing happens, the number of samples or counter value
1529will not reflect the case compared to when no multiplexing
1530occurs. This makes comparison between different runs difficult.
1531Typically, the counter value should be normalized before
1532comparing to other experiments. The usual normalization is done
1533as follows.
1534.INDENT 7.0
1535.INDENT 3.5
1536.sp
1537.nf
1538.ft C
1539normalized_counter = counter * t_enabled / t_running
1540.ft P
1541.fi
1542.UNINDENT
1543.UNINDENT
1544.sp
1545Where t_enabled is the time enabled for event and t_running is
1546the time running for event since last normalization. The
1547enabled and running times are accumulated since the perf event
1548open. To achieve scaling factor between two invocations of an
1549eBPF program, users can can use CPU id as the key (which is
1550typical for perf array usage model) to remember the previous
1551value and do the calculation inside the eBPF program.
1552.TP
1553.B Return
15540 on success, or a negative error in case of failure.
1555.UNINDENT
1556.TP
1557.B \fBint bpf_perf_prog_read_value(struct bpf_perf_event_data *\fP\fIctx\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP
1558.INDENT 7.0
1559.TP
1560.B Description
1561For en eBPF program attached to a perf event, retrieve the
1562value of the event counter associated to \fIctx\fP and store it in
1563the structure pointed by \fIbuf\fP and of size \fIbuf_size\fP\&. Enabled
1564and running times are also stored in the structure (see
1565description of helper \fBbpf_perf_event_read_value\fP() for
1566more details).
1567.TP
1568.B Return
15690 on success, or a negative error in case of failure.
1570.UNINDENT
1571.TP
1572.B \fBint bpf_getsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP
1573.INDENT 7.0
1574.TP
1575.B Description
1576Emulate a call to \fBgetsockopt()\fP on the socket associated to
1577\fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at
1578which the option resides and the name \fIoptname\fP of the option
1579must be specified, see \fBgetsockopt(2)\fP for more information.
1580The retrieved value is stored in the structure pointed by
1581\fIopval\fP and of length \fIoptlen\fP\&.
1582.sp
1583This helper actually implements a subset of \fBgetsockopt()\fP\&.
1584It supports the following \fIlevel\fPs:
1585.INDENT 7.0
1586.IP \(bu 2
1587\fBIPPROTO_TCP\fP, which supports \fIoptname\fP
1588\fBTCP_CONGESTION\fP\&.
1589.IP \(bu 2
1590\fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&.
1591.IP \(bu 2
1592\fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&.
1593.UNINDENT
1594.TP
1595.B Return
15960 on success, or a negative error in case of failure.
1597.UNINDENT
1598.TP
1599.B \fBint bpf_override_return(struct pt_reg *\fP\fIregs\fP\fB, u64\fP \fIrc\fP\fB)\fP
1600.INDENT 7.0
1601.TP
1602.B Description
1603Used for error injection, this helper uses kprobes to override
1604the return value of the probed function, and to set it to \fIrc\fP\&.
1605The first argument is the context \fIregs\fP on which the kprobe
1606works.
1607.sp
1608This helper works by setting setting the PC (program counter)
1609to an override function which is run in place of the original
1610probed function. This means the probed function is not run at
1611all. The replacement function just returns with the required
1612value.
1613.sp
1614This helper has security implications, and thus is subject to
1615restrictions. It is only available if the kernel was compiled
1616with the \fBCONFIG_BPF_KPROBE_OVERRIDE\fP configuration
1617option, and in this case it only works on functions tagged with
1618\fBALLOW_ERROR_INJECTION\fP in the kernel code.
1619.sp
1620Also, the helper is only available for the architectures having
1621the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing,
1622x86 architecture is the only one to support this feature.
1623.TP
1624.B Return
53666f6c
MK
1625.UNINDENT
1626.TP
1627.B \fBint bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *\fP\fIbpf_sock\fP\fB, int\fP \fIargval\fP\fB)\fP
1628.INDENT 7.0
1629.TP
1630.B Description
1631Attempt to set the value of the \fBbpf_sock_ops_cb_flags\fP field
1632for the full TCP socket associated to \fIbpf_sock_ops\fP to
1633\fIargval\fP\&.
1634.sp
1635The primary use of this field is to determine if there should
1636be calls to eBPF programs of type
1637\fBBPF_PROG_TYPE_SOCK_OPS\fP at various points in the TCP
1638code. A program of the same type can change its value, per
1639connection and as necessary, when the connection is
1640established. This field is directly accessible for reading, but
1641this helper must be used for updates in order to return an
1642error if an eBPF program tries to set a callback that is not
1643supported in the current kernel.
1644.sp
1645The supported callback values that \fIargval\fP can combine are:
1646.INDENT 7.0
1647.IP \(bu 2
1648\fBBPF_SOCK_OPS_RTO_CB_FLAG\fP (retransmission time out)
1649.IP \(bu 2
1650\fBBPF_SOCK_OPS_RETRANS_CB_FLAG\fP (retransmission)
1651.IP \(bu 2
1652\fBBPF_SOCK_OPS_STATE_CB_FLAG\fP (TCP state change)
1653.UNINDENT
1654.sp
1655Here are some examples of where one could call such eBPF
1656program:
1657.INDENT 7.0
1658.IP \(bu 2
1659When RTO fires.
1660.IP \(bu 2
1661When a packet is retransmitted.
1662.IP \(bu 2
1663When the connection terminates.
1664.IP \(bu 2
1665When a packet is sent.
1666.IP \(bu 2
1667When a packet is received.
1668.UNINDENT
1669.TP
1670.B Return
1671Code \fB\-EINVAL\fP if the socket is not a full TCP socket;
1672otherwise, a positive number containing the bits that could not
1673be set is returned (which comes down to 0 if all bits were set
1674as required).
1675.UNINDENT
1676.TP
1677.B \fBint bpf_msg_redirect_map(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1678.INDENT 7.0
1679.TP
1680.B Description
1681This helper is used in programs implementing policies at the
1682socket level. If the message \fImsg\fP is allowed to pass (i.e. if
1683the verdict eBPF program returns \fBSK_PASS\fP), redirect it to
1684the socket referenced by \fImap\fP (of type
1685\fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and
1686egress interfaces can be used for redirection. The
1687\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
1688distinction (ingress path is selected if the flag is present,
1689egress path otherwise). This is the only flag supported for now.
1690.TP
1691.B Return
1692\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
1693.UNINDENT
1694.TP
1695.B \fBint bpf_msg_apply_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP
1696.INDENT 7.0
1697.TP
1698.B Description
1699For socket policies, apply the verdict of the eBPF program to
1700the next \fIbytes\fP (number of bytes) of message \fImsg\fP\&.
1701.sp
1702For example, this helper can be used in the following cases:
1703.INDENT 7.0
1704.IP \(bu 2
1705A single \fBsendmsg\fP() or \fBsendfile\fP() system call
1706contains multiple logical messages that the eBPF program is
1707supposed to read and for which it should apply a verdict.
1708.IP \(bu 2
1709An eBPF program only cares to read the first \fIbytes\fP of a
1710\fImsg\fP\&. If the message has a large payload, then setting up
1711and calling the eBPF program repeatedly for all bytes, even
1712though the verdict is already known, would create unnecessary
1713overhead.
1714.UNINDENT
1715.sp
1716When called from within an eBPF program, the helper sets a
1717counter internal to the BPF infrastructure, that is used to
1718apply the last verdict to the next \fIbytes\fP\&. If \fIbytes\fP is
1719smaller than the current data being processed from a
1720\fBsendmsg\fP() or \fBsendfile\fP() system call, the first
1721\fIbytes\fP will be sent and the eBPF program will be re\-run with
1722the pointer for start of data pointing to byte number \fIbytes\fP
1723\fB+ 1\fP\&. If \fIbytes\fP is larger than the current data being
1724processed, then the eBPF verdict will be applied to multiple
1725\fBsendmsg\fP() or \fBsendfile\fP() calls until \fIbytes\fP are
1726consumed.
1727.sp
1728Note that if a socket closes with the internal counter holding
1729a non\-zero value, this is not a problem because data is not
1730being buffered for \fIbytes\fP and is sent as it is received.
1731.TP
1732.B Return
53666f6c
MK
1733.UNINDENT
1734.TP
1735.B \fBint bpf_msg_cork_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP
1736.INDENT 7.0
1737.TP
1738.B Description
1739For socket policies, prevent the execution of the verdict eBPF
1740program for message \fImsg\fP until \fIbytes\fP (byte number) have been
1741accumulated.
1742.sp
1743This can be used when one needs a specific number of bytes
1744before a verdict can be assigned, even if the data spans
1745multiple \fBsendmsg\fP() or \fBsendfile\fP() calls. The extreme
1746case would be a user calling \fBsendmsg\fP() repeatedly with
17471\-byte long message segments. Obviously, this is bad for
1748performance, but it is still valid. If the eBPF program needs
1749\fIbytes\fP bytes to validate a header, this helper can be used to
1750prevent the eBPF program to be called again until \fIbytes\fP have
1751been accumulated.
1752.TP
1753.B Return
53666f6c
MK
1754.UNINDENT
1755.TP
1756.B \fBint bpf_msg_pull_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIend\fP\fB, u64\fP \fIflags\fP\fB)\fP
1757.INDENT 7.0
1758.TP
1759.B Description
1760For socket policies, pull in non\-linear data from user space
1761for \fImsg\fP and set pointers \fImsg\fP\fB\->data\fP and \fImsg\fP\fB\->data_end\fP to \fIstart\fP and \fIend\fP bytes offsets into \fImsg\fP,
1762respectively.
1763.sp
1764If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a
1765\fImsg\fP it can only parse data that the (\fBdata\fP, \fBdata_end\fP)
1766pointers have already consumed. For \fBsendmsg\fP() hooks this
1767is likely the first scatterlist element. But for calls relying
1768on the \fBsendpage\fP handler (e.g. \fBsendfile\fP()) this will
1769be the range (\fB0\fP, \fB0\fP) because the data is shared with
1770user space and by default the objective is to avoid allowing
1771user space to modify data while (or after) eBPF verdict is
1772being decided. This helper can be used to pull in data and to
1773set the start and end pointer to given values. Data will be
1774copied if necessary (i.e. if data was not linear and if start
1775and end pointers do not point to the same chunk).
1776.sp
1777A call to this helper is susceptible to change the underlaying
1778packet buffer. Therefore, at load time, all checks on pointers
1779previously done by the verifier are invalidated and must be
1780performed again, if the helper is used in combination with
1781direct packet access.
1782.sp
1783All values for \fIflags\fP are reserved for future usage, and must
1784be left at zero.
1785.TP
1786.B Return
17870 on success, or a negative error in case of failure.
1788.UNINDENT
1789.TP
1790.B \fBint bpf_bind(struct bpf_sock_addr *\fP\fIctx\fP\fB, struct sockaddr *\fP\fIaddr\fP\fB, int\fP \fIaddr_len\fP\fB)\fP
1791.INDENT 7.0
1792.TP
1793.B Description
1794Bind the socket associated to \fIctx\fP to the address pointed by
1795\fIaddr\fP, of length \fIaddr_len\fP\&. This allows for making outgoing
1796connection from the desired IP address, which can be useful for
1797example when all processes inside a cgroup should use one
1798single IP address on a host that has multiple IP configured.
1799.sp
1800This helper works for IPv4 and IPv6, TCP and UDP sockets. The
1801domain (\fIaddr\fP\fB\->sa_family\fP) must be \fBAF_INET\fP (or
1802\fBAF_INET6\fP). Looking for a free port to bind to can be
1803expensive, therefore binding to port is not permitted by the
1804helper: \fIaddr\fP\fB\->sin_port\fP (or \fBsin6_port\fP, respectively)
1805must be set to zero.
1806.TP
1807.B Return
18080 on success, or a negative error in case of failure.
1809.UNINDENT
1810.TP
1811.B \fBint bpf_xdp_adjust_tail(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP
1812.INDENT 7.0
1813.TP
1814.B Description
1815Adjust (move) \fIxdp_md\fP\fB\->data_end\fP by \fIdelta\fP bytes. It is
1816only possible to shrink the packet as of this writing,
1817therefore \fIdelta\fP must be a negative integer.
1818.sp
1819A call to this helper is susceptible to change the underlaying
1820packet buffer. Therefore, at load time, all checks on pointers
1821previously done by the verifier are invalidated and must be
1822performed again, if the helper is used in combination with
1823direct packet access.
1824.TP
1825.B Return
18260 on success, or a negative error in case of failure.
1827.UNINDENT
1828.TP
1829.B \fBint bpf_skb_get_xfrm_state(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIindex\fP\fB, struct bpf_xfrm_state *\fP\fIxfrm_state\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
1830.INDENT 7.0
1831.TP
1832.B Description
1833Retrieve the XFRM state (IP transform framework, see also
1834\fBip\-xfrm(8)\fP) at \fIindex\fP in XFRM "security path" for \fIskb\fP\&.
1835.sp
1836The retrieved value is stored in the \fBstruct bpf_xfrm_state\fP
1837pointed by \fIxfrm_state\fP and of length \fIsize\fP\&.
1838.sp
1839All values for \fIflags\fP are reserved for future usage, and must
1840be left at zero.
1841.sp
1842This helper is available only if the kernel was compiled with
1843\fBCONFIG_XFRM\fP configuration option.
1844.TP
1845.B Return
18460 on success, or a negative error in case of failure.
1847.UNINDENT
1848.TP
1849.B \fBint bpf_get_stack(struct pt_regs *\fP\fIregs\fP\fB, void *\fP\fIbuf\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
1850.INDENT 7.0
1851.TP
1852.B Description
1853Return a user or a kernel stack in bpf program provided buffer.
1854To achieve this, the helper needs \fIctx\fP, which is a pointer
1855to the context on which the tracing program is executed.
1856To store the stacktrace, the bpf program provides \fIbuf\fP with
1857a nonnegative \fIsize\fP\&.
1858.sp
1859The last argument, \fIflags\fP, holds the number of stack frames to
1860skip (from 0 to 255), masked with
1861\fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set
1862the following flags:
1863.INDENT 7.0
1864.TP
1865.B \fBBPF_F_USER_STACK\fP
1866Collect a user space stack instead of a kernel stack.
1867.TP
1868.B \fBBPF_F_USER_BUILD_ID\fP
1869Collect buildid+offset instead of ips for user stack,
1870only valid if \fBBPF_F_USER_STACK\fP is also specified.
1871.UNINDENT
1872.sp
1873\fBbpf_get_stack\fP() can collect up to
1874\fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames, subject
1875to sufficient large buffer size. Note that
1876this limit can be controlled with the \fBsysctl\fP program, and
1877that it should be manually increased in order to profile long
1878user stacks (such as stacks for Java programs). To do so, use:
1879.INDENT 7.0
1880.INDENT 3.5
1881.sp
1882.nf
1883.ft C
1884# sysctl kernel.perf_event_max_stack=<new value>
1885.ft P
1886.fi
1887.UNINDENT
1888.UNINDENT
1889.TP
1890.B Return
1891A non\-negative value equal to or less than \fIsize\fP on success,
1892or a negative error in case of failure.
1893.UNINDENT
1894.TP
1895.B \fBint bpf_skb_load_bytes_relative(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB, u32\fP \fIstart_header\fP\fB)\fP
1896.INDENT 7.0
1897.TP
1898.B Description
1899This helper is similar to \fBbpf_skb_load_bytes\fP() in that
1900it provides an easy way to load \fIlen\fP bytes from \fIoffset\fP
1901from the packet associated to \fIskb\fP, into the buffer pointed
1902by \fIto\fP\&. The difference to \fBbpf_skb_load_bytes\fP() is that
1903a fifth argument \fIstart_header\fP exists in order to select a
1904base offset to start from. \fIstart_header\fP can be one of:
1905.INDENT 7.0
1906.TP
1907.B \fBBPF_HDR_START_MAC\fP
1908Base offset to load data from is \fIskb\fP\(aqs mac header.
1909.TP
1910.B \fBBPF_HDR_START_NET\fP
1911Base offset to load data from is \fIskb\fP\(aqs network header.
1912.UNINDENT
1913.sp
1914In general, "direct packet access" is the preferred method to
1915access packet data, however, this helper is in particular useful
1916in socket filters where \fIskb\fP\fB\->data\fP does not always point
1917to the start of the mac header and where "direct packet access"
1918is not available.
1919.TP
1920.B Return
19210 on success, or a negative error in case of failure.
1922.UNINDENT
1923.TP
1924.B \fBint bpf_fib_lookup(void *\fP\fIctx\fP\fB, struct bpf_fib_lookup *\fP\fIparams\fP\fB, int\fP \fIplen\fP\fB, u32\fP \fIflags\fP\fB)\fP
1925.INDENT 7.0
1926.TP
1927.B Description
1928Do FIB lookup in kernel tables using parameters in \fIparams\fP\&.
1929If lookup is successful and result shows packet is to be
1930forwarded, the neighbor tables are searched for the nexthop.
1931If successful (ie., FIB lookup shows forwarding and nexthop
1932is resolved), the nexthop address is returned in ipv4_dst
1933or ipv6_dst based on family, smac is set to mac address of
1934egress device, dmac is set to nexthop mac address, rt_metric
1935is set to metric from route (IPv4/IPv6 only), and ifindex
1936is set to the device index of the nexthop from the FIB lookup.
1937.sp
1938\fIplen\fP argument is the size of the passed in struct.
1939\fIflags\fP argument can be a combination of one or more of the
1940following values:
1941.INDENT 7.0
1942.TP
1943.B \fBBPF_FIB_LOOKUP_DIRECT\fP
1944Do a direct table lookup vs full lookup using FIB
1945rules.
1946.TP
1947.B \fBBPF_FIB_LOOKUP_OUTPUT\fP
1948Perform lookup from an egress perspective (default is
1949ingress).
1950.UNINDENT
1951.sp
1952\fIctx\fP is either \fBstruct xdp_md\fP for XDP programs or
1953\fBstruct sk_buff\fP tc cls_act programs.
1954.TP
1955.B Return
1956.INDENT 7.0
1957.IP \(bu 2
1958< 0 if any input argument is invalid
1959.IP \(bu 2
19600 on success (packet is forwarded, nexthop neighbor exists)
1961.IP \(bu 2
1962> 0 one of \fBBPF_FIB_LKUP_RET_\fP codes explaining why the
1963packet is not forwarded or needs assist from full stack
1964.UNINDENT
1965.UNINDENT
1966.TP
1967.B \fBint bpf_sock_hash_update(struct bpf_sock_ops_kern *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1968.INDENT 7.0
1969.TP
1970.B Description
1971Add an entry to, or update a sockhash \fImap\fP referencing sockets.
1972The \fIskops\fP is used as a new value for the entry associated to
1973\fIkey\fP\&. \fIflags\fP is one of:
1974.INDENT 7.0
1975.TP
1976.B \fBBPF_NOEXIST\fP
1977The entry for \fIkey\fP must not exist in the map.
1978.TP
1979.B \fBBPF_EXIST\fP
1980The entry for \fIkey\fP must already exist in the map.
1981.TP
1982.B \fBBPF_ANY\fP
1983No condition on the existence of the entry for \fIkey\fP\&.
1984.UNINDENT
1985.sp
1986If the \fImap\fP has eBPF programs (parser and verdict), those will
1987be inherited by the socket being added. If the socket is
1988already attached to eBPF programs, this results in an error.
1989.TP
1990.B Return
19910 on success, or a negative error in case of failure.
1992.UNINDENT
1993.TP
1994.B \fBint bpf_msg_redirect_hash(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1995.INDENT 7.0
1996.TP
1997.B Description
1998This helper is used in programs implementing policies at the
1999socket level. If the message \fImsg\fP is allowed to pass (i.e. if
2000the verdict eBPF program returns \fBSK_PASS\fP), redirect it to
2001the socket referenced by \fImap\fP (of type
2002\fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and
2003egress interfaces can be used for redirection. The
2004\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
2005distinction (ingress path is selected if the flag is present,
2006egress path otherwise). This is the only flag supported for now.
2007.TP
2008.B Return
2009\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
2010.UNINDENT
2011.TP
2012.B \fBint bpf_sk_redirect_hash(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
2013.INDENT 7.0
2014.TP
2015.B Description
2016This helper is used in programs implementing policies at the
2017skb socket level. If the sk_buff \fIskb\fP is allowed to pass (i.e.
2018if the verdeict eBPF program returns \fBSK_PASS\fP), redirect it
2019to the socket referenced by \fImap\fP (of type
2020\fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and
2021egress interfaces can be used for redirection. The
2022\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
2023distinction (ingress path is selected if the flag is present,
2024egress otherwise). This is the only flag supported for now.
2025.TP
2026.B Return
2027\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
2028.UNINDENT
2029.TP
2030.B \fBint bpf_lwt_push_encap(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB, void *\fP\fIhdr\fP\fB, u32\fP \fIlen\fP\fB)\fP
2031.INDENT 7.0
2032.TP
2033.B Description
2034Encapsulate the packet associated to \fIskb\fP within a Layer 3
2035protocol header. This header is provided in the buffer at
2036address \fIhdr\fP, with \fIlen\fP its size in bytes. \fItype\fP indicates
2037the protocol of the header and can be one of:
2038.INDENT 7.0
2039.TP
2040.B \fBBPF_LWT_ENCAP_SEG6\fP
2041IPv6 encapsulation with Segment Routing Header
2042(\fBstruct ipv6_sr_hdr\fP). \fIhdr\fP only contains the SRH,
2043the IPv6 header is computed by the kernel.
2044.TP
2045.B \fBBPF_LWT_ENCAP_SEG6_INLINE\fP
2046Only works if \fIskb\fP contains an IPv6 packet. Insert a
2047Segment Routing Header (\fBstruct ipv6_sr_hdr\fP) inside
2048the IPv6 header.
2049.UNINDENT
2050.sp
2051A call to this helper is susceptible to change the underlaying
2052packet buffer. Therefore, at load time, all checks on pointers
2053previously done by the verifier are invalidated and must be
2054performed again, if the helper is used in combination with
2055direct packet access.
2056.TP
2057.B Return
20580 on success, or a negative error in case of failure.
2059.UNINDENT
2060.TP
2061.B \fBint bpf_lwt_seg6_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB)\fP
2062.INDENT 7.0
2063.TP
2064.B Description
2065Store \fIlen\fP bytes from address \fIfrom\fP into the packet
2066associated to \fIskb\fP, at \fIoffset\fP\&. Only the flags, tag and TLVs
2067inside the outermost IPv6 Segment Routing Header can be
2068modified through this helper.
2069.sp
2070A call to this helper is susceptible to change the underlaying
2071packet buffer. Therefore, at load time, all checks on pointers
2072previously done by the verifier are invalidated and must be
2073performed again, if the helper is used in combination with
2074direct packet access.
2075.TP
2076.B Return
20770 on success, or a negative error in case of failure.
2078.UNINDENT
2079.TP
2080.B \fBint bpf_lwt_seg6_adjust_srh(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, s32\fP \fIdelta\fP\fB)\fP
2081.INDENT 7.0
2082.TP
2083.B Description
2084Adjust the size allocated to TLVs in the outermost IPv6
2085Segment Routing Header contained in the packet associated to
2086\fIskb\fP, at position \fIoffset\fP by \fIdelta\fP bytes. Only offsets
2087after the segments are accepted. \fIdelta\fP can be as well
2088positive (growing) as negative (shrinking).
2089.sp
2090A call to this helper is susceptible to change the underlaying
2091packet buffer. Therefore, at load time, all checks on pointers
2092previously done by the verifier are invalidated and must be
2093performed again, if the helper is used in combination with
2094direct packet access.
2095.TP
2096.B Return
20970 on success, or a negative error in case of failure.
2098.UNINDENT
2099.TP
2100.B \fBint bpf_lwt_seg6_action(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIaction\fP\fB, void *\fP\fIparam\fP\fB, u32\fP \fIparam_len\fP\fB)\fP
2101.INDENT 7.0
2102.TP
2103.B Description
2104Apply an IPv6 Segment Routing action of type \fIaction\fP to the
2105packet associated to \fIskb\fP\&. Each action takes a parameter
2106contained at address \fIparam\fP, and of length \fIparam_len\fP bytes.
2107\fIaction\fP can be one of:
2108.INDENT 7.0
2109.TP
2110.B \fBSEG6_LOCAL_ACTION_END_X\fP
2111End.X action: Endpoint with Layer\-3 cross\-connect.
2112Type of \fIparam\fP: \fBstruct in6_addr\fP\&.
2113.TP
2114.B \fBSEG6_LOCAL_ACTION_END_T\fP
2115End.T action: Endpoint with specific IPv6 table lookup.
2116Type of \fIparam\fP: \fBint\fP\&.
2117.TP
2118.B \fBSEG6_LOCAL_ACTION_END_B6\fP
2119End.B6 action: Endpoint bound to an SRv6 policy.
2120Type of param: \fBstruct ipv6_sr_hdr\fP\&.
2121.TP
2122.B \fBSEG6_LOCAL_ACTION_END_B6_ENCAP\fP
2123End.B6.Encap action: Endpoint bound to an SRv6
2124encapsulation policy.
2125Type of param: \fBstruct ipv6_sr_hdr\fP\&.
2126.UNINDENT
2127.sp
2128A call to this helper is susceptible to change the underlaying
2129packet buffer. Therefore, at load time, all checks on pointers
2130previously done by the verifier are invalidated and must be
2131performed again, if the helper is used in combination with
2132direct packet access.
2133.TP
2134.B Return
21350 on success, or a negative error in case of failure.
2136.UNINDENT
2137.TP
2138.B \fBint bpf_rc_keydown(void *\fP\fIctx\fP\fB, u32\fP \fIprotocol\fP\fB, u64\fP \fIscancode\fP\fB, u32\fP \fItoggle\fP\fB)\fP
2139.INDENT 7.0
2140.TP
2141.B Description
2142This helper is used in programs implementing IR decoding, to
2143report a successfully decoded key press with \fIscancode\fP,
2144\fItoggle\fP value in the given \fIprotocol\fP\&. The scancode will be
2145translated to a keycode using the rc keymap, and reported as
2146an input key down event. After a period a key up event is
2147generated. This period can be extended by calling either
2223d7df
MK
2148\fBbpf_rc_keydown\fP() again with the same values, or calling
2149\fBbpf_rc_repeat\fP().
53666f6c
MK
2150.sp
2151Some protocols include a toggle bit, in case the button was
2152released and pressed again between consecutive scancodes.
2153.sp
2154The \fIctx\fP should point to the lirc sample as passed into
2155the program.
2156.sp
2157The \fIprotocol\fP is the decoded protocol number (see
2158\fBenum rc_proto\fP for some predefined values).
2159.sp
2160This helper is only available is the kernel was compiled with
2161the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to
2162"\fBy\fP".
2163.TP
2164.B Return
53666f6c
MK
2165.UNINDENT
2166.TP
2167.B \fBint bpf_rc_repeat(void *\fP\fIctx\fP\fB)\fP
2168.INDENT 7.0
2169.TP
2170.B Description
2171This helper is used in programs implementing IR decoding, to
2172report a successfully decoded repeat key message. This delays
2173the generation of a key up event for previously generated
2174key down event.
2175.sp
2176Some IR protocols like NEC have a special IR message for
2177repeating last button, for when a button is held down.
2178.sp
2179The \fIctx\fP should point to the lirc sample as passed into
2180the program.
2181.sp
2182This helper is only available is the kernel was compiled with
2183the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to
2184"\fBy\fP".
2185.TP
2186.B Return
53666f6c
MK
2187.UNINDENT
2188.TP
2189.B \fBuint64_t bpf_skb_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB)\fP
2190.INDENT 7.0
2191.TP
2192.B Description
2193Return the cgroup v2 id of the socket associated with the \fIskb\fP\&.
2194This is roughly similar to the \fBbpf_get_cgroup_classid\fP()
2195helper for cgroup v1 by providing a tag resp. identifier that
2196can be matched on or used for map lookups e.g. to implement
2197policy. The cgroup v2 id of a given path in the hierarchy is
2198exposed in user space through the f_handle API in order to get
2199to the same 64\-bit id.
2200.sp
2201This helper can be used on TC egress path, but not on ingress,
2202and is available only if the kernel was compiled with the
2203\fBCONFIG_SOCK_CGROUP_DATA\fP configuration option.
2204.TP
2205.B Return
2206The id is returned or 0 in case the id could not be retrieved.
2207.UNINDENT
2208.TP
2209.B \fBu64 bpf_skb_ancestor_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB, int\fP \fIancestor_level\fP\fB)\fP
2210.INDENT 7.0
2211.TP
2212.B Description
2213Return id of cgroup v2 that is ancestor of cgroup associated
2214with the \fIskb\fP at the \fIancestor_level\fP\&. The root cgroup is at
2215\fIancestor_level\fP zero and each step down the hierarchy
2216increments the level. If \fIancestor_level\fP == level of cgroup
2217associated with \fIskb\fP, then return value will be same as that
2218of \fBbpf_skb_cgroup_id\fP().
2219.sp
2220The helper is useful to implement policies based on cgroups
2221that are upper in hierarchy than immediate cgroup associated
2222with \fIskb\fP\&.
2223.sp
2224The format of returned id and helper limitations are same as in
2225\fBbpf_skb_cgroup_id\fP().
2226.TP
2227.B Return
2228The id is returned or 0 in case the id could not be retrieved.
2229.UNINDENT
2230.TP
2231.B \fBu64 bpf_get_current_cgroup_id(void)\fP
2232.INDENT 7.0
2233.TP
2234.B Return
2235A 64\-bit integer containing the current cgroup id based
2236on the cgroup within which the current task is running.
2237.UNINDENT
2238.TP
2239.B \fBvoid* get_local_storage(void *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP
2240.INDENT 7.0
2241.TP
2242.B Description
2243Get the pointer to the local storage area.
2244The type and the size of the local storage is defined
2245by the \fImap\fP argument.
2246The \fIflags\fP meaning is specific for each map type,
2247and has to be 0 for cgroup local storage.
2248.sp
2223d7df
MK
2249Depending on the BPF program type, a local storage area
2250can be shared between multiple instances of the BPF program,
53666f6c
MK
2251running simultaneously.
2252.sp
60ae21db 2253A user should care about the synchronization by themself.
2223d7df 2254For example, by using the \fBBPF_STX_XADD\fP instruction to alter
53666f6c
MK
2255the shared data.
2256.TP
2257.B Return
2223d7df 2258A pointer to the local storage area.
53666f6c
MK
2259.UNINDENT
2260.TP
2261.B \fBint bpf_sk_select_reuseport(struct sk_reuseport_md *\fP\fIreuse\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
2262.INDENT 7.0
2263.TP
2264.B Description
2223d7df
MK
2265Select a \fBSO_REUSEPORT\fP socket from a
2266\fBBPF_MAP_TYPE_REUSEPORT_ARRAY\fP \fImap\fP\&.
2267It checks the selected socket is matching the incoming
2268request in the socket buffer.
53666f6c
MK
2269.TP
2270.B Return
22710 on success, or a negative error in case of failure.
2272.UNINDENT
2223d7df
MK
2273.TP
2274.B \fBstruct bpf_sock *bpf_sk_lookup_tcp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP
2275.INDENT 7.0
2276.TP
2277.B Description
2278Look for TCP socket matching \fItuple\fP, optionally in a child
2279network namespace \fInetns\fP\&. The return value must be checked,
2280and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP().
2281.sp
2282The \fIctx\fP should point to the context of the program, such as
2283the skb or socket (depending on the hook in use). This is used
2284to determine the base network namespace for the lookup.
2285.sp
2286\fItuple_size\fP must be one of:
2287.INDENT 7.0
2288.TP
2289.B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP)
2290Look for an IPv4 socket.
2291.TP
2292.B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP)
2293Look for an IPv6 socket.
2294.UNINDENT
2295.sp
2296If the \fInetns\fP is a negative signed 32\-bit integer, then the
2297socket lookup table in the netns associated with the \fIctx\fP will
2298will be used. For the TC hooks, this is the netns of the device
2299in the skb. For socket hooks, this is the netns of the socket.
2300If \fInetns\fP is any other signed 32\-bit value greater than or
2301equal to zero then it specifies the ID of the netns relative to
2302the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the
2303range of 32\-bit integers are reserved for future use.
2304.sp
2305All values for \fIflags\fP are reserved for future usage, and must
2306be left at zero.
2307.sp
2308This helper is available only if the kernel was compiled with
2309\fBCONFIG_NET\fP configuration option.
2310.TP
2311.B Return
2312Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure.
2313For sockets with reuseport option, the \fBstruct bpf_sock\fP
2314result is from \fBreuse\->socks\fP[] using the hash of the tuple.
2315.UNINDENT
2316.TP
2317.B \fBstruct bpf_sock *bpf_sk_lookup_udp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP
2318.INDENT 7.0
2319.TP
2320.B Description
2321Look for UDP socket matching \fItuple\fP, optionally in a child
2322network namespace \fInetns\fP\&. The return value must be checked,
2323and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP().
2324.sp
2325The \fIctx\fP should point to the context of the program, such as
2326the skb or socket (depending on the hook in use). This is used
2327to determine the base network namespace for the lookup.
2328.sp
2329\fItuple_size\fP must be one of:
2330.INDENT 7.0
2331.TP
2332.B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP)
2333Look for an IPv4 socket.
2334.TP
2335.B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP)
2336Look for an IPv6 socket.
2337.UNINDENT
2338.sp
2339If the \fInetns\fP is a negative signed 32\-bit integer, then the
2340socket lookup table in the netns associated with the \fIctx\fP will
2341will be used. For the TC hooks, this is the netns of the device
2342in the skb. For socket hooks, this is the netns of the socket.
2343If \fInetns\fP is any other signed 32\-bit value greater than or
2344equal to zero then it specifies the ID of the netns relative to
2345the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the
2346range of 32\-bit integers are reserved for future use.
2347.sp
2348All values for \fIflags\fP are reserved for future usage, and must
2349be left at zero.
2350.sp
2351This helper is available only if the kernel was compiled with
2352\fBCONFIG_NET\fP configuration option.
2353.TP
2354.B Return
2355Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure.
2356For sockets with reuseport option, the \fBstruct bpf_sock\fP
2357result is from \fBreuse\->socks\fP[] using the hash of the tuple.
2358.UNINDENT
2359.TP
2360.B \fBint bpf_sk_release(struct bpf_sock *\fP\fIsock\fP\fB)\fP
2361.INDENT 7.0
2362.TP
2363.B Description
2364Release the reference held by \fIsock\fP\&. \fIsock\fP must be a
2365non\-\fBNULL\fP pointer that was returned from
2366\fBbpf_sk_lookup_xxx\fP().
2367.TP
2368.B Return
23690 on success, or a negative error in case of failure.
2370.UNINDENT
2371.TP
2372.B \fBint bpf_map_pop_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP
2373.INDENT 7.0
2374.TP
2375.B Description
2376Pop an element from \fImap\fP\&.
2377.TP
2378.B Return
23790 on success, or a negative error in case of failure.
2380.UNINDENT
2381.TP
2382.B \fBint bpf_map_peek_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP
2383.INDENT 7.0
2384.TP
2385.B Description
2386Get an element from \fImap\fP without removing it.
2387.TP
2388.B Return
23890 on success, or a negative error in case of failure.
2390.UNINDENT
2391.TP
2392.B \fBint bpf_msg_push_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
2393.INDENT 7.0
2394.TP
2395.B Description
2396For socket policies, insert \fIlen\fP bytes into \fImsg\fP at offset
2397\fIstart\fP\&.
2398.sp
2399If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a
2400\fImsg\fP it may want to insert metadata or options into the \fImsg\fP\&.
2401This can later be read and used by any of the lower layer BPF
2402hooks.
2403.sp
2404This helper may fail if under memory pressure (a malloc
2405fails) in these cases BPF programs will get an appropriate
2406error and BPF programs will need to handle them.
2407.TP
2408.B Return
24090 on success, or a negative error in case of failure.
2410.UNINDENT
2411.TP
2412.B \fBint bpf_msg_pop_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIpop\fP\fB, u64\fP \fIflags\fP\fB)\fP
2413.INDENT 7.0
2414.TP
2415.B Description
2416Will remove \fIpop\fP bytes from a \fImsg\fP starting at byte \fIstart\fP\&.
2417This may result in \fBENOMEM\fP errors under certain situations if
2418an allocation and copy are required due to a full ring buffer.
2419However, the helper will try to avoid doing the allocation
2420if possible. Other errors can occur if input parameters are
2421invalid either due to \fIstart\fP byte not being valid part of \fImsg\fP
2422payload and/or \fIpop\fP value being to large.
2423.TP
2424.B Return
24250 on success, or a negative error in case of failure.
2426.UNINDENT
2427.TP
2428.B \fBint bpf_rc_pointer_rel(void *\fP\fIctx\fP\fB, s32\fP \fIrel_x\fP\fB, s32\fP \fIrel_y\fP\fB)\fP
2429.INDENT 7.0
2430.TP
2431.B Description
2432This helper is used in programs implementing IR decoding, to
2433report a successfully decoded pointer movement.
2434.sp
2435The \fIctx\fP should point to the lirc sample as passed into
2436the program.
2437.sp
2438This helper is only available is the kernel was compiled with
2439the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to
2440"\fBy\fP".
2441.TP
2442.B Return
2443.UNINDENT
53666f6c
MK
2444.UNINDENT
2445.SH EXAMPLES
2446.sp
2447Example usage for most of the eBPF helpers listed in this manual page are
2448available within the Linux kernel sources, at the following locations:
2449.INDENT 0.0
2450.IP \(bu 2
2451\fIsamples/bpf/\fP
2452.IP \(bu 2
2453\fItools/testing/selftests/bpf/\fP
2454.UNINDENT
2455.SH LICENSE
2456.sp
2457eBPF programs can have an associated license, passed along with the bytecode
2458instructions to the kernel when the programs are loaded. The format for that
2459string is identical to the one in use for kernel modules (Dual licenses, such
2460as "Dual BSD/GPL", may be used). Some helper functions are only accessible to
2461programs that are compatible with the GNU Privacy License (GPL).
2462.sp
2463In order to use such helpers, the eBPF program must be loaded with the correct
2464license string passed (via \fBattr\fP) to the \fBbpf\fP() system call, and this
2465generally translates into the C source code of the program containing a line
2466similar to the following:
2467.INDENT 0.0
2468.INDENT 3.5
2469.sp
2470.nf
2471.ft C
2472char ____license[] __attribute__((section("license"), used)) = "GPL";
2473.ft P
2474.fi
2475.UNINDENT
2476.UNINDENT
2477.SH IMPLEMENTATION
2478.sp
2479This manual page is an effort to document the existing eBPF helper functions.
2480But as of this writing, the BPF sub\-system is under heavy development. New eBPF
2481program or map types are added, along with new helper functions. Some helpers
2482are occasionally made available for additional program types. So in spite of
2483the efforts of the community, this page might not be up\-to\-date. If you want to
2484check by yourself what helper functions exist in your kernel, or what types of
2485programs they can support, here are some files among the kernel tree that you
2486may be interested in:
2487.INDENT 0.0
2488.IP \(bu 2
2489\fIinclude/uapi/linux/bpf.h\fP is the main BPF header. It contains the full list
2490of all helper functions, as well as many other BPF definitions including most
2491of the flags, structs or constants used by the helpers.
2492.IP \(bu 2
2493\fInet/core/filter.c\fP contains the definition of most network\-related helper
2494functions, and the list of program types from which they can be used.
2495.IP \(bu 2
2496\fIkernel/trace/bpf_trace.c\fP is the equivalent for most tracing program\-related
2497helpers.
2498.IP \(bu 2
2499\fIkernel/bpf/verifier.c\fP contains the functions used to check that valid types
2500of eBPF maps are used with a given helper function.
2501.IP \(bu 2
2502\fIkernel/bpf/\fP directory contains other files in which additional helpers are
2503defined (for cgroups, sockmaps, etc.).
2504.UNINDENT
2505.sp
2506Compatibility between helper functions and program types can generally be found
2507in the files where helper functions are defined. Look for the \fBstruct
2508bpf_func_proto\fP objects and for functions returning them: these functions
2509contain a list of helpers that a given program type can call. Note that the
2510\fBdefault:\fP label of the \fBswitch ... case\fP used to filter helpers can call
2511other functions, themselves allowing access to additional helpers. The
2512requirement for GPL license is also in those \fBstruct bpf_func_proto\fP\&.
2513.sp
2514Compatibility between helper functions and map types can be found in the
2515\fBcheck_map_func_compatibility\fP() function in file \fIkernel/bpf/verifier.c\fP\&.
2516.sp
2517Helper functions that invalidate the checks on \fBdata\fP and \fBdata_end\fP
2518pointers for network processing are listed in function
2519\fBbpf_helper_changes_pkt_data\fP() in file \fInet/core/filter.c\fP\&.
2520.SH SEE ALSO
2521.sp
2522\fBbpf\fP(2),
2523\fBcgroups\fP(7),
2524\fBip\fP(8),
2525\fBperf_event_open\fP(2),
2526\fBsendmsg\fP(2),
2527\fBsocket\fP(7),
2528\fBtc\-bpf\fP(8)
2529.\" Generated by docutils manpage writer.