]> git.ipfire.org Git - thirdparty/man-pages.git/blame - man7/bpf-helpers.7
move_pages.2: Minor tweaks to Yang Shi's patch
[thirdparty/man-pages.git] / man7 / bpf-helpers.7
CommitLineData
53666f6c 1.\" Man page generated from reStructuredText.
e6107b29 2.
e46733c4 3.TH BPF-HELPERS 7 2019-11-19 "Linux" "Linux Programmer's Manual"
53666f6c
MK
4.SH NAME
5BPF-HELPERS \- list of eBPF helper functions
e6107b29 6.
53666f6c 7.nr rst2man-indent-level 0
e6107b29 8.
53666f6c
MK
9.de1 rstReportMargin
10\\$1 \\n[an-margin]
11level \\n[rst2man-indent-level]
12level margin: \\n[rst2man-indent\\n[rst2man-indent-level]]
e6107b29 13-
53666f6c
MK
14\\n[rst2man-indent0]
15\\n[rst2man-indent1]
16\\n[rst2man-indent2]
17..
18.de1 INDENT
19.\" .rstReportMargin pre:
20. RS \\$1
21. nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin]
22. nr rst2man-indent-level +1
23.\" .rstReportMargin post:
24..
25.de UNINDENT
26. RE
27.\" indent \\n[an-margin]
28.\" old: \\n[rst2man-indent\\n[rst2man-indent-level]]
29.nr rst2man-indent-level -1
30.\" new: \\n[rst2man-indent\\n[rst2man-indent-level]]
31.in \\n[rst2man-indent\\n[rst2man-indent-level]]u
32..
e6107b29 33.\" Copyright (C) All BPF authors and contributors from 2014 to present.
e6107b29 34.\" See git log include/uapi/linux/bpf.h in kernel tree for details.
324f6154 35.\"
e6107b29 36.\" %%%LICENSE_START(VERBATIM)
e6107b29 37.\" Permission is granted to make and distribute verbatim copies of this
e6107b29 38.\" manual provided the copyright notice and this permission notice are
e6107b29 39.\" preserved on all copies.
324f6154 40.\"
e6107b29 41.\" Permission is granted to copy and distribute modified versions of this
e6107b29 42.\" manual under the conditions for verbatim copying, provided that the
e6107b29 43.\" entire resulting derived work is distributed under the terms of a
e6107b29 44.\" permission notice identical to this one.
324f6154 45.\"
e6107b29 46.\" Since the Linux kernel and libraries are constantly changing, this
e6107b29 47.\" manual page may be incorrect or out-of-date. The author(s) assume no
e6107b29 48.\" responsibility for errors or omissions, or for damages resulting from
e6107b29 49.\" the use of the information contained herein. The author(s) may not
e6107b29 50.\" have taken the same level of care in the production of this manual,
e6107b29 51.\" which is licensed free of charge, as they might when working
e6107b29 52.\" professionally.
324f6154 53.\"
e6107b29 54.\" Formatted or processed versions of this manual, if unaccompanied by
e6107b29 55.\" the source, must acknowledge the copyright and authors of this work.
e6107b29 56.\" %%%LICENSE_END
324f6154 57.\"
e6107b29 58.\" Please do not edit this file. It was generated from the documentation
e6107b29 59.\" located in file include/uapi/linux/bpf.h of the Linux kernel sources
e6107b29 60.\" (helpers description), and from scripts/bpf_helpers_doc.py in the same
e6107b29 61.\" repository (header and footer).
53666f6c
MK
62.SH DESCRIPTION
63.sp
64The extended Berkeley Packet Filter (eBPF) subsystem consists in programs
65written in a pseudo\-assembly language, then attached to one of the several
66kernel hooks and run in reaction of specific events. This framework differs
67from the older, "classic" BPF (or "cBPF") in several aspects, one of them being
68the ability to call special functions (or "helpers") from within a program.
69These functions are restricted to a white\-list of helpers defined in the
70kernel.
71.sp
72These helpers are used by eBPF programs to interact with the system, or with
73the context in which they work. For instance, they can be used to print
74debugging messages, to get the time since the system was booted, to interact
75with eBPF maps, or to manipulate network packets. Since there are several eBPF
76program types, and that they do not run in the same context, each program type
77can only call a subset of those helpers.
78.sp
79Due to eBPF conventions, a helper can not have more than five arguments.
80.sp
81Internally, eBPF programs call directly into the compiled helper functions
82without requiring any foreign\-function interface. As a result, calling helpers
83introduces no overhead, thus offering excellent performance.
84.sp
85This document is an attempt to list and document the helpers available to eBPF
86developers. They are sorted by chronological order (the oldest helpers in the
87kernel at the top).
88.SH HELPERS
89.INDENT 0.0
90.TP
91.B \fBvoid *bpf_map_lookup_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP
92.INDENT 7.0
93.TP
94.B Description
95Perform a lookup in \fImap\fP for an entry associated to \fIkey\fP\&.
96.TP
97.B Return
98Map value associated to \fIkey\fP, or \fBNULL\fP if no entry was
99found.
100.UNINDENT
101.TP
102.B \fBint bpf_map_update_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP
103.INDENT 7.0
104.TP
105.B Description
106Add or update the value of the entry associated to \fIkey\fP in
107\fImap\fP with \fIvalue\fP\&. \fIflags\fP is one of:
108.INDENT 7.0
109.TP
110.B \fBBPF_NOEXIST\fP
111The entry for \fIkey\fP must not exist in the map.
112.TP
113.B \fBBPF_EXIST\fP
114The entry for \fIkey\fP must already exist in the map.
115.TP
116.B \fBBPF_ANY\fP
117No condition on the existence of the entry for \fIkey\fP\&.
118.UNINDENT
119.sp
120Flag value \fBBPF_NOEXIST\fP cannot be used for maps of types
121\fBBPF_MAP_TYPE_ARRAY\fP or \fBBPF_MAP_TYPE_PERCPU_ARRAY\fP (all
122elements always exist), the helper would return an error.
123.TP
124.B Return
1250 on success, or a negative error in case of failure.
126.UNINDENT
127.TP
128.B \fBint bpf_map_delete_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP
129.INDENT 7.0
130.TP
131.B Description
132Delete entry with \fIkey\fP from \fImap\fP\&.
133.TP
134.B Return
1350 on success, or a negative error in case of failure.
136.UNINDENT
137.TP
138.B \fBint bpf_probe_read(void *\fP\fIdst\fP\fB, u32\fP \fIsize\fP\fB, const void *\fP\fIsrc\fP\fB)\fP
139.INDENT 7.0
140.TP
141.B Description
142For tracing programs, safely attempt to read \fIsize\fP bytes from
143address \fIsrc\fP and store the data in \fIdst\fP\&.
144.TP
145.B Return
1460 on success, or a negative error in case of failure.
147.UNINDENT
148.TP
149.B \fBu64 bpf_ktime_get_ns(void)\fP
150.INDENT 7.0
151.TP
152.B Description
153Return the time elapsed since system boot, in nanoseconds.
154.TP
155.B Return
156Current \fIktime\fP\&.
157.UNINDENT
158.TP
159.B \fBint bpf_trace_printk(const char *\fP\fIfmt\fP\fB, u32\fP \fIfmt_size\fP\fB, ...)\fP
160.INDENT 7.0
161.TP
162.B Description
163This helper is a "printk()\-like" facility for debugging. It
164prints a message defined by format \fIfmt\fP (of size \fIfmt_size\fP)
165to file \fI/sys/kernel/debug/tracing/trace\fP from DebugFS, if
166available. It can take up to three additional \fBu64\fP
167arguments (as an eBPF helpers, the total number of arguments is
168limited to five).
169.sp
170Each time the helper is called, it appends a line to the trace.
e6107b29
MK
171Lines are discarded while \fI/sys/kernel/debug/tracing/trace\fP is
172open, use \fI/sys/kernel/debug/tracing/trace_pipe\fP to avoid this.
53666f6c
MK
173The format of the trace is customizable, and the exact output
174one will get depends on the options set in
175\fI/sys/kernel/debug/tracing/trace_options\fP (see also the
176\fIREADME\fP file under the same directory). However, it usually
177defaults to something like:
178.INDENT 7.0
179.INDENT 3.5
180.sp
181.nf
182.ft C
183telnet\-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg>
184.ft P
185.fi
186.UNINDENT
187.UNINDENT
188.sp
189In the above:
190.INDENT 7.0
191.INDENT 3.5
192.INDENT 0.0
193.IP \(bu 2
194\fBtelnet\fP is the name of the current task.
195.IP \(bu 2
196\fB470\fP is the PID of the current task.
197.IP \(bu 2
198\fB001\fP is the CPU number on which the task is
199running.
200.IP \(bu 2
201In \fB\&.N..\fP, each character refers to a set of
202options (whether irqs are enabled, scheduling
203options, whether hard/softirqs are running, level of
204preempt_disabled respectively). \fBN\fP means that
205\fBTIF_NEED_RESCHED\fP and \fBPREEMPT_NEED_RESCHED\fP
206are set.
207.IP \(bu 2
208\fB419421.045894\fP is a timestamp.
209.IP \(bu 2
210\fB0x00000001\fP is a fake value used by BPF for the
211instruction pointer register.
212.IP \(bu 2
213\fB<formatted msg>\fP is the message formatted with
214\fIfmt\fP\&.
215.UNINDENT
216.UNINDENT
217.UNINDENT
218.sp
219The conversion specifiers supported by \fIfmt\fP are similar, but
220more limited than for printk(). They are \fB%d\fP, \fB%i\fP,
221\fB%u\fP, \fB%x\fP, \fB%ld\fP, \fB%li\fP, \fB%lu\fP, \fB%lx\fP, \fB%lld\fP,
222\fB%lli\fP, \fB%llu\fP, \fB%llx\fP, \fB%p\fP, \fB%s\fP\&. No modifier (size
223of field, padding with zeroes, etc.) is available, and the
224helper will return \fB\-EINVAL\fP (but print nothing) if it
225encounters an unknown specifier.
226.sp
227Also, note that \fBbpf_trace_printk\fP() is slow, and should
228only be used for debugging purposes. For this reason, a notice
229bloc (spanning several lines) is printed to kernel logs and
230states that the helper should not be used "for production use"
231the first time this helper is used (or more precisely, when
232\fBtrace_printk\fP() buffers are allocated). For passing values
233to user space, perf events should be preferred.
234.TP
235.B Return
236The number of bytes written to the buffer, or a negative error
237in case of failure.
238.UNINDENT
239.TP
240.B \fBu32 bpf_get_prandom_u32(void)\fP
241.INDENT 7.0
242.TP
243.B Description
244Get a pseudo\-random number.
245.sp
246From a security point of view, this helper uses its own
247pseudo\-random internal state, and cannot be used to infer the
248seed of other random functions in the kernel. However, it is
249essential to note that the generator used by the helper is not
250cryptographically secure.
251.TP
252.B Return
253A random 32\-bit unsigned value.
254.UNINDENT
255.TP
256.B \fBu32 bpf_get_smp_processor_id(void)\fP
257.INDENT 7.0
258.TP
259.B Description
260Get the SMP (symmetric multiprocessing) processor id. Note that
261all programs run with preemption disabled, which means that the
262SMP processor id is stable during all the execution of the
263program.
264.TP
265.B Return
266The SMP id of the processor running the program.
267.UNINDENT
268.TP
269.B \fBint bpf_skb_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
270.INDENT 7.0
271.TP
272.B Description
273Store \fIlen\fP bytes from address \fIfrom\fP into the packet
274associated to \fIskb\fP, at \fIoffset\fP\&. \fIflags\fP are a combination of
275\fBBPF_F_RECOMPUTE_CSUM\fP (automatically recompute the
276checksum for the packet after storing the bytes) and
277\fBBPF_F_INVALIDATE_HASH\fP (set \fIskb\fP\fB\->hash\fP, \fIskb\fP\fB\->swhash\fP and \fIskb\fP\fB\->l4hash\fP to 0).
278.sp
e6107b29 279A call to this helper is susceptible to change the underlying
53666f6c
MK
280packet buffer. Therefore, at load time, all checks on pointers
281previously done by the verifier are invalidated and must be
282performed again, if the helper is used in combination with
283direct packet access.
284.TP
285.B Return
2860 on success, or a negative error in case of failure.
287.UNINDENT
288.TP
289.B \fBint bpf_l3_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIsize\fP\fB)\fP
290.INDENT 7.0
291.TP
292.B Description
293Recompute the layer 3 (e.g. IP) checksum for the packet
294associated to \fIskb\fP\&. Computation is incremental, so the helper
295must know the former value of the header field that was
296modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the
297number of bytes (2 or 4) for this field, stored in \fIsize\fP\&.
298Alternatively, it is possible to store the difference between
299the previous and the new values of the header field in \fIto\fP, by
300setting \fIfrom\fP and \fIsize\fP to 0. For both methods, \fIoffset\fP
301indicates the location of the IP checksum within the packet.
302.sp
303This helper works in combination with \fBbpf_csum_diff\fP(),
304which does not update the checksum in\-place, but offers more
305flexibility and can handle sizes larger than 2 or 4 for the
306checksum to update.
307.sp
e6107b29 308A call to this helper is susceptible to change the underlying
53666f6c
MK
309packet buffer. Therefore, at load time, all checks on pointers
310previously done by the verifier are invalidated and must be
311performed again, if the helper is used in combination with
312direct packet access.
313.TP
314.B Return
3150 on success, or a negative error in case of failure.
316.UNINDENT
317.TP
318.B \fBint bpf_l4_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIflags\fP\fB)\fP
319.INDENT 7.0
320.TP
321.B Description
322Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the
323packet associated to \fIskb\fP\&. Computation is incremental, so the
324helper must know the former value of the header field that was
325modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the
326number of bytes (2 or 4) for this field, stored on the lowest
327four bits of \fIflags\fP\&. Alternatively, it is possible to store
328the difference between the previous and the new values of the
329header field in \fIto\fP, by setting \fIfrom\fP and the four lowest
330bits of \fIflags\fP to 0. For both methods, \fIoffset\fP indicates the
331location of the IP checksum within the packet. In addition to
332the size of the field, \fIflags\fP can be added (bitwise OR) actual
333flags. With \fBBPF_F_MARK_MANGLED_0\fP, a null checksum is left
334untouched (unless \fBBPF_F_MARK_ENFORCE\fP is added as well), and
335for updates resulting in a null checksum the value is set to
336\fBCSUM_MANGLED_0\fP instead. Flag \fBBPF_F_PSEUDO_HDR\fP indicates
337the checksum is to be computed against a pseudo\-header.
338.sp
339This helper works in combination with \fBbpf_csum_diff\fP(),
340which does not update the checksum in\-place, but offers more
341flexibility and can handle sizes larger than 2 or 4 for the
342checksum to update.
343.sp
e6107b29 344A call to this helper is susceptible to change the underlying
53666f6c
MK
345packet buffer. Therefore, at load time, all checks on pointers
346previously done by the verifier are invalidated and must be
347performed again, if the helper is used in combination with
348direct packet access.
349.TP
350.B Return
3510 on success, or a negative error in case of failure.
352.UNINDENT
353.TP
354.B \fBint bpf_tail_call(void *\fP\fIctx\fP\fB, struct bpf_map *\fP\fIprog_array_map\fP\fB, u32\fP \fIindex\fP\fB)\fP
355.INDENT 7.0
356.TP
357.B Description
358This special helper is used to trigger a "tail call", or in
359other words, to jump into another eBPF program. The same stack
360frame is used (but values on stack and in registers for the
361caller are not accessible to the callee). This mechanism allows
362for program chaining, either for raising the maximum number of
363available eBPF instructions, or to execute given programs in
364conditional blocks. For security reasons, there is an upper
365limit to the number of successive tail calls that can be
366performed.
367.sp
368Upon call of this helper, the program attempts to jump into a
369program referenced at index \fIindex\fP in \fIprog_array_map\fP, a
370special map of type \fBBPF_MAP_TYPE_PROG_ARRAY\fP, and passes
371\fIctx\fP, a pointer to the context.
372.sp
373If the call succeeds, the kernel immediately runs the first
374instruction of the new program. This is not a function call,
375and it never returns to the previous program. If the call
376fails, then the helper has no effect, and the caller continues
377to run its subsequent instructions. A call can fail if the
378destination program for the jump does not exist (i.e. \fIindex\fP
379is superior to the number of entries in \fIprog_array_map\fP), or
380if the maximum number of tail calls has been reached for this
381chain of programs. This limit is defined in the kernel by the
382macro \fBMAX_TAIL_CALL_CNT\fP (not accessible to user space),
383which is currently set to 32.
384.TP
385.B Return
3860 on success, or a negative error in case of failure.
387.UNINDENT
388.TP
389.B \fBint bpf_clone_redirect(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP
390.INDENT 7.0
391.TP
392.B Description
393Clone and redirect the packet associated to \fIskb\fP to another
394net device of index \fIifindex\fP\&. Both ingress and egress
395interfaces can be used for redirection. The \fBBPF_F_INGRESS\fP
396value in \fIflags\fP is used to make the distinction (ingress path
397is selected if the flag is present, egress path otherwise).
398This is the only flag supported for now.
399.sp
400In comparison with \fBbpf_redirect\fP() helper,
401\fBbpf_clone_redirect\fP() has the associated cost of
402duplicating the packet buffer, but this can be executed out of
403the eBPF program. Conversely, \fBbpf_redirect\fP() is more
404efficient, but it is handled through an action code where the
405redirection happens only after the eBPF program has returned.
406.sp
e6107b29 407A call to this helper is susceptible to change the underlying
53666f6c
MK
408packet buffer. Therefore, at load time, all checks on pointers
409previously done by the verifier are invalidated and must be
410performed again, if the helper is used in combination with
411direct packet access.
412.TP
413.B Return
4140 on success, or a negative error in case of failure.
415.UNINDENT
416.TP
417.B \fBu64 bpf_get_current_pid_tgid(void)\fP
418.INDENT 7.0
419.TP
420.B Return
421A 64\-bit integer containing the current tgid and pid, and
422created as such:
423\fIcurrent_task\fP\fB\->tgid << 32 |\fP
424\fIcurrent_task\fP\fB\->pid\fP\&.
425.UNINDENT
426.TP
427.B \fBu64 bpf_get_current_uid_gid(void)\fP
428.INDENT 7.0
429.TP
430.B Return
431A 64\-bit integer containing the current GID and UID, and
432created as such: \fIcurrent_gid\fP \fB<< 32 |\fP \fIcurrent_uid\fP\&.
433.UNINDENT
434.TP
435.B \fBint bpf_get_current_comm(char *\fP\fIbuf\fP\fB, u32\fP \fIsize_of_buf\fP\fB)\fP
436.INDENT 7.0
437.TP
438.B Description
439Copy the \fBcomm\fP attribute of the current task into \fIbuf\fP of
440\fIsize_of_buf\fP\&. The \fBcomm\fP attribute contains the name of
441the executable (excluding the path) for the current task. The
442\fIsize_of_buf\fP must be strictly positive. On success, the
443helper makes sure that the \fIbuf\fP is NUL\-terminated. On failure,
444it is filled with zeroes.
445.TP
446.B Return
4470 on success, or a negative error in case of failure.
448.UNINDENT
449.TP
450.B \fBu32 bpf_get_cgroup_classid(struct sk_buff *\fP\fIskb\fP\fB)\fP
451.INDENT 7.0
452.TP
453.B Description
454Retrieve the classid for the current task, i.e. for the net_cls
455cgroup to which \fIskb\fP belongs.
456.sp
457This helper can be used on TC egress path, but not on ingress.
458.sp
459The net_cls cgroup provides an interface to tag network packets
460based on a user\-provided identifier for all traffic coming from
461the tasks belonging to the related cgroup. See also the related
462kernel documentation, available from the Linux sources in file
e6107b29 463\fIDocumentation/admin\-guide/cgroup\-v1/net_cls.rst\fP\&.
53666f6c
MK
464.sp
465The Linux kernel has two versions for cgroups: there are
466cgroups v1 and cgroups v2. Both are available to users, who can
467use a mixture of them, but note that the net_cls cgroup is for
468cgroup v1 only. This makes it incompatible with BPF programs
469run on cgroups, which is a cgroup\-v2\-only feature (a socket can
470only hold data for one version of cgroups at a time).
471.sp
472This helper is only available is the kernel was compiled with
473the \fBCONFIG_CGROUP_NET_CLASSID\fP configuration option set to
474"\fBy\fP" or to "\fBm\fP".
475.TP
476.B Return
477The classid, or 0 for the default unconfigured classid.
478.UNINDENT
479.TP
480.B \fBint bpf_skb_vlan_push(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIvlan_proto\fP\fB, u16\fP \fIvlan_tci\fP\fB)\fP
481.INDENT 7.0
482.TP
483.B Description
484Push a \fIvlan_tci\fP (VLAN tag control information) of protocol
485\fIvlan_proto\fP to the packet associated to \fIskb\fP, then update
486the checksum. Note that if \fIvlan_proto\fP is different from
487\fBETH_P_8021Q\fP and \fBETH_P_8021AD\fP, it is considered to
488be \fBETH_P_8021Q\fP\&.
489.sp
e6107b29 490A call to this helper is susceptible to change the underlying
53666f6c
MK
491packet buffer. Therefore, at load time, all checks on pointers
492previously done by the verifier are invalidated and must be
493performed again, if the helper is used in combination with
494direct packet access.
495.TP
496.B Return
4970 on success, or a negative error in case of failure.
498.UNINDENT
499.TP
500.B \fBint bpf_skb_vlan_pop(struct sk_buff *\fP\fIskb\fP\fB)\fP
501.INDENT 7.0
502.TP
503.B Description
504Pop a VLAN header from the packet associated to \fIskb\fP\&.
505.sp
e6107b29 506A call to this helper is susceptible to change the underlying
53666f6c
MK
507packet buffer. Therefore, at load time, all checks on pointers
508previously done by the verifier are invalidated and must be
509performed again, if the helper is used in combination with
510direct packet access.
511.TP
512.B Return
5130 on success, or a negative error in case of failure.
514.UNINDENT
515.TP
516.B \fBint bpf_skb_get_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
517.INDENT 7.0
518.TP
519.B Description
520Get tunnel metadata. This helper takes a pointer \fIkey\fP to an
521empty \fBstruct bpf_tunnel_key\fP of \fBsize\fP, that will be
522filled with tunnel metadata for the packet associated to \fIskb\fP\&.
523The \fIflags\fP can be set to \fBBPF_F_TUNINFO_IPV6\fP, which
524indicates that the tunnel is based on IPv6 protocol instead of
525IPv4.
526.sp
527The \fBstruct bpf_tunnel_key\fP is an object that generalizes the
528principal parameters used by various tunneling protocols into a
529single struct. This way, it can be used to easily make a
530decision based on the contents of the encapsulation header,
531"summarized" in this struct. In particular, it holds the IP
532address of the remote end (IPv4 or IPv6, depending on the case)
533in \fIkey\fP\fB\->remote_ipv4\fP or \fIkey\fP\fB\->remote_ipv6\fP\&. Also,
534this struct exposes the \fIkey\fP\fB\->tunnel_id\fP, which is
535generally mapped to a VNI (Virtual Network Identifier), making
536it programmable together with the \fBbpf_skb_set_tunnel_key\fP() helper.
537.sp
538Let\(aqs imagine that the following code is part of a program
539attached to the TC ingress interface, on one end of a GRE
540tunnel, and is supposed to filter out all messages coming from
541remote ends with IPv4 address other than 10.0.0.1:
542.INDENT 7.0
543.INDENT 3.5
544.sp
545.nf
546.ft C
547int ret;
548struct bpf_tunnel_key key = {};
549
550ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0);
551if (ret < 0)
552 return TC_ACT_SHOT; // drop packet
553
554if (key.remote_ipv4 != 0x0a000001)
555 return TC_ACT_SHOT; // drop packet
556
557return TC_ACT_OK; // accept packet
558.ft P
559.fi
560.UNINDENT
561.UNINDENT
562.sp
563This interface can also be used with all encapsulation devices
564that can operate in "collect metadata" mode: instead of having
565one network device per specific configuration, the "collect
566metadata" mode only requires a single device where the
567configuration can be extracted from this helper.
568.sp
569This can be used together with various tunnels such as VXLan,
570Geneve, GRE or IP in IP (IPIP).
571.TP
572.B Return
5730 on success, or a negative error in case of failure.
574.UNINDENT
575.TP
576.B \fBint bpf_skb_set_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
577.INDENT 7.0
578.TP
579.B Description
580Populate tunnel metadata for packet associated to \fIskb.\fP The
581tunnel metadata is set to the contents of \fIkey\fP, of \fIsize\fP\&. The
582\fIflags\fP can be set to a combination of the following values:
583.INDENT 7.0
584.TP
585.B \fBBPF_F_TUNINFO_IPV6\fP
586Indicate that the tunnel is based on IPv6 protocol
587instead of IPv4.
588.TP
589.B \fBBPF_F_ZERO_CSUM_TX\fP
590For IPv4 packets, add a flag to tunnel metadata
591indicating that checksum computation should be skipped
592and checksum set to zeroes.
593.TP
594.B \fBBPF_F_DONT_FRAGMENT\fP
595Add a flag to tunnel metadata indicating that the
596packet should not be fragmented.
597.TP
598.B \fBBPF_F_SEQ_NUMBER\fP
599Add a flag to tunnel metadata indicating that a
600sequence number should be added to tunnel header before
601sending the packet. This flag was added for GRE
602encapsulation, but might be used with other protocols
603as well in the future.
604.UNINDENT
605.sp
606Here is a typical usage on the transmit path:
607.INDENT 7.0
608.INDENT 3.5
609.sp
610.nf
611.ft C
612struct bpf_tunnel_key key;
613 populate key ...
614bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0);
615bpf_clone_redirect(skb, vxlan_dev_ifindex, 0);
616.ft P
617.fi
618.UNINDENT
619.UNINDENT
620.sp
621See also the description of the \fBbpf_skb_get_tunnel_key\fP()
622helper for additional information.
623.TP
624.B Return
6250 on success, or a negative error in case of failure.
626.UNINDENT
627.TP
628.B \fBu64 bpf_perf_event_read(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP
629.INDENT 7.0
630.TP
631.B Description
632Read the value of a perf event counter. This helper relies on a
633\fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of
634the perf event counter is selected when \fImap\fP is updated with
635perf event file descriptors. The \fImap\fP is an array whose size
636is the number of available CPUs, and each cell contains a value
637relative to one CPU. The value to retrieve is indicated by
638\fIflags\fP, that contains the index of the CPU to look up, masked
639with \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to
640\fBBPF_F_CURRENT_CPU\fP to indicate that the value for the
641current CPU should be retrieved.
642.sp
643Note that before Linux 4.13, only hardware perf event can be
644retrieved.
645.sp
646Also, be aware that the newer helper
647\fBbpf_perf_event_read_value\fP() is recommended over
648\fBbpf_perf_event_read\fP() in general. The latter has some ABI
649quirks where error and counter value are used as a return code
650(which is wrong to do since ranges may overlap). This issue is
651fixed with \fBbpf_perf_event_read_value\fP(), which at the same
652time provides more features over the \fBbpf_perf_event_read\fP() interface. Please refer to the description of
653\fBbpf_perf_event_read_value\fP() for details.
654.TP
655.B Return
656The value of the perf event counter read from the map, or a
657negative error code in case of failure.
658.UNINDENT
659.TP
660.B \fBint bpf_redirect(u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP
661.INDENT 7.0
662.TP
663.B Description
664Redirect the packet to another net device of index \fIifindex\fP\&.
665This helper is somewhat similar to \fBbpf_clone_redirect\fP(), except that the packet is not cloned, which provides
666increased performance.
667.sp
668Except for XDP, both ingress and egress interfaces can be used
669for redirection. The \fBBPF_F_INGRESS\fP value in \fIflags\fP is used
670to make the distinction (ingress path is selected if the flag
671is present, egress path otherwise). Currently, XDP only
672supports redirection to the egress interface, and accepts no
673flag at all.
674.sp
675The same effect can be attained with the more generic
676\fBbpf_redirect_map\fP(), which requires specific maps to be
677used but offers better performance.
678.TP
679.B Return
680For XDP, the helper returns \fBXDP_REDIRECT\fP on success or
681\fBXDP_ABORTED\fP on error. For other program types, the values
682are \fBTC_ACT_REDIRECT\fP on success or \fBTC_ACT_SHOT\fP on
683error.
684.UNINDENT
685.TP
686.B \fBu32 bpf_get_route_realm(struct sk_buff *\fP\fIskb\fP\fB)\fP
687.INDENT 7.0
688.TP
689.B Description
690Retrieve the realm or the route, that is to say the
691\fBtclassid\fP field of the destination for the \fIskb\fP\&. The
692indentifier retrieved is a user\-provided tag, similar to the
693one used with the net_cls cgroup (see description for
694\fBbpf_get_cgroup_classid\fP() helper), but here this tag is
695held by a route (a destination entry), not by a task.
696.sp
697Retrieving this identifier works with the clsact TC egress hook
698(see also \fBtc\-bpf(8)\fP), or alternatively on conventional
699classful egress qdiscs, but not on TC ingress path. In case of
700clsact TC egress hook, this has the advantage that, internally,
701the destination entry has not been dropped yet in the transmit
702path. Therefore, the destination entry does not need to be
703artificially held via \fBnetif_keep_dst\fP() for a classful
704qdisc until the \fIskb\fP is freed.
705.sp
706This helper is available only if the kernel was compiled with
707\fBCONFIG_IP_ROUTE_CLASSID\fP configuration option.
708.TP
709.B Return
710The realm of the route for the packet associated to \fIskb\fP, or 0
711if none was found.
712.UNINDENT
713.TP
e6107b29 714.B \fBint bpf_perf_event_output(struct pt_regs *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, void *\fP\fIdata\fP\fB, u64\fP \fIsize\fP\fB)\fP
53666f6c
MK
715.INDENT 7.0
716.TP
717.B Description
718Write raw \fIdata\fP blob into a special BPF perf event held by
719\fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. This perf
720event must have the following attributes: \fBPERF_SAMPLE_RAW\fP
721as \fBsample_type\fP, \fBPERF_TYPE_SOFTWARE\fP as \fBtype\fP, and
722\fBPERF_COUNT_SW_BPF_OUTPUT\fP as \fBconfig\fP\&.
723.sp
724The \fIflags\fP are used to indicate the index in \fImap\fP for which
725the value must be put, masked with \fBBPF_F_INDEX_MASK\fP\&.
726Alternatively, \fIflags\fP can be set to \fBBPF_F_CURRENT_CPU\fP
727to indicate that the index of the current CPU core should be
728used.
729.sp
730The value to write, of \fIsize\fP, is passed through eBPF stack and
731pointed by \fIdata\fP\&.
732.sp
733The context of the program \fIctx\fP needs also be passed to the
734helper.
735.sp
736On user space, a program willing to read the values needs to
737call \fBperf_event_open\fP() on the perf event (either for
738one or for all CPUs) and to store the file descriptor into the
739\fImap\fP\&. This must be done before the eBPF program can send data
740into it. An example is available in file
741\fIsamples/bpf/trace_output_user.c\fP in the Linux kernel source
742tree (the eBPF program counterpart is in
743\fIsamples/bpf/trace_output_kern.c\fP).
744.sp
745\fBbpf_perf_event_output\fP() achieves better performance
746than \fBbpf_trace_printk\fP() for sharing data with user
747space, and is much better suitable for streaming data from eBPF
748programs.
749.sp
750Note that this helper is not restricted to tracing use cases
751and can be used with programs attached to TC or XDP as well,
752where it allows for passing data to user space listeners. Data
753can be:
754.INDENT 7.0
755.IP \(bu 2
756Only custom structs,
757.IP \(bu 2
758Only the packet payload, or
759.IP \(bu 2
760A combination of both.
761.UNINDENT
762.TP
763.B Return
7640 on success, or a negative error in case of failure.
765.UNINDENT
766.TP
767.B \fBint bpf_skb_load_bytes(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB)\fP
768.INDENT 7.0
769.TP
770.B Description
771This helper was provided as an easy way to load data from a
772packet. It can be used to load \fIlen\fP bytes from \fIoffset\fP from
773the packet associated to \fIskb\fP, into the buffer pointed by
774\fIto\fP\&.
775.sp
776Since Linux 4.7, usage of this helper has mostly been replaced
777by "direct packet access", enabling packet data to be
778manipulated with \fIskb\fP\fB\->data\fP and \fIskb\fP\fB\->data_end\fP
779pointing respectively to the first byte of packet data and to
780the byte after the last byte of packet data. However, it
781remains useful if one wishes to read large quantities of data
782at once from a packet into the eBPF stack.
783.TP
784.B Return
7850 on success, or a negative error in case of failure.
786.UNINDENT
787.TP
e6107b29 788.B \fBint bpf_get_stackid(struct pt_regs *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP
53666f6c
MK
789.INDENT 7.0
790.TP
791.B Description
792Walk a user or a kernel stack and return its id. To achieve
793this, the helper needs \fIctx\fP, which is a pointer to the context
794on which the tracing program is executed, and a pointer to a
795\fImap\fP of type \fBBPF_MAP_TYPE_STACK_TRACE\fP\&.
796.sp
797The last argument, \fIflags\fP, holds the number of stack frames to
798skip (from 0 to 255), masked with
799\fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set
800a combination of the following flags:
801.INDENT 7.0
802.TP
803.B \fBBPF_F_USER_STACK\fP
804Collect a user space stack instead of a kernel stack.
805.TP
806.B \fBBPF_F_FAST_STACK_CMP\fP
807Compare stacks by hash only.
808.TP
809.B \fBBPF_F_REUSE_STACKID\fP
810If two different stacks hash into the same \fIstackid\fP,
811discard the old one.
812.UNINDENT
813.sp
814The stack id retrieved is a 32 bit long integer handle which
815can be further combined with other data (including other stack
816ids) and used as a key into maps. This can be useful for
817generating a variety of graphs (such as flame graphs or off\-cpu
818graphs).
819.sp
820For walking a stack, this helper is an improvement over
821\fBbpf_probe_read\fP(), which can be used with unrolled loops
822but is not efficient and consumes a lot of eBPF instructions.
823Instead, \fBbpf_get_stackid\fP() can collect up to
824\fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames. Note that
825this limit can be controlled with the \fBsysctl\fP program, and
826that it should be manually increased in order to profile long
827user stacks (such as stacks for Java programs). To do so, use:
828.INDENT 7.0
829.INDENT 3.5
830.sp
831.nf
832.ft C
833# sysctl kernel.perf_event_max_stack=<new value>
834.ft P
835.fi
836.UNINDENT
837.UNINDENT
838.TP
839.B Return
840The positive or null stack id on success, or a negative error
841in case of failure.
842.UNINDENT
843.TP
844.B \fBs64 bpf_csum_diff(__be32 *\fP\fIfrom\fP\fB, u32\fP \fIfrom_size\fP\fB, __be32 *\fP\fIto\fP\fB, u32\fP \fIto_size\fP\fB, __wsum\fP \fIseed\fP\fB)\fP
845.INDENT 7.0
846.TP
847.B Description
848Compute a checksum difference, from the raw buffer pointed by
849\fIfrom\fP, of length \fIfrom_size\fP (that must be a multiple of 4),
850towards the raw buffer pointed by \fIto\fP, of size \fIto_size\fP
851(same remark). An optional \fIseed\fP can be added to the value
852(this can be cascaded, the seed may come from a previous call
853to the helper).
854.sp
855This is flexible enough to be used in several ways:
856.INDENT 7.0
857.IP \(bu 2
858With \fIfrom_size\fP == 0, \fIto_size\fP > 0 and \fIseed\fP set to
859checksum, it can be used when pushing new data.
860.IP \(bu 2
861With \fIfrom_size\fP > 0, \fIto_size\fP == 0 and \fIseed\fP set to
862checksum, it can be used when removing data from a packet.
863.IP \(bu 2
864With \fIfrom_size\fP > 0, \fIto_size\fP > 0 and \fIseed\fP set to 0, it
865can be used to compute a diff. Note that \fIfrom_size\fP and
866\fIto_size\fP do not need to be equal.
867.UNINDENT
868.sp
869This helper can be used in combination with
870\fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(), to
871which one can feed in the difference computed with
872\fBbpf_csum_diff\fP().
873.TP
874.B Return
875The checksum result, or a negative error code in case of
876failure.
877.UNINDENT
878.TP
879.B \fBint bpf_skb_get_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP
880.INDENT 7.0
881.TP
882.B Description
883Retrieve tunnel options metadata for the packet associated to
884\fIskb\fP, and store the raw tunnel option data to the buffer \fIopt\fP
885of \fIsize\fP\&.
886.sp
887This helper can be used with encapsulation devices that can
888operate in "collect metadata" mode (please refer to the related
889note in the description of \fBbpf_skb_get_tunnel_key\fP() for
890more details). A particular example where this can be used is
891in combination with the Geneve encapsulation protocol, where it
892allows for pushing (with \fBbpf_skb_get_tunnel_opt\fP() helper)
893and retrieving arbitrary TLVs (Type\-Length\-Value headers) from
894the eBPF program. This allows for full customization of these
895headers.
896.TP
897.B Return
898The size of the option data retrieved.
899.UNINDENT
900.TP
901.B \fBint bpf_skb_set_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP
902.INDENT 7.0
903.TP
904.B Description
905Set tunnel options metadata for the packet associated to \fIskb\fP
906to the option data contained in the raw buffer \fIopt\fP of \fIsize\fP\&.
907.sp
908See also the description of the \fBbpf_skb_get_tunnel_opt\fP()
909helper for additional information.
910.TP
911.B Return
9120 on success, or a negative error in case of failure.
913.UNINDENT
914.TP
915.B \fBint bpf_skb_change_proto(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIproto\fP\fB, u64\fP \fIflags\fP\fB)\fP
916.INDENT 7.0
917.TP
918.B Description
919Change the protocol of the \fIskb\fP to \fIproto\fP\&. Currently
920supported are transition from IPv4 to IPv6, and from IPv6 to
921IPv4. The helper takes care of the groundwork for the
922transition, including resizing the socket buffer. The eBPF
923program is expected to fill the new headers, if any, via
924\fBskb_store_bytes\fP() and to recompute the checksums with
925\fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(). The main case for this helper is to perform NAT64
926operations out of an eBPF program.
927.sp
928Internally, the GSO type is marked as dodgy so that headers are
929checked and segments are recalculated by the GSO/GRO engine.
930The size for GSO target is adapted as well.
931.sp
932All values for \fIflags\fP are reserved for future usage, and must
933be left at zero.
934.sp
e6107b29 935A call to this helper is susceptible to change the underlying
53666f6c
MK
936packet buffer. Therefore, at load time, all checks on pointers
937previously done by the verifier are invalidated and must be
938performed again, if the helper is used in combination with
939direct packet access.
940.TP
941.B Return
9420 on success, or a negative error in case of failure.
943.UNINDENT
944.TP
945.B \fBint bpf_skb_change_type(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB)\fP
946.INDENT 7.0
947.TP
948.B Description
949Change the packet type for the packet associated to \fIskb\fP\&. This
950comes down to setting \fIskb\fP\fB\->pkt_type\fP to \fItype\fP, except
951the eBPF program does not have a write access to \fIskb\fP\fB\->pkt_type\fP beside this helper. Using a helper here allows
952for graceful handling of errors.
953.sp
954The major use case is to change incoming \fIskb*s to
955**PACKET_HOST*\fP in a programmatic way instead of having to
956recirculate via \fBredirect\fP(..., \fBBPF_F_INGRESS\fP), for
957example.
958.sp
959Note that \fItype\fP only allows certain values. At this time, they
960are:
961.INDENT 7.0
962.TP
963.B \fBPACKET_HOST\fP
964Packet is for us.
965.TP
966.B \fBPACKET_BROADCAST\fP
967Send packet to all.
968.TP
969.B \fBPACKET_MULTICAST\fP
970Send packet to group.
971.TP
972.B \fBPACKET_OTHERHOST\fP
973Send packet to someone else.
974.UNINDENT
975.TP
976.B Return
9770 on success, or a negative error in case of failure.
978.UNINDENT
979.TP
980.B \fBint bpf_skb_under_cgroup(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP
981.INDENT 7.0
982.TP
983.B Description
984Check whether \fIskb\fP is a descendant of the cgroup2 held by
985\fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&.
986.TP
987.B Return
988The return value depends on the result of the test, and can be:
989.INDENT 7.0
990.IP \(bu 2
9910, if the \fIskb\fP failed the cgroup2 descendant test.
992.IP \(bu 2
9931, if the \fIskb\fP succeeded the cgroup2 descendant test.
994.IP \(bu 2
995A negative error code, if an error occurred.
996.UNINDENT
997.UNINDENT
998.TP
999.B \fBu32 bpf_get_hash_recalc(struct sk_buff *\fP\fIskb\fP\fB)\fP
1000.INDENT 7.0
1001.TP
1002.B Description
1003Retrieve the hash of the packet, \fIskb\fP\fB\->hash\fP\&. If it is
1004not set, in particular if the hash was cleared due to mangling,
1005recompute this hash. Later accesses to the hash can be done
1006directly with \fIskb\fP\fB\->hash\fP\&.
1007.sp
1008Calling \fBbpf_set_hash_invalid\fP(), changing a packet
1009prototype with \fBbpf_skb_change_proto\fP(), or calling
1010\fBbpf_skb_store_bytes\fP() with the
1011\fBBPF_F_INVALIDATE_HASH\fP are actions susceptible to clear
1012the hash and to trigger a new computation for the next call to
1013\fBbpf_get_hash_recalc\fP().
1014.TP
1015.B Return
1016The 32\-bit hash.
1017.UNINDENT
1018.TP
1019.B \fBu64 bpf_get_current_task(void)\fP
1020.INDENT 7.0
1021.TP
1022.B Return
1023A pointer to the current task struct.
1024.UNINDENT
1025.TP
1026.B \fBint bpf_probe_write_user(void *\fP\fIdst\fP\fB, const void *\fP\fIsrc\fP\fB, u32\fP \fIlen\fP\fB)\fP
1027.INDENT 7.0
1028.TP
1029.B Description
1030Attempt in a safe way to write \fIlen\fP bytes from the buffer
1031\fIsrc\fP to \fIdst\fP in memory. It only works for threads that are in
1032user context, and \fIdst\fP must be a valid user space address.
1033.sp
1034This helper should not be used to implement any kind of
1035security mechanism because of TOC\-TOU attacks, but rather to
1036debug, divert, and manipulate execution of semi\-cooperative
1037processes.
1038.sp
1039Keep in mind that this feature is meant for experiments, and it
1040has a risk of crashing the system and running programs.
1041Therefore, when an eBPF program using this helper is attached,
1042a warning including PID and process name is printed to kernel
1043logs.
1044.TP
1045.B Return
10460 on success, or a negative error in case of failure.
1047.UNINDENT
1048.TP
1049.B \fBint bpf_current_task_under_cgroup(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP
1050.INDENT 7.0
1051.TP
1052.B Description
1053Check whether the probe is being run is the context of a given
1054subset of the cgroup2 hierarchy. The cgroup2 to test is held by
1055\fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&.
1056.TP
1057.B Return
1058The return value depends on the result of the test, and can be:
1059.INDENT 7.0
1060.IP \(bu 2
10610, if the \fIskb\fP task belongs to the cgroup2.
1062.IP \(bu 2
10631, if the \fIskb\fP task does not belong to the cgroup2.
1064.IP \(bu 2
1065A negative error code, if an error occurred.
1066.UNINDENT
1067.UNINDENT
1068.TP
1069.B \fBint bpf_skb_change_tail(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
1070.INDENT 7.0
1071.TP
1072.B Description
1073Resize (trim or grow) the packet associated to \fIskb\fP to the
1074new \fIlen\fP\&. The \fIflags\fP are reserved for future usage, and must
1075be left at zero.
1076.sp
1077The basic idea is that the helper performs the needed work to
1078change the size of the packet, then the eBPF program rewrites
1079the rest via helpers like \fBbpf_skb_store_bytes\fP(),
1080\fBbpf_l3_csum_replace\fP(), \fBbpf_l3_csum_replace\fP()
1081and others. This helper is a slow path utility intended for
1082replies with control messages. And because it is targeted for
1083slow path, the helper itself can afford to be slow: it
1084implicitly linearizes, unclones and drops offloads from the
1085\fIskb\fP\&.
1086.sp
e6107b29 1087A call to this helper is susceptible to change the underlying
53666f6c
MK
1088packet buffer. Therefore, at load time, all checks on pointers
1089previously done by the verifier are invalidated and must be
1090performed again, if the helper is used in combination with
1091direct packet access.
1092.TP
1093.B Return
10940 on success, or a negative error in case of failure.
1095.UNINDENT
1096.TP
1097.B \fBint bpf_skb_pull_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB)\fP
1098.INDENT 7.0
1099.TP
1100.B Description
1101Pull in non\-linear data in case the \fIskb\fP is non\-linear and not
1102all of \fIlen\fP are part of the linear section. Make \fIlen\fP bytes
1103from \fIskb\fP readable and writable. If a zero value is passed for
1104\fIlen\fP, then the whole length of the \fIskb\fP is pulled.
1105.sp
1106This helper is only needed for reading and writing with direct
1107packet access.
1108.sp
1109For direct packet access, testing that offsets to access
1110are within packet boundaries (test on \fIskb\fP\fB\->data_end\fP) is
1111susceptible to fail if offsets are invalid, or if the requested
1112data is in non\-linear parts of the \fIskb\fP\&. On failure the
1113program can just bail out, or in the case of a non\-linear
1114buffer, use a helper to make the data available. The
1115\fBbpf_skb_load_bytes\fP() helper is a first solution to access
1116the data. Another one consists in using \fBbpf_skb_pull_data\fP
1117to pull in once the non\-linear parts, then retesting and
1118eventually access the data.
1119.sp
1120At the same time, this also makes sure the \fIskb\fP is uncloned,
1121which is a necessary condition for direct write. As this needs
1122to be an invariant for the write part only, the verifier
1123detects writes and adds a prologue that is calling
1124\fBbpf_skb_pull_data()\fP to effectively unclone the \fIskb\fP from
1125the very beginning in case it is indeed cloned.
1126.sp
e6107b29 1127A call to this helper is susceptible to change the underlying
53666f6c
MK
1128packet buffer. Therefore, at load time, all checks on pointers
1129previously done by the verifier are invalidated and must be
1130performed again, if the helper is used in combination with
1131direct packet access.
1132.TP
1133.B Return
11340 on success, or a negative error in case of failure.
1135.UNINDENT
1136.TP
1137.B \fBs64 bpf_csum_update(struct sk_buff *\fP\fIskb\fP\fB, __wsum\fP \fIcsum\fP\fB)\fP
1138.INDENT 7.0
1139.TP
1140.B Description
1141Add the checksum \fIcsum\fP into \fIskb\fP\fB\->csum\fP in case the
1142driver has supplied a checksum for the entire packet into that
1143field. Return an error otherwise. This helper is intended to be
1144used in combination with \fBbpf_csum_diff\fP(), in particular
1145when the checksum needs to be updated after data has been
1146written into the packet through direct packet access.
1147.TP
1148.B Return
1149The checksum on success, or a negative error code in case of
1150failure.
1151.UNINDENT
1152.TP
1153.B \fBvoid bpf_set_hash_invalid(struct sk_buff *\fP\fIskb\fP\fB)\fP
1154.INDENT 7.0
1155.TP
1156.B Description
1157Invalidate the current \fIskb\fP\fB\->hash\fP\&. It can be used after
1158mangling on headers through direct packet access, in order to
1159indicate that the hash is outdated and to trigger a
1160recalculation the next time the kernel tries to access this
1161hash or when the \fBbpf_get_hash_recalc\fP() helper is called.
1162.UNINDENT
1163.TP
1164.B \fBint bpf_get_numa_node_id(void)\fP
1165.INDENT 7.0
1166.TP
1167.B Description
1168Return the id of the current NUMA node. The primary use case
1169for this helper is the selection of sockets for the local NUMA
1170node, when the program is attached to sockets using the
1171\fBSO_ATTACH_REUSEPORT_EBPF\fP option (see also \fBsocket(7)\fP),
1172but the helper is also available to other eBPF program types,
1173similarly to \fBbpf_get_smp_processor_id\fP().
1174.TP
1175.B Return
1176The id of current NUMA node.
1177.UNINDENT
1178.TP
1179.B \fBint bpf_skb_change_head(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
1180.INDENT 7.0
1181.TP
1182.B Description
1183Grows headroom of packet associated to \fIskb\fP and adjusts the
1184offset of the MAC header accordingly, adding \fIlen\fP bytes of
1185space. It automatically extends and reallocates memory as
1186required.
1187.sp
1188This helper can be used on a layer 3 \fIskb\fP to push a MAC header
1189for redirection into a layer 2 device.
1190.sp
1191All values for \fIflags\fP are reserved for future usage, and must
1192be left at zero.
1193.sp
e6107b29 1194A call to this helper is susceptible to change the underlying
53666f6c
MK
1195packet buffer. Therefore, at load time, all checks on pointers
1196previously done by the verifier are invalidated and must be
1197performed again, if the helper is used in combination with
1198direct packet access.
1199.TP
1200.B Return
12010 on success, or a negative error in case of failure.
1202.UNINDENT
1203.TP
1204.B \fBint bpf_xdp_adjust_head(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP
1205.INDENT 7.0
1206.TP
1207.B Description
1208Adjust (move) \fIxdp_md\fP\fB\->data\fP by \fIdelta\fP bytes. Note that
1209it is possible to use a negative value for \fIdelta\fP\&. This helper
1210can be used to prepare the packet for pushing or popping
1211headers.
1212.sp
e6107b29 1213A call to this helper is susceptible to change the underlying
53666f6c
MK
1214packet buffer. Therefore, at load time, all checks on pointers
1215previously done by the verifier are invalidated and must be
1216performed again, if the helper is used in combination with
1217direct packet access.
1218.TP
1219.B Return
12200 on success, or a negative error in case of failure.
1221.UNINDENT
1222.TP
1223.B \fBint bpf_probe_read_str(void *\fP\fIdst\fP\fB, int\fP \fIsize\fP\fB, const void *\fP\fIunsafe_ptr\fP\fB)\fP
1224.INDENT 7.0
1225.TP
1226.B Description
1227Copy a NUL terminated string from an unsafe address
1228\fIunsafe_ptr\fP to \fIdst\fP\&. The \fIsize\fP should include the
1229terminating NUL byte. In case the string length is smaller than
1230\fIsize\fP, the target is not padded with further NUL bytes. If the
1231string length is larger than \fIsize\fP, just \fIsize\fP\-1 bytes are
1232copied and the last byte is set to NUL.
1233.sp
1234On success, the length of the copied string is returned. This
1235makes this helper useful in tracing programs for reading
1236strings, and more importantly to get its length at runtime. See
1237the following snippet:
1238.INDENT 7.0
1239.INDENT 3.5
1240.sp
1241.nf
1242.ft C
1243SEC("kprobe/sys_open")
1244void bpf_sys_open(struct pt_regs *ctx)
e6107b29 1245{
53666f6c
MK
1246 char buf[PATHLEN]; // PATHLEN is defined to 256
1247 int res = bpf_probe_read_str(buf, sizeof(buf),
1248 ctx\->di);
1249
1250 // Consume buf, for example push it to
1251 // userspace via bpf_perf_event_output(); we
1252 // can use res (the string length) as event
1253 // size, after checking its boundaries.
e6107b29 1254}
53666f6c
MK
1255.ft P
1256.fi
1257.UNINDENT
1258.UNINDENT
1259.sp
1260In comparison, using \fBbpf_probe_read()\fP helper here instead
1261to read the string would require to estimate the length at
1262compile time, and would often result in copying more memory
1263than necessary.
1264.sp
1265Another useful use case is when parsing individual process
1266arguments or individual environment variables navigating
1267\fIcurrent\fP\fB\->mm\->arg_start\fP and \fIcurrent\fP\fB\->mm\->env_start\fP: using this helper and the return value,
1268one can quickly iterate at the right offset of the memory area.
1269.TP
1270.B Return
1271On success, the strictly positive length of the string,
1272including the trailing NUL character. On error, a negative
1273value.
1274.UNINDENT
1275.TP
1276.B \fBu64 bpf_get_socket_cookie(struct sk_buff *\fP\fIskb\fP\fB)\fP
1277.INDENT 7.0
1278.TP
1279.B Description
1280If the \fBstruct sk_buff\fP pointed by \fIskb\fP has a known socket,
1281retrieve the cookie (generated by the kernel) of this socket.
1282If no cookie has been set yet, generate a new cookie. Once
1283generated, the socket cookie remains stable for the life of the
1284socket. This helper can be useful for monitoring per socket
e6107b29
MK
1285networking traffic statistics as it provides a global socket
1286identifier that can be assumed unique.
53666f6c
MK
1287.TP
1288.B Return
1289A 8\-byte long non\-decreasing number on success, or 0 if the
1290socket field is missing inside \fIskb\fP\&.
1291.UNINDENT
1292.TP
1293.B \fBu64 bpf_get_socket_cookie(struct bpf_sock_addr *\fP\fIctx\fP\fB)\fP
1294.INDENT 7.0
1295.TP
1296.B Description
1297Equivalent to bpf_get_socket_cookie() helper that accepts
e6107b29 1298\fIskb\fP, but gets socket from \fBstruct bpf_sock_addr\fP context.
53666f6c
MK
1299.TP
1300.B Return
1301A 8\-byte long non\-decreasing number.
1302.UNINDENT
1303.TP
1304.B \fBu64 bpf_get_socket_cookie(struct bpf_sock_ops *\fP\fIctx\fP\fB)\fP
1305.INDENT 7.0
1306.TP
1307.B Description
1308Equivalent to bpf_get_socket_cookie() helper that accepts
e6107b29 1309\fIskb\fP, but gets socket from \fBstruct bpf_sock_ops\fP context.
53666f6c
MK
1310.TP
1311.B Return
1312A 8\-byte long non\-decreasing number.
1313.UNINDENT
1314.TP
1315.B \fBu32 bpf_get_socket_uid(struct sk_buff *\fP\fIskb\fP\fB)\fP
1316.INDENT 7.0
1317.TP
1318.B Return
1319The owner UID of the socket associated to \fIskb\fP\&. If the socket
1320is \fBNULL\fP, or if it is not a full socket (i.e. if it is a
1321time\-wait or a request socket instead), \fBoverflowuid\fP value
1322is returned (note that \fBoverflowuid\fP might also be the actual
1323UID value for the socket).
1324.UNINDENT
1325.TP
1326.B \fBu32 bpf_set_hash(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIhash\fP\fB)\fP
1327.INDENT 7.0
1328.TP
1329.B Description
1330Set the full hash for \fIskb\fP (set the field \fIskb\fP\fB\->hash\fP)
1331to value \fIhash\fP\&.
1332.TP
1333.B Return
e6107b29 13340
53666f6c
MK
1335.UNINDENT
1336.TP
1337.B \fBint bpf_setsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP
1338.INDENT 7.0
1339.TP
1340.B Description
1341Emulate a call to \fBsetsockopt()\fP on the socket associated to
1342\fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at
1343which the option resides and the name \fIoptname\fP of the option
1344must be specified, see \fBsetsockopt(2)\fP for more information.
1345The option value of length \fIoptlen\fP is pointed by \fIoptval\fP\&.
1346.sp
1347This helper actually implements a subset of \fBsetsockopt()\fP\&.
1348It supports the following \fIlevel\fPs:
1349.INDENT 7.0
1350.IP \(bu 2
1351\fBSOL_SOCKET\fP, which supports the following \fIoptname\fPs:
1352\fBSO_RCVBUF\fP, \fBSO_SNDBUF\fP, \fBSO_MAX_PACING_RATE\fP,
1353\fBSO_PRIORITY\fP, \fBSO_RCVLOWAT\fP, \fBSO_MARK\fP\&.
1354.IP \(bu 2
1355\fBIPPROTO_TCP\fP, which supports the following \fIoptname\fPs:
1356\fBTCP_CONGESTION\fP, \fBTCP_BPF_IW\fP,
1357\fBTCP_BPF_SNDCWND_CLAMP\fP\&.
1358.IP \(bu 2
1359\fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&.
1360.IP \(bu 2
1361\fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&.
1362.UNINDENT
1363.TP
1364.B Return
13650 on success, or a negative error in case of failure.
1366.UNINDENT
1367.TP
2223d7df 1368.B \fBint bpf_skb_adjust_room(struct sk_buff *\fP\fIskb\fP\fB, s32\fP \fIlen_diff\fP\fB, u32\fP \fImode\fP\fB, u64\fP \fIflags\fP\fB)\fP
53666f6c
MK
1369.INDENT 7.0
1370.TP
1371.B Description
1372Grow or shrink the room for data in the packet associated to
1373\fIskb\fP by \fIlen_diff\fP, and according to the selected \fImode\fP\&.
1374.sp
e6107b29 1375There are two supported modes at this time:
53666f6c
MK
1376.INDENT 7.0
1377.IP \(bu 2
e6107b29
MK
1378\fBBPF_ADJ_ROOM_MAC\fP: Adjust room at the mac layer
1379(room space is added or removed below the layer 2 header).
1380.IP \(bu 2
53666f6c
MK
1381\fBBPF_ADJ_ROOM_NET\fP: Adjust room at the network layer
1382(room space is added or removed below the layer 3 header).
1383.UNINDENT
1384.sp
e6107b29
MK
1385The following flags are supported at this time:
1386.INDENT 7.0
1387.IP \(bu 2
1388\fBBPF_F_ADJ_ROOM_FIXED_GSO\fP: Do not adjust gso_size.
1389Adjusting mss in this way is not allowed for datagrams.
1390.IP \(bu 2
1391\fBBPF_F_ADJ_ROOM_ENCAP_L3_IPV4\fP,
1392\fBBPF_F_ADJ_ROOM_ENCAP_L3_IPV6\fP:
1393Any new space is reserved to hold a tunnel header.
1394Configure skb offsets and other fields accordingly.
1395.IP \(bu 2
1396\fBBPF_F_ADJ_ROOM_ENCAP_L4_GRE\fP,
1397\fBBPF_F_ADJ_ROOM_ENCAP_L4_UDP\fP:
1398Use with ENCAP_L3 flags to further specify the tunnel type.
1399.IP \(bu 2
1400\fBBPF_F_ADJ_ROOM_ENCAP_L2\fP(\fIlen\fP):
1401Use with ENCAP_L3/L4 flags to further specify the tunnel
1402type; \fIlen\fP is the length of the inner MAC header.
1403.UNINDENT
53666f6c 1404.sp
e6107b29 1405A call to this helper is susceptible to change the underlying
53666f6c
MK
1406packet buffer. Therefore, at load time, all checks on pointers
1407previously done by the verifier are invalidated and must be
1408performed again, if the helper is used in combination with
1409direct packet access.
1410.TP
1411.B Return
14120 on success, or a negative error in case of failure.
1413.UNINDENT
1414.TP
1415.B \fBint bpf_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1416.INDENT 7.0
1417.TP
1418.B Description
1419Redirect the packet to the endpoint referenced by \fImap\fP at
1420index \fIkey\fP\&. Depending on its type, this \fImap\fP can contain
1421references to net devices (for forwarding packets through other
1422ports), or to CPUs (for redirecting XDP frames to another CPU;
1423but this is only implemented for native XDP (with driver
1424support) as of this writing).
1425.sp
e6107b29
MK
1426The lower two bits of \fIflags\fP are used as the return code if
1427the map lookup fails. This is so that the return value can be
1428one of the XDP program return codes up to XDP_TX, as chosen by
1429the caller. Any higher bits in the \fIflags\fP argument must be
1430unset.
53666f6c
MK
1431.sp
1432When used to redirect packets to net devices, this helper
1433provides a high performance increase over \fBbpf_redirect\fP().
1434This is due to various implementation details of the underlying
1435mechanisms, one of which is the fact that \fBbpf_redirect_map\fP() tries to send packet as a "bulk" to the device.
1436.TP
1437.B Return
1438\fBXDP_REDIRECT\fP on success, or \fBXDP_ABORTED\fP on error.
1439.UNINDENT
1440.TP
1441.B \fBint bpf_sk_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1442.INDENT 7.0
1443.TP
1444.B Description
1445Redirect the packet to the socket referenced by \fImap\fP (of type
1446\fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and
1447egress interfaces can be used for redirection. The
1448\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
1449distinction (ingress path is selected if the flag is present,
1450egress path otherwise). This is the only flag supported for now.
1451.TP
1452.B Return
1453\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
1454.UNINDENT
1455.TP
1456.B \fBint bpf_sock_map_update(struct bpf_sock_ops *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1457.INDENT 7.0
1458.TP
1459.B Description
1460Add an entry to, or update a \fImap\fP referencing sockets. The
1461\fIskops\fP is used as a new value for the entry associated to
1462\fIkey\fP\&. \fIflags\fP is one of:
1463.INDENT 7.0
1464.TP
1465.B \fBBPF_NOEXIST\fP
1466The entry for \fIkey\fP must not exist in the map.
1467.TP
1468.B \fBBPF_EXIST\fP
1469The entry for \fIkey\fP must already exist in the map.
1470.TP
1471.B \fBBPF_ANY\fP
1472No condition on the existence of the entry for \fIkey\fP\&.
1473.UNINDENT
1474.sp
1475If the \fImap\fP has eBPF programs (parser and verdict), those will
1476be inherited by the socket being added. If the socket is
1477already attached to eBPF programs, this results in an error.
1478.TP
1479.B Return
14800 on success, or a negative error in case of failure.
1481.UNINDENT
1482.TP
1483.B \fBint bpf_xdp_adjust_meta(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP
1484.INDENT 7.0
1485.TP
1486.B Description
1487Adjust the address pointed by \fIxdp_md\fP\fB\->data_meta\fP by
1488\fIdelta\fP (which can be positive or negative). Note that this
1489operation modifies the address stored in \fIxdp_md\fP\fB\->data\fP,
1490so the latter must be loaded only after the helper has been
1491called.
1492.sp
1493The use of \fIxdp_md\fP\fB\->data_meta\fP is optional and programs
1494are not required to use it. The rationale is that when the
1495packet is processed with XDP (e.g. as DoS filter), it is
1496possible to push further meta data along with it before passing
1497to the stack, and to give the guarantee that an ingress eBPF
1498program attached as a TC classifier on the same device can pick
1499this up for further post\-processing. Since TC works with socket
1500buffers, it remains possible to set from XDP the \fBmark\fP or
1501\fBpriority\fP pointers, or other pointers for the socket buffer.
1502Having this scratch space generic and programmable allows for
1503more flexibility as the user is free to store whatever meta
1504data they need.
1505.sp
e6107b29 1506A call to this helper is susceptible to change the underlying
53666f6c
MK
1507packet buffer. Therefore, at load time, all checks on pointers
1508previously done by the verifier are invalidated and must be
1509performed again, if the helper is used in combination with
1510direct packet access.
1511.TP
1512.B Return
15130 on success, or a negative error in case of failure.
1514.UNINDENT
1515.TP
1516.B \fBint bpf_perf_event_read_value(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP
1517.INDENT 7.0
1518.TP
1519.B Description
1520Read the value of a perf event counter, and store it into \fIbuf\fP
1521of size \fIbuf_size\fP\&. This helper relies on a \fImap\fP of type
1522\fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of the perf event
1523counter is selected when \fImap\fP is updated with perf event file
1524descriptors. The \fImap\fP is an array whose size is the number of
1525available CPUs, and each cell contains a value relative to one
1526CPU. The value to retrieve is indicated by \fIflags\fP, that
1527contains the index of the CPU to look up, masked with
1528\fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to
1529\fBBPF_F_CURRENT_CPU\fP to indicate that the value for the
1530current CPU should be retrieved.
1531.sp
1532This helper behaves in a way close to
1533\fBbpf_perf_event_read\fP() helper, save that instead of
1534just returning the value observed, it fills the \fIbuf\fP
1535structure. This allows for additional data to be retrieved: in
1536particular, the enabled and running times (in \fIbuf\fP\fB\->enabled\fP and \fIbuf\fP\fB\->running\fP, respectively) are
1537copied. In general, \fBbpf_perf_event_read_value\fP() is
1538recommended over \fBbpf_perf_event_read\fP(), which has some
1539ABI issues and provides fewer functionalities.
1540.sp
1541These values are interesting, because hardware PMU (Performance
1542Monitoring Unit) counters are limited resources. When there are
1543more PMU based perf events opened than available counters,
1544kernel will multiplex these events so each event gets certain
1545percentage (but not all) of the PMU time. In case that
1546multiplexing happens, the number of samples or counter value
1547will not reflect the case compared to when no multiplexing
1548occurs. This makes comparison between different runs difficult.
1549Typically, the counter value should be normalized before
1550comparing to other experiments. The usual normalization is done
1551as follows.
1552.INDENT 7.0
1553.INDENT 3.5
1554.sp
1555.nf
1556.ft C
1557normalized_counter = counter * t_enabled / t_running
1558.ft P
1559.fi
1560.UNINDENT
1561.UNINDENT
1562.sp
1563Where t_enabled is the time enabled for event and t_running is
1564the time running for event since last normalization. The
1565enabled and running times are accumulated since the perf event
1566open. To achieve scaling factor between two invocations of an
1567eBPF program, users can can use CPU id as the key (which is
1568typical for perf array usage model) to remember the previous
1569value and do the calculation inside the eBPF program.
1570.TP
1571.B Return
15720 on success, or a negative error in case of failure.
1573.UNINDENT
1574.TP
1575.B \fBint bpf_perf_prog_read_value(struct bpf_perf_event_data *\fP\fIctx\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP
1576.INDENT 7.0
1577.TP
1578.B Description
1579For en eBPF program attached to a perf event, retrieve the
1580value of the event counter associated to \fIctx\fP and store it in
1581the structure pointed by \fIbuf\fP and of size \fIbuf_size\fP\&. Enabled
1582and running times are also stored in the structure (see
1583description of helper \fBbpf_perf_event_read_value\fP() for
1584more details).
1585.TP
1586.B Return
15870 on success, or a negative error in case of failure.
1588.UNINDENT
1589.TP
1590.B \fBint bpf_getsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP
1591.INDENT 7.0
1592.TP
1593.B Description
1594Emulate a call to \fBgetsockopt()\fP on the socket associated to
1595\fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at
1596which the option resides and the name \fIoptname\fP of the option
1597must be specified, see \fBgetsockopt(2)\fP for more information.
1598The retrieved value is stored in the structure pointed by
1599\fIopval\fP and of length \fIoptlen\fP\&.
1600.sp
1601This helper actually implements a subset of \fBgetsockopt()\fP\&.
1602It supports the following \fIlevel\fPs:
1603.INDENT 7.0
1604.IP \(bu 2
1605\fBIPPROTO_TCP\fP, which supports \fIoptname\fP
1606\fBTCP_CONGESTION\fP\&.
1607.IP \(bu 2
1608\fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&.
1609.IP \(bu 2
1610\fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&.
1611.UNINDENT
1612.TP
1613.B Return
16140 on success, or a negative error in case of failure.
1615.UNINDENT
1616.TP
e6107b29 1617.B \fBint bpf_override_return(struct pt_regs *\fP\fIregs\fP\fB, u64\fP \fIrc\fP\fB)\fP
53666f6c
MK
1618.INDENT 7.0
1619.TP
1620.B Description
1621Used for error injection, this helper uses kprobes to override
1622the return value of the probed function, and to set it to \fIrc\fP\&.
1623The first argument is the context \fIregs\fP on which the kprobe
1624works.
1625.sp
1626This helper works by setting setting the PC (program counter)
1627to an override function which is run in place of the original
1628probed function. This means the probed function is not run at
1629all. The replacement function just returns with the required
1630value.
1631.sp
1632This helper has security implications, and thus is subject to
1633restrictions. It is only available if the kernel was compiled
1634with the \fBCONFIG_BPF_KPROBE_OVERRIDE\fP configuration
1635option, and in this case it only works on functions tagged with
1636\fBALLOW_ERROR_INJECTION\fP in the kernel code.
1637.sp
1638Also, the helper is only available for the architectures having
1639the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing,
1640x86 architecture is the only one to support this feature.
1641.TP
1642.B Return
e6107b29 16430
53666f6c
MK
1644.UNINDENT
1645.TP
1646.B \fBint bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *\fP\fIbpf_sock\fP\fB, int\fP \fIargval\fP\fB)\fP
1647.INDENT 7.0
1648.TP
1649.B Description
1650Attempt to set the value of the \fBbpf_sock_ops_cb_flags\fP field
1651for the full TCP socket associated to \fIbpf_sock_ops\fP to
1652\fIargval\fP\&.
1653.sp
1654The primary use of this field is to determine if there should
1655be calls to eBPF programs of type
1656\fBBPF_PROG_TYPE_SOCK_OPS\fP at various points in the TCP
1657code. A program of the same type can change its value, per
1658connection and as necessary, when the connection is
1659established. This field is directly accessible for reading, but
1660this helper must be used for updates in order to return an
1661error if an eBPF program tries to set a callback that is not
1662supported in the current kernel.
1663.sp
e6107b29 1664\fIargval\fP is a flag array which can combine these flags:
53666f6c
MK
1665.INDENT 7.0
1666.IP \(bu 2
1667\fBBPF_SOCK_OPS_RTO_CB_FLAG\fP (retransmission time out)
1668.IP \(bu 2
1669\fBBPF_SOCK_OPS_RETRANS_CB_FLAG\fP (retransmission)
1670.IP \(bu 2
1671\fBBPF_SOCK_OPS_STATE_CB_FLAG\fP (TCP state change)
e6107b29
MK
1672.IP \(bu 2
1673\fBBPF_SOCK_OPS_RTT_CB_FLAG\fP (every RTT)
1674.UNINDENT
1675.sp
1676Therefore, this function can be used to clear a callback flag by
1677setting the appropriate bit to zero. e.g. to disable the RTO
1678callback:
1679.INDENT 7.0
1680.TP
1681.B \fBbpf_sock_ops_cb_flags_set(bpf_sock,\fP
1682\fBbpf_sock\->bpf_sock_ops_cb_flags & ~BPF_SOCK_OPS_RTO_CB_FLAG)\fP
53666f6c
MK
1683.UNINDENT
1684.sp
1685Here are some examples of where one could call such eBPF
1686program:
1687.INDENT 7.0
1688.IP \(bu 2
1689When RTO fires.
1690.IP \(bu 2
1691When a packet is retransmitted.
1692.IP \(bu 2
1693When the connection terminates.
1694.IP \(bu 2
1695When a packet is sent.
1696.IP \(bu 2
1697When a packet is received.
1698.UNINDENT
1699.TP
1700.B Return
1701Code \fB\-EINVAL\fP if the socket is not a full TCP socket;
1702otherwise, a positive number containing the bits that could not
1703be set is returned (which comes down to 0 if all bits were set
1704as required).
1705.UNINDENT
1706.TP
1707.B \fBint bpf_msg_redirect_map(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
1708.INDENT 7.0
1709.TP
1710.B Description
1711This helper is used in programs implementing policies at the
1712socket level. If the message \fImsg\fP is allowed to pass (i.e. if
1713the verdict eBPF program returns \fBSK_PASS\fP), redirect it to
1714the socket referenced by \fImap\fP (of type
1715\fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and
1716egress interfaces can be used for redirection. The
1717\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
1718distinction (ingress path is selected if the flag is present,
1719egress path otherwise). This is the only flag supported for now.
1720.TP
1721.B Return
1722\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
1723.UNINDENT
1724.TP
1725.B \fBint bpf_msg_apply_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP
1726.INDENT 7.0
1727.TP
1728.B Description
1729For socket policies, apply the verdict of the eBPF program to
1730the next \fIbytes\fP (number of bytes) of message \fImsg\fP\&.
1731.sp
1732For example, this helper can be used in the following cases:
1733.INDENT 7.0
1734.IP \(bu 2
1735A single \fBsendmsg\fP() or \fBsendfile\fP() system call
1736contains multiple logical messages that the eBPF program is
1737supposed to read and for which it should apply a verdict.
1738.IP \(bu 2
1739An eBPF program only cares to read the first \fIbytes\fP of a
1740\fImsg\fP\&. If the message has a large payload, then setting up
1741and calling the eBPF program repeatedly for all bytes, even
1742though the verdict is already known, would create unnecessary
1743overhead.
1744.UNINDENT
1745.sp
1746When called from within an eBPF program, the helper sets a
1747counter internal to the BPF infrastructure, that is used to
1748apply the last verdict to the next \fIbytes\fP\&. If \fIbytes\fP is
1749smaller than the current data being processed from a
1750\fBsendmsg\fP() or \fBsendfile\fP() system call, the first
1751\fIbytes\fP will be sent and the eBPF program will be re\-run with
1752the pointer for start of data pointing to byte number \fIbytes\fP
1753\fB+ 1\fP\&. If \fIbytes\fP is larger than the current data being
1754processed, then the eBPF verdict will be applied to multiple
1755\fBsendmsg\fP() or \fBsendfile\fP() calls until \fIbytes\fP are
1756consumed.
1757.sp
1758Note that if a socket closes with the internal counter holding
1759a non\-zero value, this is not a problem because data is not
1760being buffered for \fIbytes\fP and is sent as it is received.
1761.TP
1762.B Return
e6107b29 17630
53666f6c
MK
1764.UNINDENT
1765.TP
1766.B \fBint bpf_msg_cork_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP
1767.INDENT 7.0
1768.TP
1769.B Description
1770For socket policies, prevent the execution of the verdict eBPF
1771program for message \fImsg\fP until \fIbytes\fP (byte number) have been
1772accumulated.
1773.sp
1774This can be used when one needs a specific number of bytes
1775before a verdict can be assigned, even if the data spans
1776multiple \fBsendmsg\fP() or \fBsendfile\fP() calls. The extreme
1777case would be a user calling \fBsendmsg\fP() repeatedly with
17781\-byte long message segments. Obviously, this is bad for
1779performance, but it is still valid. If the eBPF program needs
1780\fIbytes\fP bytes to validate a header, this helper can be used to
1781prevent the eBPF program to be called again until \fIbytes\fP have
1782been accumulated.
1783.TP
1784.B Return
e6107b29 17850
53666f6c
MK
1786.UNINDENT
1787.TP
1788.B \fBint bpf_msg_pull_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIend\fP\fB, u64\fP \fIflags\fP\fB)\fP
1789.INDENT 7.0
1790.TP
1791.B Description
1792For socket policies, pull in non\-linear data from user space
1793for \fImsg\fP and set pointers \fImsg\fP\fB\->data\fP and \fImsg\fP\fB\->data_end\fP to \fIstart\fP and \fIend\fP bytes offsets into \fImsg\fP,
1794respectively.
1795.sp
1796If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a
1797\fImsg\fP it can only parse data that the (\fBdata\fP, \fBdata_end\fP)
1798pointers have already consumed. For \fBsendmsg\fP() hooks this
1799is likely the first scatterlist element. But for calls relying
1800on the \fBsendpage\fP handler (e.g. \fBsendfile\fP()) this will
1801be the range (\fB0\fP, \fB0\fP) because the data is shared with
1802user space and by default the objective is to avoid allowing
1803user space to modify data while (or after) eBPF verdict is
1804being decided. This helper can be used to pull in data and to
1805set the start and end pointer to given values. Data will be
1806copied if necessary (i.e. if data was not linear and if start
1807and end pointers do not point to the same chunk).
1808.sp
e6107b29 1809A call to this helper is susceptible to change the underlying
53666f6c
MK
1810packet buffer. Therefore, at load time, all checks on pointers
1811previously done by the verifier are invalidated and must be
1812performed again, if the helper is used in combination with
1813direct packet access.
1814.sp
1815All values for \fIflags\fP are reserved for future usage, and must
1816be left at zero.
1817.TP
1818.B Return
18190 on success, or a negative error in case of failure.
1820.UNINDENT
1821.TP
1822.B \fBint bpf_bind(struct bpf_sock_addr *\fP\fIctx\fP\fB, struct sockaddr *\fP\fIaddr\fP\fB, int\fP \fIaddr_len\fP\fB)\fP
1823.INDENT 7.0
1824.TP
1825.B Description
1826Bind the socket associated to \fIctx\fP to the address pointed by
1827\fIaddr\fP, of length \fIaddr_len\fP\&. This allows for making outgoing
1828connection from the desired IP address, which can be useful for
1829example when all processes inside a cgroup should use one
1830single IP address on a host that has multiple IP configured.
1831.sp
1832This helper works for IPv4 and IPv6, TCP and UDP sockets. The
1833domain (\fIaddr\fP\fB\->sa_family\fP) must be \fBAF_INET\fP (or
1834\fBAF_INET6\fP). Looking for a free port to bind to can be
1835expensive, therefore binding to port is not permitted by the
1836helper: \fIaddr\fP\fB\->sin_port\fP (or \fBsin6_port\fP, respectively)
1837must be set to zero.
1838.TP
1839.B Return
18400 on success, or a negative error in case of failure.
1841.UNINDENT
1842.TP
1843.B \fBint bpf_xdp_adjust_tail(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP
1844.INDENT 7.0
1845.TP
1846.B Description
1847Adjust (move) \fIxdp_md\fP\fB\->data_end\fP by \fIdelta\fP bytes. It is
1848only possible to shrink the packet as of this writing,
1849therefore \fIdelta\fP must be a negative integer.
1850.sp
e6107b29 1851A call to this helper is susceptible to change the underlying
53666f6c
MK
1852packet buffer. Therefore, at load time, all checks on pointers
1853previously done by the verifier are invalidated and must be
1854performed again, if the helper is used in combination with
1855direct packet access.
1856.TP
1857.B Return
18580 on success, or a negative error in case of failure.
1859.UNINDENT
1860.TP
1861.B \fBint bpf_skb_get_xfrm_state(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIindex\fP\fB, struct bpf_xfrm_state *\fP\fIxfrm_state\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
1862.INDENT 7.0
1863.TP
1864.B Description
1865Retrieve the XFRM state (IP transform framework, see also
1866\fBip\-xfrm(8)\fP) at \fIindex\fP in XFRM "security path" for \fIskb\fP\&.
1867.sp
1868The retrieved value is stored in the \fBstruct bpf_xfrm_state\fP
1869pointed by \fIxfrm_state\fP and of length \fIsize\fP\&.
1870.sp
1871All values for \fIflags\fP are reserved for future usage, and must
1872be left at zero.
1873.sp
1874This helper is available only if the kernel was compiled with
1875\fBCONFIG_XFRM\fP configuration option.
1876.TP
1877.B Return
18780 on success, or a negative error in case of failure.
1879.UNINDENT
1880.TP
1881.B \fBint bpf_get_stack(struct pt_regs *\fP\fIregs\fP\fB, void *\fP\fIbuf\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP
1882.INDENT 7.0
1883.TP
1884.B Description
1885Return a user or a kernel stack in bpf program provided buffer.
1886To achieve this, the helper needs \fIctx\fP, which is a pointer
1887to the context on which the tracing program is executed.
1888To store the stacktrace, the bpf program provides \fIbuf\fP with
1889a nonnegative \fIsize\fP\&.
1890.sp
1891The last argument, \fIflags\fP, holds the number of stack frames to
1892skip (from 0 to 255), masked with
1893\fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set
1894the following flags:
1895.INDENT 7.0
1896.TP
1897.B \fBBPF_F_USER_STACK\fP
1898Collect a user space stack instead of a kernel stack.
1899.TP
1900.B \fBBPF_F_USER_BUILD_ID\fP
1901Collect buildid+offset instead of ips for user stack,
1902only valid if \fBBPF_F_USER_STACK\fP is also specified.
1903.UNINDENT
1904.sp
1905\fBbpf_get_stack\fP() can collect up to
1906\fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames, subject
1907to sufficient large buffer size. Note that
1908this limit can be controlled with the \fBsysctl\fP program, and
1909that it should be manually increased in order to profile long
1910user stacks (such as stacks for Java programs). To do so, use:
1911.INDENT 7.0
1912.INDENT 3.5
1913.sp
1914.nf
1915.ft C
1916# sysctl kernel.perf_event_max_stack=<new value>
1917.ft P
1918.fi
1919.UNINDENT
1920.UNINDENT
1921.TP
1922.B Return
1923A non\-negative value equal to or less than \fIsize\fP on success,
1924or a negative error in case of failure.
1925.UNINDENT
1926.TP
1927.B \fBint bpf_skb_load_bytes_relative(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB, u32\fP \fIstart_header\fP\fB)\fP
1928.INDENT 7.0
1929.TP
1930.B Description
1931This helper is similar to \fBbpf_skb_load_bytes\fP() in that
1932it provides an easy way to load \fIlen\fP bytes from \fIoffset\fP
1933from the packet associated to \fIskb\fP, into the buffer pointed
1934by \fIto\fP\&. The difference to \fBbpf_skb_load_bytes\fP() is that
1935a fifth argument \fIstart_header\fP exists in order to select a
1936base offset to start from. \fIstart_header\fP can be one of:
1937.INDENT 7.0
1938.TP
1939.B \fBBPF_HDR_START_MAC\fP
1940Base offset to load data from is \fIskb\fP\(aqs mac header.
1941.TP
1942.B \fBBPF_HDR_START_NET\fP
1943Base offset to load data from is \fIskb\fP\(aqs network header.
1944.UNINDENT
1945.sp
1946In general, "direct packet access" is the preferred method to
1947access packet data, however, this helper is in particular useful
1948in socket filters where \fIskb\fP\fB\->data\fP does not always point
1949to the start of the mac header and where "direct packet access"
1950is not available.
1951.TP
1952.B Return
19530 on success, or a negative error in case of failure.
1954.UNINDENT
1955.TP
1956.B \fBint bpf_fib_lookup(void *\fP\fIctx\fP\fB, struct bpf_fib_lookup *\fP\fIparams\fP\fB, int\fP \fIplen\fP\fB, u32\fP \fIflags\fP\fB)\fP
1957.INDENT 7.0
1958.TP
1959.B Description
1960Do FIB lookup in kernel tables using parameters in \fIparams\fP\&.
1961If lookup is successful and result shows packet is to be
1962forwarded, the neighbor tables are searched for the nexthop.
1963If successful (ie., FIB lookup shows forwarding and nexthop
1964is resolved), the nexthop address is returned in ipv4_dst
1965or ipv6_dst based on family, smac is set to mac address of
1966egress device, dmac is set to nexthop mac address, rt_metric
1967is set to metric from route (IPv4/IPv6 only), and ifindex
1968is set to the device index of the nexthop from the FIB lookup.
1969.sp
1970\fIplen\fP argument is the size of the passed in struct.
1971\fIflags\fP argument can be a combination of one or more of the
1972following values:
1973.INDENT 7.0
1974.TP
1975.B \fBBPF_FIB_LOOKUP_DIRECT\fP
1976Do a direct table lookup vs full lookup using FIB
1977rules.
1978.TP
1979.B \fBBPF_FIB_LOOKUP_OUTPUT\fP
1980Perform lookup from an egress perspective (default is
1981ingress).
1982.UNINDENT
1983.sp
1984\fIctx\fP is either \fBstruct xdp_md\fP for XDP programs or
1985\fBstruct sk_buff\fP tc cls_act programs.
1986.TP
1987.B Return
1988.INDENT 7.0
1989.IP \(bu 2
1990< 0 if any input argument is invalid
1991.IP \(bu 2
19920 on success (packet is forwarded, nexthop neighbor exists)
1993.IP \(bu 2
1994> 0 one of \fBBPF_FIB_LKUP_RET_\fP codes explaining why the
1995packet is not forwarded or needs assist from full stack
1996.UNINDENT
1997.UNINDENT
1998.TP
1999.B \fBint bpf_sock_hash_update(struct bpf_sock_ops_kern *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
2000.INDENT 7.0
2001.TP
2002.B Description
2003Add an entry to, or update a sockhash \fImap\fP referencing sockets.
2004The \fIskops\fP is used as a new value for the entry associated to
2005\fIkey\fP\&. \fIflags\fP is one of:
2006.INDENT 7.0
2007.TP
2008.B \fBBPF_NOEXIST\fP
2009The entry for \fIkey\fP must not exist in the map.
2010.TP
2011.B \fBBPF_EXIST\fP
2012The entry for \fIkey\fP must already exist in the map.
2013.TP
2014.B \fBBPF_ANY\fP
2015No condition on the existence of the entry for \fIkey\fP\&.
2016.UNINDENT
2017.sp
2018If the \fImap\fP has eBPF programs (parser and verdict), those will
2019be inherited by the socket being added. If the socket is
2020already attached to eBPF programs, this results in an error.
2021.TP
2022.B Return
20230 on success, or a negative error in case of failure.
2024.UNINDENT
2025.TP
2026.B \fBint bpf_msg_redirect_hash(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
2027.INDENT 7.0
2028.TP
2029.B Description
2030This helper is used in programs implementing policies at the
2031socket level. If the message \fImsg\fP is allowed to pass (i.e. if
2032the verdict eBPF program returns \fBSK_PASS\fP), redirect it to
2033the socket referenced by \fImap\fP (of type
2034\fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and
2035egress interfaces can be used for redirection. The
2036\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
2037distinction (ingress path is selected if the flag is present,
2038egress path otherwise). This is the only flag supported for now.
2039.TP
2040.B Return
2041\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
2042.UNINDENT
2043.TP
2044.B \fBint bpf_sk_redirect_hash(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
2045.INDENT 7.0
2046.TP
2047.B Description
2048This helper is used in programs implementing policies at the
2049skb socket level. If the sk_buff \fIskb\fP is allowed to pass (i.e.
2050if the verdeict eBPF program returns \fBSK_PASS\fP), redirect it
2051to the socket referenced by \fImap\fP (of type
2052\fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and
2053egress interfaces can be used for redirection. The
2054\fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the
2055distinction (ingress path is selected if the flag is present,
2056egress otherwise). This is the only flag supported for now.
2057.TP
2058.B Return
2059\fBSK_PASS\fP on success, or \fBSK_DROP\fP on error.
2060.UNINDENT
2061.TP
2062.B \fBint bpf_lwt_push_encap(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB, void *\fP\fIhdr\fP\fB, u32\fP \fIlen\fP\fB)\fP
2063.INDENT 7.0
2064.TP
2065.B Description
2066Encapsulate the packet associated to \fIskb\fP within a Layer 3
2067protocol header. This header is provided in the buffer at
2068address \fIhdr\fP, with \fIlen\fP its size in bytes. \fItype\fP indicates
2069the protocol of the header and can be one of:
2070.INDENT 7.0
2071.TP
2072.B \fBBPF_LWT_ENCAP_SEG6\fP
2073IPv6 encapsulation with Segment Routing Header
2074(\fBstruct ipv6_sr_hdr\fP). \fIhdr\fP only contains the SRH,
2075the IPv6 header is computed by the kernel.
2076.TP
2077.B \fBBPF_LWT_ENCAP_SEG6_INLINE\fP
2078Only works if \fIskb\fP contains an IPv6 packet. Insert a
2079Segment Routing Header (\fBstruct ipv6_sr_hdr\fP) inside
2080the IPv6 header.
e6107b29
MK
2081.TP
2082.B \fBBPF_LWT_ENCAP_IP\fP
2083IP encapsulation (GRE/GUE/IPIP/etc). The outer header
2084must be IPv4 or IPv6, followed by zero or more
2085additional headers, up to \fBLWT_BPF_MAX_HEADROOM\fP
2086total bytes in all prepended headers. Please note that
2087if \fBskb_is_gso\fP(\fIskb\fP) is true, no more than two
2088headers can be prepended, and the inner header, if
2089present, should be either GRE or UDP/GUE.
53666f6c
MK
2090.UNINDENT
2091.sp
e6107b29
MK
2092\fBBPF_LWT_ENCAP_SEG6\fP* types can be called by BPF programs
2093of type \fBBPF_PROG_TYPE_LWT_IN\fP; \fBBPF_LWT_ENCAP_IP\fP type can
2094be called by bpf programs of types \fBBPF_PROG_TYPE_LWT_IN\fP and
2095\fBBPF_PROG_TYPE_LWT_XMIT\fP\&.
2096.sp
2097A call to this helper is susceptible to change the underlying
53666f6c
MK
2098packet buffer. Therefore, at load time, all checks on pointers
2099previously done by the verifier are invalidated and must be
2100performed again, if the helper is used in combination with
2101direct packet access.
2102.TP
2103.B Return
21040 on success, or a negative error in case of failure.
2105.UNINDENT
2106.TP
2107.B \fBint bpf_lwt_seg6_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB)\fP
2108.INDENT 7.0
2109.TP
2110.B Description
2111Store \fIlen\fP bytes from address \fIfrom\fP into the packet
2112associated to \fIskb\fP, at \fIoffset\fP\&. Only the flags, tag and TLVs
2113inside the outermost IPv6 Segment Routing Header can be
2114modified through this helper.
2115.sp
e6107b29 2116A call to this helper is susceptible to change the underlying
53666f6c
MK
2117packet buffer. Therefore, at load time, all checks on pointers
2118previously done by the verifier are invalidated and must be
2119performed again, if the helper is used in combination with
2120direct packet access.
2121.TP
2122.B Return
21230 on success, or a negative error in case of failure.
2124.UNINDENT
2125.TP
2126.B \fBint bpf_lwt_seg6_adjust_srh(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, s32\fP \fIdelta\fP\fB)\fP
2127.INDENT 7.0
2128.TP
2129.B Description
2130Adjust the size allocated to TLVs in the outermost IPv6
2131Segment Routing Header contained in the packet associated to
2132\fIskb\fP, at position \fIoffset\fP by \fIdelta\fP bytes. Only offsets
2133after the segments are accepted. \fIdelta\fP can be as well
2134positive (growing) as negative (shrinking).
2135.sp
e6107b29 2136A call to this helper is susceptible to change the underlying
53666f6c
MK
2137packet buffer. Therefore, at load time, all checks on pointers
2138previously done by the verifier are invalidated and must be
2139performed again, if the helper is used in combination with
2140direct packet access.
2141.TP
2142.B Return
21430 on success, or a negative error in case of failure.
2144.UNINDENT
2145.TP
2146.B \fBint bpf_lwt_seg6_action(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIaction\fP\fB, void *\fP\fIparam\fP\fB, u32\fP \fIparam_len\fP\fB)\fP
2147.INDENT 7.0
2148.TP
2149.B Description
2150Apply an IPv6 Segment Routing action of type \fIaction\fP to the
2151packet associated to \fIskb\fP\&. Each action takes a parameter
2152contained at address \fIparam\fP, and of length \fIparam_len\fP bytes.
2153\fIaction\fP can be one of:
2154.INDENT 7.0
2155.TP
2156.B \fBSEG6_LOCAL_ACTION_END_X\fP
2157End.X action: Endpoint with Layer\-3 cross\-connect.
2158Type of \fIparam\fP: \fBstruct in6_addr\fP\&.
2159.TP
2160.B \fBSEG6_LOCAL_ACTION_END_T\fP
2161End.T action: Endpoint with specific IPv6 table lookup.
2162Type of \fIparam\fP: \fBint\fP\&.
2163.TP
2164.B \fBSEG6_LOCAL_ACTION_END_B6\fP
2165End.B6 action: Endpoint bound to an SRv6 policy.
e6107b29 2166Type of \fIparam\fP: \fBstruct ipv6_sr_hdr\fP\&.
53666f6c
MK
2167.TP
2168.B \fBSEG6_LOCAL_ACTION_END_B6_ENCAP\fP
2169End.B6.Encap action: Endpoint bound to an SRv6
2170encapsulation policy.
e6107b29 2171Type of \fIparam\fP: \fBstruct ipv6_sr_hdr\fP\&.
53666f6c
MK
2172.UNINDENT
2173.sp
e6107b29 2174A call to this helper is susceptible to change the underlying
53666f6c
MK
2175packet buffer. Therefore, at load time, all checks on pointers
2176previously done by the verifier are invalidated and must be
2177performed again, if the helper is used in combination with
2178direct packet access.
2179.TP
2180.B Return
21810 on success, or a negative error in case of failure.
2182.UNINDENT
2183.TP
e6107b29 2184.B \fBint bpf_rc_repeat(void *\fP\fIctx\fP\fB)\fP
53666f6c
MK
2185.INDENT 7.0
2186.TP
2187.B Description
2188This helper is used in programs implementing IR decoding, to
e6107b29
MK
2189report a successfully decoded repeat key message. This delays
2190the generation of a key up event for previously generated
2191key down event.
53666f6c 2192.sp
e6107b29
MK
2193Some IR protocols like NEC have a special IR message for
2194repeating last button, for when a button is held down.
53666f6c
MK
2195.sp
2196The \fIctx\fP should point to the lirc sample as passed into
2197the program.
2198.sp
53666f6c
MK
2199This helper is only available is the kernel was compiled with
2200the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to
2201"\fBy\fP".
2202.TP
2203.B Return
e6107b29 22040
53666f6c
MK
2205.UNINDENT
2206.TP
e6107b29 2207.B \fBint bpf_rc_keydown(void *\fP\fIctx\fP\fB, u32\fP \fIprotocol\fP\fB, u64\fP \fIscancode\fP\fB, u32\fP \fItoggle\fP\fB)\fP
53666f6c
MK
2208.INDENT 7.0
2209.TP
2210.B Description
2211This helper is used in programs implementing IR decoding, to
e6107b29
MK
2212report a successfully decoded key press with \fIscancode\fP,
2213\fItoggle\fP value in the given \fIprotocol\fP\&. The scancode will be
2214translated to a keycode using the rc keymap, and reported as
2215an input key down event. After a period a key up event is
2216generated. This period can be extended by calling either
2217\fBbpf_rc_keydown\fP() again with the same values, or calling
2218\fBbpf_rc_repeat\fP().
53666f6c 2219.sp
e6107b29
MK
2220Some protocols include a toggle bit, in case the button was
2221released and pressed again between consecutive scancodes.
53666f6c
MK
2222.sp
2223The \fIctx\fP should point to the lirc sample as passed into
2224the program.
2225.sp
e6107b29
MK
2226The \fIprotocol\fP is the decoded protocol number (see
2227\fBenum rc_proto\fP for some predefined values).
2228.sp
53666f6c
MK
2229This helper is only available is the kernel was compiled with
2230the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to
2231"\fBy\fP".
2232.TP
2233.B Return
e6107b29 22340
53666f6c
MK
2235.UNINDENT
2236.TP
e6107b29 2237.B \fBu64 bpf_skb_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB)\fP
53666f6c
MK
2238.INDENT 7.0
2239.TP
2240.B Description
2241Return the cgroup v2 id of the socket associated with the \fIskb\fP\&.
2242This is roughly similar to the \fBbpf_get_cgroup_classid\fP()
2243helper for cgroup v1 by providing a tag resp. identifier that
2244can be matched on or used for map lookups e.g. to implement
2245policy. The cgroup v2 id of a given path in the hierarchy is
2246exposed in user space through the f_handle API in order to get
2247to the same 64\-bit id.
2248.sp
2249This helper can be used on TC egress path, but not on ingress,
2250and is available only if the kernel was compiled with the
2251\fBCONFIG_SOCK_CGROUP_DATA\fP configuration option.
2252.TP
2253.B Return
2254The id is returned or 0 in case the id could not be retrieved.
2255.UNINDENT
2256.TP
53666f6c
MK
2257.B \fBu64 bpf_get_current_cgroup_id(void)\fP
2258.INDENT 7.0
2259.TP
2260.B Return
2261A 64\-bit integer containing the current cgroup id based
2262on the cgroup within which the current task is running.
2263.UNINDENT
2264.TP
e6107b29 2265.B \fBvoid *bpf_get_local_storage(void *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP
53666f6c
MK
2266.INDENT 7.0
2267.TP
2268.B Description
2269Get the pointer to the local storage area.
2270The type and the size of the local storage is defined
2271by the \fImap\fP argument.
2272The \fIflags\fP meaning is specific for each map type,
2273and has to be 0 for cgroup local storage.
2274.sp
2223d7df
MK
2275Depending on the BPF program type, a local storage area
2276can be shared between multiple instances of the BPF program,
53666f6c
MK
2277running simultaneously.
2278.sp
e6107b29 2279A user should care about the synchronization by himself.
2223d7df 2280For example, by using the \fBBPF_STX_XADD\fP instruction to alter
53666f6c
MK
2281the shared data.
2282.TP
2283.B Return
2223d7df 2284A pointer to the local storage area.
53666f6c
MK
2285.UNINDENT
2286.TP
2287.B \fBint bpf_sk_select_reuseport(struct sk_reuseport_md *\fP\fIreuse\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP
2288.INDENT 7.0
2289.TP
2290.B Description
2223d7df
MK
2291Select a \fBSO_REUSEPORT\fP socket from a
2292\fBBPF_MAP_TYPE_REUSEPORT_ARRAY\fP \fImap\fP\&.
2293It checks the selected socket is matching the incoming
2294request in the socket buffer.
53666f6c
MK
2295.TP
2296.B Return
22970 on success, or a negative error in case of failure.
2298.UNINDENT
2223d7df 2299.TP
e6107b29
MK
2300.B \fBu64 bpf_skb_ancestor_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB, int\fP \fIancestor_level\fP\fB)\fP
2301.INDENT 7.0
2302.TP
2303.B Description
2304Return id of cgroup v2 that is ancestor of cgroup associated
2305with the \fIskb\fP at the \fIancestor_level\fP\&. The root cgroup is at
2306\fIancestor_level\fP zero and each step down the hierarchy
2307increments the level. If \fIancestor_level\fP == level of cgroup
2308associated with \fIskb\fP, then return value will be same as that
2309of \fBbpf_skb_cgroup_id\fP().
2310.sp
2311The helper is useful to implement policies based on cgroups
2312that are upper in hierarchy than immediate cgroup associated
2313with \fIskb\fP\&.
2314.sp
2315The format of returned id and helper limitations are same as in
2316\fBbpf_skb_cgroup_id\fP().
2317.TP
2318.B Return
2319The id is returned or 0 in case the id could not be retrieved.
2320.UNINDENT
2321.TP
2223d7df
MK
2322.B \fBstruct bpf_sock *bpf_sk_lookup_tcp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP
2323.INDENT 7.0
2324.TP
2325.B Description
2326Look for TCP socket matching \fItuple\fP, optionally in a child
2327network namespace \fInetns\fP\&. The return value must be checked,
2328and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP().
2329.sp
2330The \fIctx\fP should point to the context of the program, such as
2331the skb or socket (depending on the hook in use). This is used
2332to determine the base network namespace for the lookup.
2333.sp
2334\fItuple_size\fP must be one of:
2335.INDENT 7.0
2336.TP
2337.B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP)
2338Look for an IPv4 socket.
2339.TP
2340.B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP)
2341Look for an IPv6 socket.
2342.UNINDENT
2343.sp
2344If the \fInetns\fP is a negative signed 32\-bit integer, then the
2345socket lookup table in the netns associated with the \fIctx\fP will
2346will be used. For the TC hooks, this is the netns of the device
2347in the skb. For socket hooks, this is the netns of the socket.
2348If \fInetns\fP is any other signed 32\-bit value greater than or
2349equal to zero then it specifies the ID of the netns relative to
2350the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the
2351range of 32\-bit integers are reserved for future use.
2352.sp
2353All values for \fIflags\fP are reserved for future usage, and must
2354be left at zero.
2355.sp
2356This helper is available only if the kernel was compiled with
2357\fBCONFIG_NET\fP configuration option.
2358.TP
2359.B Return
2360Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure.
2361For sockets with reuseport option, the \fBstruct bpf_sock\fP
e6107b29
MK
2362result is from \fIreuse\fP\fB\->socks\fP[] using the hash of the
2363tuple.
2223d7df
MK
2364.UNINDENT
2365.TP
2366.B \fBstruct bpf_sock *bpf_sk_lookup_udp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP
2367.INDENT 7.0
2368.TP
2369.B Description
2370Look for UDP socket matching \fItuple\fP, optionally in a child
2371network namespace \fInetns\fP\&. The return value must be checked,
2372and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP().
2373.sp
2374The \fIctx\fP should point to the context of the program, such as
2375the skb or socket (depending on the hook in use). This is used
2376to determine the base network namespace for the lookup.
2377.sp
2378\fItuple_size\fP must be one of:
2379.INDENT 7.0
2380.TP
2381.B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP)
2382Look for an IPv4 socket.
2383.TP
2384.B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP)
2385Look for an IPv6 socket.
2386.UNINDENT
2387.sp
2388If the \fInetns\fP is a negative signed 32\-bit integer, then the
2389socket lookup table in the netns associated with the \fIctx\fP will
2390will be used. For the TC hooks, this is the netns of the device
2391in the skb. For socket hooks, this is the netns of the socket.
2392If \fInetns\fP is any other signed 32\-bit value greater than or
2393equal to zero then it specifies the ID of the netns relative to
2394the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the
2395range of 32\-bit integers are reserved for future use.
2396.sp
2397All values for \fIflags\fP are reserved for future usage, and must
2398be left at zero.
2399.sp
2400This helper is available only if the kernel was compiled with
2401\fBCONFIG_NET\fP configuration option.
2402.TP
2403.B Return
2404Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure.
2405For sockets with reuseport option, the \fBstruct bpf_sock\fP
e6107b29
MK
2406result is from \fIreuse\fP\fB\->socks\fP[] using the hash of the
2407tuple.
2223d7df
MK
2408.UNINDENT
2409.TP
2410.B \fBint bpf_sk_release(struct bpf_sock *\fP\fIsock\fP\fB)\fP
2411.INDENT 7.0
2412.TP
2413.B Description
2414Release the reference held by \fIsock\fP\&. \fIsock\fP must be a
2415non\-\fBNULL\fP pointer that was returned from
2416\fBbpf_sk_lookup_xxx\fP().
2417.TP
2418.B Return
24190 on success, or a negative error in case of failure.
2420.UNINDENT
2421.TP
e6107b29
MK
2422.B \fBint bpf_map_push_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP
2423.INDENT 7.0
2424.TP
2425.B Description
2426Push an element \fIvalue\fP in \fImap\fP\&. \fIflags\fP is one of:
2427.INDENT 7.0
2428.TP
2429.B \fBBPF_EXIST\fP
2430If the queue/stack is full, the oldest element is
2431removed to make room for this.
2432.UNINDENT
2433.TP
2434.B Return
24350 on success, or a negative error in case of failure.
2436.UNINDENT
2437.TP
2223d7df
MK
2438.B \fBint bpf_map_pop_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP
2439.INDENT 7.0
2440.TP
2441.B Description
2442Pop an element from \fImap\fP\&.
2443.TP
2444.B Return
24450 on success, or a negative error in case of failure.
2446.UNINDENT
2447.TP
2448.B \fBint bpf_map_peek_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP
2449.INDENT 7.0
2450.TP
2451.B Description
2452Get an element from \fImap\fP without removing it.
2453.TP
2454.B Return
24550 on success, or a negative error in case of failure.
2456.UNINDENT
2457.TP
2458.B \fBint bpf_msg_push_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP
2459.INDENT 7.0
2460.TP
2461.B Description
2462For socket policies, insert \fIlen\fP bytes into \fImsg\fP at offset
2463\fIstart\fP\&.
2464.sp
2465If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a
2466\fImsg\fP it may want to insert metadata or options into the \fImsg\fP\&.
2467This can later be read and used by any of the lower layer BPF
2468hooks.
2469.sp
2470This helper may fail if under memory pressure (a malloc
2471fails) in these cases BPF programs will get an appropriate
2472error and BPF programs will need to handle them.
2473.TP
2474.B Return
24750 on success, or a negative error in case of failure.
2476.UNINDENT
2477.TP
2478.B \fBint bpf_msg_pop_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIpop\fP\fB, u64\fP \fIflags\fP\fB)\fP
2479.INDENT 7.0
2480.TP
2481.B Description
2482Will remove \fIpop\fP bytes from a \fImsg\fP starting at byte \fIstart\fP\&.
2483This may result in \fBENOMEM\fP errors under certain situations if
2484an allocation and copy are required due to a full ring buffer.
2485However, the helper will try to avoid doing the allocation
2486if possible. Other errors can occur if input parameters are
2487invalid either due to \fIstart\fP byte not being valid part of \fImsg\fP
2488payload and/or \fIpop\fP value being to large.
2489.TP
2490.B Return
24910 on success, or a negative error in case of failure.
2492.UNINDENT
2493.TP
2494.B \fBint bpf_rc_pointer_rel(void *\fP\fIctx\fP\fB, s32\fP \fIrel_x\fP\fB, s32\fP \fIrel_y\fP\fB)\fP
2495.INDENT 7.0
2496.TP
2497.B Description
2498This helper is used in programs implementing IR decoding, to
2499report a successfully decoded pointer movement.
2500.sp
2501The \fIctx\fP should point to the lirc sample as passed into
2502the program.
2503.sp
2504This helper is only available is the kernel was compiled with
2505the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to
2506"\fBy\fP".
2507.TP
2508.B Return
e6107b29
MK
25090
2510.UNINDENT
2511.TP
2512.B \fBint bpf_spin_lock(struct bpf_spin_lock *\fP\fIlock\fP\fB)\fP
2513.INDENT 7.0
2514.TP
2515.B Description
2516Acquire a spinlock represented by the pointer \fIlock\fP, which is
2517stored as part of a value of a map. Taking the lock allows to
2518safely update the rest of the fields in that value. The
2519spinlock can (and must) later be released with a call to
2520\fBbpf_spin_unlock\fP(\fIlock\fP).
2521.sp
2522Spinlocks in BPF programs come with a number of restrictions
2523and constraints:
2524.INDENT 7.0
2525.IP \(bu 2
2526\fBbpf_spin_lock\fP objects are only allowed inside maps of
2527types \fBBPF_MAP_TYPE_HASH\fP and \fBBPF_MAP_TYPE_ARRAY\fP (this
2528list could be extended in the future).
2529.IP \(bu 2
2530BTF description of the map is mandatory.
2531.IP \(bu 2
2532The BPF program can take ONE lock at a time, since taking two
2533or more could cause dead locks.
2534.IP \(bu 2
2535Only one \fBstruct bpf_spin_lock\fP is allowed per map element.
2536.IP \(bu 2
2537When the lock is taken, calls (either BPF to BPF or helpers)
2538are not allowed.
2539.IP \(bu 2
2540The \fBBPF_LD_ABS\fP and \fBBPF_LD_IND\fP instructions are not
2541allowed inside a spinlock\-ed region.
2542.IP \(bu 2
2543The BPF program MUST call \fBbpf_spin_unlock\fP() to release
2544the lock, on all execution paths, before it returns.
2545.IP \(bu 2
2546The BPF program can access \fBstruct bpf_spin_lock\fP only via
2547the \fBbpf_spin_lock\fP() and \fBbpf_spin_unlock\fP()
2548helpers. Loading or storing data into the \fBstruct
2549bpf_spin_lock\fP \fIlock\fP\fB;\fP field of a map is not allowed.
2550.IP \(bu 2
2551To use the \fBbpf_spin_lock\fP() helper, the BTF description
2552of the map value must be a struct and have \fBstruct
2553bpf_spin_lock\fP \fIanyname\fP\fB;\fP field at the top level.
2554Nested lock inside another struct is not allowed.
2555.IP \(bu 2
2556The \fBstruct bpf_spin_lock\fP \fIlock\fP field in a map value must
2557be aligned on a multiple of 4 bytes in that value.
2558.IP \(bu 2
2559Syscall with command \fBBPF_MAP_LOOKUP_ELEM\fP does not copy
2560the \fBbpf_spin_lock\fP field to user space.
2561.IP \(bu 2
2562Syscall with command \fBBPF_MAP_UPDATE_ELEM\fP, or update from
2563a BPF program, do not update the \fBbpf_spin_lock\fP field.
2564.IP \(bu 2
2565\fBbpf_spin_lock\fP cannot be on the stack or inside a
2566networking packet (it can only be inside of a map values).
2567.IP \(bu 2
2568\fBbpf_spin_lock\fP is available to root only.
2569.IP \(bu 2
2570Tracing programs and socket filter programs cannot use
2571\fBbpf_spin_lock\fP() due to insufficient preemption checks
2572(but this may change in the future).
2573.IP \(bu 2
2574\fBbpf_spin_lock\fP is not allowed in inner maps of map\-in\-map.
2575.UNINDENT
2576.TP
2577.B Return
25780
2579.UNINDENT
2580.TP
2581.B \fBint bpf_spin_unlock(struct bpf_spin_lock *\fP\fIlock\fP\fB)\fP
2582.INDENT 7.0
2583.TP
2584.B Description
2585Release the \fIlock\fP previously locked by a call to
2586\fBbpf_spin_lock\fP(\fIlock\fP).
2587.TP
2588.B Return
25890
2590.UNINDENT
2591.TP
2592.B \fBstruct bpf_sock *bpf_sk_fullsock(struct bpf_sock *\fP\fIsk\fP\fB)\fP
2593.INDENT 7.0
2594.TP
2595.B Description
2596This helper gets a \fBstruct bpf_sock\fP pointer such
2597that all the fields in this \fBbpf_sock\fP can be accessed.
2598.TP
2599.B Return
2600A \fBstruct bpf_sock\fP pointer on success, or \fBNULL\fP in
2601case of failure.
2602.UNINDENT
2603.TP
2604.B \fBstruct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *\fP\fIsk\fP\fB)\fP
2605.INDENT 7.0
2606.TP
2607.B Description
2608This helper gets a \fBstruct bpf_tcp_sock\fP pointer from a
2609\fBstruct bpf_sock\fP pointer.
2610.TP
2611.B Return
2612A \fBstruct bpf_tcp_sock\fP pointer on success, or \fBNULL\fP in
2613case of failure.
2614.UNINDENT
2615.TP
2616.B \fBint bpf_skb_ecn_set_ce(struct sk_buf *\fP\fIskb\fP\fB)\fP
2617.INDENT 7.0
2618.TP
2619.B Description
2620Set ECN (Explicit Congestion Notification) field of IP header
2621to \fBCE\fP (Congestion Encountered) if current value is \fBECT\fP
2622(ECN Capable Transport). Otherwise, do nothing. Works with IPv6
2623and IPv4.
2624.TP
2625.B Return
26261 if the \fBCE\fP flag is set (either by the current helper call
2627or because it was already present), 0 if it is not set.
2628.UNINDENT
2629.TP
2630.B \fBstruct bpf_sock *bpf_get_listener_sock(struct bpf_sock *\fP\fIsk\fP\fB)\fP
2631.INDENT 7.0
2632.TP
2633.B Description
2634Return a \fBstruct bpf_sock\fP pointer in \fBTCP_LISTEN\fP state.
2635\fBbpf_sk_release\fP() is unnecessary and not allowed.
2636.TP
2637.B Return
2638A \fBstruct bpf_sock\fP pointer on success, or \fBNULL\fP in
2639case of failure.
2640.UNINDENT
2641.TP
2642.B \fBstruct bpf_sock *bpf_skc_lookup_tcp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP
2643.INDENT 7.0
2644.TP
2645.B Description
2646Look for TCP socket matching \fItuple\fP, optionally in a child
2647network namespace \fInetns\fP\&. The return value must be checked,
2648and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP().
2649.sp
2650This function is identical to \fBbpf_sk_lookup_tcp\fP(), except
2651that it also returns timewait or request sockets. Use
2652\fBbpf_sk_fullsock\fP() or \fBbpf_tcp_sock\fP() to access the
2653full structure.
2654.sp
2655This helper is available only if the kernel was compiled with
2656\fBCONFIG_NET\fP configuration option.
2657.TP
2658.B Return
2659Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure.
2660For sockets with reuseport option, the \fBstruct bpf_sock\fP
2661result is from \fIreuse\fP\fB\->socks\fP[] using the hash of the
2662tuple.
2663.UNINDENT
2664.TP
2665.B \fBint bpf_tcp_check_syncookie(struct bpf_sock *\fP\fIsk\fP\fB, void *\fP\fIiph\fP\fB, u32\fP \fIiph_len\fP\fB, struct tcphdr *\fP\fIth\fP\fB, u32\fP \fIth_len\fP\fB)\fP
2666.INDENT 7.0
2667.TP
2668.B Description
2669Check whether \fIiph\fP and \fIth\fP contain a valid SYN cookie ACK for
2670the listening socket in \fIsk\fP\&.
2671.sp
2672\fIiph\fP points to the start of the IPv4 or IPv6 header, while
2673\fIiph_len\fP contains \fBsizeof\fP(\fBstruct iphdr\fP) or
2674\fBsizeof\fP(\fBstruct ip6hdr\fP).
2675.sp
2676\fIth\fP points to the start of the TCP header, while \fIth_len\fP
2677contains \fBsizeof\fP(\fBstruct tcphdr\fP).
2678.TP
2679.B Return
26800 if \fIiph\fP and \fIth\fP are a valid SYN cookie ACK, or a negative
2681error otherwise.
2682.UNINDENT
2683.TP
2684.B \fBint bpf_sysctl_get_name(struct bpf_sysctl *\fP\fIctx\fP\fB, char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB, u64\fP \fIflags\fP\fB)\fP
2685.INDENT 7.0
2686.TP
2687.B Description
2688Get name of sysctl in /proc/sys/ and copy it into provided by
2689program buffer \fIbuf\fP of size \fIbuf_len\fP\&.
2690.sp
2691The buffer is always NUL terminated, unless it\(aqs zero\-sized.
2692.sp
2693If \fIflags\fP is zero, full name (e.g. "net/ipv4/tcp_mem") is
2694copied. Use \fBBPF_F_SYSCTL_BASE_NAME\fP flag to copy base name
2695only (e.g. "tcp_mem").
2696.TP
2697.B Return
2698Number of character copied (not including the trailing NUL).
2699.sp
2700\fB\-E2BIG\fP if the buffer wasn\(aqt big enough (\fIbuf\fP will contain
2701truncated name in this case).
2702.UNINDENT
2703.TP
2704.B \fBint bpf_sysctl_get_current_value(struct bpf_sysctl *\fP\fIctx\fP\fB, char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB)\fP
2705.INDENT 7.0
2706.TP
2707.B Description
2708Get current value of sysctl as it is presented in /proc/sys
2709(incl. newline, etc), and copy it as a string into provided
2710by program buffer \fIbuf\fP of size \fIbuf_len\fP\&.
2711.sp
2712The whole value is copied, no matter what file position user
2713space issued e.g. sys_read at.
2714.sp
2715The buffer is always NUL terminated, unless it\(aqs zero\-sized.
2716.TP
2717.B Return
2718Number of character copied (not including the trailing NUL).
2719.sp
2720\fB\-E2BIG\fP if the buffer wasn\(aqt big enough (\fIbuf\fP will contain
2721truncated name in this case).
2722.sp
2723\fB\-EINVAL\fP if current value was unavailable, e.g. because
2724sysctl is uninitialized and read returns \-EIO for it.
2725.UNINDENT
2726.TP
2727.B \fBint bpf_sysctl_get_new_value(struct bpf_sysctl *\fP\fIctx\fP\fB, char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB)\fP
2728.INDENT 7.0
2729.TP
2730.B Description
2731Get new value being written by user space to sysctl (before
2732the actual write happens) and copy it as a string into
2733provided by program buffer \fIbuf\fP of size \fIbuf_len\fP\&.
2734.sp
2735User space may write new value at file position > 0.
2736.sp
2737The buffer is always NUL terminated, unless it\(aqs zero\-sized.
2738.TP
2739.B Return
2740Number of character copied (not including the trailing NUL).
2741.sp
2742\fB\-E2BIG\fP if the buffer wasn\(aqt big enough (\fIbuf\fP will contain
2743truncated name in this case).
2744.sp
2745\fB\-EINVAL\fP if sysctl is being read.
2746.UNINDENT
2747.TP
2748.B \fBint bpf_sysctl_set_new_value(struct bpf_sysctl *\fP\fIctx\fP\fB, const char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB)\fP
2749.INDENT 7.0
2750.TP
2751.B Description
2752Override new value being written by user space to sysctl with
2753value provided by program in buffer \fIbuf\fP of size \fIbuf_len\fP\&.
2754.sp
2755\fIbuf\fP should contain a string in same form as provided by user
2756space on sysctl write.
2757.sp
2758User space may write new value at file position > 0. To override
2759the whole sysctl value file position should be set to zero.
2760.TP
2761.B Return
27620 on success.
2763.sp
2764\fB\-E2BIG\fP if the \fIbuf_len\fP is too big.
2765.sp
2766\fB\-EINVAL\fP if sysctl is being read.
2767.UNINDENT
2768.TP
2769.B \fBint bpf_strtol(const char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB, u64\fP \fIflags\fP\fB, long *\fP\fIres\fP\fB)\fP
2770.INDENT 7.0
2771.TP
2772.B Description
2773Convert the initial part of the string from buffer \fIbuf\fP of
2774size \fIbuf_len\fP to a long integer according to the given base
2775and save the result in \fIres\fP\&.
2776.sp
2777The string may begin with an arbitrary amount of white space
2778(as determined by \fBisspace\fP(3)) followed by a single
2779optional \(aq\fB\-\fP\(aq sign.
2780.sp
2781Five least significant bits of \fIflags\fP encode base, other bits
2782are currently unused.
2783.sp
2784Base must be either 8, 10, 16 or 0 to detect it automatically
2785similar to user space \fBstrtol\fP(3).
2786.TP
2787.B Return
2788Number of characters consumed on success. Must be positive but
2789no more than \fIbuf_len\fP\&.
2790.sp
2791\fB\-EINVAL\fP if no valid digits were found or unsupported base
2792was provided.
2793.sp
2794\fB\-ERANGE\fP if resulting value was out of range.
2795.UNINDENT
2796.TP
2797.B \fBint bpf_strtoul(const char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB, u64\fP \fIflags\fP\fB, unsigned long *\fP\fIres\fP\fB)\fP
2798.INDENT 7.0
2799.TP
2800.B Description
2801Convert the initial part of the string from buffer \fIbuf\fP of
2802size \fIbuf_len\fP to an unsigned long integer according to the
2803given base and save the result in \fIres\fP\&.
2804.sp
2805The string may begin with an arbitrary amount of white space
2806(as determined by \fBisspace\fP(3)).
2807.sp
2808Five least significant bits of \fIflags\fP encode base, other bits
2809are currently unused.
2810.sp
2811Base must be either 8, 10, 16 or 0 to detect it automatically
2812similar to user space \fBstrtoul\fP(3).
2813.TP
2814.B Return
2815Number of characters consumed on success. Must be positive but
2816no more than \fIbuf_len\fP\&.
2817.sp
2818\fB\-EINVAL\fP if no valid digits were found or unsupported base
2819was provided.
2820.sp
2821\fB\-ERANGE\fP if resulting value was out of range.
2822.UNINDENT
2823.TP
2824.B \fBvoid *bpf_sk_storage_get(struct bpf_map *\fP\fImap\fP\fB, struct bpf_sock *\fP\fIsk\fP\fB, void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP
2825.INDENT 7.0
2826.TP
2827.B Description
2828Get a bpf\-local\-storage from a \fIsk\fP\&.
2829.sp
2830Logically, it could be thought of getting the value from
2831a \fImap\fP with \fIsk\fP as the \fBkey\fP\&. From this
2832perspective, the usage is not much different from
2833\fBbpf_map_lookup_elem\fP(\fImap\fP, \fB&\fP\fIsk\fP) except this
2834helper enforces the key must be a full socket and the map must
2835be a \fBBPF_MAP_TYPE_SK_STORAGE\fP also.
2836.sp
2837Underneath, the value is stored locally at \fIsk\fP instead of
2838the \fImap\fP\&. The \fImap\fP is used as the bpf\-local\-storage
2839"type". The bpf\-local\-storage "type" (i.e. the \fImap\fP) is
2840searched against all bpf\-local\-storages residing at \fIsk\fP\&.
2841.sp
2842An optional \fIflags\fP (\fBBPF_SK_STORAGE_GET_F_CREATE\fP) can be
2843used such that a new bpf\-local\-storage will be
2844created if one does not exist. \fIvalue\fP can be used
2845together with \fBBPF_SK_STORAGE_GET_F_CREATE\fP to specify
2846the initial value of a bpf\-local\-storage. If \fIvalue\fP is
2847\fBNULL\fP, the new bpf\-local\-storage will be zero initialized.
2848.TP
2849.B Return
2850A bpf\-local\-storage pointer is returned on success.
2851.sp
2852\fBNULL\fP if not found or there was an error in adding
2853a new bpf\-local\-storage.
2854.UNINDENT
2855.TP
2856.B \fBint bpf_sk_storage_delete(struct bpf_map *\fP\fImap\fP\fB, struct bpf_sock *\fP\fIsk\fP\fB)\fP
2857.INDENT 7.0
2858.TP
2859.B Description
2860Delete a bpf\-local\-storage from a \fIsk\fP\&.
2861.TP
2862.B Return
28630 on success.
2864.sp
2865\fB\-ENOENT\fP if the bpf\-local\-storage cannot be found.
2866.UNINDENT
2867.TP
2868.B \fBint bpf_send_signal(u32\fP \fIsig\fP\fB)\fP
2869.INDENT 7.0
2870.TP
2871.B Description
2872Send signal \fIsig\fP to the current task.
2873.TP
2874.B Return
28750 on success or successfully queued.
2876.sp
2877\fB\-EBUSY\fP if work queue under nmi is full.
2878.sp
2879\fB\-EINVAL\fP if \fIsig\fP is invalid.
2880.sp
2881\fB\-EPERM\fP if no permission to send the \fIsig\fP\&.
2882.sp
2883\fB\-EAGAIN\fP if bpf program can try again.
2884.UNINDENT
2885.TP
2886.B \fBs64 bpf_tcp_gen_syncookie(struct bpf_sock *\fP\fIsk\fP\fB, void *\fP\fIiph\fP\fB, u32\fP \fIiph_len\fP\fB, struct tcphdr *\fP\fIth\fP\fB, u32\fP \fIth_len\fP\fB)\fP
2887.INDENT 7.0
2888.TP
2889.B Description
2890Try to issue a SYN cookie for the packet with corresponding
2891IP/TCP headers, \fIiph\fP and \fIth\fP, on the listening socket in \fIsk\fP\&.
2892.sp
2893\fIiph\fP points to the start of the IPv4 or IPv6 header, while
2894\fIiph_len\fP contains \fBsizeof\fP(\fBstruct iphdr\fP) or
2895\fBsizeof\fP(\fBstruct ip6hdr\fP).
2896.sp
2897\fIth\fP points to the start of the TCP header, while \fIth_len\fP
2898contains the length of the TCP header.
2899.TP
2900.B Return
2901On success, lower 32 bits hold the generated SYN cookie in
2902followed by 16 bits which hold the MSS value for that cookie,
2903and the top 16 bits are unused.
2904.sp
2905On failure, the returned value is one of the following:
2906.sp
2907\fB\-EINVAL\fP SYN cookie cannot be issued due to error
2908.sp
2909\fB\-ENOENT\fP SYN cookie should not be issued (no SYN flood)
2910.sp
2911\fB\-EOPNOTSUPP\fP kernel configuration does not enable SYN cookies
2912.sp
2913\fB\-EPROTONOSUPPORT\fP IP packet version is not 4 or 6
2223d7df 2914.UNINDENT
53666f6c
MK
2915.UNINDENT
2916.SH EXAMPLES
2917.sp
2918Example usage for most of the eBPF helpers listed in this manual page are
2919available within the Linux kernel sources, at the following locations:
2920.INDENT 0.0
2921.IP \(bu 2
2922\fIsamples/bpf/\fP
2923.IP \(bu 2
2924\fItools/testing/selftests/bpf/\fP
2925.UNINDENT
2926.SH LICENSE
2927.sp
2928eBPF programs can have an associated license, passed along with the bytecode
2929instructions to the kernel when the programs are loaded. The format for that
2930string is identical to the one in use for kernel modules (Dual licenses, such
2931as "Dual BSD/GPL", may be used). Some helper functions are only accessible to
2932programs that are compatible with the GNU Privacy License (GPL).
2933.sp
2934In order to use such helpers, the eBPF program must be loaded with the correct
2935license string passed (via \fBattr\fP) to the \fBbpf\fP() system call, and this
2936generally translates into the C source code of the program containing a line
2937similar to the following:
2938.INDENT 0.0
2939.INDENT 3.5
2940.sp
2941.nf
2942.ft C
2943char ____license[] __attribute__((section("license"), used)) = "GPL";
2944.ft P
2945.fi
2946.UNINDENT
2947.UNINDENT
2948.SH IMPLEMENTATION
2949.sp
2950This manual page is an effort to document the existing eBPF helper functions.
2951But as of this writing, the BPF sub\-system is under heavy development. New eBPF
2952program or map types are added, along with new helper functions. Some helpers
2953are occasionally made available for additional program types. So in spite of
2954the efforts of the community, this page might not be up\-to\-date. If you want to
2955check by yourself what helper functions exist in your kernel, or what types of
2956programs they can support, here are some files among the kernel tree that you
2957may be interested in:
2958.INDENT 0.0
2959.IP \(bu 2
2960\fIinclude/uapi/linux/bpf.h\fP is the main BPF header. It contains the full list
2961of all helper functions, as well as many other BPF definitions including most
2962of the flags, structs or constants used by the helpers.
2963.IP \(bu 2
2964\fInet/core/filter.c\fP contains the definition of most network\-related helper
2965functions, and the list of program types from which they can be used.
2966.IP \(bu 2
2967\fIkernel/trace/bpf_trace.c\fP is the equivalent for most tracing program\-related
2968helpers.
2969.IP \(bu 2
2970\fIkernel/bpf/verifier.c\fP contains the functions used to check that valid types
2971of eBPF maps are used with a given helper function.
2972.IP \(bu 2
2973\fIkernel/bpf/\fP directory contains other files in which additional helpers are
2974defined (for cgroups, sockmaps, etc.).
2975.UNINDENT
2976.sp
2977Compatibility between helper functions and program types can generally be found
2978in the files where helper functions are defined. Look for the \fBstruct
2979bpf_func_proto\fP objects and for functions returning them: these functions
2980contain a list of helpers that a given program type can call. Note that the
2981\fBdefault:\fP label of the \fBswitch ... case\fP used to filter helpers can call
2982other functions, themselves allowing access to additional helpers. The
2983requirement for GPL license is also in those \fBstruct bpf_func_proto\fP\&.
2984.sp
2985Compatibility between helper functions and map types can be found in the
2986\fBcheck_map_func_compatibility\fP() function in file \fIkernel/bpf/verifier.c\fP\&.
2987.sp
2988Helper functions that invalidate the checks on \fBdata\fP and \fBdata_end\fP
2989pointers for network processing are listed in function
2990\fBbpf_helper_changes_pkt_data\fP() in file \fInet/core/filter.c\fP\&.
2991.SH SEE ALSO
2992.sp
2993\fBbpf\fP(2),
2994\fBcgroups\fP(7),
2995\fBip\fP(8),
2996\fBperf_event_open\fP(2),
2997\fBsendmsg\fP(2),
2998\fBsocket\fP(7),
2999\fBtc\-bpf\fP(8)
3000.\" Generated by docutils manpage writer.
e6107b29 3001.