]>
Commit | Line | Data |
---|---|---|
53666f6c MK |
1 | .\" Man page generated from reStructuredText. |
2 | .\" Copyright (C) All BPF authors and contributors from 2014 to present. | |
3 | .\" See git log include/uapi/linux/bpf.h in kernel tree for details. | |
880c3f67 | 4 | .\" |
53666f6c MK |
5 | .\" %%%LICENSE_START(VERBATIM) |
6 | .\" Permission is granted to make and distribute verbatim copies of this | |
7 | .\" manual provided the copyright notice and this permission notice are | |
8 | .\" preserved on all copies. | |
880c3f67 | 9 | .\" |
53666f6c MK |
10 | .\" Permission is granted to copy and distribute modified versions of this |
11 | .\" manual under the conditions for verbatim copying, provided that the | |
12 | .\" entire resulting derived work is distributed under the terms of a | |
13 | .\" permission notice identical to this one. | |
880c3f67 | 14 | .\" |
53666f6c MK |
15 | .\" Since the Linux kernel and libraries are constantly changing, this |
16 | .\" manual page may be incorrect or out-of-date. The author(s) assume no | |
17 | .\" responsibility for errors or omissions, or for damages resulting from | |
18 | .\" the use of the information contained herein. The author(s) may not | |
19 | .\" have taken the same level of care in the production of this manual, | |
20 | .\" which is licensed free of charge, as they might when working | |
21 | .\" professionally. | |
880c3f67 | 22 | .\" |
53666f6c MK |
23 | .\" Formatted or processed versions of this manual, if unaccompanied by |
24 | .\" the source, must acknowledge the copyright and authors of this work. | |
25 | .\" %%%LICENSE_END | |
880c3f67 | 26 | .\" |
53666f6c MK |
27 | .\" Please do not edit this file. It was generated from the documentation |
28 | .\" located in file include/uapi/linux/bpf.h of the Linux kernel sources | |
29 | .\" (helpers description), and from scripts/bpf_helpers_doc.py in the same | |
30 | .\" repository (header and footer). | |
8d1b260e | 31 | .TH BPF-HELPERS 7 2019-03-06 "Linux" "Linux Programmer's Manual" |
53666f6c MK |
32 | .SH NAME |
33 | BPF-HELPERS \- list of eBPF helper functions | |
53666f6c | 34 | .nr rst2man-indent-level 0 |
53666f6c MK |
35 | .de1 rstReportMargin |
36 | \\$1 \\n[an-margin] | |
37 | level \\n[rst2man-indent-level] | |
38 | level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] | |
53666f6c MK |
39 | \\n[rst2man-indent0] |
40 | \\n[rst2man-indent1] | |
41 | \\n[rst2man-indent2] | |
42 | .. | |
43 | .de1 INDENT | |
44 | .\" .rstReportMargin pre: | |
45 | . RS \\$1 | |
46 | . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] | |
47 | . nr rst2man-indent-level +1 | |
48 | .\" .rstReportMargin post: | |
49 | .. | |
50 | .de UNINDENT | |
51 | . RE | |
52 | .\" indent \\n[an-margin] | |
53 | .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] | |
54 | .nr rst2man-indent-level -1 | |
55 | .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] | |
56 | .in \\n[rst2man-indent\\n[rst2man-indent-level]]u | |
57 | .. | |
58 | .SH DESCRIPTION | |
59 | .sp | |
60 | The extended Berkeley Packet Filter (eBPF) subsystem consists in programs | |
61 | written in a pseudo\-assembly language, then attached to one of the several | |
62 | kernel hooks and run in reaction of specific events. This framework differs | |
63 | from the older, "classic" BPF (or "cBPF") in several aspects, one of them being | |
64 | the ability to call special functions (or "helpers") from within a program. | |
65 | These functions are restricted to a white\-list of helpers defined in the | |
66 | kernel. | |
67 | .sp | |
68 | These helpers are used by eBPF programs to interact with the system, or with | |
69 | the context in which they work. For instance, they can be used to print | |
70 | debugging messages, to get the time since the system was booted, to interact | |
71 | with eBPF maps, or to manipulate network packets. Since there are several eBPF | |
72 | program types, and that they do not run in the same context, each program type | |
73 | can only call a subset of those helpers. | |
74 | .sp | |
75 | Due to eBPF conventions, a helper can not have more than five arguments. | |
76 | .sp | |
77 | Internally, eBPF programs call directly into the compiled helper functions | |
78 | without requiring any foreign\-function interface. As a result, calling helpers | |
79 | introduces no overhead, thus offering excellent performance. | |
80 | .sp | |
81 | This document is an attempt to list and document the helpers available to eBPF | |
82 | developers. They are sorted by chronological order (the oldest helpers in the | |
83 | kernel at the top). | |
84 | .SH HELPERS | |
85 | .INDENT 0.0 | |
86 | .TP | |
87 | .B \fBvoid *bpf_map_lookup_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP | |
88 | .INDENT 7.0 | |
89 | .TP | |
90 | .B Description | |
91 | Perform a lookup in \fImap\fP for an entry associated to \fIkey\fP\&. | |
92 | .TP | |
93 | .B Return | |
94 | Map value associated to \fIkey\fP, or \fBNULL\fP if no entry was | |
95 | found. | |
96 | .UNINDENT | |
97 | .TP | |
98 | .B \fBint bpf_map_update_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
99 | .INDENT 7.0 | |
100 | .TP | |
101 | .B Description | |
102 | Add or update the value of the entry associated to \fIkey\fP in | |
103 | \fImap\fP with \fIvalue\fP\&. \fIflags\fP is one of: | |
104 | .INDENT 7.0 | |
105 | .TP | |
106 | .B \fBBPF_NOEXIST\fP | |
107 | The entry for \fIkey\fP must not exist in the map. | |
108 | .TP | |
109 | .B \fBBPF_EXIST\fP | |
110 | The entry for \fIkey\fP must already exist in the map. | |
111 | .TP | |
112 | .B \fBBPF_ANY\fP | |
113 | No condition on the existence of the entry for \fIkey\fP\&. | |
114 | .UNINDENT | |
115 | .sp | |
116 | Flag value \fBBPF_NOEXIST\fP cannot be used for maps of types | |
117 | \fBBPF_MAP_TYPE_ARRAY\fP or \fBBPF_MAP_TYPE_PERCPU_ARRAY\fP (all | |
118 | elements always exist), the helper would return an error. | |
119 | .TP | |
120 | .B Return | |
121 | 0 on success, or a negative error in case of failure. | |
122 | .UNINDENT | |
123 | .TP | |
124 | .B \fBint bpf_map_delete_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP | |
125 | .INDENT 7.0 | |
126 | .TP | |
127 | .B Description | |
128 | Delete entry with \fIkey\fP from \fImap\fP\&. | |
129 | .TP | |
130 | .B Return | |
131 | 0 on success, or a negative error in case of failure. | |
132 | .UNINDENT | |
133 | .TP | |
2223d7df MK |
134 | .B \fBint bpf_map_push_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP |
135 | .INDENT 7.0 | |
136 | .TP | |
137 | .B Description | |
138 | Push an element \fIvalue\fP in \fImap\fP\&. \fIflags\fP is one of: | |
139 | .sp | |
140 | \fBBPF_EXIST\fP | |
141 | If the queue/stack is full, the oldest element is removed to | |
142 | make room for this. | |
143 | .TP | |
144 | .B Return | |
145 | 0 on success, or a negative error in case of failure. | |
146 | .UNINDENT | |
147 | .TP | |
53666f6c MK |
148 | .B \fBint bpf_probe_read(void *\fP\fIdst\fP\fB, u32\fP \fIsize\fP\fB, const void *\fP\fIsrc\fP\fB)\fP |
149 | .INDENT 7.0 | |
150 | .TP | |
151 | .B Description | |
152 | For tracing programs, safely attempt to read \fIsize\fP bytes from | |
153 | address \fIsrc\fP and store the data in \fIdst\fP\&. | |
154 | .TP | |
155 | .B Return | |
156 | 0 on success, or a negative error in case of failure. | |
157 | .UNINDENT | |
158 | .TP | |
159 | .B \fBu64 bpf_ktime_get_ns(void)\fP | |
160 | .INDENT 7.0 | |
161 | .TP | |
162 | .B Description | |
163 | Return the time elapsed since system boot, in nanoseconds. | |
164 | .TP | |
165 | .B Return | |
166 | Current \fIktime\fP\&. | |
167 | .UNINDENT | |
168 | .TP | |
169 | .B \fBint bpf_trace_printk(const char *\fP\fIfmt\fP\fB, u32\fP \fIfmt_size\fP\fB, ...)\fP | |
170 | .INDENT 7.0 | |
171 | .TP | |
172 | .B Description | |
173 | This helper is a "printk()\-like" facility for debugging. It | |
174 | prints a message defined by format \fIfmt\fP (of size \fIfmt_size\fP) | |
175 | to file \fI/sys/kernel/debug/tracing/trace\fP from DebugFS, if | |
176 | available. It can take up to three additional \fBu64\fP | |
177 | arguments (as an eBPF helpers, the total number of arguments is | |
178 | limited to five). | |
179 | .sp | |
180 | Each time the helper is called, it appends a line to the trace. | |
181 | The format of the trace is customizable, and the exact output | |
182 | one will get depends on the options set in | |
183 | \fI/sys/kernel/debug/tracing/trace_options\fP (see also the | |
184 | \fIREADME\fP file under the same directory). However, it usually | |
185 | defaults to something like: | |
186 | .INDENT 7.0 | |
187 | .INDENT 3.5 | |
188 | .sp | |
189 | .nf | |
190 | .ft C | |
191 | telnet\-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg> | |
192 | .ft P | |
193 | .fi | |
194 | .UNINDENT | |
195 | .UNINDENT | |
196 | .sp | |
197 | In the above: | |
198 | .INDENT 7.0 | |
199 | .INDENT 3.5 | |
200 | .INDENT 0.0 | |
201 | .IP \(bu 2 | |
202 | \fBtelnet\fP is the name of the current task. | |
203 | .IP \(bu 2 | |
204 | \fB470\fP is the PID of the current task. | |
205 | .IP \(bu 2 | |
206 | \fB001\fP is the CPU number on which the task is | |
207 | running. | |
208 | .IP \(bu 2 | |
209 | In \fB\&.N..\fP, each character refers to a set of | |
210 | options (whether irqs are enabled, scheduling | |
211 | options, whether hard/softirqs are running, level of | |
212 | preempt_disabled respectively). \fBN\fP means that | |
213 | \fBTIF_NEED_RESCHED\fP and \fBPREEMPT_NEED_RESCHED\fP | |
214 | are set. | |
215 | .IP \(bu 2 | |
216 | \fB419421.045894\fP is a timestamp. | |
217 | .IP \(bu 2 | |
218 | \fB0x00000001\fP is a fake value used by BPF for the | |
219 | instruction pointer register. | |
220 | .IP \(bu 2 | |
221 | \fB<formatted msg>\fP is the message formatted with | |
222 | \fIfmt\fP\&. | |
223 | .UNINDENT | |
224 | .UNINDENT | |
225 | .UNINDENT | |
226 | .sp | |
227 | The conversion specifiers supported by \fIfmt\fP are similar, but | |
228 | more limited than for printk(). They are \fB%d\fP, \fB%i\fP, | |
229 | \fB%u\fP, \fB%x\fP, \fB%ld\fP, \fB%li\fP, \fB%lu\fP, \fB%lx\fP, \fB%lld\fP, | |
230 | \fB%lli\fP, \fB%llu\fP, \fB%llx\fP, \fB%p\fP, \fB%s\fP\&. No modifier (size | |
231 | of field, padding with zeroes, etc.) is available, and the | |
232 | helper will return \fB\-EINVAL\fP (but print nothing) if it | |
233 | encounters an unknown specifier. | |
234 | .sp | |
235 | Also, note that \fBbpf_trace_printk\fP() is slow, and should | |
236 | only be used for debugging purposes. For this reason, a notice | |
237 | bloc (spanning several lines) is printed to kernel logs and | |
238 | states that the helper should not be used "for production use" | |
239 | the first time this helper is used (or more precisely, when | |
240 | \fBtrace_printk\fP() buffers are allocated). For passing values | |
241 | to user space, perf events should be preferred. | |
242 | .TP | |
243 | .B Return | |
244 | The number of bytes written to the buffer, or a negative error | |
245 | in case of failure. | |
246 | .UNINDENT | |
247 | .TP | |
248 | .B \fBu32 bpf_get_prandom_u32(void)\fP | |
249 | .INDENT 7.0 | |
250 | .TP | |
251 | .B Description | |
252 | Get a pseudo\-random number. | |
253 | .sp | |
254 | From a security point of view, this helper uses its own | |
255 | pseudo\-random internal state, and cannot be used to infer the | |
256 | seed of other random functions in the kernel. However, it is | |
257 | essential to note that the generator used by the helper is not | |
258 | cryptographically secure. | |
259 | .TP | |
260 | .B Return | |
261 | A random 32\-bit unsigned value. | |
262 | .UNINDENT | |
263 | .TP | |
264 | .B \fBu32 bpf_get_smp_processor_id(void)\fP | |
265 | .INDENT 7.0 | |
266 | .TP | |
267 | .B Description | |
268 | Get the SMP (symmetric multiprocessing) processor id. Note that | |
269 | all programs run with preemption disabled, which means that the | |
270 | SMP processor id is stable during all the execution of the | |
271 | program. | |
272 | .TP | |
273 | .B Return | |
274 | The SMP id of the processor running the program. | |
275 | .UNINDENT | |
276 | .TP | |
277 | .B \fBint bpf_skb_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
278 | .INDENT 7.0 | |
279 | .TP | |
280 | .B Description | |
281 | Store \fIlen\fP bytes from address \fIfrom\fP into the packet | |
282 | associated to \fIskb\fP, at \fIoffset\fP\&. \fIflags\fP are a combination of | |
283 | \fBBPF_F_RECOMPUTE_CSUM\fP (automatically recompute the | |
284 | checksum for the packet after storing the bytes) and | |
285 | \fBBPF_F_INVALIDATE_HASH\fP (set \fIskb\fP\fB\->hash\fP, \fIskb\fP\fB\->swhash\fP and \fIskb\fP\fB\->l4hash\fP to 0). | |
286 | .sp | |
287 | A call to this helper is susceptible to change the underlaying | |
288 | packet buffer. Therefore, at load time, all checks on pointers | |
289 | previously done by the verifier are invalidated and must be | |
290 | performed again, if the helper is used in combination with | |
291 | direct packet access. | |
292 | .TP | |
293 | .B Return | |
294 | 0 on success, or a negative error in case of failure. | |
295 | .UNINDENT | |
296 | .TP | |
297 | .B \fBint bpf_l3_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIsize\fP\fB)\fP | |
298 | .INDENT 7.0 | |
299 | .TP | |
300 | .B Description | |
301 | Recompute the layer 3 (e.g. IP) checksum for the packet | |
302 | associated to \fIskb\fP\&. Computation is incremental, so the helper | |
303 | must know the former value of the header field that was | |
304 | modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the | |
305 | number of bytes (2 or 4) for this field, stored in \fIsize\fP\&. | |
306 | Alternatively, it is possible to store the difference between | |
307 | the previous and the new values of the header field in \fIto\fP, by | |
308 | setting \fIfrom\fP and \fIsize\fP to 0. For both methods, \fIoffset\fP | |
309 | indicates the location of the IP checksum within the packet. | |
310 | .sp | |
311 | This helper works in combination with \fBbpf_csum_diff\fP(), | |
312 | which does not update the checksum in\-place, but offers more | |
313 | flexibility and can handle sizes larger than 2 or 4 for the | |
314 | checksum to update. | |
315 | .sp | |
316 | A call to this helper is susceptible to change the underlaying | |
317 | packet buffer. Therefore, at load time, all checks on pointers | |
318 | previously done by the verifier are invalidated and must be | |
319 | performed again, if the helper is used in combination with | |
320 | direct packet access. | |
321 | .TP | |
322 | .B Return | |
323 | 0 on success, or a negative error in case of failure. | |
324 | .UNINDENT | |
325 | .TP | |
326 | .B \fBint bpf_l4_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
327 | .INDENT 7.0 | |
328 | .TP | |
329 | .B Description | |
330 | Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the | |
331 | packet associated to \fIskb\fP\&. Computation is incremental, so the | |
332 | helper must know the former value of the header field that was | |
333 | modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the | |
334 | number of bytes (2 or 4) for this field, stored on the lowest | |
335 | four bits of \fIflags\fP\&. Alternatively, it is possible to store | |
336 | the difference between the previous and the new values of the | |
337 | header field in \fIto\fP, by setting \fIfrom\fP and the four lowest | |
338 | bits of \fIflags\fP to 0. For both methods, \fIoffset\fP indicates the | |
339 | location of the IP checksum within the packet. In addition to | |
340 | the size of the field, \fIflags\fP can be added (bitwise OR) actual | |
341 | flags. With \fBBPF_F_MARK_MANGLED_0\fP, a null checksum is left | |
342 | untouched (unless \fBBPF_F_MARK_ENFORCE\fP is added as well), and | |
343 | for updates resulting in a null checksum the value is set to | |
344 | \fBCSUM_MANGLED_0\fP instead. Flag \fBBPF_F_PSEUDO_HDR\fP indicates | |
345 | the checksum is to be computed against a pseudo\-header. | |
346 | .sp | |
347 | This helper works in combination with \fBbpf_csum_diff\fP(), | |
348 | which does not update the checksum in\-place, but offers more | |
349 | flexibility and can handle sizes larger than 2 or 4 for the | |
350 | checksum to update. | |
351 | .sp | |
352 | A call to this helper is susceptible to change the underlaying | |
353 | packet buffer. Therefore, at load time, all checks on pointers | |
354 | previously done by the verifier are invalidated and must be | |
355 | performed again, if the helper is used in combination with | |
356 | direct packet access. | |
357 | .TP | |
358 | .B Return | |
359 | 0 on success, or a negative error in case of failure. | |
360 | .UNINDENT | |
361 | .TP | |
362 | .B \fBint bpf_tail_call(void *\fP\fIctx\fP\fB, struct bpf_map *\fP\fIprog_array_map\fP\fB, u32\fP \fIindex\fP\fB)\fP | |
363 | .INDENT 7.0 | |
364 | .TP | |
365 | .B Description | |
366 | This special helper is used to trigger a "tail call", or in | |
367 | other words, to jump into another eBPF program. The same stack | |
368 | frame is used (but values on stack and in registers for the | |
369 | caller are not accessible to the callee). This mechanism allows | |
370 | for program chaining, either for raising the maximum number of | |
371 | available eBPF instructions, or to execute given programs in | |
372 | conditional blocks. For security reasons, there is an upper | |
373 | limit to the number of successive tail calls that can be | |
374 | performed. | |
375 | .sp | |
376 | Upon call of this helper, the program attempts to jump into a | |
377 | program referenced at index \fIindex\fP in \fIprog_array_map\fP, a | |
378 | special map of type \fBBPF_MAP_TYPE_PROG_ARRAY\fP, and passes | |
379 | \fIctx\fP, a pointer to the context. | |
380 | .sp | |
381 | If the call succeeds, the kernel immediately runs the first | |
382 | instruction of the new program. This is not a function call, | |
383 | and it never returns to the previous program. If the call | |
384 | fails, then the helper has no effect, and the caller continues | |
385 | to run its subsequent instructions. A call can fail if the | |
386 | destination program for the jump does not exist (i.e. \fIindex\fP | |
387 | is superior to the number of entries in \fIprog_array_map\fP), or | |
388 | if the maximum number of tail calls has been reached for this | |
389 | chain of programs. This limit is defined in the kernel by the | |
390 | macro \fBMAX_TAIL_CALL_CNT\fP (not accessible to user space), | |
391 | which is currently set to 32. | |
392 | .TP | |
393 | .B Return | |
394 | 0 on success, or a negative error in case of failure. | |
395 | .UNINDENT | |
396 | .TP | |
397 | .B \fBint bpf_clone_redirect(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
398 | .INDENT 7.0 | |
399 | .TP | |
400 | .B Description | |
401 | Clone and redirect the packet associated to \fIskb\fP to another | |
402 | net device of index \fIifindex\fP\&. Both ingress and egress | |
403 | interfaces can be used for redirection. The \fBBPF_F_INGRESS\fP | |
404 | value in \fIflags\fP is used to make the distinction (ingress path | |
405 | is selected if the flag is present, egress path otherwise). | |
406 | This is the only flag supported for now. | |
407 | .sp | |
408 | In comparison with \fBbpf_redirect\fP() helper, | |
409 | \fBbpf_clone_redirect\fP() has the associated cost of | |
410 | duplicating the packet buffer, but this can be executed out of | |
411 | the eBPF program. Conversely, \fBbpf_redirect\fP() is more | |
412 | efficient, but it is handled through an action code where the | |
413 | redirection happens only after the eBPF program has returned. | |
414 | .sp | |
415 | A call to this helper is susceptible to change the underlaying | |
416 | packet buffer. Therefore, at load time, all checks on pointers | |
417 | previously done by the verifier are invalidated and must be | |
418 | performed again, if the helper is used in combination with | |
419 | direct packet access. | |
420 | .TP | |
421 | .B Return | |
422 | 0 on success, or a negative error in case of failure. | |
423 | .UNINDENT | |
424 | .TP | |
425 | .B \fBu64 bpf_get_current_pid_tgid(void)\fP | |
426 | .INDENT 7.0 | |
427 | .TP | |
428 | .B Return | |
429 | A 64\-bit integer containing the current tgid and pid, and | |
430 | created as such: | |
431 | \fIcurrent_task\fP\fB\->tgid << 32 |\fP | |
432 | \fIcurrent_task\fP\fB\->pid\fP\&. | |
433 | .UNINDENT | |
434 | .TP | |
435 | .B \fBu64 bpf_get_current_uid_gid(void)\fP | |
436 | .INDENT 7.0 | |
437 | .TP | |
438 | .B Return | |
439 | A 64\-bit integer containing the current GID and UID, and | |
440 | created as such: \fIcurrent_gid\fP \fB<< 32 |\fP \fIcurrent_uid\fP\&. | |
441 | .UNINDENT | |
442 | .TP | |
443 | .B \fBint bpf_get_current_comm(char *\fP\fIbuf\fP\fB, u32\fP \fIsize_of_buf\fP\fB)\fP | |
444 | .INDENT 7.0 | |
445 | .TP | |
446 | .B Description | |
447 | Copy the \fBcomm\fP attribute of the current task into \fIbuf\fP of | |
448 | \fIsize_of_buf\fP\&. The \fBcomm\fP attribute contains the name of | |
449 | the executable (excluding the path) for the current task. The | |
450 | \fIsize_of_buf\fP must be strictly positive. On success, the | |
451 | helper makes sure that the \fIbuf\fP is NUL\-terminated. On failure, | |
452 | it is filled with zeroes. | |
453 | .TP | |
454 | .B Return | |
455 | 0 on success, or a negative error in case of failure. | |
456 | .UNINDENT | |
457 | .TP | |
458 | .B \fBu32 bpf_get_cgroup_classid(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
459 | .INDENT 7.0 | |
460 | .TP | |
461 | .B Description | |
462 | Retrieve the classid for the current task, i.e. for the net_cls | |
463 | cgroup to which \fIskb\fP belongs. | |
464 | .sp | |
465 | This helper can be used on TC egress path, but not on ingress. | |
466 | .sp | |
467 | The net_cls cgroup provides an interface to tag network packets | |
468 | based on a user\-provided identifier for all traffic coming from | |
469 | the tasks belonging to the related cgroup. See also the related | |
470 | kernel documentation, available from the Linux sources in file | |
471 | \fIDocumentation/cgroup\-v1/net_cls.txt\fP\&. | |
472 | .sp | |
473 | The Linux kernel has two versions for cgroups: there are | |
474 | cgroups v1 and cgroups v2. Both are available to users, who can | |
475 | use a mixture of them, but note that the net_cls cgroup is for | |
476 | cgroup v1 only. This makes it incompatible with BPF programs | |
477 | run on cgroups, which is a cgroup\-v2\-only feature (a socket can | |
478 | only hold data for one version of cgroups at a time). | |
479 | .sp | |
480 | This helper is only available is the kernel was compiled with | |
481 | the \fBCONFIG_CGROUP_NET_CLASSID\fP configuration option set to | |
482 | "\fBy\fP" or to "\fBm\fP". | |
483 | .TP | |
484 | .B Return | |
485 | The classid, or 0 for the default unconfigured classid. | |
486 | .UNINDENT | |
487 | .TP | |
488 | .B \fBint bpf_skb_vlan_push(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIvlan_proto\fP\fB, u16\fP \fIvlan_tci\fP\fB)\fP | |
489 | .INDENT 7.0 | |
490 | .TP | |
491 | .B Description | |
492 | Push a \fIvlan_tci\fP (VLAN tag control information) of protocol | |
493 | \fIvlan_proto\fP to the packet associated to \fIskb\fP, then update | |
494 | the checksum. Note that if \fIvlan_proto\fP is different from | |
495 | \fBETH_P_8021Q\fP and \fBETH_P_8021AD\fP, it is considered to | |
496 | be \fBETH_P_8021Q\fP\&. | |
497 | .sp | |
498 | A call to this helper is susceptible to change the underlaying | |
499 | packet buffer. Therefore, at load time, all checks on pointers | |
500 | previously done by the verifier are invalidated and must be | |
501 | performed again, if the helper is used in combination with | |
502 | direct packet access. | |
503 | .TP | |
504 | .B Return | |
505 | 0 on success, or a negative error in case of failure. | |
506 | .UNINDENT | |
507 | .TP | |
508 | .B \fBint bpf_skb_vlan_pop(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
509 | .INDENT 7.0 | |
510 | .TP | |
511 | .B Description | |
512 | Pop a VLAN header from the packet associated to \fIskb\fP\&. | |
513 | .sp | |
514 | A call to this helper is susceptible to change the underlaying | |
515 | packet buffer. Therefore, at load time, all checks on pointers | |
516 | previously done by the verifier are invalidated and must be | |
517 | performed again, if the helper is used in combination with | |
518 | direct packet access. | |
519 | .TP | |
520 | .B Return | |
521 | 0 on success, or a negative error in case of failure. | |
522 | .UNINDENT | |
523 | .TP | |
524 | .B \fBint bpf_skb_get_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
525 | .INDENT 7.0 | |
526 | .TP | |
527 | .B Description | |
528 | Get tunnel metadata. This helper takes a pointer \fIkey\fP to an | |
529 | empty \fBstruct bpf_tunnel_key\fP of \fBsize\fP, that will be | |
530 | filled with tunnel metadata for the packet associated to \fIskb\fP\&. | |
531 | The \fIflags\fP can be set to \fBBPF_F_TUNINFO_IPV6\fP, which | |
532 | indicates that the tunnel is based on IPv6 protocol instead of | |
533 | IPv4. | |
534 | .sp | |
535 | The \fBstruct bpf_tunnel_key\fP is an object that generalizes the | |
536 | principal parameters used by various tunneling protocols into a | |
537 | single struct. This way, it can be used to easily make a | |
538 | decision based on the contents of the encapsulation header, | |
539 | "summarized" in this struct. In particular, it holds the IP | |
540 | address of the remote end (IPv4 or IPv6, depending on the case) | |
541 | in \fIkey\fP\fB\->remote_ipv4\fP or \fIkey\fP\fB\->remote_ipv6\fP\&. Also, | |
542 | this struct exposes the \fIkey\fP\fB\->tunnel_id\fP, which is | |
543 | generally mapped to a VNI (Virtual Network Identifier), making | |
544 | it programmable together with the \fBbpf_skb_set_tunnel_key\fP() helper. | |
545 | .sp | |
546 | Let\(aqs imagine that the following code is part of a program | |
547 | attached to the TC ingress interface, on one end of a GRE | |
548 | tunnel, and is supposed to filter out all messages coming from | |
549 | remote ends with IPv4 address other than 10.0.0.1: | |
550 | .INDENT 7.0 | |
551 | .INDENT 3.5 | |
552 | .sp | |
553 | .nf | |
554 | .ft C | |
555 | int ret; | |
556 | struct bpf_tunnel_key key = {}; | |
557 | ||
558 | ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0); | |
559 | if (ret < 0) | |
560 | return TC_ACT_SHOT; // drop packet | |
561 | ||
562 | if (key.remote_ipv4 != 0x0a000001) | |
563 | return TC_ACT_SHOT; // drop packet | |
564 | ||
565 | return TC_ACT_OK; // accept packet | |
566 | .ft P | |
567 | .fi | |
568 | .UNINDENT | |
569 | .UNINDENT | |
570 | .sp | |
571 | This interface can also be used with all encapsulation devices | |
572 | that can operate in "collect metadata" mode: instead of having | |
573 | one network device per specific configuration, the "collect | |
574 | metadata" mode only requires a single device where the | |
575 | configuration can be extracted from this helper. | |
576 | .sp | |
577 | This can be used together with various tunnels such as VXLan, | |
578 | Geneve, GRE or IP in IP (IPIP). | |
579 | .TP | |
580 | .B Return | |
581 | 0 on success, or a negative error in case of failure. | |
582 | .UNINDENT | |
583 | .TP | |
584 | .B \fBint bpf_skb_set_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
585 | .INDENT 7.0 | |
586 | .TP | |
587 | .B Description | |
588 | Populate tunnel metadata for packet associated to \fIskb.\fP The | |
589 | tunnel metadata is set to the contents of \fIkey\fP, of \fIsize\fP\&. The | |
590 | \fIflags\fP can be set to a combination of the following values: | |
591 | .INDENT 7.0 | |
592 | .TP | |
593 | .B \fBBPF_F_TUNINFO_IPV6\fP | |
594 | Indicate that the tunnel is based on IPv6 protocol | |
595 | instead of IPv4. | |
596 | .TP | |
597 | .B \fBBPF_F_ZERO_CSUM_TX\fP | |
598 | For IPv4 packets, add a flag to tunnel metadata | |
599 | indicating that checksum computation should be skipped | |
600 | and checksum set to zeroes. | |
601 | .TP | |
602 | .B \fBBPF_F_DONT_FRAGMENT\fP | |
603 | Add a flag to tunnel metadata indicating that the | |
604 | packet should not be fragmented. | |
605 | .TP | |
606 | .B \fBBPF_F_SEQ_NUMBER\fP | |
607 | Add a flag to tunnel metadata indicating that a | |
608 | sequence number should be added to tunnel header before | |
609 | sending the packet. This flag was added for GRE | |
610 | encapsulation, but might be used with other protocols | |
611 | as well in the future. | |
612 | .UNINDENT | |
613 | .sp | |
614 | Here is a typical usage on the transmit path: | |
615 | .INDENT 7.0 | |
616 | .INDENT 3.5 | |
617 | .sp | |
618 | .nf | |
619 | .ft C | |
620 | struct bpf_tunnel_key key; | |
621 | populate key ... | |
622 | bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0); | |
623 | bpf_clone_redirect(skb, vxlan_dev_ifindex, 0); | |
624 | .ft P | |
625 | .fi | |
626 | .UNINDENT | |
627 | .UNINDENT | |
628 | .sp | |
629 | See also the description of the \fBbpf_skb_get_tunnel_key\fP() | |
630 | helper for additional information. | |
631 | .TP | |
632 | .B Return | |
633 | 0 on success, or a negative error in case of failure. | |
634 | .UNINDENT | |
635 | .TP | |
636 | .B \fBu64 bpf_perf_event_read(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
637 | .INDENT 7.0 | |
638 | .TP | |
639 | .B Description | |
640 | Read the value of a perf event counter. This helper relies on a | |
641 | \fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of | |
642 | the perf event counter is selected when \fImap\fP is updated with | |
643 | perf event file descriptors. The \fImap\fP is an array whose size | |
644 | is the number of available CPUs, and each cell contains a value | |
645 | relative to one CPU. The value to retrieve is indicated by | |
646 | \fIflags\fP, that contains the index of the CPU to look up, masked | |
647 | with \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to | |
648 | \fBBPF_F_CURRENT_CPU\fP to indicate that the value for the | |
649 | current CPU should be retrieved. | |
650 | .sp | |
651 | Note that before Linux 4.13, only hardware perf event can be | |
652 | retrieved. | |
653 | .sp | |
654 | Also, be aware that the newer helper | |
655 | \fBbpf_perf_event_read_value\fP() is recommended over | |
656 | \fBbpf_perf_event_read\fP() in general. The latter has some ABI | |
657 | quirks where error and counter value are used as a return code | |
658 | (which is wrong to do since ranges may overlap). This issue is | |
659 | fixed with \fBbpf_perf_event_read_value\fP(), which at the same | |
660 | time provides more features over the \fBbpf_perf_event_read\fP() interface. Please refer to the description of | |
661 | \fBbpf_perf_event_read_value\fP() for details. | |
662 | .TP | |
663 | .B Return | |
664 | The value of the perf event counter read from the map, or a | |
665 | negative error code in case of failure. | |
666 | .UNINDENT | |
667 | .TP | |
668 | .B \fBint bpf_redirect(u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
669 | .INDENT 7.0 | |
670 | .TP | |
671 | .B Description | |
672 | Redirect the packet to another net device of index \fIifindex\fP\&. | |
673 | This helper is somewhat similar to \fBbpf_clone_redirect\fP(), except that the packet is not cloned, which provides | |
674 | increased performance. | |
675 | .sp | |
676 | Except for XDP, both ingress and egress interfaces can be used | |
677 | for redirection. The \fBBPF_F_INGRESS\fP value in \fIflags\fP is used | |
678 | to make the distinction (ingress path is selected if the flag | |
679 | is present, egress path otherwise). Currently, XDP only | |
680 | supports redirection to the egress interface, and accepts no | |
681 | flag at all. | |
682 | .sp | |
683 | The same effect can be attained with the more generic | |
684 | \fBbpf_redirect_map\fP(), which requires specific maps to be | |
685 | used but offers better performance. | |
686 | .TP | |
687 | .B Return | |
688 | For XDP, the helper returns \fBXDP_REDIRECT\fP on success or | |
689 | \fBXDP_ABORTED\fP on error. For other program types, the values | |
690 | are \fBTC_ACT_REDIRECT\fP on success or \fBTC_ACT_SHOT\fP on | |
691 | error. | |
692 | .UNINDENT | |
693 | .TP | |
694 | .B \fBu32 bpf_get_route_realm(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
695 | .INDENT 7.0 | |
696 | .TP | |
697 | .B Description | |
698 | Retrieve the realm or the route, that is to say the | |
699 | \fBtclassid\fP field of the destination for the \fIskb\fP\&. The | |
700 | indentifier retrieved is a user\-provided tag, similar to the | |
701 | one used with the net_cls cgroup (see description for | |
702 | \fBbpf_get_cgroup_classid\fP() helper), but here this tag is | |
703 | held by a route (a destination entry), not by a task. | |
704 | .sp | |
705 | Retrieving this identifier works with the clsact TC egress hook | |
706 | (see also \fBtc\-bpf(8)\fP), or alternatively on conventional | |
707 | classful egress qdiscs, but not on TC ingress path. In case of | |
708 | clsact TC egress hook, this has the advantage that, internally, | |
709 | the destination entry has not been dropped yet in the transmit | |
710 | path. Therefore, the destination entry does not need to be | |
711 | artificially held via \fBnetif_keep_dst\fP() for a classful | |
712 | qdisc until the \fIskb\fP is freed. | |
713 | .sp | |
714 | This helper is available only if the kernel was compiled with | |
715 | \fBCONFIG_IP_ROUTE_CLASSID\fP configuration option. | |
716 | .TP | |
717 | .B Return | |
718 | The realm of the route for the packet associated to \fIskb\fP, or 0 | |
719 | if none was found. | |
720 | .UNINDENT | |
721 | .TP | |
722 | .B \fBint bpf_perf_event_output(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, void *\fP\fIdata\fP\fB, u64\fP \fIsize\fP\fB)\fP | |
723 | .INDENT 7.0 | |
724 | .TP | |
725 | .B Description | |
726 | Write raw \fIdata\fP blob into a special BPF perf event held by | |
727 | \fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. This perf | |
728 | event must have the following attributes: \fBPERF_SAMPLE_RAW\fP | |
729 | as \fBsample_type\fP, \fBPERF_TYPE_SOFTWARE\fP as \fBtype\fP, and | |
730 | \fBPERF_COUNT_SW_BPF_OUTPUT\fP as \fBconfig\fP\&. | |
731 | .sp | |
732 | The \fIflags\fP are used to indicate the index in \fImap\fP for which | |
733 | the value must be put, masked with \fBBPF_F_INDEX_MASK\fP\&. | |
734 | Alternatively, \fIflags\fP can be set to \fBBPF_F_CURRENT_CPU\fP | |
735 | to indicate that the index of the current CPU core should be | |
736 | used. | |
737 | .sp | |
738 | The value to write, of \fIsize\fP, is passed through eBPF stack and | |
739 | pointed by \fIdata\fP\&. | |
740 | .sp | |
741 | The context of the program \fIctx\fP needs also be passed to the | |
742 | helper. | |
743 | .sp | |
744 | On user space, a program willing to read the values needs to | |
745 | call \fBperf_event_open\fP() on the perf event (either for | |
746 | one or for all CPUs) and to store the file descriptor into the | |
747 | \fImap\fP\&. This must be done before the eBPF program can send data | |
748 | into it. An example is available in file | |
749 | \fIsamples/bpf/trace_output_user.c\fP in the Linux kernel source | |
750 | tree (the eBPF program counterpart is in | |
751 | \fIsamples/bpf/trace_output_kern.c\fP). | |
752 | .sp | |
753 | \fBbpf_perf_event_output\fP() achieves better performance | |
754 | than \fBbpf_trace_printk\fP() for sharing data with user | |
755 | space, and is much better suitable for streaming data from eBPF | |
756 | programs. | |
757 | .sp | |
758 | Note that this helper is not restricted to tracing use cases | |
759 | and can be used with programs attached to TC or XDP as well, | |
760 | where it allows for passing data to user space listeners. Data | |
761 | can be: | |
762 | .INDENT 7.0 | |
763 | .IP \(bu 2 | |
764 | Only custom structs, | |
765 | .IP \(bu 2 | |
766 | Only the packet payload, or | |
767 | .IP \(bu 2 | |
768 | A combination of both. | |
769 | .UNINDENT | |
770 | .TP | |
771 | .B Return | |
772 | 0 on success, or a negative error in case of failure. | |
773 | .UNINDENT | |
774 | .TP | |
775 | .B \fBint bpf_skb_load_bytes(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
776 | .INDENT 7.0 | |
777 | .TP | |
778 | .B Description | |
779 | This helper was provided as an easy way to load data from a | |
780 | packet. It can be used to load \fIlen\fP bytes from \fIoffset\fP from | |
781 | the packet associated to \fIskb\fP, into the buffer pointed by | |
782 | \fIto\fP\&. | |
783 | .sp | |
784 | Since Linux 4.7, usage of this helper has mostly been replaced | |
785 | by "direct packet access", enabling packet data to be | |
786 | manipulated with \fIskb\fP\fB\->data\fP and \fIskb\fP\fB\->data_end\fP | |
787 | pointing respectively to the first byte of packet data and to | |
788 | the byte after the last byte of packet data. However, it | |
789 | remains useful if one wishes to read large quantities of data | |
790 | at once from a packet into the eBPF stack. | |
791 | .TP | |
792 | .B Return | |
793 | 0 on success, or a negative error in case of failure. | |
794 | .UNINDENT | |
795 | .TP | |
796 | .B \fBint bpf_get_stackid(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
797 | .INDENT 7.0 | |
798 | .TP | |
799 | .B Description | |
800 | Walk a user or a kernel stack and return its id. To achieve | |
801 | this, the helper needs \fIctx\fP, which is a pointer to the context | |
802 | on which the tracing program is executed, and a pointer to a | |
803 | \fImap\fP of type \fBBPF_MAP_TYPE_STACK_TRACE\fP\&. | |
804 | .sp | |
805 | The last argument, \fIflags\fP, holds the number of stack frames to | |
806 | skip (from 0 to 255), masked with | |
807 | \fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set | |
808 | a combination of the following flags: | |
809 | .INDENT 7.0 | |
810 | .TP | |
811 | .B \fBBPF_F_USER_STACK\fP | |
812 | Collect a user space stack instead of a kernel stack. | |
813 | .TP | |
814 | .B \fBBPF_F_FAST_STACK_CMP\fP | |
815 | Compare stacks by hash only. | |
816 | .TP | |
817 | .B \fBBPF_F_REUSE_STACKID\fP | |
818 | If two different stacks hash into the same \fIstackid\fP, | |
819 | discard the old one. | |
820 | .UNINDENT | |
821 | .sp | |
822 | The stack id retrieved is a 32 bit long integer handle which | |
823 | can be further combined with other data (including other stack | |
824 | ids) and used as a key into maps. This can be useful for | |
825 | generating a variety of graphs (such as flame graphs or off\-cpu | |
826 | graphs). | |
827 | .sp | |
828 | For walking a stack, this helper is an improvement over | |
829 | \fBbpf_probe_read\fP(), which can be used with unrolled loops | |
830 | but is not efficient and consumes a lot of eBPF instructions. | |
831 | Instead, \fBbpf_get_stackid\fP() can collect up to | |
832 | \fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames. Note that | |
833 | this limit can be controlled with the \fBsysctl\fP program, and | |
834 | that it should be manually increased in order to profile long | |
835 | user stacks (such as stacks for Java programs). To do so, use: | |
836 | .INDENT 7.0 | |
837 | .INDENT 3.5 | |
838 | .sp | |
839 | .nf | |
840 | .ft C | |
841 | # sysctl kernel.perf_event_max_stack=<new value> | |
842 | .ft P | |
843 | .fi | |
844 | .UNINDENT | |
845 | .UNINDENT | |
846 | .TP | |
847 | .B Return | |
848 | The positive or null stack id on success, or a negative error | |
849 | in case of failure. | |
850 | .UNINDENT | |
851 | .TP | |
852 | .B \fBs64 bpf_csum_diff(__be32 *\fP\fIfrom\fP\fB, u32\fP \fIfrom_size\fP\fB, __be32 *\fP\fIto\fP\fB, u32\fP \fIto_size\fP\fB, __wsum\fP \fIseed\fP\fB)\fP | |
853 | .INDENT 7.0 | |
854 | .TP | |
855 | .B Description | |
856 | Compute a checksum difference, from the raw buffer pointed by | |
857 | \fIfrom\fP, of length \fIfrom_size\fP (that must be a multiple of 4), | |
858 | towards the raw buffer pointed by \fIto\fP, of size \fIto_size\fP | |
859 | (same remark). An optional \fIseed\fP can be added to the value | |
860 | (this can be cascaded, the seed may come from a previous call | |
861 | to the helper). | |
862 | .sp | |
863 | This is flexible enough to be used in several ways: | |
864 | .INDENT 7.0 | |
865 | .IP \(bu 2 | |
866 | With \fIfrom_size\fP == 0, \fIto_size\fP > 0 and \fIseed\fP set to | |
867 | checksum, it can be used when pushing new data. | |
868 | .IP \(bu 2 | |
869 | With \fIfrom_size\fP > 0, \fIto_size\fP == 0 and \fIseed\fP set to | |
870 | checksum, it can be used when removing data from a packet. | |
871 | .IP \(bu 2 | |
872 | With \fIfrom_size\fP > 0, \fIto_size\fP > 0 and \fIseed\fP set to 0, it | |
873 | can be used to compute a diff. Note that \fIfrom_size\fP and | |
874 | \fIto_size\fP do not need to be equal. | |
875 | .UNINDENT | |
876 | .sp | |
877 | This helper can be used in combination with | |
878 | \fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(), to | |
879 | which one can feed in the difference computed with | |
880 | \fBbpf_csum_diff\fP(). | |
881 | .TP | |
882 | .B Return | |
883 | The checksum result, or a negative error code in case of | |
884 | failure. | |
885 | .UNINDENT | |
886 | .TP | |
887 | .B \fBint bpf_skb_get_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP | |
888 | .INDENT 7.0 | |
889 | .TP | |
890 | .B Description | |
891 | Retrieve tunnel options metadata for the packet associated to | |
892 | \fIskb\fP, and store the raw tunnel option data to the buffer \fIopt\fP | |
893 | of \fIsize\fP\&. | |
894 | .sp | |
895 | This helper can be used with encapsulation devices that can | |
896 | operate in "collect metadata" mode (please refer to the related | |
897 | note in the description of \fBbpf_skb_get_tunnel_key\fP() for | |
898 | more details). A particular example where this can be used is | |
899 | in combination with the Geneve encapsulation protocol, where it | |
900 | allows for pushing (with \fBbpf_skb_get_tunnel_opt\fP() helper) | |
901 | and retrieving arbitrary TLVs (Type\-Length\-Value headers) from | |
902 | the eBPF program. This allows for full customization of these | |
903 | headers. | |
904 | .TP | |
905 | .B Return | |
906 | The size of the option data retrieved. | |
907 | .UNINDENT | |
908 | .TP | |
909 | .B \fBint bpf_skb_set_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP | |
910 | .INDENT 7.0 | |
911 | .TP | |
912 | .B Description | |
913 | Set tunnel options metadata for the packet associated to \fIskb\fP | |
914 | to the option data contained in the raw buffer \fIopt\fP of \fIsize\fP\&. | |
915 | .sp | |
916 | See also the description of the \fBbpf_skb_get_tunnel_opt\fP() | |
917 | helper for additional information. | |
918 | .TP | |
919 | .B Return | |
920 | 0 on success, or a negative error in case of failure. | |
921 | .UNINDENT | |
922 | .TP | |
923 | .B \fBint bpf_skb_change_proto(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIproto\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
924 | .INDENT 7.0 | |
925 | .TP | |
926 | .B Description | |
927 | Change the protocol of the \fIskb\fP to \fIproto\fP\&. Currently | |
928 | supported are transition from IPv4 to IPv6, and from IPv6 to | |
929 | IPv4. The helper takes care of the groundwork for the | |
930 | transition, including resizing the socket buffer. The eBPF | |
931 | program is expected to fill the new headers, if any, via | |
932 | \fBskb_store_bytes\fP() and to recompute the checksums with | |
933 | \fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(). The main case for this helper is to perform NAT64 | |
934 | operations out of an eBPF program. | |
935 | .sp | |
936 | Internally, the GSO type is marked as dodgy so that headers are | |
937 | checked and segments are recalculated by the GSO/GRO engine. | |
938 | The size for GSO target is adapted as well. | |
939 | .sp | |
940 | All values for \fIflags\fP are reserved for future usage, and must | |
941 | be left at zero. | |
942 | .sp | |
943 | A call to this helper is susceptible to change the underlaying | |
944 | packet buffer. Therefore, at load time, all checks on pointers | |
945 | previously done by the verifier are invalidated and must be | |
946 | performed again, if the helper is used in combination with | |
947 | direct packet access. | |
948 | .TP | |
949 | .B Return | |
950 | 0 on success, or a negative error in case of failure. | |
951 | .UNINDENT | |
952 | .TP | |
953 | .B \fBint bpf_skb_change_type(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB)\fP | |
954 | .INDENT 7.0 | |
955 | .TP | |
956 | .B Description | |
957 | Change the packet type for the packet associated to \fIskb\fP\&. This | |
958 | comes down to setting \fIskb\fP\fB\->pkt_type\fP to \fItype\fP, except | |
959 | the eBPF program does not have a write access to \fIskb\fP\fB\->pkt_type\fP beside this helper. Using a helper here allows | |
960 | for graceful handling of errors. | |
961 | .sp | |
962 | The major use case is to change incoming \fIskb*s to | |
963 | **PACKET_HOST*\fP in a programmatic way instead of having to | |
964 | recirculate via \fBredirect\fP(..., \fBBPF_F_INGRESS\fP), for | |
965 | example. | |
966 | .sp | |
967 | Note that \fItype\fP only allows certain values. At this time, they | |
968 | are: | |
969 | .INDENT 7.0 | |
970 | .TP | |
971 | .B \fBPACKET_HOST\fP | |
972 | Packet is for us. | |
973 | .TP | |
974 | .B \fBPACKET_BROADCAST\fP | |
975 | Send packet to all. | |
976 | .TP | |
977 | .B \fBPACKET_MULTICAST\fP | |
978 | Send packet to group. | |
979 | .TP | |
980 | .B \fBPACKET_OTHERHOST\fP | |
981 | Send packet to someone else. | |
982 | .UNINDENT | |
983 | .TP | |
984 | .B Return | |
985 | 0 on success, or a negative error in case of failure. | |
986 | .UNINDENT | |
987 | .TP | |
988 | .B \fBint bpf_skb_under_cgroup(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP | |
989 | .INDENT 7.0 | |
990 | .TP | |
991 | .B Description | |
992 | Check whether \fIskb\fP is a descendant of the cgroup2 held by | |
993 | \fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&. | |
994 | .TP | |
995 | .B Return | |
996 | The return value depends on the result of the test, and can be: | |
997 | .INDENT 7.0 | |
998 | .IP \(bu 2 | |
999 | 0, if the \fIskb\fP failed the cgroup2 descendant test. | |
1000 | .IP \(bu 2 | |
1001 | 1, if the \fIskb\fP succeeded the cgroup2 descendant test. | |
1002 | .IP \(bu 2 | |
1003 | A negative error code, if an error occurred. | |
1004 | .UNINDENT | |
1005 | .UNINDENT | |
1006 | .TP | |
1007 | .B \fBu32 bpf_get_hash_recalc(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1008 | .INDENT 7.0 | |
1009 | .TP | |
1010 | .B Description | |
1011 | Retrieve the hash of the packet, \fIskb\fP\fB\->hash\fP\&. If it is | |
1012 | not set, in particular if the hash was cleared due to mangling, | |
1013 | recompute this hash. Later accesses to the hash can be done | |
1014 | directly with \fIskb\fP\fB\->hash\fP\&. | |
1015 | .sp | |
1016 | Calling \fBbpf_set_hash_invalid\fP(), changing a packet | |
1017 | prototype with \fBbpf_skb_change_proto\fP(), or calling | |
1018 | \fBbpf_skb_store_bytes\fP() with the | |
1019 | \fBBPF_F_INVALIDATE_HASH\fP are actions susceptible to clear | |
1020 | the hash and to trigger a new computation for the next call to | |
1021 | \fBbpf_get_hash_recalc\fP(). | |
1022 | .TP | |
1023 | .B Return | |
1024 | The 32\-bit hash. | |
1025 | .UNINDENT | |
1026 | .TP | |
1027 | .B \fBu64 bpf_get_current_task(void)\fP | |
1028 | .INDENT 7.0 | |
1029 | .TP | |
1030 | .B Return | |
1031 | A pointer to the current task struct. | |
1032 | .UNINDENT | |
1033 | .TP | |
1034 | .B \fBint bpf_probe_write_user(void *\fP\fIdst\fP\fB, const void *\fP\fIsrc\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
1035 | .INDENT 7.0 | |
1036 | .TP | |
1037 | .B Description | |
1038 | Attempt in a safe way to write \fIlen\fP bytes from the buffer | |
1039 | \fIsrc\fP to \fIdst\fP in memory. It only works for threads that are in | |
1040 | user context, and \fIdst\fP must be a valid user space address. | |
1041 | .sp | |
1042 | This helper should not be used to implement any kind of | |
1043 | security mechanism because of TOC\-TOU attacks, but rather to | |
1044 | debug, divert, and manipulate execution of semi\-cooperative | |
1045 | processes. | |
1046 | .sp | |
1047 | Keep in mind that this feature is meant for experiments, and it | |
1048 | has a risk of crashing the system and running programs. | |
1049 | Therefore, when an eBPF program using this helper is attached, | |
1050 | a warning including PID and process name is printed to kernel | |
1051 | logs. | |
1052 | .TP | |
1053 | .B Return | |
1054 | 0 on success, or a negative error in case of failure. | |
1055 | .UNINDENT | |
1056 | .TP | |
1057 | .B \fBint bpf_current_task_under_cgroup(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP | |
1058 | .INDENT 7.0 | |
1059 | .TP | |
1060 | .B Description | |
1061 | Check whether the probe is being run is the context of a given | |
1062 | subset of the cgroup2 hierarchy. The cgroup2 to test is held by | |
1063 | \fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&. | |
1064 | .TP | |
1065 | .B Return | |
1066 | The return value depends on the result of the test, and can be: | |
1067 | .INDENT 7.0 | |
1068 | .IP \(bu 2 | |
1069 | 0, if the \fIskb\fP task belongs to the cgroup2. | |
1070 | .IP \(bu 2 | |
1071 | 1, if the \fIskb\fP task does not belong to the cgroup2. | |
1072 | .IP \(bu 2 | |
1073 | A negative error code, if an error occurred. | |
1074 | .UNINDENT | |
1075 | .UNINDENT | |
1076 | .TP | |
1077 | .B \fBint bpf_skb_change_tail(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1078 | .INDENT 7.0 | |
1079 | .TP | |
1080 | .B Description | |
1081 | Resize (trim or grow) the packet associated to \fIskb\fP to the | |
1082 | new \fIlen\fP\&. The \fIflags\fP are reserved for future usage, and must | |
1083 | be left at zero. | |
1084 | .sp | |
1085 | The basic idea is that the helper performs the needed work to | |
1086 | change the size of the packet, then the eBPF program rewrites | |
1087 | the rest via helpers like \fBbpf_skb_store_bytes\fP(), | |
1088 | \fBbpf_l3_csum_replace\fP(), \fBbpf_l3_csum_replace\fP() | |
1089 | and others. This helper is a slow path utility intended for | |
1090 | replies with control messages. And because it is targeted for | |
1091 | slow path, the helper itself can afford to be slow: it | |
1092 | implicitly linearizes, unclones and drops offloads from the | |
1093 | \fIskb\fP\&. | |
1094 | .sp | |
1095 | A call to this helper is susceptible to change the underlaying | |
1096 | packet buffer. Therefore, at load time, all checks on pointers | |
1097 | previously done by the verifier are invalidated and must be | |
1098 | performed again, if the helper is used in combination with | |
1099 | direct packet access. | |
1100 | .TP | |
1101 | .B Return | |
1102 | 0 on success, or a negative error in case of failure. | |
1103 | .UNINDENT | |
1104 | .TP | |
1105 | .B \fBint bpf_skb_pull_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
1106 | .INDENT 7.0 | |
1107 | .TP | |
1108 | .B Description | |
1109 | Pull in non\-linear data in case the \fIskb\fP is non\-linear and not | |
1110 | all of \fIlen\fP are part of the linear section. Make \fIlen\fP bytes | |
1111 | from \fIskb\fP readable and writable. If a zero value is passed for | |
1112 | \fIlen\fP, then the whole length of the \fIskb\fP is pulled. | |
1113 | .sp | |
1114 | This helper is only needed for reading and writing with direct | |
1115 | packet access. | |
1116 | .sp | |
1117 | For direct packet access, testing that offsets to access | |
1118 | are within packet boundaries (test on \fIskb\fP\fB\->data_end\fP) is | |
1119 | susceptible to fail if offsets are invalid, or if the requested | |
1120 | data is in non\-linear parts of the \fIskb\fP\&. On failure the | |
1121 | program can just bail out, or in the case of a non\-linear | |
1122 | buffer, use a helper to make the data available. The | |
1123 | \fBbpf_skb_load_bytes\fP() helper is a first solution to access | |
1124 | the data. Another one consists in using \fBbpf_skb_pull_data\fP | |
1125 | to pull in once the non\-linear parts, then retesting and | |
1126 | eventually access the data. | |
1127 | .sp | |
1128 | At the same time, this also makes sure the \fIskb\fP is uncloned, | |
1129 | which is a necessary condition for direct write. As this needs | |
1130 | to be an invariant for the write part only, the verifier | |
1131 | detects writes and adds a prologue that is calling | |
1132 | \fBbpf_skb_pull_data()\fP to effectively unclone the \fIskb\fP from | |
1133 | the very beginning in case it is indeed cloned. | |
1134 | .sp | |
1135 | A call to this helper is susceptible to change the underlaying | |
1136 | packet buffer. Therefore, at load time, all checks on pointers | |
1137 | previously done by the verifier are invalidated and must be | |
1138 | performed again, if the helper is used in combination with | |
1139 | direct packet access. | |
1140 | .TP | |
1141 | .B Return | |
1142 | 0 on success, or a negative error in case of failure. | |
1143 | .UNINDENT | |
1144 | .TP | |
1145 | .B \fBs64 bpf_csum_update(struct sk_buff *\fP\fIskb\fP\fB, __wsum\fP \fIcsum\fP\fB)\fP | |
1146 | .INDENT 7.0 | |
1147 | .TP | |
1148 | .B Description | |
1149 | Add the checksum \fIcsum\fP into \fIskb\fP\fB\->csum\fP in case the | |
1150 | driver has supplied a checksum for the entire packet into that | |
1151 | field. Return an error otherwise. This helper is intended to be | |
1152 | used in combination with \fBbpf_csum_diff\fP(), in particular | |
1153 | when the checksum needs to be updated after data has been | |
1154 | written into the packet through direct packet access. | |
1155 | .TP | |
1156 | .B Return | |
1157 | The checksum on success, or a negative error code in case of | |
1158 | failure. | |
1159 | .UNINDENT | |
1160 | .TP | |
1161 | .B \fBvoid bpf_set_hash_invalid(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1162 | .INDENT 7.0 | |
1163 | .TP | |
1164 | .B Description | |
1165 | Invalidate the current \fIskb\fP\fB\->hash\fP\&. It can be used after | |
1166 | mangling on headers through direct packet access, in order to | |
1167 | indicate that the hash is outdated and to trigger a | |
1168 | recalculation the next time the kernel tries to access this | |
1169 | hash or when the \fBbpf_get_hash_recalc\fP() helper is called. | |
1170 | .UNINDENT | |
1171 | .TP | |
1172 | .B \fBint bpf_get_numa_node_id(void)\fP | |
1173 | .INDENT 7.0 | |
1174 | .TP | |
1175 | .B Description | |
1176 | Return the id of the current NUMA node. The primary use case | |
1177 | for this helper is the selection of sockets for the local NUMA | |
1178 | node, when the program is attached to sockets using the | |
1179 | \fBSO_ATTACH_REUSEPORT_EBPF\fP option (see also \fBsocket(7)\fP), | |
1180 | but the helper is also available to other eBPF program types, | |
1181 | similarly to \fBbpf_get_smp_processor_id\fP(). | |
1182 | .TP | |
1183 | .B Return | |
1184 | The id of current NUMA node. | |
1185 | .UNINDENT | |
1186 | .TP | |
1187 | .B \fBint bpf_skb_change_head(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1188 | .INDENT 7.0 | |
1189 | .TP | |
1190 | .B Description | |
1191 | Grows headroom of packet associated to \fIskb\fP and adjusts the | |
1192 | offset of the MAC header accordingly, adding \fIlen\fP bytes of | |
1193 | space. It automatically extends and reallocates memory as | |
1194 | required. | |
1195 | .sp | |
1196 | This helper can be used on a layer 3 \fIskb\fP to push a MAC header | |
1197 | for redirection into a layer 2 device. | |
1198 | .sp | |
1199 | All values for \fIflags\fP are reserved for future usage, and must | |
1200 | be left at zero. | |
1201 | .sp | |
1202 | A call to this helper is susceptible to change the underlaying | |
1203 | packet buffer. Therefore, at load time, all checks on pointers | |
1204 | previously done by the verifier are invalidated and must be | |
1205 | performed again, if the helper is used in combination with | |
1206 | direct packet access. | |
1207 | .TP | |
1208 | .B Return | |
1209 | 0 on success, or a negative error in case of failure. | |
1210 | .UNINDENT | |
1211 | .TP | |
1212 | .B \fBint bpf_xdp_adjust_head(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP | |
1213 | .INDENT 7.0 | |
1214 | .TP | |
1215 | .B Description | |
1216 | Adjust (move) \fIxdp_md\fP\fB\->data\fP by \fIdelta\fP bytes. Note that | |
1217 | it is possible to use a negative value for \fIdelta\fP\&. This helper | |
1218 | can be used to prepare the packet for pushing or popping | |
1219 | headers. | |
1220 | .sp | |
1221 | A call to this helper is susceptible to change the underlaying | |
1222 | packet buffer. Therefore, at load time, all checks on pointers | |
1223 | previously done by the verifier are invalidated and must be | |
1224 | performed again, if the helper is used in combination with | |
1225 | direct packet access. | |
1226 | .TP | |
1227 | .B Return | |
1228 | 0 on success, or a negative error in case of failure. | |
1229 | .UNINDENT | |
1230 | .TP | |
1231 | .B \fBint bpf_probe_read_str(void *\fP\fIdst\fP\fB, int\fP \fIsize\fP\fB, const void *\fP\fIunsafe_ptr\fP\fB)\fP | |
1232 | .INDENT 7.0 | |
1233 | .TP | |
1234 | .B Description | |
1235 | Copy a NUL terminated string from an unsafe address | |
1236 | \fIunsafe_ptr\fP to \fIdst\fP\&. The \fIsize\fP should include the | |
1237 | terminating NUL byte. In case the string length is smaller than | |
1238 | \fIsize\fP, the target is not padded with further NUL bytes. If the | |
1239 | string length is larger than \fIsize\fP, just \fIsize\fP\-1 bytes are | |
1240 | copied and the last byte is set to NUL. | |
1241 | .sp | |
1242 | On success, the length of the copied string is returned. This | |
1243 | makes this helper useful in tracing programs for reading | |
1244 | strings, and more importantly to get its length at runtime. See | |
1245 | the following snippet: | |
1246 | .INDENT 7.0 | |
1247 | .INDENT 3.5 | |
1248 | .sp | |
1249 | .nf | |
1250 | .ft C | |
1251 | SEC("kprobe/sys_open") | |
1252 | void bpf_sys_open(struct pt_regs *ctx) | |
53666f6c MK |
1253 | char buf[PATHLEN]; // PATHLEN is defined to 256 |
1254 | int res = bpf_probe_read_str(buf, sizeof(buf), | |
1255 | ctx\->di); | |
1256 | ||
1257 | // Consume buf, for example push it to | |
1258 | // userspace via bpf_perf_event_output(); we | |
1259 | // can use res (the string length) as event | |
1260 | // size, after checking its boundaries. | |
53666f6c MK |
1261 | .ft P |
1262 | .fi | |
1263 | .UNINDENT | |
1264 | .UNINDENT | |
1265 | .sp | |
1266 | In comparison, using \fBbpf_probe_read()\fP helper here instead | |
1267 | to read the string would require to estimate the length at | |
1268 | compile time, and would often result in copying more memory | |
1269 | than necessary. | |
1270 | .sp | |
1271 | Another useful use case is when parsing individual process | |
1272 | arguments or individual environment variables navigating | |
1273 | \fIcurrent\fP\fB\->mm\->arg_start\fP and \fIcurrent\fP\fB\->mm\->env_start\fP: using this helper and the return value, | |
1274 | one can quickly iterate at the right offset of the memory area. | |
1275 | .TP | |
1276 | .B Return | |
1277 | On success, the strictly positive length of the string, | |
1278 | including the trailing NUL character. On error, a negative | |
1279 | value. | |
1280 | .UNINDENT | |
1281 | .TP | |
1282 | .B \fBu64 bpf_get_socket_cookie(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1283 | .INDENT 7.0 | |
1284 | .TP | |
1285 | .B Description | |
1286 | If the \fBstruct sk_buff\fP pointed by \fIskb\fP has a known socket, | |
1287 | retrieve the cookie (generated by the kernel) of this socket. | |
1288 | If no cookie has been set yet, generate a new cookie. Once | |
1289 | generated, the socket cookie remains stable for the life of the | |
1290 | socket. This helper can be useful for monitoring per socket | |
1291 | networking traffic statistics as it provides a unique socket | |
1292 | identifier per namespace. | |
1293 | .TP | |
1294 | .B Return | |
1295 | A 8\-byte long non\-decreasing number on success, or 0 if the | |
1296 | socket field is missing inside \fIskb\fP\&. | |
1297 | .UNINDENT | |
1298 | .TP | |
1299 | .B \fBu64 bpf_get_socket_cookie(struct bpf_sock_addr *\fP\fIctx\fP\fB)\fP | |
1300 | .INDENT 7.0 | |
1301 | .TP | |
1302 | .B Description | |
1303 | Equivalent to bpf_get_socket_cookie() helper that accepts | |
1304 | \fIskb\fP, but gets socket from \fBstruct bpf_sock_addr\fP contex. | |
1305 | .TP | |
1306 | .B Return | |
1307 | A 8\-byte long non\-decreasing number. | |
1308 | .UNINDENT | |
1309 | .TP | |
1310 | .B \fBu64 bpf_get_socket_cookie(struct bpf_sock_ops *\fP\fIctx\fP\fB)\fP | |
1311 | .INDENT 7.0 | |
1312 | .TP | |
1313 | .B Description | |
1314 | Equivalent to bpf_get_socket_cookie() helper that accepts | |
1315 | \fIskb\fP, but gets socket from \fBstruct bpf_sock_ops\fP contex. | |
1316 | .TP | |
1317 | .B Return | |
1318 | A 8\-byte long non\-decreasing number. | |
1319 | .UNINDENT | |
1320 | .TP | |
1321 | .B \fBu32 bpf_get_socket_uid(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1322 | .INDENT 7.0 | |
1323 | .TP | |
1324 | .B Return | |
1325 | The owner UID of the socket associated to \fIskb\fP\&. If the socket | |
1326 | is \fBNULL\fP, or if it is not a full socket (i.e. if it is a | |
1327 | time\-wait or a request socket instead), \fBoverflowuid\fP value | |
1328 | is returned (note that \fBoverflowuid\fP might also be the actual | |
1329 | UID value for the socket). | |
1330 | .UNINDENT | |
1331 | .TP | |
1332 | .B \fBu32 bpf_set_hash(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIhash\fP\fB)\fP | |
1333 | .INDENT 7.0 | |
1334 | .TP | |
1335 | .B Description | |
1336 | Set the full hash for \fIskb\fP (set the field \fIskb\fP\fB\->hash\fP) | |
1337 | to value \fIhash\fP\&. | |
1338 | .TP | |
1339 | .B Return | |
53666f6c MK |
1340 | .UNINDENT |
1341 | .TP | |
1342 | .B \fBint bpf_setsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP | |
1343 | .INDENT 7.0 | |
1344 | .TP | |
1345 | .B Description | |
1346 | Emulate a call to \fBsetsockopt()\fP on the socket associated to | |
1347 | \fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at | |
1348 | which the option resides and the name \fIoptname\fP of the option | |
1349 | must be specified, see \fBsetsockopt(2)\fP for more information. | |
1350 | The option value of length \fIoptlen\fP is pointed by \fIoptval\fP\&. | |
1351 | .sp | |
1352 | This helper actually implements a subset of \fBsetsockopt()\fP\&. | |
1353 | It supports the following \fIlevel\fPs: | |
1354 | .INDENT 7.0 | |
1355 | .IP \(bu 2 | |
1356 | \fBSOL_SOCKET\fP, which supports the following \fIoptname\fPs: | |
1357 | \fBSO_RCVBUF\fP, \fBSO_SNDBUF\fP, \fBSO_MAX_PACING_RATE\fP, | |
1358 | \fBSO_PRIORITY\fP, \fBSO_RCVLOWAT\fP, \fBSO_MARK\fP\&. | |
1359 | .IP \(bu 2 | |
1360 | \fBIPPROTO_TCP\fP, which supports the following \fIoptname\fPs: | |
1361 | \fBTCP_CONGESTION\fP, \fBTCP_BPF_IW\fP, | |
1362 | \fBTCP_BPF_SNDCWND_CLAMP\fP\&. | |
1363 | .IP \(bu 2 | |
1364 | \fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&. | |
1365 | .IP \(bu 2 | |
1366 | \fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&. | |
1367 | .UNINDENT | |
1368 | .TP | |
1369 | .B Return | |
1370 | 0 on success, or a negative error in case of failure. | |
1371 | .UNINDENT | |
1372 | .TP | |
2223d7df | 1373 | .B \fBint bpf_skb_adjust_room(struct sk_buff *\fP\fIskb\fP\fB, s32\fP \fIlen_diff\fP\fB, u32\fP \fImode\fP\fB, u64\fP \fIflags\fP\fB)\fP |
53666f6c MK |
1374 | .INDENT 7.0 |
1375 | .TP | |
1376 | .B Description | |
1377 | Grow or shrink the room for data in the packet associated to | |
1378 | \fIskb\fP by \fIlen_diff\fP, and according to the selected \fImode\fP\&. | |
1379 | .sp | |
1380 | There is a single supported mode at this time: | |
1381 | .INDENT 7.0 | |
1382 | .IP \(bu 2 | |
1383 | \fBBPF_ADJ_ROOM_NET\fP: Adjust room at the network layer | |
1384 | (room space is added or removed below the layer 3 header). | |
1385 | .UNINDENT | |
1386 | .sp | |
1387 | All values for \fIflags\fP are reserved for future usage, and must | |
1388 | be left at zero. | |
1389 | .sp | |
1390 | A call to this helper is susceptible to change the underlaying | |
1391 | packet buffer. Therefore, at load time, all checks on pointers | |
1392 | previously done by the verifier are invalidated and must be | |
1393 | performed again, if the helper is used in combination with | |
1394 | direct packet access. | |
1395 | .TP | |
1396 | .B Return | |
1397 | 0 on success, or a negative error in case of failure. | |
1398 | .UNINDENT | |
1399 | .TP | |
1400 | .B \fBint bpf_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1401 | .INDENT 7.0 | |
1402 | .TP | |
1403 | .B Description | |
1404 | Redirect the packet to the endpoint referenced by \fImap\fP at | |
1405 | index \fIkey\fP\&. Depending on its type, this \fImap\fP can contain | |
1406 | references to net devices (for forwarding packets through other | |
1407 | ports), or to CPUs (for redirecting XDP frames to another CPU; | |
1408 | but this is only implemented for native XDP (with driver | |
1409 | support) as of this writing). | |
1410 | .sp | |
1411 | All values for \fIflags\fP are reserved for future usage, and must | |
1412 | be left at zero. | |
1413 | .sp | |
1414 | When used to redirect packets to net devices, this helper | |
1415 | provides a high performance increase over \fBbpf_redirect\fP(). | |
1416 | This is due to various implementation details of the underlying | |
1417 | mechanisms, one of which is the fact that \fBbpf_redirect_map\fP() tries to send packet as a "bulk" to the device. | |
1418 | .TP | |
1419 | .B Return | |
1420 | \fBXDP_REDIRECT\fP on success, or \fBXDP_ABORTED\fP on error. | |
1421 | .UNINDENT | |
1422 | .TP | |
1423 | .B \fBint bpf_sk_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1424 | .INDENT 7.0 | |
1425 | .TP | |
1426 | .B Description | |
1427 | Redirect the packet to the socket referenced by \fImap\fP (of type | |
1428 | \fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and | |
1429 | egress interfaces can be used for redirection. The | |
1430 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
1431 | distinction (ingress path is selected if the flag is present, | |
1432 | egress path otherwise). This is the only flag supported for now. | |
1433 | .TP | |
1434 | .B Return | |
1435 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
1436 | .UNINDENT | |
1437 | .TP | |
1438 | .B \fBint bpf_sock_map_update(struct bpf_sock_ops *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1439 | .INDENT 7.0 | |
1440 | .TP | |
1441 | .B Description | |
1442 | Add an entry to, or update a \fImap\fP referencing sockets. The | |
1443 | \fIskops\fP is used as a new value for the entry associated to | |
1444 | \fIkey\fP\&. \fIflags\fP is one of: | |
1445 | .INDENT 7.0 | |
1446 | .TP | |
1447 | .B \fBBPF_NOEXIST\fP | |
1448 | The entry for \fIkey\fP must not exist in the map. | |
1449 | .TP | |
1450 | .B \fBBPF_EXIST\fP | |
1451 | The entry for \fIkey\fP must already exist in the map. | |
1452 | .TP | |
1453 | .B \fBBPF_ANY\fP | |
1454 | No condition on the existence of the entry for \fIkey\fP\&. | |
1455 | .UNINDENT | |
1456 | .sp | |
1457 | If the \fImap\fP has eBPF programs (parser and verdict), those will | |
1458 | be inherited by the socket being added. If the socket is | |
1459 | already attached to eBPF programs, this results in an error. | |
1460 | .TP | |
1461 | .B Return | |
1462 | 0 on success, or a negative error in case of failure. | |
1463 | .UNINDENT | |
1464 | .TP | |
1465 | .B \fBint bpf_xdp_adjust_meta(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP | |
1466 | .INDENT 7.0 | |
1467 | .TP | |
1468 | .B Description | |
1469 | Adjust the address pointed by \fIxdp_md\fP\fB\->data_meta\fP by | |
1470 | \fIdelta\fP (which can be positive or negative). Note that this | |
1471 | operation modifies the address stored in \fIxdp_md\fP\fB\->data\fP, | |
1472 | so the latter must be loaded only after the helper has been | |
1473 | called. | |
1474 | .sp | |
1475 | The use of \fIxdp_md\fP\fB\->data_meta\fP is optional and programs | |
1476 | are not required to use it. The rationale is that when the | |
1477 | packet is processed with XDP (e.g. as DoS filter), it is | |
1478 | possible to push further meta data along with it before passing | |
1479 | to the stack, and to give the guarantee that an ingress eBPF | |
1480 | program attached as a TC classifier on the same device can pick | |
1481 | this up for further post\-processing. Since TC works with socket | |
1482 | buffers, it remains possible to set from XDP the \fBmark\fP or | |
1483 | \fBpriority\fP pointers, or other pointers for the socket buffer. | |
1484 | Having this scratch space generic and programmable allows for | |
1485 | more flexibility as the user is free to store whatever meta | |
1486 | data they need. | |
1487 | .sp | |
1488 | A call to this helper is susceptible to change the underlaying | |
1489 | packet buffer. Therefore, at load time, all checks on pointers | |
1490 | previously done by the verifier are invalidated and must be | |
1491 | performed again, if the helper is used in combination with | |
1492 | direct packet access. | |
1493 | .TP | |
1494 | .B Return | |
1495 | 0 on success, or a negative error in case of failure. | |
1496 | .UNINDENT | |
1497 | .TP | |
1498 | .B \fBint bpf_perf_event_read_value(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP | |
1499 | .INDENT 7.0 | |
1500 | .TP | |
1501 | .B Description | |
1502 | Read the value of a perf event counter, and store it into \fIbuf\fP | |
1503 | of size \fIbuf_size\fP\&. This helper relies on a \fImap\fP of type | |
1504 | \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of the perf event | |
1505 | counter is selected when \fImap\fP is updated with perf event file | |
1506 | descriptors. The \fImap\fP is an array whose size is the number of | |
1507 | available CPUs, and each cell contains a value relative to one | |
1508 | CPU. The value to retrieve is indicated by \fIflags\fP, that | |
1509 | contains the index of the CPU to look up, masked with | |
1510 | \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to | |
1511 | \fBBPF_F_CURRENT_CPU\fP to indicate that the value for the | |
1512 | current CPU should be retrieved. | |
1513 | .sp | |
1514 | This helper behaves in a way close to | |
1515 | \fBbpf_perf_event_read\fP() helper, save that instead of | |
1516 | just returning the value observed, it fills the \fIbuf\fP | |
1517 | structure. This allows for additional data to be retrieved: in | |
1518 | particular, the enabled and running times (in \fIbuf\fP\fB\->enabled\fP and \fIbuf\fP\fB\->running\fP, respectively) are | |
1519 | copied. In general, \fBbpf_perf_event_read_value\fP() is | |
1520 | recommended over \fBbpf_perf_event_read\fP(), which has some | |
1521 | ABI issues and provides fewer functionalities. | |
1522 | .sp | |
1523 | These values are interesting, because hardware PMU (Performance | |
1524 | Monitoring Unit) counters are limited resources. When there are | |
1525 | more PMU based perf events opened than available counters, | |
1526 | kernel will multiplex these events so each event gets certain | |
1527 | percentage (but not all) of the PMU time. In case that | |
1528 | multiplexing happens, the number of samples or counter value | |
1529 | will not reflect the case compared to when no multiplexing | |
1530 | occurs. This makes comparison between different runs difficult. | |
1531 | Typically, the counter value should be normalized before | |
1532 | comparing to other experiments. The usual normalization is done | |
1533 | as follows. | |
1534 | .INDENT 7.0 | |
1535 | .INDENT 3.5 | |
1536 | .sp | |
1537 | .nf | |
1538 | .ft C | |
1539 | normalized_counter = counter * t_enabled / t_running | |
1540 | .ft P | |
1541 | .fi | |
1542 | .UNINDENT | |
1543 | .UNINDENT | |
1544 | .sp | |
1545 | Where t_enabled is the time enabled for event and t_running is | |
1546 | the time running for event since last normalization. The | |
1547 | enabled and running times are accumulated since the perf event | |
1548 | open. To achieve scaling factor between two invocations of an | |
1549 | eBPF program, users can can use CPU id as the key (which is | |
1550 | typical for perf array usage model) to remember the previous | |
1551 | value and do the calculation inside the eBPF program. | |
1552 | .TP | |
1553 | .B Return | |
1554 | 0 on success, or a negative error in case of failure. | |
1555 | .UNINDENT | |
1556 | .TP | |
1557 | .B \fBint bpf_perf_prog_read_value(struct bpf_perf_event_data *\fP\fIctx\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP | |
1558 | .INDENT 7.0 | |
1559 | .TP | |
1560 | .B Description | |
1561 | For en eBPF program attached to a perf event, retrieve the | |
1562 | value of the event counter associated to \fIctx\fP and store it in | |
1563 | the structure pointed by \fIbuf\fP and of size \fIbuf_size\fP\&. Enabled | |
1564 | and running times are also stored in the structure (see | |
1565 | description of helper \fBbpf_perf_event_read_value\fP() for | |
1566 | more details). | |
1567 | .TP | |
1568 | .B Return | |
1569 | 0 on success, or a negative error in case of failure. | |
1570 | .UNINDENT | |
1571 | .TP | |
1572 | .B \fBint bpf_getsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP | |
1573 | .INDENT 7.0 | |
1574 | .TP | |
1575 | .B Description | |
1576 | Emulate a call to \fBgetsockopt()\fP on the socket associated to | |
1577 | \fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at | |
1578 | which the option resides and the name \fIoptname\fP of the option | |
1579 | must be specified, see \fBgetsockopt(2)\fP for more information. | |
1580 | The retrieved value is stored in the structure pointed by | |
1581 | \fIopval\fP and of length \fIoptlen\fP\&. | |
1582 | .sp | |
1583 | This helper actually implements a subset of \fBgetsockopt()\fP\&. | |
1584 | It supports the following \fIlevel\fPs: | |
1585 | .INDENT 7.0 | |
1586 | .IP \(bu 2 | |
1587 | \fBIPPROTO_TCP\fP, which supports \fIoptname\fP | |
1588 | \fBTCP_CONGESTION\fP\&. | |
1589 | .IP \(bu 2 | |
1590 | \fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&. | |
1591 | .IP \(bu 2 | |
1592 | \fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&. | |
1593 | .UNINDENT | |
1594 | .TP | |
1595 | .B Return | |
1596 | 0 on success, or a negative error in case of failure. | |
1597 | .UNINDENT | |
1598 | .TP | |
1599 | .B \fBint bpf_override_return(struct pt_reg *\fP\fIregs\fP\fB, u64\fP \fIrc\fP\fB)\fP | |
1600 | .INDENT 7.0 | |
1601 | .TP | |
1602 | .B Description | |
1603 | Used for error injection, this helper uses kprobes to override | |
1604 | the return value of the probed function, and to set it to \fIrc\fP\&. | |
1605 | The first argument is the context \fIregs\fP on which the kprobe | |
1606 | works. | |
1607 | .sp | |
1608 | This helper works by setting setting the PC (program counter) | |
1609 | to an override function which is run in place of the original | |
1610 | probed function. This means the probed function is not run at | |
1611 | all. The replacement function just returns with the required | |
1612 | value. | |
1613 | .sp | |
1614 | This helper has security implications, and thus is subject to | |
1615 | restrictions. It is only available if the kernel was compiled | |
1616 | with the \fBCONFIG_BPF_KPROBE_OVERRIDE\fP configuration | |
1617 | option, and in this case it only works on functions tagged with | |
1618 | \fBALLOW_ERROR_INJECTION\fP in the kernel code. | |
1619 | .sp | |
1620 | Also, the helper is only available for the architectures having | |
1621 | the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing, | |
1622 | x86 architecture is the only one to support this feature. | |
1623 | .TP | |
1624 | .B Return | |
53666f6c MK |
1625 | .UNINDENT |
1626 | .TP | |
1627 | .B \fBint bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *\fP\fIbpf_sock\fP\fB, int\fP \fIargval\fP\fB)\fP | |
1628 | .INDENT 7.0 | |
1629 | .TP | |
1630 | .B Description | |
1631 | Attempt to set the value of the \fBbpf_sock_ops_cb_flags\fP field | |
1632 | for the full TCP socket associated to \fIbpf_sock_ops\fP to | |
1633 | \fIargval\fP\&. | |
1634 | .sp | |
1635 | The primary use of this field is to determine if there should | |
1636 | be calls to eBPF programs of type | |
1637 | \fBBPF_PROG_TYPE_SOCK_OPS\fP at various points in the TCP | |
1638 | code. A program of the same type can change its value, per | |
1639 | connection and as necessary, when the connection is | |
1640 | established. This field is directly accessible for reading, but | |
1641 | this helper must be used for updates in order to return an | |
1642 | error if an eBPF program tries to set a callback that is not | |
1643 | supported in the current kernel. | |
1644 | .sp | |
1645 | The supported callback values that \fIargval\fP can combine are: | |
1646 | .INDENT 7.0 | |
1647 | .IP \(bu 2 | |
1648 | \fBBPF_SOCK_OPS_RTO_CB_FLAG\fP (retransmission time out) | |
1649 | .IP \(bu 2 | |
1650 | \fBBPF_SOCK_OPS_RETRANS_CB_FLAG\fP (retransmission) | |
1651 | .IP \(bu 2 | |
1652 | \fBBPF_SOCK_OPS_STATE_CB_FLAG\fP (TCP state change) | |
1653 | .UNINDENT | |
1654 | .sp | |
1655 | Here are some examples of where one could call such eBPF | |
1656 | program: | |
1657 | .INDENT 7.0 | |
1658 | .IP \(bu 2 | |
1659 | When RTO fires. | |
1660 | .IP \(bu 2 | |
1661 | When a packet is retransmitted. | |
1662 | .IP \(bu 2 | |
1663 | When the connection terminates. | |
1664 | .IP \(bu 2 | |
1665 | When a packet is sent. | |
1666 | .IP \(bu 2 | |
1667 | When a packet is received. | |
1668 | .UNINDENT | |
1669 | .TP | |
1670 | .B Return | |
1671 | Code \fB\-EINVAL\fP if the socket is not a full TCP socket; | |
1672 | otherwise, a positive number containing the bits that could not | |
1673 | be set is returned (which comes down to 0 if all bits were set | |
1674 | as required). | |
1675 | .UNINDENT | |
1676 | .TP | |
1677 | .B \fBint bpf_msg_redirect_map(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1678 | .INDENT 7.0 | |
1679 | .TP | |
1680 | .B Description | |
1681 | This helper is used in programs implementing policies at the | |
1682 | socket level. If the message \fImsg\fP is allowed to pass (i.e. if | |
1683 | the verdict eBPF program returns \fBSK_PASS\fP), redirect it to | |
1684 | the socket referenced by \fImap\fP (of type | |
1685 | \fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and | |
1686 | egress interfaces can be used for redirection. The | |
1687 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
1688 | distinction (ingress path is selected if the flag is present, | |
1689 | egress path otherwise). This is the only flag supported for now. | |
1690 | .TP | |
1691 | .B Return | |
1692 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
1693 | .UNINDENT | |
1694 | .TP | |
1695 | .B \fBint bpf_msg_apply_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP | |
1696 | .INDENT 7.0 | |
1697 | .TP | |
1698 | .B Description | |
1699 | For socket policies, apply the verdict of the eBPF program to | |
1700 | the next \fIbytes\fP (number of bytes) of message \fImsg\fP\&. | |
1701 | .sp | |
1702 | For example, this helper can be used in the following cases: | |
1703 | .INDENT 7.0 | |
1704 | .IP \(bu 2 | |
1705 | A single \fBsendmsg\fP() or \fBsendfile\fP() system call | |
1706 | contains multiple logical messages that the eBPF program is | |
1707 | supposed to read and for which it should apply a verdict. | |
1708 | .IP \(bu 2 | |
1709 | An eBPF program only cares to read the first \fIbytes\fP of a | |
1710 | \fImsg\fP\&. If the message has a large payload, then setting up | |
1711 | and calling the eBPF program repeatedly for all bytes, even | |
1712 | though the verdict is already known, would create unnecessary | |
1713 | overhead. | |
1714 | .UNINDENT | |
1715 | .sp | |
1716 | When called from within an eBPF program, the helper sets a | |
1717 | counter internal to the BPF infrastructure, that is used to | |
1718 | apply the last verdict to the next \fIbytes\fP\&. If \fIbytes\fP is | |
1719 | smaller than the current data being processed from a | |
1720 | \fBsendmsg\fP() or \fBsendfile\fP() system call, the first | |
1721 | \fIbytes\fP will be sent and the eBPF program will be re\-run with | |
1722 | the pointer for start of data pointing to byte number \fIbytes\fP | |
1723 | \fB+ 1\fP\&. If \fIbytes\fP is larger than the current data being | |
1724 | processed, then the eBPF verdict will be applied to multiple | |
1725 | \fBsendmsg\fP() or \fBsendfile\fP() calls until \fIbytes\fP are | |
1726 | consumed. | |
1727 | .sp | |
1728 | Note that if a socket closes with the internal counter holding | |
1729 | a non\-zero value, this is not a problem because data is not | |
1730 | being buffered for \fIbytes\fP and is sent as it is received. | |
1731 | .TP | |
1732 | .B Return | |
53666f6c MK |
1733 | .UNINDENT |
1734 | .TP | |
1735 | .B \fBint bpf_msg_cork_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP | |
1736 | .INDENT 7.0 | |
1737 | .TP | |
1738 | .B Description | |
1739 | For socket policies, prevent the execution of the verdict eBPF | |
1740 | program for message \fImsg\fP until \fIbytes\fP (byte number) have been | |
1741 | accumulated. | |
1742 | .sp | |
1743 | This can be used when one needs a specific number of bytes | |
1744 | before a verdict can be assigned, even if the data spans | |
1745 | multiple \fBsendmsg\fP() or \fBsendfile\fP() calls. The extreme | |
1746 | case would be a user calling \fBsendmsg\fP() repeatedly with | |
1747 | 1\-byte long message segments. Obviously, this is bad for | |
1748 | performance, but it is still valid. If the eBPF program needs | |
1749 | \fIbytes\fP bytes to validate a header, this helper can be used to | |
1750 | prevent the eBPF program to be called again until \fIbytes\fP have | |
1751 | been accumulated. | |
1752 | .TP | |
1753 | .B Return | |
53666f6c MK |
1754 | .UNINDENT |
1755 | .TP | |
1756 | .B \fBint bpf_msg_pull_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIend\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1757 | .INDENT 7.0 | |
1758 | .TP | |
1759 | .B Description | |
1760 | For socket policies, pull in non\-linear data from user space | |
1761 | for \fImsg\fP and set pointers \fImsg\fP\fB\->data\fP and \fImsg\fP\fB\->data_end\fP to \fIstart\fP and \fIend\fP bytes offsets into \fImsg\fP, | |
1762 | respectively. | |
1763 | .sp | |
1764 | If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a | |
1765 | \fImsg\fP it can only parse data that the (\fBdata\fP, \fBdata_end\fP) | |
1766 | pointers have already consumed. For \fBsendmsg\fP() hooks this | |
1767 | is likely the first scatterlist element. But for calls relying | |
1768 | on the \fBsendpage\fP handler (e.g. \fBsendfile\fP()) this will | |
1769 | be the range (\fB0\fP, \fB0\fP) because the data is shared with | |
1770 | user space and by default the objective is to avoid allowing | |
1771 | user space to modify data while (or after) eBPF verdict is | |
1772 | being decided. This helper can be used to pull in data and to | |
1773 | set the start and end pointer to given values. Data will be | |
1774 | copied if necessary (i.e. if data was not linear and if start | |
1775 | and end pointers do not point to the same chunk). | |
1776 | .sp | |
1777 | A call to this helper is susceptible to change the underlaying | |
1778 | packet buffer. Therefore, at load time, all checks on pointers | |
1779 | previously done by the verifier are invalidated and must be | |
1780 | performed again, if the helper is used in combination with | |
1781 | direct packet access. | |
1782 | .sp | |
1783 | All values for \fIflags\fP are reserved for future usage, and must | |
1784 | be left at zero. | |
1785 | .TP | |
1786 | .B Return | |
1787 | 0 on success, or a negative error in case of failure. | |
1788 | .UNINDENT | |
1789 | .TP | |
1790 | .B \fBint bpf_bind(struct bpf_sock_addr *\fP\fIctx\fP\fB, struct sockaddr *\fP\fIaddr\fP\fB, int\fP \fIaddr_len\fP\fB)\fP | |
1791 | .INDENT 7.0 | |
1792 | .TP | |
1793 | .B Description | |
1794 | Bind the socket associated to \fIctx\fP to the address pointed by | |
1795 | \fIaddr\fP, of length \fIaddr_len\fP\&. This allows for making outgoing | |
1796 | connection from the desired IP address, which can be useful for | |
1797 | example when all processes inside a cgroup should use one | |
1798 | single IP address on a host that has multiple IP configured. | |
1799 | .sp | |
1800 | This helper works for IPv4 and IPv6, TCP and UDP sockets. The | |
1801 | domain (\fIaddr\fP\fB\->sa_family\fP) must be \fBAF_INET\fP (or | |
1802 | \fBAF_INET6\fP). Looking for a free port to bind to can be | |
1803 | expensive, therefore binding to port is not permitted by the | |
1804 | helper: \fIaddr\fP\fB\->sin_port\fP (or \fBsin6_port\fP, respectively) | |
1805 | must be set to zero. | |
1806 | .TP | |
1807 | .B Return | |
1808 | 0 on success, or a negative error in case of failure. | |
1809 | .UNINDENT | |
1810 | .TP | |
1811 | .B \fBint bpf_xdp_adjust_tail(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP | |
1812 | .INDENT 7.0 | |
1813 | .TP | |
1814 | .B Description | |
1815 | Adjust (move) \fIxdp_md\fP\fB\->data_end\fP by \fIdelta\fP bytes. It is | |
1816 | only possible to shrink the packet as of this writing, | |
1817 | therefore \fIdelta\fP must be a negative integer. | |
1818 | .sp | |
1819 | A call to this helper is susceptible to change the underlaying | |
1820 | packet buffer. Therefore, at load time, all checks on pointers | |
1821 | previously done by the verifier are invalidated and must be | |
1822 | performed again, if the helper is used in combination with | |
1823 | direct packet access. | |
1824 | .TP | |
1825 | .B Return | |
1826 | 0 on success, or a negative error in case of failure. | |
1827 | .UNINDENT | |
1828 | .TP | |
1829 | .B \fBint bpf_skb_get_xfrm_state(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIindex\fP\fB, struct bpf_xfrm_state *\fP\fIxfrm_state\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1830 | .INDENT 7.0 | |
1831 | .TP | |
1832 | .B Description | |
1833 | Retrieve the XFRM state (IP transform framework, see also | |
1834 | \fBip\-xfrm(8)\fP) at \fIindex\fP in XFRM "security path" for \fIskb\fP\&. | |
1835 | .sp | |
1836 | The retrieved value is stored in the \fBstruct bpf_xfrm_state\fP | |
1837 | pointed by \fIxfrm_state\fP and of length \fIsize\fP\&. | |
1838 | .sp | |
1839 | All values for \fIflags\fP are reserved for future usage, and must | |
1840 | be left at zero. | |
1841 | .sp | |
1842 | This helper is available only if the kernel was compiled with | |
1843 | \fBCONFIG_XFRM\fP configuration option. | |
1844 | .TP | |
1845 | .B Return | |
1846 | 0 on success, or a negative error in case of failure. | |
1847 | .UNINDENT | |
1848 | .TP | |
1849 | .B \fBint bpf_get_stack(struct pt_regs *\fP\fIregs\fP\fB, void *\fP\fIbuf\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1850 | .INDENT 7.0 | |
1851 | .TP | |
1852 | .B Description | |
1853 | Return a user or a kernel stack in bpf program provided buffer. | |
1854 | To achieve this, the helper needs \fIctx\fP, which is a pointer | |
1855 | to the context on which the tracing program is executed. | |
1856 | To store the stacktrace, the bpf program provides \fIbuf\fP with | |
1857 | a nonnegative \fIsize\fP\&. | |
1858 | .sp | |
1859 | The last argument, \fIflags\fP, holds the number of stack frames to | |
1860 | skip (from 0 to 255), masked with | |
1861 | \fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set | |
1862 | the following flags: | |
1863 | .INDENT 7.0 | |
1864 | .TP | |
1865 | .B \fBBPF_F_USER_STACK\fP | |
1866 | Collect a user space stack instead of a kernel stack. | |
1867 | .TP | |
1868 | .B \fBBPF_F_USER_BUILD_ID\fP | |
1869 | Collect buildid+offset instead of ips for user stack, | |
1870 | only valid if \fBBPF_F_USER_STACK\fP is also specified. | |
1871 | .UNINDENT | |
1872 | .sp | |
1873 | \fBbpf_get_stack\fP() can collect up to | |
1874 | \fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames, subject | |
1875 | to sufficient large buffer size. Note that | |
1876 | this limit can be controlled with the \fBsysctl\fP program, and | |
1877 | that it should be manually increased in order to profile long | |
1878 | user stacks (such as stacks for Java programs). To do so, use: | |
1879 | .INDENT 7.0 | |
1880 | .INDENT 3.5 | |
1881 | .sp | |
1882 | .nf | |
1883 | .ft C | |
1884 | # sysctl kernel.perf_event_max_stack=<new value> | |
1885 | .ft P | |
1886 | .fi | |
1887 | .UNINDENT | |
1888 | .UNINDENT | |
1889 | .TP | |
1890 | .B Return | |
1891 | A non\-negative value equal to or less than \fIsize\fP on success, | |
1892 | or a negative error in case of failure. | |
1893 | .UNINDENT | |
1894 | .TP | |
1895 | .B \fBint bpf_skb_load_bytes_relative(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB, u32\fP \fIstart_header\fP\fB)\fP | |
1896 | .INDENT 7.0 | |
1897 | .TP | |
1898 | .B Description | |
1899 | This helper is similar to \fBbpf_skb_load_bytes\fP() in that | |
1900 | it provides an easy way to load \fIlen\fP bytes from \fIoffset\fP | |
1901 | from the packet associated to \fIskb\fP, into the buffer pointed | |
1902 | by \fIto\fP\&. The difference to \fBbpf_skb_load_bytes\fP() is that | |
1903 | a fifth argument \fIstart_header\fP exists in order to select a | |
1904 | base offset to start from. \fIstart_header\fP can be one of: | |
1905 | .INDENT 7.0 | |
1906 | .TP | |
1907 | .B \fBBPF_HDR_START_MAC\fP | |
1908 | Base offset to load data from is \fIskb\fP\(aqs mac header. | |
1909 | .TP | |
1910 | .B \fBBPF_HDR_START_NET\fP | |
1911 | Base offset to load data from is \fIskb\fP\(aqs network header. | |
1912 | .UNINDENT | |
1913 | .sp | |
1914 | In general, "direct packet access" is the preferred method to | |
1915 | access packet data, however, this helper is in particular useful | |
1916 | in socket filters where \fIskb\fP\fB\->data\fP does not always point | |
1917 | to the start of the mac header and where "direct packet access" | |
1918 | is not available. | |
1919 | .TP | |
1920 | .B Return | |
1921 | 0 on success, or a negative error in case of failure. | |
1922 | .UNINDENT | |
1923 | .TP | |
1924 | .B \fBint bpf_fib_lookup(void *\fP\fIctx\fP\fB, struct bpf_fib_lookup *\fP\fIparams\fP\fB, int\fP \fIplen\fP\fB, u32\fP \fIflags\fP\fB)\fP | |
1925 | .INDENT 7.0 | |
1926 | .TP | |
1927 | .B Description | |
1928 | Do FIB lookup in kernel tables using parameters in \fIparams\fP\&. | |
1929 | If lookup is successful and result shows packet is to be | |
1930 | forwarded, the neighbor tables are searched for the nexthop. | |
1931 | If successful (ie., FIB lookup shows forwarding and nexthop | |
1932 | is resolved), the nexthop address is returned in ipv4_dst | |
1933 | or ipv6_dst based on family, smac is set to mac address of | |
1934 | egress device, dmac is set to nexthop mac address, rt_metric | |
1935 | is set to metric from route (IPv4/IPv6 only), and ifindex | |
1936 | is set to the device index of the nexthop from the FIB lookup. | |
1937 | .sp | |
1938 | \fIplen\fP argument is the size of the passed in struct. | |
1939 | \fIflags\fP argument can be a combination of one or more of the | |
1940 | following values: | |
1941 | .INDENT 7.0 | |
1942 | .TP | |
1943 | .B \fBBPF_FIB_LOOKUP_DIRECT\fP | |
1944 | Do a direct table lookup vs full lookup using FIB | |
1945 | rules. | |
1946 | .TP | |
1947 | .B \fBBPF_FIB_LOOKUP_OUTPUT\fP | |
1948 | Perform lookup from an egress perspective (default is | |
1949 | ingress). | |
1950 | .UNINDENT | |
1951 | .sp | |
1952 | \fIctx\fP is either \fBstruct xdp_md\fP for XDP programs or | |
1953 | \fBstruct sk_buff\fP tc cls_act programs. | |
1954 | .TP | |
1955 | .B Return | |
1956 | .INDENT 7.0 | |
1957 | .IP \(bu 2 | |
1958 | < 0 if any input argument is invalid | |
1959 | .IP \(bu 2 | |
1960 | 0 on success (packet is forwarded, nexthop neighbor exists) | |
1961 | .IP \(bu 2 | |
1962 | > 0 one of \fBBPF_FIB_LKUP_RET_\fP codes explaining why the | |
1963 | packet is not forwarded or needs assist from full stack | |
1964 | .UNINDENT | |
1965 | .UNINDENT | |
1966 | .TP | |
1967 | .B \fBint bpf_sock_hash_update(struct bpf_sock_ops_kern *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1968 | .INDENT 7.0 | |
1969 | .TP | |
1970 | .B Description | |
1971 | Add an entry to, or update a sockhash \fImap\fP referencing sockets. | |
1972 | The \fIskops\fP is used as a new value for the entry associated to | |
1973 | \fIkey\fP\&. \fIflags\fP is one of: | |
1974 | .INDENT 7.0 | |
1975 | .TP | |
1976 | .B \fBBPF_NOEXIST\fP | |
1977 | The entry for \fIkey\fP must not exist in the map. | |
1978 | .TP | |
1979 | .B \fBBPF_EXIST\fP | |
1980 | The entry for \fIkey\fP must already exist in the map. | |
1981 | .TP | |
1982 | .B \fBBPF_ANY\fP | |
1983 | No condition on the existence of the entry for \fIkey\fP\&. | |
1984 | .UNINDENT | |
1985 | .sp | |
1986 | If the \fImap\fP has eBPF programs (parser and verdict), those will | |
1987 | be inherited by the socket being added. If the socket is | |
1988 | already attached to eBPF programs, this results in an error. | |
1989 | .TP | |
1990 | .B Return | |
1991 | 0 on success, or a negative error in case of failure. | |
1992 | .UNINDENT | |
1993 | .TP | |
1994 | .B \fBint bpf_msg_redirect_hash(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1995 | .INDENT 7.0 | |
1996 | .TP | |
1997 | .B Description | |
1998 | This helper is used in programs implementing policies at the | |
1999 | socket level. If the message \fImsg\fP is allowed to pass (i.e. if | |
2000 | the verdict eBPF program returns \fBSK_PASS\fP), redirect it to | |
2001 | the socket referenced by \fImap\fP (of type | |
2002 | \fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and | |
2003 | egress interfaces can be used for redirection. The | |
2004 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
2005 | distinction (ingress path is selected if the flag is present, | |
2006 | egress path otherwise). This is the only flag supported for now. | |
2007 | .TP | |
2008 | .B Return | |
2009 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
2010 | .UNINDENT | |
2011 | .TP | |
2012 | .B \fBint bpf_sk_redirect_hash(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2013 | .INDENT 7.0 | |
2014 | .TP | |
2015 | .B Description | |
2016 | This helper is used in programs implementing policies at the | |
2017 | skb socket level. If the sk_buff \fIskb\fP is allowed to pass (i.e. | |
2018 | if the verdeict eBPF program returns \fBSK_PASS\fP), redirect it | |
2019 | to the socket referenced by \fImap\fP (of type | |
2020 | \fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and | |
2021 | egress interfaces can be used for redirection. The | |
2022 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
2023 | distinction (ingress path is selected if the flag is present, | |
2024 | egress otherwise). This is the only flag supported for now. | |
2025 | .TP | |
2026 | .B Return | |
2027 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
2028 | .UNINDENT | |
2029 | .TP | |
2030 | .B \fBint bpf_lwt_push_encap(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB, void *\fP\fIhdr\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
2031 | .INDENT 7.0 | |
2032 | .TP | |
2033 | .B Description | |
2034 | Encapsulate the packet associated to \fIskb\fP within a Layer 3 | |
2035 | protocol header. This header is provided in the buffer at | |
2036 | address \fIhdr\fP, with \fIlen\fP its size in bytes. \fItype\fP indicates | |
2037 | the protocol of the header and can be one of: | |
2038 | .INDENT 7.0 | |
2039 | .TP | |
2040 | .B \fBBPF_LWT_ENCAP_SEG6\fP | |
2041 | IPv6 encapsulation with Segment Routing Header | |
2042 | (\fBstruct ipv6_sr_hdr\fP). \fIhdr\fP only contains the SRH, | |
2043 | the IPv6 header is computed by the kernel. | |
2044 | .TP | |
2045 | .B \fBBPF_LWT_ENCAP_SEG6_INLINE\fP | |
2046 | Only works if \fIskb\fP contains an IPv6 packet. Insert a | |
2047 | Segment Routing Header (\fBstruct ipv6_sr_hdr\fP) inside | |
2048 | the IPv6 header. | |
2049 | .UNINDENT | |
2050 | .sp | |
2051 | A call to this helper is susceptible to change the underlaying | |
2052 | packet buffer. Therefore, at load time, all checks on pointers | |
2053 | previously done by the verifier are invalidated and must be | |
2054 | performed again, if the helper is used in combination with | |
2055 | direct packet access. | |
2056 | .TP | |
2057 | .B Return | |
2058 | 0 on success, or a negative error in case of failure. | |
2059 | .UNINDENT | |
2060 | .TP | |
2061 | .B \fBint bpf_lwt_seg6_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
2062 | .INDENT 7.0 | |
2063 | .TP | |
2064 | .B Description | |
2065 | Store \fIlen\fP bytes from address \fIfrom\fP into the packet | |
2066 | associated to \fIskb\fP, at \fIoffset\fP\&. Only the flags, tag and TLVs | |
2067 | inside the outermost IPv6 Segment Routing Header can be | |
2068 | modified through this helper. | |
2069 | .sp | |
2070 | A call to this helper is susceptible to change the underlaying | |
2071 | packet buffer. Therefore, at load time, all checks on pointers | |
2072 | previously done by the verifier are invalidated and must be | |
2073 | performed again, if the helper is used in combination with | |
2074 | direct packet access. | |
2075 | .TP | |
2076 | .B Return | |
2077 | 0 on success, or a negative error in case of failure. | |
2078 | .UNINDENT | |
2079 | .TP | |
2080 | .B \fBint bpf_lwt_seg6_adjust_srh(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, s32\fP \fIdelta\fP\fB)\fP | |
2081 | .INDENT 7.0 | |
2082 | .TP | |
2083 | .B Description | |
2084 | Adjust the size allocated to TLVs in the outermost IPv6 | |
2085 | Segment Routing Header contained in the packet associated to | |
2086 | \fIskb\fP, at position \fIoffset\fP by \fIdelta\fP bytes. Only offsets | |
2087 | after the segments are accepted. \fIdelta\fP can be as well | |
2088 | positive (growing) as negative (shrinking). | |
2089 | .sp | |
2090 | A call to this helper is susceptible to change the underlaying | |
2091 | packet buffer. Therefore, at load time, all checks on pointers | |
2092 | previously done by the verifier are invalidated and must be | |
2093 | performed again, if the helper is used in combination with | |
2094 | direct packet access. | |
2095 | .TP | |
2096 | .B Return | |
2097 | 0 on success, or a negative error in case of failure. | |
2098 | .UNINDENT | |
2099 | .TP | |
2100 | .B \fBint bpf_lwt_seg6_action(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIaction\fP\fB, void *\fP\fIparam\fP\fB, u32\fP \fIparam_len\fP\fB)\fP | |
2101 | .INDENT 7.0 | |
2102 | .TP | |
2103 | .B Description | |
2104 | Apply an IPv6 Segment Routing action of type \fIaction\fP to the | |
2105 | packet associated to \fIskb\fP\&. Each action takes a parameter | |
2106 | contained at address \fIparam\fP, and of length \fIparam_len\fP bytes. | |
2107 | \fIaction\fP can be one of: | |
2108 | .INDENT 7.0 | |
2109 | .TP | |
2110 | .B \fBSEG6_LOCAL_ACTION_END_X\fP | |
2111 | End.X action: Endpoint with Layer\-3 cross\-connect. | |
2112 | Type of \fIparam\fP: \fBstruct in6_addr\fP\&. | |
2113 | .TP | |
2114 | .B \fBSEG6_LOCAL_ACTION_END_T\fP | |
2115 | End.T action: Endpoint with specific IPv6 table lookup. | |
2116 | Type of \fIparam\fP: \fBint\fP\&. | |
2117 | .TP | |
2118 | .B \fBSEG6_LOCAL_ACTION_END_B6\fP | |
2119 | End.B6 action: Endpoint bound to an SRv6 policy. | |
2120 | Type of param: \fBstruct ipv6_sr_hdr\fP\&. | |
2121 | .TP | |
2122 | .B \fBSEG6_LOCAL_ACTION_END_B6_ENCAP\fP | |
2123 | End.B6.Encap action: Endpoint bound to an SRv6 | |
2124 | encapsulation policy. | |
2125 | Type of param: \fBstruct ipv6_sr_hdr\fP\&. | |
2126 | .UNINDENT | |
2127 | .sp | |
2128 | A call to this helper is susceptible to change the underlaying | |
2129 | packet buffer. Therefore, at load time, all checks on pointers | |
2130 | previously done by the verifier are invalidated and must be | |
2131 | performed again, if the helper is used in combination with | |
2132 | direct packet access. | |
2133 | .TP | |
2134 | .B Return | |
2135 | 0 on success, or a negative error in case of failure. | |
2136 | .UNINDENT | |
2137 | .TP | |
2138 | .B \fBint bpf_rc_keydown(void *\fP\fIctx\fP\fB, u32\fP \fIprotocol\fP\fB, u64\fP \fIscancode\fP\fB, u32\fP \fItoggle\fP\fB)\fP | |
2139 | .INDENT 7.0 | |
2140 | .TP | |
2141 | .B Description | |
2142 | This helper is used in programs implementing IR decoding, to | |
2143 | report a successfully decoded key press with \fIscancode\fP, | |
2144 | \fItoggle\fP value in the given \fIprotocol\fP\&. The scancode will be | |
2145 | translated to a keycode using the rc keymap, and reported as | |
2146 | an input key down event. After a period a key up event is | |
2147 | generated. This period can be extended by calling either | |
2223d7df MK |
2148 | \fBbpf_rc_keydown\fP() again with the same values, or calling |
2149 | \fBbpf_rc_repeat\fP(). | |
53666f6c MK |
2150 | .sp |
2151 | Some protocols include a toggle bit, in case the button was | |
2152 | released and pressed again between consecutive scancodes. | |
2153 | .sp | |
2154 | The \fIctx\fP should point to the lirc sample as passed into | |
2155 | the program. | |
2156 | .sp | |
2157 | The \fIprotocol\fP is the decoded protocol number (see | |
2158 | \fBenum rc_proto\fP for some predefined values). | |
2159 | .sp | |
2160 | This helper is only available is the kernel was compiled with | |
2161 | the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to | |
2162 | "\fBy\fP". | |
2163 | .TP | |
2164 | .B Return | |
53666f6c MK |
2165 | .UNINDENT |
2166 | .TP | |
2167 | .B \fBint bpf_rc_repeat(void *\fP\fIctx\fP\fB)\fP | |
2168 | .INDENT 7.0 | |
2169 | .TP | |
2170 | .B Description | |
2171 | This helper is used in programs implementing IR decoding, to | |
2172 | report a successfully decoded repeat key message. This delays | |
2173 | the generation of a key up event for previously generated | |
2174 | key down event. | |
2175 | .sp | |
2176 | Some IR protocols like NEC have a special IR message for | |
2177 | repeating last button, for when a button is held down. | |
2178 | .sp | |
2179 | The \fIctx\fP should point to the lirc sample as passed into | |
2180 | the program. | |
2181 | .sp | |
2182 | This helper is only available is the kernel was compiled with | |
2183 | the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to | |
2184 | "\fBy\fP". | |
2185 | .TP | |
2186 | .B Return | |
53666f6c MK |
2187 | .UNINDENT |
2188 | .TP | |
2189 | .B \fBuint64_t bpf_skb_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
2190 | .INDENT 7.0 | |
2191 | .TP | |
2192 | .B Description | |
2193 | Return the cgroup v2 id of the socket associated with the \fIskb\fP\&. | |
2194 | This is roughly similar to the \fBbpf_get_cgroup_classid\fP() | |
2195 | helper for cgroup v1 by providing a tag resp. identifier that | |
2196 | can be matched on or used for map lookups e.g. to implement | |
2197 | policy. The cgroup v2 id of a given path in the hierarchy is | |
2198 | exposed in user space through the f_handle API in order to get | |
2199 | to the same 64\-bit id. | |
2200 | .sp | |
2201 | This helper can be used on TC egress path, but not on ingress, | |
2202 | and is available only if the kernel was compiled with the | |
2203 | \fBCONFIG_SOCK_CGROUP_DATA\fP configuration option. | |
2204 | .TP | |
2205 | .B Return | |
2206 | The id is returned or 0 in case the id could not be retrieved. | |
2207 | .UNINDENT | |
2208 | .TP | |
2209 | .B \fBu64 bpf_skb_ancestor_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB, int\fP \fIancestor_level\fP\fB)\fP | |
2210 | .INDENT 7.0 | |
2211 | .TP | |
2212 | .B Description | |
2213 | Return id of cgroup v2 that is ancestor of cgroup associated | |
2214 | with the \fIskb\fP at the \fIancestor_level\fP\&. The root cgroup is at | |
2215 | \fIancestor_level\fP zero and each step down the hierarchy | |
2216 | increments the level. If \fIancestor_level\fP == level of cgroup | |
2217 | associated with \fIskb\fP, then return value will be same as that | |
2218 | of \fBbpf_skb_cgroup_id\fP(). | |
2219 | .sp | |
2220 | The helper is useful to implement policies based on cgroups | |
2221 | that are upper in hierarchy than immediate cgroup associated | |
2222 | with \fIskb\fP\&. | |
2223 | .sp | |
2224 | The format of returned id and helper limitations are same as in | |
2225 | \fBbpf_skb_cgroup_id\fP(). | |
2226 | .TP | |
2227 | .B Return | |
2228 | The id is returned or 0 in case the id could not be retrieved. | |
2229 | .UNINDENT | |
2230 | .TP | |
2231 | .B \fBu64 bpf_get_current_cgroup_id(void)\fP | |
2232 | .INDENT 7.0 | |
2233 | .TP | |
2234 | .B Return | |
2235 | A 64\-bit integer containing the current cgroup id based | |
2236 | on the cgroup within which the current task is running. | |
2237 | .UNINDENT | |
2238 | .TP | |
2239 | .B \fBvoid* get_local_storage(void *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2240 | .INDENT 7.0 | |
2241 | .TP | |
2242 | .B Description | |
2243 | Get the pointer to the local storage area. | |
2244 | The type and the size of the local storage is defined | |
2245 | by the \fImap\fP argument. | |
2246 | The \fIflags\fP meaning is specific for each map type, | |
2247 | and has to be 0 for cgroup local storage. | |
2248 | .sp | |
2223d7df MK |
2249 | Depending on the BPF program type, a local storage area |
2250 | can be shared between multiple instances of the BPF program, | |
53666f6c MK |
2251 | running simultaneously. |
2252 | .sp | |
2253 | A user should care about the synchronization by himself. | |
2223d7df | 2254 | For example, by using the \fBBPF_STX_XADD\fP instruction to alter |
53666f6c MK |
2255 | the shared data. |
2256 | .TP | |
2257 | .B Return | |
2223d7df | 2258 | A pointer to the local storage area. |
53666f6c MK |
2259 | .UNINDENT |
2260 | .TP | |
2261 | .B \fBint bpf_sk_select_reuseport(struct sk_reuseport_md *\fP\fIreuse\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2262 | .INDENT 7.0 | |
2263 | .TP | |
2264 | .B Description | |
2223d7df MK |
2265 | Select a \fBSO_REUSEPORT\fP socket from a |
2266 | \fBBPF_MAP_TYPE_REUSEPORT_ARRAY\fP \fImap\fP\&. | |
2267 | It checks the selected socket is matching the incoming | |
2268 | request in the socket buffer. | |
53666f6c MK |
2269 | .TP |
2270 | .B Return | |
2271 | 0 on success, or a negative error in case of failure. | |
2272 | .UNINDENT | |
2223d7df MK |
2273 | .TP |
2274 | .B \fBstruct bpf_sock *bpf_sk_lookup_tcp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2275 | .INDENT 7.0 | |
2276 | .TP | |
2277 | .B Description | |
2278 | Look for TCP socket matching \fItuple\fP, optionally in a child | |
2279 | network namespace \fInetns\fP\&. The return value must be checked, | |
2280 | and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP(). | |
2281 | .sp | |
2282 | The \fIctx\fP should point to the context of the program, such as | |
2283 | the skb or socket (depending on the hook in use). This is used | |
2284 | to determine the base network namespace for the lookup. | |
2285 | .sp | |
2286 | \fItuple_size\fP must be one of: | |
2287 | .INDENT 7.0 | |
2288 | .TP | |
2289 | .B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP) | |
2290 | Look for an IPv4 socket. | |
2291 | .TP | |
2292 | .B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP) | |
2293 | Look for an IPv6 socket. | |
2294 | .UNINDENT | |
2295 | .sp | |
2296 | If the \fInetns\fP is a negative signed 32\-bit integer, then the | |
2297 | socket lookup table in the netns associated with the \fIctx\fP will | |
2298 | will be used. For the TC hooks, this is the netns of the device | |
2299 | in the skb. For socket hooks, this is the netns of the socket. | |
2300 | If \fInetns\fP is any other signed 32\-bit value greater than or | |
2301 | equal to zero then it specifies the ID of the netns relative to | |
2302 | the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the | |
2303 | range of 32\-bit integers are reserved for future use. | |
2304 | .sp | |
2305 | All values for \fIflags\fP are reserved for future usage, and must | |
2306 | be left at zero. | |
2307 | .sp | |
2308 | This helper is available only if the kernel was compiled with | |
2309 | \fBCONFIG_NET\fP configuration option. | |
2310 | .TP | |
2311 | .B Return | |
2312 | Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure. | |
2313 | For sockets with reuseport option, the \fBstruct bpf_sock\fP | |
2314 | result is from \fBreuse\->socks\fP[] using the hash of the tuple. | |
2315 | .UNINDENT | |
2316 | .TP | |
2317 | .B \fBstruct bpf_sock *bpf_sk_lookup_udp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2318 | .INDENT 7.0 | |
2319 | .TP | |
2320 | .B Description | |
2321 | Look for UDP socket matching \fItuple\fP, optionally in a child | |
2322 | network namespace \fInetns\fP\&. The return value must be checked, | |
2323 | and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP(). | |
2324 | .sp | |
2325 | The \fIctx\fP should point to the context of the program, such as | |
2326 | the skb or socket (depending on the hook in use). This is used | |
2327 | to determine the base network namespace for the lookup. | |
2328 | .sp | |
2329 | \fItuple_size\fP must be one of: | |
2330 | .INDENT 7.0 | |
2331 | .TP | |
2332 | .B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP) | |
2333 | Look for an IPv4 socket. | |
2334 | .TP | |
2335 | .B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP) | |
2336 | Look for an IPv6 socket. | |
2337 | .UNINDENT | |
2338 | .sp | |
2339 | If the \fInetns\fP is a negative signed 32\-bit integer, then the | |
2340 | socket lookup table in the netns associated with the \fIctx\fP will | |
2341 | will be used. For the TC hooks, this is the netns of the device | |
2342 | in the skb. For socket hooks, this is the netns of the socket. | |
2343 | If \fInetns\fP is any other signed 32\-bit value greater than or | |
2344 | equal to zero then it specifies the ID of the netns relative to | |
2345 | the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the | |
2346 | range of 32\-bit integers are reserved for future use. | |
2347 | .sp | |
2348 | All values for \fIflags\fP are reserved for future usage, and must | |
2349 | be left at zero. | |
2350 | .sp | |
2351 | This helper is available only if the kernel was compiled with | |
2352 | \fBCONFIG_NET\fP configuration option. | |
2353 | .TP | |
2354 | .B Return | |
2355 | Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure. | |
2356 | For sockets with reuseport option, the \fBstruct bpf_sock\fP | |
2357 | result is from \fBreuse\->socks\fP[] using the hash of the tuple. | |
2358 | .UNINDENT | |
2359 | .TP | |
2360 | .B \fBint bpf_sk_release(struct bpf_sock *\fP\fIsock\fP\fB)\fP | |
2361 | .INDENT 7.0 | |
2362 | .TP | |
2363 | .B Description | |
2364 | Release the reference held by \fIsock\fP\&. \fIsock\fP must be a | |
2365 | non\-\fBNULL\fP pointer that was returned from | |
2366 | \fBbpf_sk_lookup_xxx\fP(). | |
2367 | .TP | |
2368 | .B Return | |
2369 | 0 on success, or a negative error in case of failure. | |
2370 | .UNINDENT | |
2371 | .TP | |
2372 | .B \fBint bpf_map_pop_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP | |
2373 | .INDENT 7.0 | |
2374 | .TP | |
2375 | .B Description | |
2376 | Pop an element from \fImap\fP\&. | |
2377 | .TP | |
2378 | .B Return | |
2379 | 0 on success, or a negative error in case of failure. | |
2380 | .UNINDENT | |
2381 | .TP | |
2382 | .B \fBint bpf_map_peek_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP | |
2383 | .INDENT 7.0 | |
2384 | .TP | |
2385 | .B Description | |
2386 | Get an element from \fImap\fP without removing it. | |
2387 | .TP | |
2388 | .B Return | |
2389 | 0 on success, or a negative error in case of failure. | |
2390 | .UNINDENT | |
2391 | .TP | |
2392 | .B \fBint bpf_msg_push_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2393 | .INDENT 7.0 | |
2394 | .TP | |
2395 | .B Description | |
2396 | For socket policies, insert \fIlen\fP bytes into \fImsg\fP at offset | |
2397 | \fIstart\fP\&. | |
2398 | .sp | |
2399 | If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a | |
2400 | \fImsg\fP it may want to insert metadata or options into the \fImsg\fP\&. | |
2401 | This can later be read and used by any of the lower layer BPF | |
2402 | hooks. | |
2403 | .sp | |
2404 | This helper may fail if under memory pressure (a malloc | |
2405 | fails) in these cases BPF programs will get an appropriate | |
2406 | error and BPF programs will need to handle them. | |
2407 | .TP | |
2408 | .B Return | |
2409 | 0 on success, or a negative error in case of failure. | |
2410 | .UNINDENT | |
2411 | .TP | |
2412 | .B \fBint bpf_msg_pop_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIpop\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2413 | .INDENT 7.0 | |
2414 | .TP | |
2415 | .B Description | |
2416 | Will remove \fIpop\fP bytes from a \fImsg\fP starting at byte \fIstart\fP\&. | |
2417 | This may result in \fBENOMEM\fP errors under certain situations if | |
2418 | an allocation and copy are required due to a full ring buffer. | |
2419 | However, the helper will try to avoid doing the allocation | |
2420 | if possible. Other errors can occur if input parameters are | |
2421 | invalid either due to \fIstart\fP byte not being valid part of \fImsg\fP | |
2422 | payload and/or \fIpop\fP value being to large. | |
2423 | .TP | |
2424 | .B Return | |
2425 | 0 on success, or a negative error in case of failure. | |
2426 | .UNINDENT | |
2427 | .TP | |
2428 | .B \fBint bpf_rc_pointer_rel(void *\fP\fIctx\fP\fB, s32\fP \fIrel_x\fP\fB, s32\fP \fIrel_y\fP\fB)\fP | |
2429 | .INDENT 7.0 | |
2430 | .TP | |
2431 | .B Description | |
2432 | This helper is used in programs implementing IR decoding, to | |
2433 | report a successfully decoded pointer movement. | |
2434 | .sp | |
2435 | The \fIctx\fP should point to the lirc sample as passed into | |
2436 | the program. | |
2437 | .sp | |
2438 | This helper is only available is the kernel was compiled with | |
2439 | the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to | |
2440 | "\fBy\fP". | |
2441 | .TP | |
2442 | .B Return | |
2443 | .UNINDENT | |
53666f6c MK |
2444 | .UNINDENT |
2445 | .SH EXAMPLES | |
2446 | .sp | |
2447 | Example usage for most of the eBPF helpers listed in this manual page are | |
2448 | available within the Linux kernel sources, at the following locations: | |
2449 | .INDENT 0.0 | |
2450 | .IP \(bu 2 | |
2451 | \fIsamples/bpf/\fP | |
2452 | .IP \(bu 2 | |
2453 | \fItools/testing/selftests/bpf/\fP | |
2454 | .UNINDENT | |
2455 | .SH LICENSE | |
2456 | .sp | |
2457 | eBPF programs can have an associated license, passed along with the bytecode | |
2458 | instructions to the kernel when the programs are loaded. The format for that | |
2459 | string is identical to the one in use for kernel modules (Dual licenses, such | |
2460 | as "Dual BSD/GPL", may be used). Some helper functions are only accessible to | |
2461 | programs that are compatible with the GNU Privacy License (GPL). | |
2462 | .sp | |
2463 | In order to use such helpers, the eBPF program must be loaded with the correct | |
2464 | license string passed (via \fBattr\fP) to the \fBbpf\fP() system call, and this | |
2465 | generally translates into the C source code of the program containing a line | |
2466 | similar to the following: | |
2467 | .INDENT 0.0 | |
2468 | .INDENT 3.5 | |
2469 | .sp | |
2470 | .nf | |
2471 | .ft C | |
2472 | char ____license[] __attribute__((section("license"), used)) = "GPL"; | |
2473 | .ft P | |
2474 | .fi | |
2475 | .UNINDENT | |
2476 | .UNINDENT | |
2477 | .SH IMPLEMENTATION | |
2478 | .sp | |
2479 | This manual page is an effort to document the existing eBPF helper functions. | |
2480 | But as of this writing, the BPF sub\-system is under heavy development. New eBPF | |
2481 | program or map types are added, along with new helper functions. Some helpers | |
2482 | are occasionally made available for additional program types. So in spite of | |
2483 | the efforts of the community, this page might not be up\-to\-date. If you want to | |
2484 | check by yourself what helper functions exist in your kernel, or what types of | |
2485 | programs they can support, here are some files among the kernel tree that you | |
2486 | may be interested in: | |
2487 | .INDENT 0.0 | |
2488 | .IP \(bu 2 | |
2489 | \fIinclude/uapi/linux/bpf.h\fP is the main BPF header. It contains the full list | |
2490 | of all helper functions, as well as many other BPF definitions including most | |
2491 | of the flags, structs or constants used by the helpers. | |
2492 | .IP \(bu 2 | |
2493 | \fInet/core/filter.c\fP contains the definition of most network\-related helper | |
2494 | functions, and the list of program types from which they can be used. | |
2495 | .IP \(bu 2 | |
2496 | \fIkernel/trace/bpf_trace.c\fP is the equivalent for most tracing program\-related | |
2497 | helpers. | |
2498 | .IP \(bu 2 | |
2499 | \fIkernel/bpf/verifier.c\fP contains the functions used to check that valid types | |
2500 | of eBPF maps are used with a given helper function. | |
2501 | .IP \(bu 2 | |
2502 | \fIkernel/bpf/\fP directory contains other files in which additional helpers are | |
2503 | defined (for cgroups, sockmaps, etc.). | |
2504 | .UNINDENT | |
2505 | .sp | |
2506 | Compatibility between helper functions and program types can generally be found | |
2507 | in the files where helper functions are defined. Look for the \fBstruct | |
2508 | bpf_func_proto\fP objects and for functions returning them: these functions | |
2509 | contain a list of helpers that a given program type can call. Note that the | |
2510 | \fBdefault:\fP label of the \fBswitch ... case\fP used to filter helpers can call | |
2511 | other functions, themselves allowing access to additional helpers. The | |
2512 | requirement for GPL license is also in those \fBstruct bpf_func_proto\fP\&. | |
2513 | .sp | |
2514 | Compatibility between helper functions and map types can be found in the | |
2515 | \fBcheck_map_func_compatibility\fP() function in file \fIkernel/bpf/verifier.c\fP\&. | |
2516 | .sp | |
2517 | Helper functions that invalidate the checks on \fBdata\fP and \fBdata_end\fP | |
2518 | pointers for network processing are listed in function | |
2519 | \fBbpf_helper_changes_pkt_data\fP() in file \fInet/core/filter.c\fP\&. | |
2520 | .SH SEE ALSO | |
2521 | .sp | |
2522 | \fBbpf\fP(2), | |
2523 | \fBcgroups\fP(7), | |
2524 | \fBip\fP(8), | |
2525 | \fBperf_event_open\fP(2), | |
2526 | \fBsendmsg\fP(2), | |
2527 | \fBsocket\fP(7), | |
2528 | \fBtc\-bpf\fP(8) | |
2529 | .\" Generated by docutils manpage writer. |