]>
Commit | Line | Data |
---|---|---|
53666f6c MK |
1 | .\" Man page generated from reStructuredText. |
2 | .\" Copyright (C) All BPF authors and contributors from 2014 to present. | |
3 | .\" See git log include/uapi/linux/bpf.h in kernel tree for details. | |
4 | .\" | |
5 | .\" %%%LICENSE_START(VERBATIM) | |
6 | .\" Permission is granted to make and distribute verbatim copies of this | |
7 | .\" manual provided the copyright notice and this permission notice are | |
8 | .\" preserved on all copies. | |
9 | .\" | |
10 | .\" Permission is granted to copy and distribute modified versions of this | |
11 | .\" manual under the conditions for verbatim copying, provided that the | |
12 | .\" entire resulting derived work is distributed under the terms of a | |
13 | .\" permission notice identical to this one. | |
14 | .\" | |
15 | .\" Since the Linux kernel and libraries are constantly changing, this | |
16 | .\" manual page may be incorrect or out-of-date. The author(s) assume no | |
17 | .\" responsibility for errors or omissions, or for damages resulting from | |
18 | .\" the use of the information contained herein. The author(s) may not | |
19 | .\" have taken the same level of care in the production of this manual, | |
20 | .\" which is licensed free of charge, as they might when working | |
21 | .\" professionally. | |
22 | .\" | |
23 | .\" Formatted or processed versions of this manual, if unaccompanied by | |
24 | .\" the source, must acknowledge the copyright and authors of this work. | |
25 | .\" %%%LICENSE_END | |
26 | .\" | |
27 | .\" Please do not edit this file. It was generated from the documentation | |
28 | .\" located in file include/uapi/linux/bpf.h of the Linux kernel sources | |
29 | .\" (helpers description), and from scripts/bpf_helpers_doc.py in the same | |
30 | .\" repository (header and footer). | |
31 | . | |
32 | .TH BPF-HELPERS 7 "" "" "" | |
33 | .SH NAME | |
34 | BPF-HELPERS \- list of eBPF helper functions | |
35 | . | |
36 | .nr rst2man-indent-level 0 | |
37 | . | |
38 | .de1 rstReportMargin | |
39 | \\$1 \\n[an-margin] | |
40 | level \\n[rst2man-indent-level] | |
41 | level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] | |
42 | - | |
43 | \\n[rst2man-indent0] | |
44 | \\n[rst2man-indent1] | |
45 | \\n[rst2man-indent2] | |
46 | .. | |
47 | .de1 INDENT | |
48 | .\" .rstReportMargin pre: | |
49 | . RS \\$1 | |
50 | . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] | |
51 | . nr rst2man-indent-level +1 | |
52 | .\" .rstReportMargin post: | |
53 | .. | |
54 | .de UNINDENT | |
55 | . RE | |
56 | .\" indent \\n[an-margin] | |
57 | .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] | |
58 | .nr rst2man-indent-level -1 | |
59 | .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] | |
60 | .in \\n[rst2man-indent\\n[rst2man-indent-level]]u | |
61 | .. | |
62 | .SH DESCRIPTION | |
63 | .sp | |
64 | The extended Berkeley Packet Filter (eBPF) subsystem consists in programs | |
65 | written in a pseudo\-assembly language, then attached to one of the several | |
66 | kernel hooks and run in reaction of specific events. This framework differs | |
67 | from the older, "classic" BPF (or "cBPF") in several aspects, one of them being | |
68 | the ability to call special functions (or "helpers") from within a program. | |
69 | These functions are restricted to a white\-list of helpers defined in the | |
70 | kernel. | |
71 | .sp | |
72 | These helpers are used by eBPF programs to interact with the system, or with | |
73 | the context in which they work. For instance, they can be used to print | |
74 | debugging messages, to get the time since the system was booted, to interact | |
75 | with eBPF maps, or to manipulate network packets. Since there are several eBPF | |
76 | program types, and that they do not run in the same context, each program type | |
77 | can only call a subset of those helpers. | |
78 | .sp | |
79 | Due to eBPF conventions, a helper can not have more than five arguments. | |
80 | .sp | |
81 | Internally, eBPF programs call directly into the compiled helper functions | |
82 | without requiring any foreign\-function interface. As a result, calling helpers | |
83 | introduces no overhead, thus offering excellent performance. | |
84 | .sp | |
85 | This document is an attempt to list and document the helpers available to eBPF | |
86 | developers. They are sorted by chronological order (the oldest helpers in the | |
87 | kernel at the top). | |
88 | .SH HELPERS | |
89 | .INDENT 0.0 | |
90 | .TP | |
91 | .B \fBvoid *bpf_map_lookup_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP | |
92 | .INDENT 7.0 | |
93 | .TP | |
94 | .B Description | |
95 | Perform a lookup in \fImap\fP for an entry associated to \fIkey\fP\&. | |
96 | .TP | |
97 | .B Return | |
98 | Map value associated to \fIkey\fP, or \fBNULL\fP if no entry was | |
99 | found. | |
100 | .UNINDENT | |
101 | .TP | |
102 | .B \fBint bpf_map_update_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
103 | .INDENT 7.0 | |
104 | .TP | |
105 | .B Description | |
106 | Add or update the value of the entry associated to \fIkey\fP in | |
107 | \fImap\fP with \fIvalue\fP\&. \fIflags\fP is one of: | |
108 | .INDENT 7.0 | |
109 | .TP | |
110 | .B \fBBPF_NOEXIST\fP | |
111 | The entry for \fIkey\fP must not exist in the map. | |
112 | .TP | |
113 | .B \fBBPF_EXIST\fP | |
114 | The entry for \fIkey\fP must already exist in the map. | |
115 | .TP | |
116 | .B \fBBPF_ANY\fP | |
117 | No condition on the existence of the entry for \fIkey\fP\&. | |
118 | .UNINDENT | |
119 | .sp | |
120 | Flag value \fBBPF_NOEXIST\fP cannot be used for maps of types | |
121 | \fBBPF_MAP_TYPE_ARRAY\fP or \fBBPF_MAP_TYPE_PERCPU_ARRAY\fP (all | |
122 | elements always exist), the helper would return an error. | |
123 | .TP | |
124 | .B Return | |
125 | 0 on success, or a negative error in case of failure. | |
126 | .UNINDENT | |
127 | .TP | |
128 | .B \fBint bpf_map_delete_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP | |
129 | .INDENT 7.0 | |
130 | .TP | |
131 | .B Description | |
132 | Delete entry with \fIkey\fP from \fImap\fP\&. | |
133 | .TP | |
134 | .B Return | |
135 | 0 on success, or a negative error in case of failure. | |
136 | .UNINDENT | |
137 | .TP | |
138 | .B \fBint bpf_probe_read(void *\fP\fIdst\fP\fB, u32\fP \fIsize\fP\fB, const void *\fP\fIsrc\fP\fB)\fP | |
139 | .INDENT 7.0 | |
140 | .TP | |
141 | .B Description | |
142 | For tracing programs, safely attempt to read \fIsize\fP bytes from | |
143 | address \fIsrc\fP and store the data in \fIdst\fP\&. | |
144 | .TP | |
145 | .B Return | |
146 | 0 on success, or a negative error in case of failure. | |
147 | .UNINDENT | |
148 | .TP | |
149 | .B \fBu64 bpf_ktime_get_ns(void)\fP | |
150 | .INDENT 7.0 | |
151 | .TP | |
152 | .B Description | |
153 | Return the time elapsed since system boot, in nanoseconds. | |
154 | .TP | |
155 | .B Return | |
156 | Current \fIktime\fP\&. | |
157 | .UNINDENT | |
158 | .TP | |
159 | .B \fBint bpf_trace_printk(const char *\fP\fIfmt\fP\fB, u32\fP \fIfmt_size\fP\fB, ...)\fP | |
160 | .INDENT 7.0 | |
161 | .TP | |
162 | .B Description | |
163 | This helper is a "printk()\-like" facility for debugging. It | |
164 | prints a message defined by format \fIfmt\fP (of size \fIfmt_size\fP) | |
165 | to file \fI/sys/kernel/debug/tracing/trace\fP from DebugFS, if | |
166 | available. It can take up to three additional \fBu64\fP | |
167 | arguments (as an eBPF helpers, the total number of arguments is | |
168 | limited to five). | |
169 | .sp | |
170 | Each time the helper is called, it appends a line to the trace. | |
171 | The format of the trace is customizable, and the exact output | |
172 | one will get depends on the options set in | |
173 | \fI/sys/kernel/debug/tracing/trace_options\fP (see also the | |
174 | \fIREADME\fP file under the same directory). However, it usually | |
175 | defaults to something like: | |
176 | .INDENT 7.0 | |
177 | .INDENT 3.5 | |
178 | .sp | |
179 | .nf | |
180 | .ft C | |
181 | telnet\-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg> | |
182 | .ft P | |
183 | .fi | |
184 | .UNINDENT | |
185 | .UNINDENT | |
186 | .sp | |
187 | In the above: | |
188 | .INDENT 7.0 | |
189 | .INDENT 3.5 | |
190 | .INDENT 0.0 | |
191 | .IP \(bu 2 | |
192 | \fBtelnet\fP is the name of the current task. | |
193 | .IP \(bu 2 | |
194 | \fB470\fP is the PID of the current task. | |
195 | .IP \(bu 2 | |
196 | \fB001\fP is the CPU number on which the task is | |
197 | running. | |
198 | .IP \(bu 2 | |
199 | In \fB\&.N..\fP, each character refers to a set of | |
200 | options (whether irqs are enabled, scheduling | |
201 | options, whether hard/softirqs are running, level of | |
202 | preempt_disabled respectively). \fBN\fP means that | |
203 | \fBTIF_NEED_RESCHED\fP and \fBPREEMPT_NEED_RESCHED\fP | |
204 | are set. | |
205 | .IP \(bu 2 | |
206 | \fB419421.045894\fP is a timestamp. | |
207 | .IP \(bu 2 | |
208 | \fB0x00000001\fP is a fake value used by BPF for the | |
209 | instruction pointer register. | |
210 | .IP \(bu 2 | |
211 | \fB<formatted msg>\fP is the message formatted with | |
212 | \fIfmt\fP\&. | |
213 | .UNINDENT | |
214 | .UNINDENT | |
215 | .UNINDENT | |
216 | .sp | |
217 | The conversion specifiers supported by \fIfmt\fP are similar, but | |
218 | more limited than for printk(). They are \fB%d\fP, \fB%i\fP, | |
219 | \fB%u\fP, \fB%x\fP, \fB%ld\fP, \fB%li\fP, \fB%lu\fP, \fB%lx\fP, \fB%lld\fP, | |
220 | \fB%lli\fP, \fB%llu\fP, \fB%llx\fP, \fB%p\fP, \fB%s\fP\&. No modifier (size | |
221 | of field, padding with zeroes, etc.) is available, and the | |
222 | helper will return \fB\-EINVAL\fP (but print nothing) if it | |
223 | encounters an unknown specifier. | |
224 | .sp | |
225 | Also, note that \fBbpf_trace_printk\fP() is slow, and should | |
226 | only be used for debugging purposes. For this reason, a notice | |
227 | bloc (spanning several lines) is printed to kernel logs and | |
228 | states that the helper should not be used "for production use" | |
229 | the first time this helper is used (or more precisely, when | |
230 | \fBtrace_printk\fP() buffers are allocated). For passing values | |
231 | to user space, perf events should be preferred. | |
232 | .TP | |
233 | .B Return | |
234 | The number of bytes written to the buffer, or a negative error | |
235 | in case of failure. | |
236 | .UNINDENT | |
237 | .TP | |
238 | .B \fBu32 bpf_get_prandom_u32(void)\fP | |
239 | .INDENT 7.0 | |
240 | .TP | |
241 | .B Description | |
242 | Get a pseudo\-random number. | |
243 | .sp | |
244 | From a security point of view, this helper uses its own | |
245 | pseudo\-random internal state, and cannot be used to infer the | |
246 | seed of other random functions in the kernel. However, it is | |
247 | essential to note that the generator used by the helper is not | |
248 | cryptographically secure. | |
249 | .TP | |
250 | .B Return | |
251 | A random 32\-bit unsigned value. | |
252 | .UNINDENT | |
253 | .TP | |
254 | .B \fBu32 bpf_get_smp_processor_id(void)\fP | |
255 | .INDENT 7.0 | |
256 | .TP | |
257 | .B Description | |
258 | Get the SMP (symmetric multiprocessing) processor id. Note that | |
259 | all programs run with preemption disabled, which means that the | |
260 | SMP processor id is stable during all the execution of the | |
261 | program. | |
262 | .TP | |
263 | .B Return | |
264 | The SMP id of the processor running the program. | |
265 | .UNINDENT | |
266 | .TP | |
267 | .B \fBint bpf_skb_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
268 | .INDENT 7.0 | |
269 | .TP | |
270 | .B Description | |
271 | Store \fIlen\fP bytes from address \fIfrom\fP into the packet | |
272 | associated to \fIskb\fP, at \fIoffset\fP\&. \fIflags\fP are a combination of | |
273 | \fBBPF_F_RECOMPUTE_CSUM\fP (automatically recompute the | |
274 | checksum for the packet after storing the bytes) and | |
275 | \fBBPF_F_INVALIDATE_HASH\fP (set \fIskb\fP\fB\->hash\fP, \fIskb\fP\fB\->swhash\fP and \fIskb\fP\fB\->l4hash\fP to 0). | |
276 | .sp | |
277 | A call to this helper is susceptible to change the underlaying | |
278 | packet buffer. Therefore, at load time, all checks on pointers | |
279 | previously done by the verifier are invalidated and must be | |
280 | performed again, if the helper is used in combination with | |
281 | direct packet access. | |
282 | .TP | |
283 | .B Return | |
284 | 0 on success, or a negative error in case of failure. | |
285 | .UNINDENT | |
286 | .TP | |
287 | .B \fBint bpf_l3_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIsize\fP\fB)\fP | |
288 | .INDENT 7.0 | |
289 | .TP | |
290 | .B Description | |
291 | Recompute the layer 3 (e.g. IP) checksum for the packet | |
292 | associated to \fIskb\fP\&. Computation is incremental, so the helper | |
293 | must know the former value of the header field that was | |
294 | modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the | |
295 | number of bytes (2 or 4) for this field, stored in \fIsize\fP\&. | |
296 | Alternatively, it is possible to store the difference between | |
297 | the previous and the new values of the header field in \fIto\fP, by | |
298 | setting \fIfrom\fP and \fIsize\fP to 0. For both methods, \fIoffset\fP | |
299 | indicates the location of the IP checksum within the packet. | |
300 | .sp | |
301 | This helper works in combination with \fBbpf_csum_diff\fP(), | |
302 | which does not update the checksum in\-place, but offers more | |
303 | flexibility and can handle sizes larger than 2 or 4 for the | |
304 | checksum to update. | |
305 | .sp | |
306 | A call to this helper is susceptible to change the underlaying | |
307 | packet buffer. Therefore, at load time, all checks on pointers | |
308 | previously done by the verifier are invalidated and must be | |
309 | performed again, if the helper is used in combination with | |
310 | direct packet access. | |
311 | .TP | |
312 | .B Return | |
313 | 0 on success, or a negative error in case of failure. | |
314 | .UNINDENT | |
315 | .TP | |
316 | .B \fBint bpf_l4_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
317 | .INDENT 7.0 | |
318 | .TP | |
319 | .B Description | |
320 | Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the | |
321 | packet associated to \fIskb\fP\&. Computation is incremental, so the | |
322 | helper must know the former value of the header field that was | |
323 | modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the | |
324 | number of bytes (2 or 4) for this field, stored on the lowest | |
325 | four bits of \fIflags\fP\&. Alternatively, it is possible to store | |
326 | the difference between the previous and the new values of the | |
327 | header field in \fIto\fP, by setting \fIfrom\fP and the four lowest | |
328 | bits of \fIflags\fP to 0. For both methods, \fIoffset\fP indicates the | |
329 | location of the IP checksum within the packet. In addition to | |
330 | the size of the field, \fIflags\fP can be added (bitwise OR) actual | |
331 | flags. With \fBBPF_F_MARK_MANGLED_0\fP, a null checksum is left | |
332 | untouched (unless \fBBPF_F_MARK_ENFORCE\fP is added as well), and | |
333 | for updates resulting in a null checksum the value is set to | |
334 | \fBCSUM_MANGLED_0\fP instead. Flag \fBBPF_F_PSEUDO_HDR\fP indicates | |
335 | the checksum is to be computed against a pseudo\-header. | |
336 | .sp | |
337 | This helper works in combination with \fBbpf_csum_diff\fP(), | |
338 | which does not update the checksum in\-place, but offers more | |
339 | flexibility and can handle sizes larger than 2 or 4 for the | |
340 | checksum to update. | |
341 | .sp | |
342 | A call to this helper is susceptible to change the underlaying | |
343 | packet buffer. Therefore, at load time, all checks on pointers | |
344 | previously done by the verifier are invalidated and must be | |
345 | performed again, if the helper is used in combination with | |
346 | direct packet access. | |
347 | .TP | |
348 | .B Return | |
349 | 0 on success, or a negative error in case of failure. | |
350 | .UNINDENT | |
351 | .TP | |
352 | .B \fBint bpf_tail_call(void *\fP\fIctx\fP\fB, struct bpf_map *\fP\fIprog_array_map\fP\fB, u32\fP \fIindex\fP\fB)\fP | |
353 | .INDENT 7.0 | |
354 | .TP | |
355 | .B Description | |
356 | This special helper is used to trigger a "tail call", or in | |
357 | other words, to jump into another eBPF program. The same stack | |
358 | frame is used (but values on stack and in registers for the | |
359 | caller are not accessible to the callee). This mechanism allows | |
360 | for program chaining, either for raising the maximum number of | |
361 | available eBPF instructions, or to execute given programs in | |
362 | conditional blocks. For security reasons, there is an upper | |
363 | limit to the number of successive tail calls that can be | |
364 | performed. | |
365 | .sp | |
366 | Upon call of this helper, the program attempts to jump into a | |
367 | program referenced at index \fIindex\fP in \fIprog_array_map\fP, a | |
368 | special map of type \fBBPF_MAP_TYPE_PROG_ARRAY\fP, and passes | |
369 | \fIctx\fP, a pointer to the context. | |
370 | .sp | |
371 | If the call succeeds, the kernel immediately runs the first | |
372 | instruction of the new program. This is not a function call, | |
373 | and it never returns to the previous program. If the call | |
374 | fails, then the helper has no effect, and the caller continues | |
375 | to run its subsequent instructions. A call can fail if the | |
376 | destination program for the jump does not exist (i.e. \fIindex\fP | |
377 | is superior to the number of entries in \fIprog_array_map\fP), or | |
378 | if the maximum number of tail calls has been reached for this | |
379 | chain of programs. This limit is defined in the kernel by the | |
380 | macro \fBMAX_TAIL_CALL_CNT\fP (not accessible to user space), | |
381 | which is currently set to 32. | |
382 | .TP | |
383 | .B Return | |
384 | 0 on success, or a negative error in case of failure. | |
385 | .UNINDENT | |
386 | .TP | |
387 | .B \fBint bpf_clone_redirect(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
388 | .INDENT 7.0 | |
389 | .TP | |
390 | .B Description | |
391 | Clone and redirect the packet associated to \fIskb\fP to another | |
392 | net device of index \fIifindex\fP\&. Both ingress and egress | |
393 | interfaces can be used for redirection. The \fBBPF_F_INGRESS\fP | |
394 | value in \fIflags\fP is used to make the distinction (ingress path | |
395 | is selected if the flag is present, egress path otherwise). | |
396 | This is the only flag supported for now. | |
397 | .sp | |
398 | In comparison with \fBbpf_redirect\fP() helper, | |
399 | \fBbpf_clone_redirect\fP() has the associated cost of | |
400 | duplicating the packet buffer, but this can be executed out of | |
401 | the eBPF program. Conversely, \fBbpf_redirect\fP() is more | |
402 | efficient, but it is handled through an action code where the | |
403 | redirection happens only after the eBPF program has returned. | |
404 | .sp | |
405 | A call to this helper is susceptible to change the underlaying | |
406 | packet buffer. Therefore, at load time, all checks on pointers | |
407 | previously done by the verifier are invalidated and must be | |
408 | performed again, if the helper is used in combination with | |
409 | direct packet access. | |
410 | .TP | |
411 | .B Return | |
412 | 0 on success, or a negative error in case of failure. | |
413 | .UNINDENT | |
414 | .TP | |
415 | .B \fBu64 bpf_get_current_pid_tgid(void)\fP | |
416 | .INDENT 7.0 | |
417 | .TP | |
418 | .B Return | |
419 | A 64\-bit integer containing the current tgid and pid, and | |
420 | created as such: | |
421 | \fIcurrent_task\fP\fB\->tgid << 32 |\fP | |
422 | \fIcurrent_task\fP\fB\->pid\fP\&. | |
423 | .UNINDENT | |
424 | .TP | |
425 | .B \fBu64 bpf_get_current_uid_gid(void)\fP | |
426 | .INDENT 7.0 | |
427 | .TP | |
428 | .B Return | |
429 | A 64\-bit integer containing the current GID and UID, and | |
430 | created as such: \fIcurrent_gid\fP \fB<< 32 |\fP \fIcurrent_uid\fP\&. | |
431 | .UNINDENT | |
432 | .TP | |
433 | .B \fBint bpf_get_current_comm(char *\fP\fIbuf\fP\fB, u32\fP \fIsize_of_buf\fP\fB)\fP | |
434 | .INDENT 7.0 | |
435 | .TP | |
436 | .B Description | |
437 | Copy the \fBcomm\fP attribute of the current task into \fIbuf\fP of | |
438 | \fIsize_of_buf\fP\&. The \fBcomm\fP attribute contains the name of | |
439 | the executable (excluding the path) for the current task. The | |
440 | \fIsize_of_buf\fP must be strictly positive. On success, the | |
441 | helper makes sure that the \fIbuf\fP is NUL\-terminated. On failure, | |
442 | it is filled with zeroes. | |
443 | .TP | |
444 | .B Return | |
445 | 0 on success, or a negative error in case of failure. | |
446 | .UNINDENT | |
447 | .TP | |
448 | .B \fBu32 bpf_get_cgroup_classid(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
449 | .INDENT 7.0 | |
450 | .TP | |
451 | .B Description | |
452 | Retrieve the classid for the current task, i.e. for the net_cls | |
453 | cgroup to which \fIskb\fP belongs. | |
454 | .sp | |
455 | This helper can be used on TC egress path, but not on ingress. | |
456 | .sp | |
457 | The net_cls cgroup provides an interface to tag network packets | |
458 | based on a user\-provided identifier for all traffic coming from | |
459 | the tasks belonging to the related cgroup. See also the related | |
460 | kernel documentation, available from the Linux sources in file | |
461 | \fIDocumentation/cgroup\-v1/net_cls.txt\fP\&. | |
462 | .sp | |
463 | The Linux kernel has two versions for cgroups: there are | |
464 | cgroups v1 and cgroups v2. Both are available to users, who can | |
465 | use a mixture of them, but note that the net_cls cgroup is for | |
466 | cgroup v1 only. This makes it incompatible with BPF programs | |
467 | run on cgroups, which is a cgroup\-v2\-only feature (a socket can | |
468 | only hold data for one version of cgroups at a time). | |
469 | .sp | |
470 | This helper is only available is the kernel was compiled with | |
471 | the \fBCONFIG_CGROUP_NET_CLASSID\fP configuration option set to | |
472 | "\fBy\fP" or to "\fBm\fP". | |
473 | .TP | |
474 | .B Return | |
475 | The classid, or 0 for the default unconfigured classid. | |
476 | .UNINDENT | |
477 | .TP | |
478 | .B \fBint bpf_skb_vlan_push(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIvlan_proto\fP\fB, u16\fP \fIvlan_tci\fP\fB)\fP | |
479 | .INDENT 7.0 | |
480 | .TP | |
481 | .B Description | |
482 | Push a \fIvlan_tci\fP (VLAN tag control information) of protocol | |
483 | \fIvlan_proto\fP to the packet associated to \fIskb\fP, then update | |
484 | the checksum. Note that if \fIvlan_proto\fP is different from | |
485 | \fBETH_P_8021Q\fP and \fBETH_P_8021AD\fP, it is considered to | |
486 | be \fBETH_P_8021Q\fP\&. | |
487 | .sp | |
488 | A call to this helper is susceptible to change the underlaying | |
489 | packet buffer. Therefore, at load time, all checks on pointers | |
490 | previously done by the verifier are invalidated and must be | |
491 | performed again, if the helper is used in combination with | |
492 | direct packet access. | |
493 | .TP | |
494 | .B Return | |
495 | 0 on success, or a negative error in case of failure. | |
496 | .UNINDENT | |
497 | .TP | |
498 | .B \fBint bpf_skb_vlan_pop(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
499 | .INDENT 7.0 | |
500 | .TP | |
501 | .B Description | |
502 | Pop a VLAN header from the packet associated to \fIskb\fP\&. | |
503 | .sp | |
504 | A call to this helper is susceptible to change the underlaying | |
505 | packet buffer. Therefore, at load time, all checks on pointers | |
506 | previously done by the verifier are invalidated and must be | |
507 | performed again, if the helper is used in combination with | |
508 | direct packet access. | |
509 | .TP | |
510 | .B Return | |
511 | 0 on success, or a negative error in case of failure. | |
512 | .UNINDENT | |
513 | .TP | |
514 | .B \fBint bpf_skb_get_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
515 | .INDENT 7.0 | |
516 | .TP | |
517 | .B Description | |
518 | Get tunnel metadata. This helper takes a pointer \fIkey\fP to an | |
519 | empty \fBstruct bpf_tunnel_key\fP of \fBsize\fP, that will be | |
520 | filled with tunnel metadata for the packet associated to \fIskb\fP\&. | |
521 | The \fIflags\fP can be set to \fBBPF_F_TUNINFO_IPV6\fP, which | |
522 | indicates that the tunnel is based on IPv6 protocol instead of | |
523 | IPv4. | |
524 | .sp | |
525 | The \fBstruct bpf_tunnel_key\fP is an object that generalizes the | |
526 | principal parameters used by various tunneling protocols into a | |
527 | single struct. This way, it can be used to easily make a | |
528 | decision based on the contents of the encapsulation header, | |
529 | "summarized" in this struct. In particular, it holds the IP | |
530 | address of the remote end (IPv4 or IPv6, depending on the case) | |
531 | in \fIkey\fP\fB\->remote_ipv4\fP or \fIkey\fP\fB\->remote_ipv6\fP\&. Also, | |
532 | this struct exposes the \fIkey\fP\fB\->tunnel_id\fP, which is | |
533 | generally mapped to a VNI (Virtual Network Identifier), making | |
534 | it programmable together with the \fBbpf_skb_set_tunnel_key\fP() helper. | |
535 | .sp | |
536 | Let\(aqs imagine that the following code is part of a program | |
537 | attached to the TC ingress interface, on one end of a GRE | |
538 | tunnel, and is supposed to filter out all messages coming from | |
539 | remote ends with IPv4 address other than 10.0.0.1: | |
540 | .INDENT 7.0 | |
541 | .INDENT 3.5 | |
542 | .sp | |
543 | .nf | |
544 | .ft C | |
545 | int ret; | |
546 | struct bpf_tunnel_key key = {}; | |
547 | ||
548 | ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0); | |
549 | if (ret < 0) | |
550 | return TC_ACT_SHOT; // drop packet | |
551 | ||
552 | if (key.remote_ipv4 != 0x0a000001) | |
553 | return TC_ACT_SHOT; // drop packet | |
554 | ||
555 | return TC_ACT_OK; // accept packet | |
556 | .ft P | |
557 | .fi | |
558 | .UNINDENT | |
559 | .UNINDENT | |
560 | .sp | |
561 | This interface can also be used with all encapsulation devices | |
562 | that can operate in "collect metadata" mode: instead of having | |
563 | one network device per specific configuration, the "collect | |
564 | metadata" mode only requires a single device where the | |
565 | configuration can be extracted from this helper. | |
566 | .sp | |
567 | This can be used together with various tunnels such as VXLan, | |
568 | Geneve, GRE or IP in IP (IPIP). | |
569 | .TP | |
570 | .B Return | |
571 | 0 on success, or a negative error in case of failure. | |
572 | .UNINDENT | |
573 | .TP | |
574 | .B \fBint bpf_skb_set_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
575 | .INDENT 7.0 | |
576 | .TP | |
577 | .B Description | |
578 | Populate tunnel metadata for packet associated to \fIskb.\fP The | |
579 | tunnel metadata is set to the contents of \fIkey\fP, of \fIsize\fP\&. The | |
580 | \fIflags\fP can be set to a combination of the following values: | |
581 | .INDENT 7.0 | |
582 | .TP | |
583 | .B \fBBPF_F_TUNINFO_IPV6\fP | |
584 | Indicate that the tunnel is based on IPv6 protocol | |
585 | instead of IPv4. | |
586 | .TP | |
587 | .B \fBBPF_F_ZERO_CSUM_TX\fP | |
588 | For IPv4 packets, add a flag to tunnel metadata | |
589 | indicating that checksum computation should be skipped | |
590 | and checksum set to zeroes. | |
591 | .TP | |
592 | .B \fBBPF_F_DONT_FRAGMENT\fP | |
593 | Add a flag to tunnel metadata indicating that the | |
594 | packet should not be fragmented. | |
595 | .TP | |
596 | .B \fBBPF_F_SEQ_NUMBER\fP | |
597 | Add a flag to tunnel metadata indicating that a | |
598 | sequence number should be added to tunnel header before | |
599 | sending the packet. This flag was added for GRE | |
600 | encapsulation, but might be used with other protocols | |
601 | as well in the future. | |
602 | .UNINDENT | |
603 | .sp | |
604 | Here is a typical usage on the transmit path: | |
605 | .INDENT 7.0 | |
606 | .INDENT 3.5 | |
607 | .sp | |
608 | .nf | |
609 | .ft C | |
610 | struct bpf_tunnel_key key; | |
611 | populate key ... | |
612 | bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0); | |
613 | bpf_clone_redirect(skb, vxlan_dev_ifindex, 0); | |
614 | .ft P | |
615 | .fi | |
616 | .UNINDENT | |
617 | .UNINDENT | |
618 | .sp | |
619 | See also the description of the \fBbpf_skb_get_tunnel_key\fP() | |
620 | helper for additional information. | |
621 | .TP | |
622 | .B Return | |
623 | 0 on success, or a negative error in case of failure. | |
624 | .UNINDENT | |
625 | .TP | |
626 | .B \fBu64 bpf_perf_event_read(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
627 | .INDENT 7.0 | |
628 | .TP | |
629 | .B Description | |
630 | Read the value of a perf event counter. This helper relies on a | |
631 | \fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of | |
632 | the perf event counter is selected when \fImap\fP is updated with | |
633 | perf event file descriptors. The \fImap\fP is an array whose size | |
634 | is the number of available CPUs, and each cell contains a value | |
635 | relative to one CPU. The value to retrieve is indicated by | |
636 | \fIflags\fP, that contains the index of the CPU to look up, masked | |
637 | with \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to | |
638 | \fBBPF_F_CURRENT_CPU\fP to indicate that the value for the | |
639 | current CPU should be retrieved. | |
640 | .sp | |
641 | Note that before Linux 4.13, only hardware perf event can be | |
642 | retrieved. | |
643 | .sp | |
644 | Also, be aware that the newer helper | |
645 | \fBbpf_perf_event_read_value\fP() is recommended over | |
646 | \fBbpf_perf_event_read\fP() in general. The latter has some ABI | |
647 | quirks where error and counter value are used as a return code | |
648 | (which is wrong to do since ranges may overlap). This issue is | |
649 | fixed with \fBbpf_perf_event_read_value\fP(), which at the same | |
650 | time provides more features over the \fBbpf_perf_event_read\fP() interface. Please refer to the description of | |
651 | \fBbpf_perf_event_read_value\fP() for details. | |
652 | .TP | |
653 | .B Return | |
654 | The value of the perf event counter read from the map, or a | |
655 | negative error code in case of failure. | |
656 | .UNINDENT | |
657 | .TP | |
658 | .B \fBint bpf_redirect(u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
659 | .INDENT 7.0 | |
660 | .TP | |
661 | .B Description | |
662 | Redirect the packet to another net device of index \fIifindex\fP\&. | |
663 | This helper is somewhat similar to \fBbpf_clone_redirect\fP(), except that the packet is not cloned, which provides | |
664 | increased performance. | |
665 | .sp | |
666 | Except for XDP, both ingress and egress interfaces can be used | |
667 | for redirection. The \fBBPF_F_INGRESS\fP value in \fIflags\fP is used | |
668 | to make the distinction (ingress path is selected if the flag | |
669 | is present, egress path otherwise). Currently, XDP only | |
670 | supports redirection to the egress interface, and accepts no | |
671 | flag at all. | |
672 | .sp | |
673 | The same effect can be attained with the more generic | |
674 | \fBbpf_redirect_map\fP(), which requires specific maps to be | |
675 | used but offers better performance. | |
676 | .TP | |
677 | .B Return | |
678 | For XDP, the helper returns \fBXDP_REDIRECT\fP on success or | |
679 | \fBXDP_ABORTED\fP on error. For other program types, the values | |
680 | are \fBTC_ACT_REDIRECT\fP on success or \fBTC_ACT_SHOT\fP on | |
681 | error. | |
682 | .UNINDENT | |
683 | .TP | |
684 | .B \fBu32 bpf_get_route_realm(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
685 | .INDENT 7.0 | |
686 | .TP | |
687 | .B Description | |
688 | Retrieve the realm or the route, that is to say the | |
689 | \fBtclassid\fP field of the destination for the \fIskb\fP\&. The | |
690 | indentifier retrieved is a user\-provided tag, similar to the | |
691 | one used with the net_cls cgroup (see description for | |
692 | \fBbpf_get_cgroup_classid\fP() helper), but here this tag is | |
693 | held by a route (a destination entry), not by a task. | |
694 | .sp | |
695 | Retrieving this identifier works with the clsact TC egress hook | |
696 | (see also \fBtc\-bpf(8)\fP), or alternatively on conventional | |
697 | classful egress qdiscs, but not on TC ingress path. In case of | |
698 | clsact TC egress hook, this has the advantage that, internally, | |
699 | the destination entry has not been dropped yet in the transmit | |
700 | path. Therefore, the destination entry does not need to be | |
701 | artificially held via \fBnetif_keep_dst\fP() for a classful | |
702 | qdisc until the \fIskb\fP is freed. | |
703 | .sp | |
704 | This helper is available only if the kernel was compiled with | |
705 | \fBCONFIG_IP_ROUTE_CLASSID\fP configuration option. | |
706 | .TP | |
707 | .B Return | |
708 | The realm of the route for the packet associated to \fIskb\fP, or 0 | |
709 | if none was found. | |
710 | .UNINDENT | |
711 | .TP | |
712 | .B \fBint bpf_perf_event_output(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, void *\fP\fIdata\fP\fB, u64\fP \fIsize\fP\fB)\fP | |
713 | .INDENT 7.0 | |
714 | .TP | |
715 | .B Description | |
716 | Write raw \fIdata\fP blob into a special BPF perf event held by | |
717 | \fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. This perf | |
718 | event must have the following attributes: \fBPERF_SAMPLE_RAW\fP | |
719 | as \fBsample_type\fP, \fBPERF_TYPE_SOFTWARE\fP as \fBtype\fP, and | |
720 | \fBPERF_COUNT_SW_BPF_OUTPUT\fP as \fBconfig\fP\&. | |
721 | .sp | |
722 | The \fIflags\fP are used to indicate the index in \fImap\fP for which | |
723 | the value must be put, masked with \fBBPF_F_INDEX_MASK\fP\&. | |
724 | Alternatively, \fIflags\fP can be set to \fBBPF_F_CURRENT_CPU\fP | |
725 | to indicate that the index of the current CPU core should be | |
726 | used. | |
727 | .sp | |
728 | The value to write, of \fIsize\fP, is passed through eBPF stack and | |
729 | pointed by \fIdata\fP\&. | |
730 | .sp | |
731 | The context of the program \fIctx\fP needs also be passed to the | |
732 | helper. | |
733 | .sp | |
734 | On user space, a program willing to read the values needs to | |
735 | call \fBperf_event_open\fP() on the perf event (either for | |
736 | one or for all CPUs) and to store the file descriptor into the | |
737 | \fImap\fP\&. This must be done before the eBPF program can send data | |
738 | into it. An example is available in file | |
739 | \fIsamples/bpf/trace_output_user.c\fP in the Linux kernel source | |
740 | tree (the eBPF program counterpart is in | |
741 | \fIsamples/bpf/trace_output_kern.c\fP). | |
742 | .sp | |
743 | \fBbpf_perf_event_output\fP() achieves better performance | |
744 | than \fBbpf_trace_printk\fP() for sharing data with user | |
745 | space, and is much better suitable for streaming data from eBPF | |
746 | programs. | |
747 | .sp | |
748 | Note that this helper is not restricted to tracing use cases | |
749 | and can be used with programs attached to TC or XDP as well, | |
750 | where it allows for passing data to user space listeners. Data | |
751 | can be: | |
752 | .INDENT 7.0 | |
753 | .IP \(bu 2 | |
754 | Only custom structs, | |
755 | .IP \(bu 2 | |
756 | Only the packet payload, or | |
757 | .IP \(bu 2 | |
758 | A combination of both. | |
759 | .UNINDENT | |
760 | .TP | |
761 | .B Return | |
762 | 0 on success, or a negative error in case of failure. | |
763 | .UNINDENT | |
764 | .TP | |
765 | .B \fBint bpf_skb_load_bytes(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
766 | .INDENT 7.0 | |
767 | .TP | |
768 | .B Description | |
769 | This helper was provided as an easy way to load data from a | |
770 | packet. It can be used to load \fIlen\fP bytes from \fIoffset\fP from | |
771 | the packet associated to \fIskb\fP, into the buffer pointed by | |
772 | \fIto\fP\&. | |
773 | .sp | |
774 | Since Linux 4.7, usage of this helper has mostly been replaced | |
775 | by "direct packet access", enabling packet data to be | |
776 | manipulated with \fIskb\fP\fB\->data\fP and \fIskb\fP\fB\->data_end\fP | |
777 | pointing respectively to the first byte of packet data and to | |
778 | the byte after the last byte of packet data. However, it | |
779 | remains useful if one wishes to read large quantities of data | |
780 | at once from a packet into the eBPF stack. | |
781 | .TP | |
782 | .B Return | |
783 | 0 on success, or a negative error in case of failure. | |
784 | .UNINDENT | |
785 | .TP | |
786 | .B \fBint bpf_get_stackid(struct pt_reg *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
787 | .INDENT 7.0 | |
788 | .TP | |
789 | .B Description | |
790 | Walk a user or a kernel stack and return its id. To achieve | |
791 | this, the helper needs \fIctx\fP, which is a pointer to the context | |
792 | on which the tracing program is executed, and a pointer to a | |
793 | \fImap\fP of type \fBBPF_MAP_TYPE_STACK_TRACE\fP\&. | |
794 | .sp | |
795 | The last argument, \fIflags\fP, holds the number of stack frames to | |
796 | skip (from 0 to 255), masked with | |
797 | \fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set | |
798 | a combination of the following flags: | |
799 | .INDENT 7.0 | |
800 | .TP | |
801 | .B \fBBPF_F_USER_STACK\fP | |
802 | Collect a user space stack instead of a kernel stack. | |
803 | .TP | |
804 | .B \fBBPF_F_FAST_STACK_CMP\fP | |
805 | Compare stacks by hash only. | |
806 | .TP | |
807 | .B \fBBPF_F_REUSE_STACKID\fP | |
808 | If two different stacks hash into the same \fIstackid\fP, | |
809 | discard the old one. | |
810 | .UNINDENT | |
811 | .sp | |
812 | The stack id retrieved is a 32 bit long integer handle which | |
813 | can be further combined with other data (including other stack | |
814 | ids) and used as a key into maps. This can be useful for | |
815 | generating a variety of graphs (such as flame graphs or off\-cpu | |
816 | graphs). | |
817 | .sp | |
818 | For walking a stack, this helper is an improvement over | |
819 | \fBbpf_probe_read\fP(), which can be used with unrolled loops | |
820 | but is not efficient and consumes a lot of eBPF instructions. | |
821 | Instead, \fBbpf_get_stackid\fP() can collect up to | |
822 | \fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames. Note that | |
823 | this limit can be controlled with the \fBsysctl\fP program, and | |
824 | that it should be manually increased in order to profile long | |
825 | user stacks (such as stacks for Java programs). To do so, use: | |
826 | .INDENT 7.0 | |
827 | .INDENT 3.5 | |
828 | .sp | |
829 | .nf | |
830 | .ft C | |
831 | # sysctl kernel.perf_event_max_stack=<new value> | |
832 | .ft P | |
833 | .fi | |
834 | .UNINDENT | |
835 | .UNINDENT | |
836 | .TP | |
837 | .B Return | |
838 | The positive or null stack id on success, or a negative error | |
839 | in case of failure. | |
840 | .UNINDENT | |
841 | .TP | |
842 | .B \fBs64 bpf_csum_diff(__be32 *\fP\fIfrom\fP\fB, u32\fP \fIfrom_size\fP\fB, __be32 *\fP\fIto\fP\fB, u32\fP \fIto_size\fP\fB, __wsum\fP \fIseed\fP\fB)\fP | |
843 | .INDENT 7.0 | |
844 | .TP | |
845 | .B Description | |
846 | Compute a checksum difference, from the raw buffer pointed by | |
847 | \fIfrom\fP, of length \fIfrom_size\fP (that must be a multiple of 4), | |
848 | towards the raw buffer pointed by \fIto\fP, of size \fIto_size\fP | |
849 | (same remark). An optional \fIseed\fP can be added to the value | |
850 | (this can be cascaded, the seed may come from a previous call | |
851 | to the helper). | |
852 | .sp | |
853 | This is flexible enough to be used in several ways: | |
854 | .INDENT 7.0 | |
855 | .IP \(bu 2 | |
856 | With \fIfrom_size\fP == 0, \fIto_size\fP > 0 and \fIseed\fP set to | |
857 | checksum, it can be used when pushing new data. | |
858 | .IP \(bu 2 | |
859 | With \fIfrom_size\fP > 0, \fIto_size\fP == 0 and \fIseed\fP set to | |
860 | checksum, it can be used when removing data from a packet. | |
861 | .IP \(bu 2 | |
862 | With \fIfrom_size\fP > 0, \fIto_size\fP > 0 and \fIseed\fP set to 0, it | |
863 | can be used to compute a diff. Note that \fIfrom_size\fP and | |
864 | \fIto_size\fP do not need to be equal. | |
865 | .UNINDENT | |
866 | .sp | |
867 | This helper can be used in combination with | |
868 | \fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(), to | |
869 | which one can feed in the difference computed with | |
870 | \fBbpf_csum_diff\fP(). | |
871 | .TP | |
872 | .B Return | |
873 | The checksum result, or a negative error code in case of | |
874 | failure. | |
875 | .UNINDENT | |
876 | .TP | |
877 | .B \fBint bpf_skb_get_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP | |
878 | .INDENT 7.0 | |
879 | .TP | |
880 | .B Description | |
881 | Retrieve tunnel options metadata for the packet associated to | |
882 | \fIskb\fP, and store the raw tunnel option data to the buffer \fIopt\fP | |
883 | of \fIsize\fP\&. | |
884 | .sp | |
885 | This helper can be used with encapsulation devices that can | |
886 | operate in "collect metadata" mode (please refer to the related | |
887 | note in the description of \fBbpf_skb_get_tunnel_key\fP() for | |
888 | more details). A particular example where this can be used is | |
889 | in combination with the Geneve encapsulation protocol, where it | |
890 | allows for pushing (with \fBbpf_skb_get_tunnel_opt\fP() helper) | |
891 | and retrieving arbitrary TLVs (Type\-Length\-Value headers) from | |
892 | the eBPF program. This allows for full customization of these | |
893 | headers. | |
894 | .TP | |
895 | .B Return | |
896 | The size of the option data retrieved. | |
897 | .UNINDENT | |
898 | .TP | |
899 | .B \fBint bpf_skb_set_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP | |
900 | .INDENT 7.0 | |
901 | .TP | |
902 | .B Description | |
903 | Set tunnel options metadata for the packet associated to \fIskb\fP | |
904 | to the option data contained in the raw buffer \fIopt\fP of \fIsize\fP\&. | |
905 | .sp | |
906 | See also the description of the \fBbpf_skb_get_tunnel_opt\fP() | |
907 | helper for additional information. | |
908 | .TP | |
909 | .B Return | |
910 | 0 on success, or a negative error in case of failure. | |
911 | .UNINDENT | |
912 | .TP | |
913 | .B \fBint bpf_skb_change_proto(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIproto\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
914 | .INDENT 7.0 | |
915 | .TP | |
916 | .B Description | |
917 | Change the protocol of the \fIskb\fP to \fIproto\fP\&. Currently | |
918 | supported are transition from IPv4 to IPv6, and from IPv6 to | |
919 | IPv4. The helper takes care of the groundwork for the | |
920 | transition, including resizing the socket buffer. The eBPF | |
921 | program is expected to fill the new headers, if any, via | |
922 | \fBskb_store_bytes\fP() and to recompute the checksums with | |
923 | \fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(). The main case for this helper is to perform NAT64 | |
924 | operations out of an eBPF program. | |
925 | .sp | |
926 | Internally, the GSO type is marked as dodgy so that headers are | |
927 | checked and segments are recalculated by the GSO/GRO engine. | |
928 | The size for GSO target is adapted as well. | |
929 | .sp | |
930 | All values for \fIflags\fP are reserved for future usage, and must | |
931 | be left at zero. | |
932 | .sp | |
933 | A call to this helper is susceptible to change the underlaying | |
934 | packet buffer. Therefore, at load time, all checks on pointers | |
935 | previously done by the verifier are invalidated and must be | |
936 | performed again, if the helper is used in combination with | |
937 | direct packet access. | |
938 | .TP | |
939 | .B Return | |
940 | 0 on success, or a negative error in case of failure. | |
941 | .UNINDENT | |
942 | .TP | |
943 | .B \fBint bpf_skb_change_type(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB)\fP | |
944 | .INDENT 7.0 | |
945 | .TP | |
946 | .B Description | |
947 | Change the packet type for the packet associated to \fIskb\fP\&. This | |
948 | comes down to setting \fIskb\fP\fB\->pkt_type\fP to \fItype\fP, except | |
949 | the eBPF program does not have a write access to \fIskb\fP\fB\->pkt_type\fP beside this helper. Using a helper here allows | |
950 | for graceful handling of errors. | |
951 | .sp | |
952 | The major use case is to change incoming \fIskb*s to | |
953 | **PACKET_HOST*\fP in a programmatic way instead of having to | |
954 | recirculate via \fBredirect\fP(..., \fBBPF_F_INGRESS\fP), for | |
955 | example. | |
956 | .sp | |
957 | Note that \fItype\fP only allows certain values. At this time, they | |
958 | are: | |
959 | .INDENT 7.0 | |
960 | .TP | |
961 | .B \fBPACKET_HOST\fP | |
962 | Packet is for us. | |
963 | .TP | |
964 | .B \fBPACKET_BROADCAST\fP | |
965 | Send packet to all. | |
966 | .TP | |
967 | .B \fBPACKET_MULTICAST\fP | |
968 | Send packet to group. | |
969 | .TP | |
970 | .B \fBPACKET_OTHERHOST\fP | |
971 | Send packet to someone else. | |
972 | .UNINDENT | |
973 | .TP | |
974 | .B Return | |
975 | 0 on success, or a negative error in case of failure. | |
976 | .UNINDENT | |
977 | .TP | |
978 | .B \fBint bpf_skb_under_cgroup(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP | |
979 | .INDENT 7.0 | |
980 | .TP | |
981 | .B Description | |
982 | Check whether \fIskb\fP is a descendant of the cgroup2 held by | |
983 | \fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&. | |
984 | .TP | |
985 | .B Return | |
986 | The return value depends on the result of the test, and can be: | |
987 | .INDENT 7.0 | |
988 | .IP \(bu 2 | |
989 | 0, if the \fIskb\fP failed the cgroup2 descendant test. | |
990 | .IP \(bu 2 | |
991 | 1, if the \fIskb\fP succeeded the cgroup2 descendant test. | |
992 | .IP \(bu 2 | |
993 | A negative error code, if an error occurred. | |
994 | .UNINDENT | |
995 | .UNINDENT | |
996 | .TP | |
997 | .B \fBu32 bpf_get_hash_recalc(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
998 | .INDENT 7.0 | |
999 | .TP | |
1000 | .B Description | |
1001 | Retrieve the hash of the packet, \fIskb\fP\fB\->hash\fP\&. If it is | |
1002 | not set, in particular if the hash was cleared due to mangling, | |
1003 | recompute this hash. Later accesses to the hash can be done | |
1004 | directly with \fIskb\fP\fB\->hash\fP\&. | |
1005 | .sp | |
1006 | Calling \fBbpf_set_hash_invalid\fP(), changing a packet | |
1007 | prototype with \fBbpf_skb_change_proto\fP(), or calling | |
1008 | \fBbpf_skb_store_bytes\fP() with the | |
1009 | \fBBPF_F_INVALIDATE_HASH\fP are actions susceptible to clear | |
1010 | the hash and to trigger a new computation for the next call to | |
1011 | \fBbpf_get_hash_recalc\fP(). | |
1012 | .TP | |
1013 | .B Return | |
1014 | The 32\-bit hash. | |
1015 | .UNINDENT | |
1016 | .TP | |
1017 | .B \fBu64 bpf_get_current_task(void)\fP | |
1018 | .INDENT 7.0 | |
1019 | .TP | |
1020 | .B Return | |
1021 | A pointer to the current task struct. | |
1022 | .UNINDENT | |
1023 | .TP | |
1024 | .B \fBint bpf_probe_write_user(void *\fP\fIdst\fP\fB, const void *\fP\fIsrc\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
1025 | .INDENT 7.0 | |
1026 | .TP | |
1027 | .B Description | |
1028 | Attempt in a safe way to write \fIlen\fP bytes from the buffer | |
1029 | \fIsrc\fP to \fIdst\fP in memory. It only works for threads that are in | |
1030 | user context, and \fIdst\fP must be a valid user space address. | |
1031 | .sp | |
1032 | This helper should not be used to implement any kind of | |
1033 | security mechanism because of TOC\-TOU attacks, but rather to | |
1034 | debug, divert, and manipulate execution of semi\-cooperative | |
1035 | processes. | |
1036 | .sp | |
1037 | Keep in mind that this feature is meant for experiments, and it | |
1038 | has a risk of crashing the system and running programs. | |
1039 | Therefore, when an eBPF program using this helper is attached, | |
1040 | a warning including PID and process name is printed to kernel | |
1041 | logs. | |
1042 | .TP | |
1043 | .B Return | |
1044 | 0 on success, or a negative error in case of failure. | |
1045 | .UNINDENT | |
1046 | .TP | |
1047 | .B \fBint bpf_current_task_under_cgroup(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP | |
1048 | .INDENT 7.0 | |
1049 | .TP | |
1050 | .B Description | |
1051 | Check whether the probe is being run is the context of a given | |
1052 | subset of the cgroup2 hierarchy. The cgroup2 to test is held by | |
1053 | \fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&. | |
1054 | .TP | |
1055 | .B Return | |
1056 | The return value depends on the result of the test, and can be: | |
1057 | .INDENT 7.0 | |
1058 | .IP \(bu 2 | |
1059 | 0, if the \fIskb\fP task belongs to the cgroup2. | |
1060 | .IP \(bu 2 | |
1061 | 1, if the \fIskb\fP task does not belong to the cgroup2. | |
1062 | .IP \(bu 2 | |
1063 | A negative error code, if an error occurred. | |
1064 | .UNINDENT | |
1065 | .UNINDENT | |
1066 | .TP | |
1067 | .B \fBint bpf_skb_change_tail(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1068 | .INDENT 7.0 | |
1069 | .TP | |
1070 | .B Description | |
1071 | Resize (trim or grow) the packet associated to \fIskb\fP to the | |
1072 | new \fIlen\fP\&. The \fIflags\fP are reserved for future usage, and must | |
1073 | be left at zero. | |
1074 | .sp | |
1075 | The basic idea is that the helper performs the needed work to | |
1076 | change the size of the packet, then the eBPF program rewrites | |
1077 | the rest via helpers like \fBbpf_skb_store_bytes\fP(), | |
1078 | \fBbpf_l3_csum_replace\fP(), \fBbpf_l3_csum_replace\fP() | |
1079 | and others. This helper is a slow path utility intended for | |
1080 | replies with control messages. And because it is targeted for | |
1081 | slow path, the helper itself can afford to be slow: it | |
1082 | implicitly linearizes, unclones and drops offloads from the | |
1083 | \fIskb\fP\&. | |
1084 | .sp | |
1085 | A call to this helper is susceptible to change the underlaying | |
1086 | packet buffer. Therefore, at load time, all checks on pointers | |
1087 | previously done by the verifier are invalidated and must be | |
1088 | performed again, if the helper is used in combination with | |
1089 | direct packet access. | |
1090 | .TP | |
1091 | .B Return | |
1092 | 0 on success, or a negative error in case of failure. | |
1093 | .UNINDENT | |
1094 | .TP | |
1095 | .B \fBint bpf_skb_pull_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
1096 | .INDENT 7.0 | |
1097 | .TP | |
1098 | .B Description | |
1099 | Pull in non\-linear data in case the \fIskb\fP is non\-linear and not | |
1100 | all of \fIlen\fP are part of the linear section. Make \fIlen\fP bytes | |
1101 | from \fIskb\fP readable and writable. If a zero value is passed for | |
1102 | \fIlen\fP, then the whole length of the \fIskb\fP is pulled. | |
1103 | .sp | |
1104 | This helper is only needed for reading and writing with direct | |
1105 | packet access. | |
1106 | .sp | |
1107 | For direct packet access, testing that offsets to access | |
1108 | are within packet boundaries (test on \fIskb\fP\fB\->data_end\fP) is | |
1109 | susceptible to fail if offsets are invalid, or if the requested | |
1110 | data is in non\-linear parts of the \fIskb\fP\&. On failure the | |
1111 | program can just bail out, or in the case of a non\-linear | |
1112 | buffer, use a helper to make the data available. The | |
1113 | \fBbpf_skb_load_bytes\fP() helper is a first solution to access | |
1114 | the data. Another one consists in using \fBbpf_skb_pull_data\fP | |
1115 | to pull in once the non\-linear parts, then retesting and | |
1116 | eventually access the data. | |
1117 | .sp | |
1118 | At the same time, this also makes sure the \fIskb\fP is uncloned, | |
1119 | which is a necessary condition for direct write. As this needs | |
1120 | to be an invariant for the write part only, the verifier | |
1121 | detects writes and adds a prologue that is calling | |
1122 | \fBbpf_skb_pull_data()\fP to effectively unclone the \fIskb\fP from | |
1123 | the very beginning in case it is indeed cloned. | |
1124 | .sp | |
1125 | A call to this helper is susceptible to change the underlaying | |
1126 | packet buffer. Therefore, at load time, all checks on pointers | |
1127 | previously done by the verifier are invalidated and must be | |
1128 | performed again, if the helper is used in combination with | |
1129 | direct packet access. | |
1130 | .TP | |
1131 | .B Return | |
1132 | 0 on success, or a negative error in case of failure. | |
1133 | .UNINDENT | |
1134 | .TP | |
1135 | .B \fBs64 bpf_csum_update(struct sk_buff *\fP\fIskb\fP\fB, __wsum\fP \fIcsum\fP\fB)\fP | |
1136 | .INDENT 7.0 | |
1137 | .TP | |
1138 | .B Description | |
1139 | Add the checksum \fIcsum\fP into \fIskb\fP\fB\->csum\fP in case the | |
1140 | driver has supplied a checksum for the entire packet into that | |
1141 | field. Return an error otherwise. This helper is intended to be | |
1142 | used in combination with \fBbpf_csum_diff\fP(), in particular | |
1143 | when the checksum needs to be updated after data has been | |
1144 | written into the packet through direct packet access. | |
1145 | .TP | |
1146 | .B Return | |
1147 | The checksum on success, or a negative error code in case of | |
1148 | failure. | |
1149 | .UNINDENT | |
1150 | .TP | |
1151 | .B \fBvoid bpf_set_hash_invalid(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1152 | .INDENT 7.0 | |
1153 | .TP | |
1154 | .B Description | |
1155 | Invalidate the current \fIskb\fP\fB\->hash\fP\&. It can be used after | |
1156 | mangling on headers through direct packet access, in order to | |
1157 | indicate that the hash is outdated and to trigger a | |
1158 | recalculation the next time the kernel tries to access this | |
1159 | hash or when the \fBbpf_get_hash_recalc\fP() helper is called. | |
1160 | .UNINDENT | |
1161 | .TP | |
1162 | .B \fBint bpf_get_numa_node_id(void)\fP | |
1163 | .INDENT 7.0 | |
1164 | .TP | |
1165 | .B Description | |
1166 | Return the id of the current NUMA node. The primary use case | |
1167 | for this helper is the selection of sockets for the local NUMA | |
1168 | node, when the program is attached to sockets using the | |
1169 | \fBSO_ATTACH_REUSEPORT_EBPF\fP option (see also \fBsocket(7)\fP), | |
1170 | but the helper is also available to other eBPF program types, | |
1171 | similarly to \fBbpf_get_smp_processor_id\fP(). | |
1172 | .TP | |
1173 | .B Return | |
1174 | The id of current NUMA node. | |
1175 | .UNINDENT | |
1176 | .TP | |
1177 | .B \fBint bpf_skb_change_head(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1178 | .INDENT 7.0 | |
1179 | .TP | |
1180 | .B Description | |
1181 | Grows headroom of packet associated to \fIskb\fP and adjusts the | |
1182 | offset of the MAC header accordingly, adding \fIlen\fP bytes of | |
1183 | space. It automatically extends and reallocates memory as | |
1184 | required. | |
1185 | .sp | |
1186 | This helper can be used on a layer 3 \fIskb\fP to push a MAC header | |
1187 | for redirection into a layer 2 device. | |
1188 | .sp | |
1189 | All values for \fIflags\fP are reserved for future usage, and must | |
1190 | be left at zero. | |
1191 | .sp | |
1192 | A call to this helper is susceptible to change the underlaying | |
1193 | packet buffer. Therefore, at load time, all checks on pointers | |
1194 | previously done by the verifier are invalidated and must be | |
1195 | performed again, if the helper is used in combination with | |
1196 | direct packet access. | |
1197 | .TP | |
1198 | .B Return | |
1199 | 0 on success, or a negative error in case of failure. | |
1200 | .UNINDENT | |
1201 | .TP | |
1202 | .B \fBint bpf_xdp_adjust_head(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP | |
1203 | .INDENT 7.0 | |
1204 | .TP | |
1205 | .B Description | |
1206 | Adjust (move) \fIxdp_md\fP\fB\->data\fP by \fIdelta\fP bytes. Note that | |
1207 | it is possible to use a negative value for \fIdelta\fP\&. This helper | |
1208 | can be used to prepare the packet for pushing or popping | |
1209 | headers. | |
1210 | .sp | |
1211 | A call to this helper is susceptible to change the underlaying | |
1212 | packet buffer. Therefore, at load time, all checks on pointers | |
1213 | previously done by the verifier are invalidated and must be | |
1214 | performed again, if the helper is used in combination with | |
1215 | direct packet access. | |
1216 | .TP | |
1217 | .B Return | |
1218 | 0 on success, or a negative error in case of failure. | |
1219 | .UNINDENT | |
1220 | .TP | |
1221 | .B \fBint bpf_probe_read_str(void *\fP\fIdst\fP\fB, int\fP \fIsize\fP\fB, const void *\fP\fIunsafe_ptr\fP\fB)\fP | |
1222 | .INDENT 7.0 | |
1223 | .TP | |
1224 | .B Description | |
1225 | Copy a NUL terminated string from an unsafe address | |
1226 | \fIunsafe_ptr\fP to \fIdst\fP\&. The \fIsize\fP should include the | |
1227 | terminating NUL byte. In case the string length is smaller than | |
1228 | \fIsize\fP, the target is not padded with further NUL bytes. If the | |
1229 | string length is larger than \fIsize\fP, just \fIsize\fP\-1 bytes are | |
1230 | copied and the last byte is set to NUL. | |
1231 | .sp | |
1232 | On success, the length of the copied string is returned. This | |
1233 | makes this helper useful in tracing programs for reading | |
1234 | strings, and more importantly to get its length at runtime. See | |
1235 | the following snippet: | |
1236 | .INDENT 7.0 | |
1237 | .INDENT 3.5 | |
1238 | .sp | |
1239 | .nf | |
1240 | .ft C | |
1241 | SEC("kprobe/sys_open") | |
1242 | void bpf_sys_open(struct pt_regs *ctx) | |
1243 | { | |
1244 | char buf[PATHLEN]; // PATHLEN is defined to 256 | |
1245 | int res = bpf_probe_read_str(buf, sizeof(buf), | |
1246 | ctx\->di); | |
1247 | ||
1248 | // Consume buf, for example push it to | |
1249 | // userspace via bpf_perf_event_output(); we | |
1250 | // can use res (the string length) as event | |
1251 | // size, after checking its boundaries. | |
1252 | } | |
1253 | .ft P | |
1254 | .fi | |
1255 | .UNINDENT | |
1256 | .UNINDENT | |
1257 | .sp | |
1258 | In comparison, using \fBbpf_probe_read()\fP helper here instead | |
1259 | to read the string would require to estimate the length at | |
1260 | compile time, and would often result in copying more memory | |
1261 | than necessary. | |
1262 | .sp | |
1263 | Another useful use case is when parsing individual process | |
1264 | arguments or individual environment variables navigating | |
1265 | \fIcurrent\fP\fB\->mm\->arg_start\fP and \fIcurrent\fP\fB\->mm\->env_start\fP: using this helper and the return value, | |
1266 | one can quickly iterate at the right offset of the memory area. | |
1267 | .TP | |
1268 | .B Return | |
1269 | On success, the strictly positive length of the string, | |
1270 | including the trailing NUL character. On error, a negative | |
1271 | value. | |
1272 | .UNINDENT | |
1273 | .TP | |
1274 | .B \fBu64 bpf_get_socket_cookie(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1275 | .INDENT 7.0 | |
1276 | .TP | |
1277 | .B Description | |
1278 | If the \fBstruct sk_buff\fP pointed by \fIskb\fP has a known socket, | |
1279 | retrieve the cookie (generated by the kernel) of this socket. | |
1280 | If no cookie has been set yet, generate a new cookie. Once | |
1281 | generated, the socket cookie remains stable for the life of the | |
1282 | socket. This helper can be useful for monitoring per socket | |
1283 | networking traffic statistics as it provides a unique socket | |
1284 | identifier per namespace. | |
1285 | .TP | |
1286 | .B Return | |
1287 | A 8\-byte long non\-decreasing number on success, or 0 if the | |
1288 | socket field is missing inside \fIskb\fP\&. | |
1289 | .UNINDENT | |
1290 | .TP | |
1291 | .B \fBu64 bpf_get_socket_cookie(struct bpf_sock_addr *\fP\fIctx\fP\fB)\fP | |
1292 | .INDENT 7.0 | |
1293 | .TP | |
1294 | .B Description | |
1295 | Equivalent to bpf_get_socket_cookie() helper that accepts | |
1296 | \fIskb\fP, but gets socket from \fBstruct bpf_sock_addr\fP contex. | |
1297 | .TP | |
1298 | .B Return | |
1299 | A 8\-byte long non\-decreasing number. | |
1300 | .UNINDENT | |
1301 | .TP | |
1302 | .B \fBu64 bpf_get_socket_cookie(struct bpf_sock_ops *\fP\fIctx\fP\fB)\fP | |
1303 | .INDENT 7.0 | |
1304 | .TP | |
1305 | .B Description | |
1306 | Equivalent to bpf_get_socket_cookie() helper that accepts | |
1307 | \fIskb\fP, but gets socket from \fBstruct bpf_sock_ops\fP contex. | |
1308 | .TP | |
1309 | .B Return | |
1310 | A 8\-byte long non\-decreasing number. | |
1311 | .UNINDENT | |
1312 | .TP | |
1313 | .B \fBu32 bpf_get_socket_uid(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1314 | .INDENT 7.0 | |
1315 | .TP | |
1316 | .B Return | |
1317 | The owner UID of the socket associated to \fIskb\fP\&. If the socket | |
1318 | is \fBNULL\fP, or if it is not a full socket (i.e. if it is a | |
1319 | time\-wait or a request socket instead), \fBoverflowuid\fP value | |
1320 | is returned (note that \fBoverflowuid\fP might also be the actual | |
1321 | UID value for the socket). | |
1322 | .UNINDENT | |
1323 | .TP | |
1324 | .B \fBu32 bpf_set_hash(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIhash\fP\fB)\fP | |
1325 | .INDENT 7.0 | |
1326 | .TP | |
1327 | .B Description | |
1328 | Set the full hash for \fIskb\fP (set the field \fIskb\fP\fB\->hash\fP) | |
1329 | to value \fIhash\fP\&. | |
1330 | .TP | |
1331 | .B Return | |
1332 | 0 | |
1333 | .UNINDENT | |
1334 | .TP | |
1335 | .B \fBint bpf_setsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP | |
1336 | .INDENT 7.0 | |
1337 | .TP | |
1338 | .B Description | |
1339 | Emulate a call to \fBsetsockopt()\fP on the socket associated to | |
1340 | \fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at | |
1341 | which the option resides and the name \fIoptname\fP of the option | |
1342 | must be specified, see \fBsetsockopt(2)\fP for more information. | |
1343 | The option value of length \fIoptlen\fP is pointed by \fIoptval\fP\&. | |
1344 | .sp | |
1345 | This helper actually implements a subset of \fBsetsockopt()\fP\&. | |
1346 | It supports the following \fIlevel\fPs: | |
1347 | .INDENT 7.0 | |
1348 | .IP \(bu 2 | |
1349 | \fBSOL_SOCKET\fP, which supports the following \fIoptname\fPs: | |
1350 | \fBSO_RCVBUF\fP, \fBSO_SNDBUF\fP, \fBSO_MAX_PACING_RATE\fP, | |
1351 | \fBSO_PRIORITY\fP, \fBSO_RCVLOWAT\fP, \fBSO_MARK\fP\&. | |
1352 | .IP \(bu 2 | |
1353 | \fBIPPROTO_TCP\fP, which supports the following \fIoptname\fPs: | |
1354 | \fBTCP_CONGESTION\fP, \fBTCP_BPF_IW\fP, | |
1355 | \fBTCP_BPF_SNDCWND_CLAMP\fP\&. | |
1356 | .IP \(bu 2 | |
1357 | \fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&. | |
1358 | .IP \(bu 2 | |
1359 | \fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&. | |
1360 | .UNINDENT | |
1361 | .TP | |
1362 | .B Return | |
1363 | 0 on success, or a negative error in case of failure. | |
1364 | .UNINDENT | |
1365 | .TP | |
1366 | .B \fBint bpf_skb_adjust_room(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen_diff\fP\fB, u32\fP \fImode\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1367 | .INDENT 7.0 | |
1368 | .TP | |
1369 | .B Description | |
1370 | Grow or shrink the room for data in the packet associated to | |
1371 | \fIskb\fP by \fIlen_diff\fP, and according to the selected \fImode\fP\&. | |
1372 | .sp | |
1373 | There is a single supported mode at this time: | |
1374 | .INDENT 7.0 | |
1375 | .IP \(bu 2 | |
1376 | \fBBPF_ADJ_ROOM_NET\fP: Adjust room at the network layer | |
1377 | (room space is added or removed below the layer 3 header). | |
1378 | .UNINDENT | |
1379 | .sp | |
1380 | All values for \fIflags\fP are reserved for future usage, and must | |
1381 | be left at zero. | |
1382 | .sp | |
1383 | A call to this helper is susceptible to change the underlaying | |
1384 | packet buffer. Therefore, at load time, all checks on pointers | |
1385 | previously done by the verifier are invalidated and must be | |
1386 | performed again, if the helper is used in combination with | |
1387 | direct packet access. | |
1388 | .TP | |
1389 | .B Return | |
1390 | 0 on success, or a negative error in case of failure. | |
1391 | .UNINDENT | |
1392 | .TP | |
1393 | .B \fBint bpf_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1394 | .INDENT 7.0 | |
1395 | .TP | |
1396 | .B Description | |
1397 | Redirect the packet to the endpoint referenced by \fImap\fP at | |
1398 | index \fIkey\fP\&. Depending on its type, this \fImap\fP can contain | |
1399 | references to net devices (for forwarding packets through other | |
1400 | ports), or to CPUs (for redirecting XDP frames to another CPU; | |
1401 | but this is only implemented for native XDP (with driver | |
1402 | support) as of this writing). | |
1403 | .sp | |
1404 | All values for \fIflags\fP are reserved for future usage, and must | |
1405 | be left at zero. | |
1406 | .sp | |
1407 | When used to redirect packets to net devices, this helper | |
1408 | provides a high performance increase over \fBbpf_redirect\fP(). | |
1409 | This is due to various implementation details of the underlying | |
1410 | mechanisms, one of which is the fact that \fBbpf_redirect_map\fP() tries to send packet as a "bulk" to the device. | |
1411 | .TP | |
1412 | .B Return | |
1413 | \fBXDP_REDIRECT\fP on success, or \fBXDP_ABORTED\fP on error. | |
1414 | .UNINDENT | |
1415 | .TP | |
1416 | .B \fBint bpf_sk_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1417 | .INDENT 7.0 | |
1418 | .TP | |
1419 | .B Description | |
1420 | Redirect the packet to the socket referenced by \fImap\fP (of type | |
1421 | \fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and | |
1422 | egress interfaces can be used for redirection. The | |
1423 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
1424 | distinction (ingress path is selected if the flag is present, | |
1425 | egress path otherwise). This is the only flag supported for now. | |
1426 | .TP | |
1427 | .B Return | |
1428 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
1429 | .UNINDENT | |
1430 | .TP | |
1431 | .B \fBint bpf_sock_map_update(struct bpf_sock_ops *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1432 | .INDENT 7.0 | |
1433 | .TP | |
1434 | .B Description | |
1435 | Add an entry to, or update a \fImap\fP referencing sockets. The | |
1436 | \fIskops\fP is used as a new value for the entry associated to | |
1437 | \fIkey\fP\&. \fIflags\fP is one of: | |
1438 | .INDENT 7.0 | |
1439 | .TP | |
1440 | .B \fBBPF_NOEXIST\fP | |
1441 | The entry for \fIkey\fP must not exist in the map. | |
1442 | .TP | |
1443 | .B \fBBPF_EXIST\fP | |
1444 | The entry for \fIkey\fP must already exist in the map. | |
1445 | .TP | |
1446 | .B \fBBPF_ANY\fP | |
1447 | No condition on the existence of the entry for \fIkey\fP\&. | |
1448 | .UNINDENT | |
1449 | .sp | |
1450 | If the \fImap\fP has eBPF programs (parser and verdict), those will | |
1451 | be inherited by the socket being added. If the socket is | |
1452 | already attached to eBPF programs, this results in an error. | |
1453 | .TP | |
1454 | .B Return | |
1455 | 0 on success, or a negative error in case of failure. | |
1456 | .UNINDENT | |
1457 | .TP | |
1458 | .B \fBint bpf_xdp_adjust_meta(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP | |
1459 | .INDENT 7.0 | |
1460 | .TP | |
1461 | .B Description | |
1462 | Adjust the address pointed by \fIxdp_md\fP\fB\->data_meta\fP by | |
1463 | \fIdelta\fP (which can be positive or negative). Note that this | |
1464 | operation modifies the address stored in \fIxdp_md\fP\fB\->data\fP, | |
1465 | so the latter must be loaded only after the helper has been | |
1466 | called. | |
1467 | .sp | |
1468 | The use of \fIxdp_md\fP\fB\->data_meta\fP is optional and programs | |
1469 | are not required to use it. The rationale is that when the | |
1470 | packet is processed with XDP (e.g. as DoS filter), it is | |
1471 | possible to push further meta data along with it before passing | |
1472 | to the stack, and to give the guarantee that an ingress eBPF | |
1473 | program attached as a TC classifier on the same device can pick | |
1474 | this up for further post\-processing. Since TC works with socket | |
1475 | buffers, it remains possible to set from XDP the \fBmark\fP or | |
1476 | \fBpriority\fP pointers, or other pointers for the socket buffer. | |
1477 | Having this scratch space generic and programmable allows for | |
1478 | more flexibility as the user is free to store whatever meta | |
1479 | data they need. | |
1480 | .sp | |
1481 | A call to this helper is susceptible to change the underlaying | |
1482 | packet buffer. Therefore, at load time, all checks on pointers | |
1483 | previously done by the verifier are invalidated and must be | |
1484 | performed again, if the helper is used in combination with | |
1485 | direct packet access. | |
1486 | .TP | |
1487 | .B Return | |
1488 | 0 on success, or a negative error in case of failure. | |
1489 | .UNINDENT | |
1490 | .TP | |
1491 | .B \fBint bpf_perf_event_read_value(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP | |
1492 | .INDENT 7.0 | |
1493 | .TP | |
1494 | .B Description | |
1495 | Read the value of a perf event counter, and store it into \fIbuf\fP | |
1496 | of size \fIbuf_size\fP\&. This helper relies on a \fImap\fP of type | |
1497 | \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of the perf event | |
1498 | counter is selected when \fImap\fP is updated with perf event file | |
1499 | descriptors. The \fImap\fP is an array whose size is the number of | |
1500 | available CPUs, and each cell contains a value relative to one | |
1501 | CPU. The value to retrieve is indicated by \fIflags\fP, that | |
1502 | contains the index of the CPU to look up, masked with | |
1503 | \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to | |
1504 | \fBBPF_F_CURRENT_CPU\fP to indicate that the value for the | |
1505 | current CPU should be retrieved. | |
1506 | .sp | |
1507 | This helper behaves in a way close to | |
1508 | \fBbpf_perf_event_read\fP() helper, save that instead of | |
1509 | just returning the value observed, it fills the \fIbuf\fP | |
1510 | structure. This allows for additional data to be retrieved: in | |
1511 | particular, the enabled and running times (in \fIbuf\fP\fB\->enabled\fP and \fIbuf\fP\fB\->running\fP, respectively) are | |
1512 | copied. In general, \fBbpf_perf_event_read_value\fP() is | |
1513 | recommended over \fBbpf_perf_event_read\fP(), which has some | |
1514 | ABI issues and provides fewer functionalities. | |
1515 | .sp | |
1516 | These values are interesting, because hardware PMU (Performance | |
1517 | Monitoring Unit) counters are limited resources. When there are | |
1518 | more PMU based perf events opened than available counters, | |
1519 | kernel will multiplex these events so each event gets certain | |
1520 | percentage (but not all) of the PMU time. In case that | |
1521 | multiplexing happens, the number of samples or counter value | |
1522 | will not reflect the case compared to when no multiplexing | |
1523 | occurs. This makes comparison between different runs difficult. | |
1524 | Typically, the counter value should be normalized before | |
1525 | comparing to other experiments. The usual normalization is done | |
1526 | as follows. | |
1527 | .INDENT 7.0 | |
1528 | .INDENT 3.5 | |
1529 | .sp | |
1530 | .nf | |
1531 | .ft C | |
1532 | normalized_counter = counter * t_enabled / t_running | |
1533 | .ft P | |
1534 | .fi | |
1535 | .UNINDENT | |
1536 | .UNINDENT | |
1537 | .sp | |
1538 | Where t_enabled is the time enabled for event and t_running is | |
1539 | the time running for event since last normalization. The | |
1540 | enabled and running times are accumulated since the perf event | |
1541 | open. To achieve scaling factor between two invocations of an | |
1542 | eBPF program, users can can use CPU id as the key (which is | |
1543 | typical for perf array usage model) to remember the previous | |
1544 | value and do the calculation inside the eBPF program. | |
1545 | .TP | |
1546 | .B Return | |
1547 | 0 on success, or a negative error in case of failure. | |
1548 | .UNINDENT | |
1549 | .TP | |
1550 | .B \fBint bpf_perf_prog_read_value(struct bpf_perf_event_data *\fP\fIctx\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP | |
1551 | .INDENT 7.0 | |
1552 | .TP | |
1553 | .B Description | |
1554 | For en eBPF program attached to a perf event, retrieve the | |
1555 | value of the event counter associated to \fIctx\fP and store it in | |
1556 | the structure pointed by \fIbuf\fP and of size \fIbuf_size\fP\&. Enabled | |
1557 | and running times are also stored in the structure (see | |
1558 | description of helper \fBbpf_perf_event_read_value\fP() for | |
1559 | more details). | |
1560 | .TP | |
1561 | .B Return | |
1562 | 0 on success, or a negative error in case of failure. | |
1563 | .UNINDENT | |
1564 | .TP | |
1565 | .B \fBint bpf_getsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP | |
1566 | .INDENT 7.0 | |
1567 | .TP | |
1568 | .B Description | |
1569 | Emulate a call to \fBgetsockopt()\fP on the socket associated to | |
1570 | \fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at | |
1571 | which the option resides and the name \fIoptname\fP of the option | |
1572 | must be specified, see \fBgetsockopt(2)\fP for more information. | |
1573 | The retrieved value is stored in the structure pointed by | |
1574 | \fIopval\fP and of length \fIoptlen\fP\&. | |
1575 | .sp | |
1576 | This helper actually implements a subset of \fBgetsockopt()\fP\&. | |
1577 | It supports the following \fIlevel\fPs: | |
1578 | .INDENT 7.0 | |
1579 | .IP \(bu 2 | |
1580 | \fBIPPROTO_TCP\fP, which supports \fIoptname\fP | |
1581 | \fBTCP_CONGESTION\fP\&. | |
1582 | .IP \(bu 2 | |
1583 | \fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&. | |
1584 | .IP \(bu 2 | |
1585 | \fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&. | |
1586 | .UNINDENT | |
1587 | .TP | |
1588 | .B Return | |
1589 | 0 on success, or a negative error in case of failure. | |
1590 | .UNINDENT | |
1591 | .TP | |
1592 | .B \fBint bpf_override_return(struct pt_reg *\fP\fIregs\fP\fB, u64\fP \fIrc\fP\fB)\fP | |
1593 | .INDENT 7.0 | |
1594 | .TP | |
1595 | .B Description | |
1596 | Used for error injection, this helper uses kprobes to override | |
1597 | the return value of the probed function, and to set it to \fIrc\fP\&. | |
1598 | The first argument is the context \fIregs\fP on which the kprobe | |
1599 | works. | |
1600 | .sp | |
1601 | This helper works by setting setting the PC (program counter) | |
1602 | to an override function which is run in place of the original | |
1603 | probed function. This means the probed function is not run at | |
1604 | all. The replacement function just returns with the required | |
1605 | value. | |
1606 | .sp | |
1607 | This helper has security implications, and thus is subject to | |
1608 | restrictions. It is only available if the kernel was compiled | |
1609 | with the \fBCONFIG_BPF_KPROBE_OVERRIDE\fP configuration | |
1610 | option, and in this case it only works on functions tagged with | |
1611 | \fBALLOW_ERROR_INJECTION\fP in the kernel code. | |
1612 | .sp | |
1613 | Also, the helper is only available for the architectures having | |
1614 | the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing, | |
1615 | x86 architecture is the only one to support this feature. | |
1616 | .TP | |
1617 | .B Return | |
1618 | 0 | |
1619 | .UNINDENT | |
1620 | .TP | |
1621 | .B \fBint bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *\fP\fIbpf_sock\fP\fB, int\fP \fIargval\fP\fB)\fP | |
1622 | .INDENT 7.0 | |
1623 | .TP | |
1624 | .B Description | |
1625 | Attempt to set the value of the \fBbpf_sock_ops_cb_flags\fP field | |
1626 | for the full TCP socket associated to \fIbpf_sock_ops\fP to | |
1627 | \fIargval\fP\&. | |
1628 | .sp | |
1629 | The primary use of this field is to determine if there should | |
1630 | be calls to eBPF programs of type | |
1631 | \fBBPF_PROG_TYPE_SOCK_OPS\fP at various points in the TCP | |
1632 | code. A program of the same type can change its value, per | |
1633 | connection and as necessary, when the connection is | |
1634 | established. This field is directly accessible for reading, but | |
1635 | this helper must be used for updates in order to return an | |
1636 | error if an eBPF program tries to set a callback that is not | |
1637 | supported in the current kernel. | |
1638 | .sp | |
1639 | The supported callback values that \fIargval\fP can combine are: | |
1640 | .INDENT 7.0 | |
1641 | .IP \(bu 2 | |
1642 | \fBBPF_SOCK_OPS_RTO_CB_FLAG\fP (retransmission time out) | |
1643 | .IP \(bu 2 | |
1644 | \fBBPF_SOCK_OPS_RETRANS_CB_FLAG\fP (retransmission) | |
1645 | .IP \(bu 2 | |
1646 | \fBBPF_SOCK_OPS_STATE_CB_FLAG\fP (TCP state change) | |
1647 | .UNINDENT | |
1648 | .sp | |
1649 | Here are some examples of where one could call such eBPF | |
1650 | program: | |
1651 | .INDENT 7.0 | |
1652 | .IP \(bu 2 | |
1653 | When RTO fires. | |
1654 | .IP \(bu 2 | |
1655 | When a packet is retransmitted. | |
1656 | .IP \(bu 2 | |
1657 | When the connection terminates. | |
1658 | .IP \(bu 2 | |
1659 | When a packet is sent. | |
1660 | .IP \(bu 2 | |
1661 | When a packet is received. | |
1662 | .UNINDENT | |
1663 | .TP | |
1664 | .B Return | |
1665 | Code \fB\-EINVAL\fP if the socket is not a full TCP socket; | |
1666 | otherwise, a positive number containing the bits that could not | |
1667 | be set is returned (which comes down to 0 if all bits were set | |
1668 | as required). | |
1669 | .UNINDENT | |
1670 | .TP | |
1671 | .B \fBint bpf_msg_redirect_map(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1672 | .INDENT 7.0 | |
1673 | .TP | |
1674 | .B Description | |
1675 | This helper is used in programs implementing policies at the | |
1676 | socket level. If the message \fImsg\fP is allowed to pass (i.e. if | |
1677 | the verdict eBPF program returns \fBSK_PASS\fP), redirect it to | |
1678 | the socket referenced by \fImap\fP (of type | |
1679 | \fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and | |
1680 | egress interfaces can be used for redirection. The | |
1681 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
1682 | distinction (ingress path is selected if the flag is present, | |
1683 | egress path otherwise). This is the only flag supported for now. | |
1684 | .TP | |
1685 | .B Return | |
1686 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
1687 | .UNINDENT | |
1688 | .TP | |
1689 | .B \fBint bpf_msg_apply_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP | |
1690 | .INDENT 7.0 | |
1691 | .TP | |
1692 | .B Description | |
1693 | For socket policies, apply the verdict of the eBPF program to | |
1694 | the next \fIbytes\fP (number of bytes) of message \fImsg\fP\&. | |
1695 | .sp | |
1696 | For example, this helper can be used in the following cases: | |
1697 | .INDENT 7.0 | |
1698 | .IP \(bu 2 | |
1699 | A single \fBsendmsg\fP() or \fBsendfile\fP() system call | |
1700 | contains multiple logical messages that the eBPF program is | |
1701 | supposed to read and for which it should apply a verdict. | |
1702 | .IP \(bu 2 | |
1703 | An eBPF program only cares to read the first \fIbytes\fP of a | |
1704 | \fImsg\fP\&. If the message has a large payload, then setting up | |
1705 | and calling the eBPF program repeatedly for all bytes, even | |
1706 | though the verdict is already known, would create unnecessary | |
1707 | overhead. | |
1708 | .UNINDENT | |
1709 | .sp | |
1710 | When called from within an eBPF program, the helper sets a | |
1711 | counter internal to the BPF infrastructure, that is used to | |
1712 | apply the last verdict to the next \fIbytes\fP\&. If \fIbytes\fP is | |
1713 | smaller than the current data being processed from a | |
1714 | \fBsendmsg\fP() or \fBsendfile\fP() system call, the first | |
1715 | \fIbytes\fP will be sent and the eBPF program will be re\-run with | |
1716 | the pointer for start of data pointing to byte number \fIbytes\fP | |
1717 | \fB+ 1\fP\&. If \fIbytes\fP is larger than the current data being | |
1718 | processed, then the eBPF verdict will be applied to multiple | |
1719 | \fBsendmsg\fP() or \fBsendfile\fP() calls until \fIbytes\fP are | |
1720 | consumed. | |
1721 | .sp | |
1722 | Note that if a socket closes with the internal counter holding | |
1723 | a non\-zero value, this is not a problem because data is not | |
1724 | being buffered for \fIbytes\fP and is sent as it is received. | |
1725 | .TP | |
1726 | .B Return | |
1727 | 0 | |
1728 | .UNINDENT | |
1729 | .TP | |
1730 | .B \fBint bpf_msg_cork_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP | |
1731 | .INDENT 7.0 | |
1732 | .TP | |
1733 | .B Description | |
1734 | For socket policies, prevent the execution of the verdict eBPF | |
1735 | program for message \fImsg\fP until \fIbytes\fP (byte number) have been | |
1736 | accumulated. | |
1737 | .sp | |
1738 | This can be used when one needs a specific number of bytes | |
1739 | before a verdict can be assigned, even if the data spans | |
1740 | multiple \fBsendmsg\fP() or \fBsendfile\fP() calls. The extreme | |
1741 | case would be a user calling \fBsendmsg\fP() repeatedly with | |
1742 | 1\-byte long message segments. Obviously, this is bad for | |
1743 | performance, but it is still valid. If the eBPF program needs | |
1744 | \fIbytes\fP bytes to validate a header, this helper can be used to | |
1745 | prevent the eBPF program to be called again until \fIbytes\fP have | |
1746 | been accumulated. | |
1747 | .TP | |
1748 | .B Return | |
1749 | 0 | |
1750 | .UNINDENT | |
1751 | .TP | |
1752 | .B \fBint bpf_msg_pull_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIend\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1753 | .INDENT 7.0 | |
1754 | .TP | |
1755 | .B Description | |
1756 | For socket policies, pull in non\-linear data from user space | |
1757 | for \fImsg\fP and set pointers \fImsg\fP\fB\->data\fP and \fImsg\fP\fB\->data_end\fP to \fIstart\fP and \fIend\fP bytes offsets into \fImsg\fP, | |
1758 | respectively. | |
1759 | .sp | |
1760 | If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a | |
1761 | \fImsg\fP it can only parse data that the (\fBdata\fP, \fBdata_end\fP) | |
1762 | pointers have already consumed. For \fBsendmsg\fP() hooks this | |
1763 | is likely the first scatterlist element. But for calls relying | |
1764 | on the \fBsendpage\fP handler (e.g. \fBsendfile\fP()) this will | |
1765 | be the range (\fB0\fP, \fB0\fP) because the data is shared with | |
1766 | user space and by default the objective is to avoid allowing | |
1767 | user space to modify data while (or after) eBPF verdict is | |
1768 | being decided. This helper can be used to pull in data and to | |
1769 | set the start and end pointer to given values. Data will be | |
1770 | copied if necessary (i.e. if data was not linear and if start | |
1771 | and end pointers do not point to the same chunk). | |
1772 | .sp | |
1773 | A call to this helper is susceptible to change the underlaying | |
1774 | packet buffer. Therefore, at load time, all checks on pointers | |
1775 | previously done by the verifier are invalidated and must be | |
1776 | performed again, if the helper is used in combination with | |
1777 | direct packet access. | |
1778 | .sp | |
1779 | All values for \fIflags\fP are reserved for future usage, and must | |
1780 | be left at zero. | |
1781 | .TP | |
1782 | .B Return | |
1783 | 0 on success, or a negative error in case of failure. | |
1784 | .UNINDENT | |
1785 | .TP | |
1786 | .B \fBint bpf_bind(struct bpf_sock_addr *\fP\fIctx\fP\fB, struct sockaddr *\fP\fIaddr\fP\fB, int\fP \fIaddr_len\fP\fB)\fP | |
1787 | .INDENT 7.0 | |
1788 | .TP | |
1789 | .B Description | |
1790 | Bind the socket associated to \fIctx\fP to the address pointed by | |
1791 | \fIaddr\fP, of length \fIaddr_len\fP\&. This allows for making outgoing | |
1792 | connection from the desired IP address, which can be useful for | |
1793 | example when all processes inside a cgroup should use one | |
1794 | single IP address on a host that has multiple IP configured. | |
1795 | .sp | |
1796 | This helper works for IPv4 and IPv6, TCP and UDP sockets. The | |
1797 | domain (\fIaddr\fP\fB\->sa_family\fP) must be \fBAF_INET\fP (or | |
1798 | \fBAF_INET6\fP). Looking for a free port to bind to can be | |
1799 | expensive, therefore binding to port is not permitted by the | |
1800 | helper: \fIaddr\fP\fB\->sin_port\fP (or \fBsin6_port\fP, respectively) | |
1801 | must be set to zero. | |
1802 | .TP | |
1803 | .B Return | |
1804 | 0 on success, or a negative error in case of failure. | |
1805 | .UNINDENT | |
1806 | .TP | |
1807 | .B \fBint bpf_xdp_adjust_tail(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP | |
1808 | .INDENT 7.0 | |
1809 | .TP | |
1810 | .B Description | |
1811 | Adjust (move) \fIxdp_md\fP\fB\->data_end\fP by \fIdelta\fP bytes. It is | |
1812 | only possible to shrink the packet as of this writing, | |
1813 | therefore \fIdelta\fP must be a negative integer. | |
1814 | .sp | |
1815 | A call to this helper is susceptible to change the underlaying | |
1816 | packet buffer. Therefore, at load time, all checks on pointers | |
1817 | previously done by the verifier are invalidated and must be | |
1818 | performed again, if the helper is used in combination with | |
1819 | direct packet access. | |
1820 | .TP | |
1821 | .B Return | |
1822 | 0 on success, or a negative error in case of failure. | |
1823 | .UNINDENT | |
1824 | .TP | |
1825 | .B \fBint bpf_skb_get_xfrm_state(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIindex\fP\fB, struct bpf_xfrm_state *\fP\fIxfrm_state\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1826 | .INDENT 7.0 | |
1827 | .TP | |
1828 | .B Description | |
1829 | Retrieve the XFRM state (IP transform framework, see also | |
1830 | \fBip\-xfrm(8)\fP) at \fIindex\fP in XFRM "security path" for \fIskb\fP\&. | |
1831 | .sp | |
1832 | The retrieved value is stored in the \fBstruct bpf_xfrm_state\fP | |
1833 | pointed by \fIxfrm_state\fP and of length \fIsize\fP\&. | |
1834 | .sp | |
1835 | All values for \fIflags\fP are reserved for future usage, and must | |
1836 | be left at zero. | |
1837 | .sp | |
1838 | This helper is available only if the kernel was compiled with | |
1839 | \fBCONFIG_XFRM\fP configuration option. | |
1840 | .TP | |
1841 | .B Return | |
1842 | 0 on success, or a negative error in case of failure. | |
1843 | .UNINDENT | |
1844 | .TP | |
1845 | .B \fBint bpf_get_stack(struct pt_regs *\fP\fIregs\fP\fB, void *\fP\fIbuf\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1846 | .INDENT 7.0 | |
1847 | .TP | |
1848 | .B Description | |
1849 | Return a user or a kernel stack in bpf program provided buffer. | |
1850 | To achieve this, the helper needs \fIctx\fP, which is a pointer | |
1851 | to the context on which the tracing program is executed. | |
1852 | To store the stacktrace, the bpf program provides \fIbuf\fP with | |
1853 | a nonnegative \fIsize\fP\&. | |
1854 | .sp | |
1855 | The last argument, \fIflags\fP, holds the number of stack frames to | |
1856 | skip (from 0 to 255), masked with | |
1857 | \fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set | |
1858 | the following flags: | |
1859 | .INDENT 7.0 | |
1860 | .TP | |
1861 | .B \fBBPF_F_USER_STACK\fP | |
1862 | Collect a user space stack instead of a kernel stack. | |
1863 | .TP | |
1864 | .B \fBBPF_F_USER_BUILD_ID\fP | |
1865 | Collect buildid+offset instead of ips for user stack, | |
1866 | only valid if \fBBPF_F_USER_STACK\fP is also specified. | |
1867 | .UNINDENT | |
1868 | .sp | |
1869 | \fBbpf_get_stack\fP() can collect up to | |
1870 | \fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames, subject | |
1871 | to sufficient large buffer size. Note that | |
1872 | this limit can be controlled with the \fBsysctl\fP program, and | |
1873 | that it should be manually increased in order to profile long | |
1874 | user stacks (such as stacks for Java programs). To do so, use: | |
1875 | .INDENT 7.0 | |
1876 | .INDENT 3.5 | |
1877 | .sp | |
1878 | .nf | |
1879 | .ft C | |
1880 | # sysctl kernel.perf_event_max_stack=<new value> | |
1881 | .ft P | |
1882 | .fi | |
1883 | .UNINDENT | |
1884 | .UNINDENT | |
1885 | .TP | |
1886 | .B Return | |
1887 | A non\-negative value equal to or less than \fIsize\fP on success, | |
1888 | or a negative error in case of failure. | |
1889 | .UNINDENT | |
1890 | .TP | |
1891 | .B \fBint bpf_skb_load_bytes_relative(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB, u32\fP \fIstart_header\fP\fB)\fP | |
1892 | .INDENT 7.0 | |
1893 | .TP | |
1894 | .B Description | |
1895 | This helper is similar to \fBbpf_skb_load_bytes\fP() in that | |
1896 | it provides an easy way to load \fIlen\fP bytes from \fIoffset\fP | |
1897 | from the packet associated to \fIskb\fP, into the buffer pointed | |
1898 | by \fIto\fP\&. The difference to \fBbpf_skb_load_bytes\fP() is that | |
1899 | a fifth argument \fIstart_header\fP exists in order to select a | |
1900 | base offset to start from. \fIstart_header\fP can be one of: | |
1901 | .INDENT 7.0 | |
1902 | .TP | |
1903 | .B \fBBPF_HDR_START_MAC\fP | |
1904 | Base offset to load data from is \fIskb\fP\(aqs mac header. | |
1905 | .TP | |
1906 | .B \fBBPF_HDR_START_NET\fP | |
1907 | Base offset to load data from is \fIskb\fP\(aqs network header. | |
1908 | .UNINDENT | |
1909 | .sp | |
1910 | In general, "direct packet access" is the preferred method to | |
1911 | access packet data, however, this helper is in particular useful | |
1912 | in socket filters where \fIskb\fP\fB\->data\fP does not always point | |
1913 | to the start of the mac header and where "direct packet access" | |
1914 | is not available. | |
1915 | .TP | |
1916 | .B Return | |
1917 | 0 on success, or a negative error in case of failure. | |
1918 | .UNINDENT | |
1919 | .TP | |
1920 | .B \fBint bpf_fib_lookup(void *\fP\fIctx\fP\fB, struct bpf_fib_lookup *\fP\fIparams\fP\fB, int\fP \fIplen\fP\fB, u32\fP \fIflags\fP\fB)\fP | |
1921 | .INDENT 7.0 | |
1922 | .TP | |
1923 | .B Description | |
1924 | Do FIB lookup in kernel tables using parameters in \fIparams\fP\&. | |
1925 | If lookup is successful and result shows packet is to be | |
1926 | forwarded, the neighbor tables are searched for the nexthop. | |
1927 | If successful (ie., FIB lookup shows forwarding and nexthop | |
1928 | is resolved), the nexthop address is returned in ipv4_dst | |
1929 | or ipv6_dst based on family, smac is set to mac address of | |
1930 | egress device, dmac is set to nexthop mac address, rt_metric | |
1931 | is set to metric from route (IPv4/IPv6 only), and ifindex | |
1932 | is set to the device index of the nexthop from the FIB lookup. | |
1933 | .sp | |
1934 | \fIplen\fP argument is the size of the passed in struct. | |
1935 | \fIflags\fP argument can be a combination of one or more of the | |
1936 | following values: | |
1937 | .INDENT 7.0 | |
1938 | .TP | |
1939 | .B \fBBPF_FIB_LOOKUP_DIRECT\fP | |
1940 | Do a direct table lookup vs full lookup using FIB | |
1941 | rules. | |
1942 | .TP | |
1943 | .B \fBBPF_FIB_LOOKUP_OUTPUT\fP | |
1944 | Perform lookup from an egress perspective (default is | |
1945 | ingress). | |
1946 | .UNINDENT | |
1947 | .sp | |
1948 | \fIctx\fP is either \fBstruct xdp_md\fP for XDP programs or | |
1949 | \fBstruct sk_buff\fP tc cls_act programs. | |
1950 | .TP | |
1951 | .B Return | |
1952 | .INDENT 7.0 | |
1953 | .IP \(bu 2 | |
1954 | < 0 if any input argument is invalid | |
1955 | .IP \(bu 2 | |
1956 | 0 on success (packet is forwarded, nexthop neighbor exists) | |
1957 | .IP \(bu 2 | |
1958 | > 0 one of \fBBPF_FIB_LKUP_RET_\fP codes explaining why the | |
1959 | packet is not forwarded or needs assist from full stack | |
1960 | .UNINDENT | |
1961 | .UNINDENT | |
1962 | .TP | |
1963 | .B \fBint bpf_sock_hash_update(struct bpf_sock_ops_kern *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1964 | .INDENT 7.0 | |
1965 | .TP | |
1966 | .B Description | |
1967 | Add an entry to, or update a sockhash \fImap\fP referencing sockets. | |
1968 | The \fIskops\fP is used as a new value for the entry associated to | |
1969 | \fIkey\fP\&. \fIflags\fP is one of: | |
1970 | .INDENT 7.0 | |
1971 | .TP | |
1972 | .B \fBBPF_NOEXIST\fP | |
1973 | The entry for \fIkey\fP must not exist in the map. | |
1974 | .TP | |
1975 | .B \fBBPF_EXIST\fP | |
1976 | The entry for \fIkey\fP must already exist in the map. | |
1977 | .TP | |
1978 | .B \fBBPF_ANY\fP | |
1979 | No condition on the existence of the entry for \fIkey\fP\&. | |
1980 | .UNINDENT | |
1981 | .sp | |
1982 | If the \fImap\fP has eBPF programs (parser and verdict), those will | |
1983 | be inherited by the socket being added. If the socket is | |
1984 | already attached to eBPF programs, this results in an error. | |
1985 | .TP | |
1986 | .B Return | |
1987 | 0 on success, or a negative error in case of failure. | |
1988 | .UNINDENT | |
1989 | .TP | |
1990 | .B \fBint bpf_msg_redirect_hash(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1991 | .INDENT 7.0 | |
1992 | .TP | |
1993 | .B Description | |
1994 | This helper is used in programs implementing policies at the | |
1995 | socket level. If the message \fImsg\fP is allowed to pass (i.e. if | |
1996 | the verdict eBPF program returns \fBSK_PASS\fP), redirect it to | |
1997 | the socket referenced by \fImap\fP (of type | |
1998 | \fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and | |
1999 | egress interfaces can be used for redirection. The | |
2000 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
2001 | distinction (ingress path is selected if the flag is present, | |
2002 | egress path otherwise). This is the only flag supported for now. | |
2003 | .TP | |
2004 | .B Return | |
2005 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
2006 | .UNINDENT | |
2007 | .TP | |
2008 | .B \fBint bpf_sk_redirect_hash(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2009 | .INDENT 7.0 | |
2010 | .TP | |
2011 | .B Description | |
2012 | This helper is used in programs implementing policies at the | |
2013 | skb socket level. If the sk_buff \fIskb\fP is allowed to pass (i.e. | |
2014 | if the verdeict eBPF program returns \fBSK_PASS\fP), redirect it | |
2015 | to the socket referenced by \fImap\fP (of type | |
2016 | \fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and | |
2017 | egress interfaces can be used for redirection. The | |
2018 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
2019 | distinction (ingress path is selected if the flag is present, | |
2020 | egress otherwise). This is the only flag supported for now. | |
2021 | .TP | |
2022 | .B Return | |
2023 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
2024 | .UNINDENT | |
2025 | .TP | |
2026 | .B \fBint bpf_lwt_push_encap(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB, void *\fP\fIhdr\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
2027 | .INDENT 7.0 | |
2028 | .TP | |
2029 | .B Description | |
2030 | Encapsulate the packet associated to \fIskb\fP within a Layer 3 | |
2031 | protocol header. This header is provided in the buffer at | |
2032 | address \fIhdr\fP, with \fIlen\fP its size in bytes. \fItype\fP indicates | |
2033 | the protocol of the header and can be one of: | |
2034 | .INDENT 7.0 | |
2035 | .TP | |
2036 | .B \fBBPF_LWT_ENCAP_SEG6\fP | |
2037 | IPv6 encapsulation with Segment Routing Header | |
2038 | (\fBstruct ipv6_sr_hdr\fP). \fIhdr\fP only contains the SRH, | |
2039 | the IPv6 header is computed by the kernel. | |
2040 | .TP | |
2041 | .B \fBBPF_LWT_ENCAP_SEG6_INLINE\fP | |
2042 | Only works if \fIskb\fP contains an IPv6 packet. Insert a | |
2043 | Segment Routing Header (\fBstruct ipv6_sr_hdr\fP) inside | |
2044 | the IPv6 header. | |
2045 | .UNINDENT | |
2046 | .sp | |
2047 | A call to this helper is susceptible to change the underlaying | |
2048 | packet buffer. Therefore, at load time, all checks on pointers | |
2049 | previously done by the verifier are invalidated and must be | |
2050 | performed again, if the helper is used in combination with | |
2051 | direct packet access. | |
2052 | .TP | |
2053 | .B Return | |
2054 | 0 on success, or a negative error in case of failure. | |
2055 | .UNINDENT | |
2056 | .TP | |
2057 | .B \fBint bpf_lwt_seg6_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
2058 | .INDENT 7.0 | |
2059 | .TP | |
2060 | .B Description | |
2061 | Store \fIlen\fP bytes from address \fIfrom\fP into the packet | |
2062 | associated to \fIskb\fP, at \fIoffset\fP\&. Only the flags, tag and TLVs | |
2063 | inside the outermost IPv6 Segment Routing Header can be | |
2064 | modified through this helper. | |
2065 | .sp | |
2066 | A call to this helper is susceptible to change the underlaying | |
2067 | packet buffer. Therefore, at load time, all checks on pointers | |
2068 | previously done by the verifier are invalidated and must be | |
2069 | performed again, if the helper is used in combination with | |
2070 | direct packet access. | |
2071 | .TP | |
2072 | .B Return | |
2073 | 0 on success, or a negative error in case of failure. | |
2074 | .UNINDENT | |
2075 | .TP | |
2076 | .B \fBint bpf_lwt_seg6_adjust_srh(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, s32\fP \fIdelta\fP\fB)\fP | |
2077 | .INDENT 7.0 | |
2078 | .TP | |
2079 | .B Description | |
2080 | Adjust the size allocated to TLVs in the outermost IPv6 | |
2081 | Segment Routing Header contained in the packet associated to | |
2082 | \fIskb\fP, at position \fIoffset\fP by \fIdelta\fP bytes. Only offsets | |
2083 | after the segments are accepted. \fIdelta\fP can be as well | |
2084 | positive (growing) as negative (shrinking). | |
2085 | .sp | |
2086 | A call to this helper is susceptible to change the underlaying | |
2087 | packet buffer. Therefore, at load time, all checks on pointers | |
2088 | previously done by the verifier are invalidated and must be | |
2089 | performed again, if the helper is used in combination with | |
2090 | direct packet access. | |
2091 | .TP | |
2092 | .B Return | |
2093 | 0 on success, or a negative error in case of failure. | |
2094 | .UNINDENT | |
2095 | .TP | |
2096 | .B \fBint bpf_lwt_seg6_action(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIaction\fP\fB, void *\fP\fIparam\fP\fB, u32\fP \fIparam_len\fP\fB)\fP | |
2097 | .INDENT 7.0 | |
2098 | .TP | |
2099 | .B Description | |
2100 | Apply an IPv6 Segment Routing action of type \fIaction\fP to the | |
2101 | packet associated to \fIskb\fP\&. Each action takes a parameter | |
2102 | contained at address \fIparam\fP, and of length \fIparam_len\fP bytes. | |
2103 | \fIaction\fP can be one of: | |
2104 | .INDENT 7.0 | |
2105 | .TP | |
2106 | .B \fBSEG6_LOCAL_ACTION_END_X\fP | |
2107 | End.X action: Endpoint with Layer\-3 cross\-connect. | |
2108 | Type of \fIparam\fP: \fBstruct in6_addr\fP\&. | |
2109 | .TP | |
2110 | .B \fBSEG6_LOCAL_ACTION_END_T\fP | |
2111 | End.T action: Endpoint with specific IPv6 table lookup. | |
2112 | Type of \fIparam\fP: \fBint\fP\&. | |
2113 | .TP | |
2114 | .B \fBSEG6_LOCAL_ACTION_END_B6\fP | |
2115 | End.B6 action: Endpoint bound to an SRv6 policy. | |
2116 | Type of param: \fBstruct ipv6_sr_hdr\fP\&. | |
2117 | .TP | |
2118 | .B \fBSEG6_LOCAL_ACTION_END_B6_ENCAP\fP | |
2119 | End.B6.Encap action: Endpoint bound to an SRv6 | |
2120 | encapsulation policy. | |
2121 | Type of param: \fBstruct ipv6_sr_hdr\fP\&. | |
2122 | .UNINDENT | |
2123 | .sp | |
2124 | A call to this helper is susceptible to change the underlaying | |
2125 | packet buffer. Therefore, at load time, all checks on pointers | |
2126 | previously done by the verifier are invalidated and must be | |
2127 | performed again, if the helper is used in combination with | |
2128 | direct packet access. | |
2129 | .TP | |
2130 | .B Return | |
2131 | 0 on success, or a negative error in case of failure. | |
2132 | .UNINDENT | |
2133 | .TP | |
2134 | .B \fBint bpf_rc_keydown(void *\fP\fIctx\fP\fB, u32\fP \fIprotocol\fP\fB, u64\fP \fIscancode\fP\fB, u32\fP \fItoggle\fP\fB)\fP | |
2135 | .INDENT 7.0 | |
2136 | .TP | |
2137 | .B Description | |
2138 | This helper is used in programs implementing IR decoding, to | |
2139 | report a successfully decoded key press with \fIscancode\fP, | |
2140 | \fItoggle\fP value in the given \fIprotocol\fP\&. The scancode will be | |
2141 | translated to a keycode using the rc keymap, and reported as | |
2142 | an input key down event. After a period a key up event is | |
2143 | generated. This period can be extended by calling either | |
2144 | \fBbpf_rc_keydown\fP () again with the same values, or calling | |
2145 | \fBbpf_rc_repeat\fP (). | |
2146 | .sp | |
2147 | Some protocols include a toggle bit, in case the button was | |
2148 | released and pressed again between consecutive scancodes. | |
2149 | .sp | |
2150 | The \fIctx\fP should point to the lirc sample as passed into | |
2151 | the program. | |
2152 | .sp | |
2153 | The \fIprotocol\fP is the decoded protocol number (see | |
2154 | \fBenum rc_proto\fP for some predefined values). | |
2155 | .sp | |
2156 | This helper is only available is the kernel was compiled with | |
2157 | the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to | |
2158 | "\fBy\fP". | |
2159 | .TP | |
2160 | .B Return | |
2161 | 0 | |
2162 | .UNINDENT | |
2163 | .TP | |
2164 | .B \fBint bpf_rc_repeat(void *\fP\fIctx\fP\fB)\fP | |
2165 | .INDENT 7.0 | |
2166 | .TP | |
2167 | .B Description | |
2168 | This helper is used in programs implementing IR decoding, to | |
2169 | report a successfully decoded repeat key message. This delays | |
2170 | the generation of a key up event for previously generated | |
2171 | key down event. | |
2172 | .sp | |
2173 | Some IR protocols like NEC have a special IR message for | |
2174 | repeating last button, for when a button is held down. | |
2175 | .sp | |
2176 | The \fIctx\fP should point to the lirc sample as passed into | |
2177 | the program. | |
2178 | .sp | |
2179 | This helper is only available is the kernel was compiled with | |
2180 | the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to | |
2181 | "\fBy\fP". | |
2182 | .TP | |
2183 | .B Return | |
2184 | 0 | |
2185 | .UNINDENT | |
2186 | .TP | |
2187 | .B \fBuint64_t bpf_skb_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
2188 | .INDENT 7.0 | |
2189 | .TP | |
2190 | .B Description | |
2191 | Return the cgroup v2 id of the socket associated with the \fIskb\fP\&. | |
2192 | This is roughly similar to the \fBbpf_get_cgroup_classid\fP() | |
2193 | helper for cgroup v1 by providing a tag resp. identifier that | |
2194 | can be matched on or used for map lookups e.g. to implement | |
2195 | policy. The cgroup v2 id of a given path in the hierarchy is | |
2196 | exposed in user space through the f_handle API in order to get | |
2197 | to the same 64\-bit id. | |
2198 | .sp | |
2199 | This helper can be used on TC egress path, but not on ingress, | |
2200 | and is available only if the kernel was compiled with the | |
2201 | \fBCONFIG_SOCK_CGROUP_DATA\fP configuration option. | |
2202 | .TP | |
2203 | .B Return | |
2204 | The id is returned or 0 in case the id could not be retrieved. | |
2205 | .UNINDENT | |
2206 | .TP | |
2207 | .B \fBu64 bpf_skb_ancestor_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB, int\fP \fIancestor_level\fP\fB)\fP | |
2208 | .INDENT 7.0 | |
2209 | .TP | |
2210 | .B Description | |
2211 | Return id of cgroup v2 that is ancestor of cgroup associated | |
2212 | with the \fIskb\fP at the \fIancestor_level\fP\&. The root cgroup is at | |
2213 | \fIancestor_level\fP zero and each step down the hierarchy | |
2214 | increments the level. If \fIancestor_level\fP == level of cgroup | |
2215 | associated with \fIskb\fP, then return value will be same as that | |
2216 | of \fBbpf_skb_cgroup_id\fP(). | |
2217 | .sp | |
2218 | The helper is useful to implement policies based on cgroups | |
2219 | that are upper in hierarchy than immediate cgroup associated | |
2220 | with \fIskb\fP\&. | |
2221 | .sp | |
2222 | The format of returned id and helper limitations are same as in | |
2223 | \fBbpf_skb_cgroup_id\fP(). | |
2224 | .TP | |
2225 | .B Return | |
2226 | The id is returned or 0 in case the id could not be retrieved. | |
2227 | .UNINDENT | |
2228 | .TP | |
2229 | .B \fBu64 bpf_get_current_cgroup_id(void)\fP | |
2230 | .INDENT 7.0 | |
2231 | .TP | |
2232 | .B Return | |
2233 | A 64\-bit integer containing the current cgroup id based | |
2234 | on the cgroup within which the current task is running. | |
2235 | .UNINDENT | |
2236 | .TP | |
2237 | .B \fBvoid* get_local_storage(void *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2238 | .INDENT 7.0 | |
2239 | .TP | |
2240 | .B Description | |
2241 | Get the pointer to the local storage area. | |
2242 | The type and the size of the local storage is defined | |
2243 | by the \fImap\fP argument. | |
2244 | The \fIflags\fP meaning is specific for each map type, | |
2245 | and has to be 0 for cgroup local storage. | |
2246 | .sp | |
2247 | Depending on the bpf program type, a local storage area | |
2248 | can be shared between multiple instances of the bpf program, | |
2249 | running simultaneously. | |
2250 | .sp | |
2251 | A user should care about the synchronization by himself. | |
2252 | For example, by using the BPF_STX_XADD instruction to alter | |
2253 | the shared data. | |
2254 | .TP | |
2255 | .B Return | |
2256 | Pointer to the local storage area. | |
2257 | .UNINDENT | |
2258 | .TP | |
2259 | .B \fBint bpf_sk_select_reuseport(struct sk_reuseport_md *\fP\fIreuse\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2260 | .INDENT 7.0 | |
2261 | .TP | |
2262 | .B Description | |
2263 | Select a SO_REUSEPORT sk from a BPF_MAP_TYPE_REUSEPORT_ARRAY map | |
2264 | It checks the selected sk is matching the incoming | |
2265 | request in the skb. | |
2266 | .TP | |
2267 | .B Return | |
2268 | 0 on success, or a negative error in case of failure. | |
2269 | .UNINDENT | |
2270 | .UNINDENT | |
2271 | .SH EXAMPLES | |
2272 | .sp | |
2273 | Example usage for most of the eBPF helpers listed in this manual page are | |
2274 | available within the Linux kernel sources, at the following locations: | |
2275 | .INDENT 0.0 | |
2276 | .IP \(bu 2 | |
2277 | \fIsamples/bpf/\fP | |
2278 | .IP \(bu 2 | |
2279 | \fItools/testing/selftests/bpf/\fP | |
2280 | .UNINDENT | |
2281 | .SH LICENSE | |
2282 | .sp | |
2283 | eBPF programs can have an associated license, passed along with the bytecode | |
2284 | instructions to the kernel when the programs are loaded. The format for that | |
2285 | string is identical to the one in use for kernel modules (Dual licenses, such | |
2286 | as "Dual BSD/GPL", may be used). Some helper functions are only accessible to | |
2287 | programs that are compatible with the GNU Privacy License (GPL). | |
2288 | .sp | |
2289 | In order to use such helpers, the eBPF program must be loaded with the correct | |
2290 | license string passed (via \fBattr\fP) to the \fBbpf\fP() system call, and this | |
2291 | generally translates into the C source code of the program containing a line | |
2292 | similar to the following: | |
2293 | .INDENT 0.0 | |
2294 | .INDENT 3.5 | |
2295 | .sp | |
2296 | .nf | |
2297 | .ft C | |
2298 | char ____license[] __attribute__((section("license"), used)) = "GPL"; | |
2299 | .ft P | |
2300 | .fi | |
2301 | .UNINDENT | |
2302 | .UNINDENT | |
2303 | .SH IMPLEMENTATION | |
2304 | .sp | |
2305 | This manual page is an effort to document the existing eBPF helper functions. | |
2306 | But as of this writing, the BPF sub\-system is under heavy development. New eBPF | |
2307 | program or map types are added, along with new helper functions. Some helpers | |
2308 | are occasionally made available for additional program types. So in spite of | |
2309 | the efforts of the community, this page might not be up\-to\-date. If you want to | |
2310 | check by yourself what helper functions exist in your kernel, or what types of | |
2311 | programs they can support, here are some files among the kernel tree that you | |
2312 | may be interested in: | |
2313 | .INDENT 0.0 | |
2314 | .IP \(bu 2 | |
2315 | \fIinclude/uapi/linux/bpf.h\fP is the main BPF header. It contains the full list | |
2316 | of all helper functions, as well as many other BPF definitions including most | |
2317 | of the flags, structs or constants used by the helpers. | |
2318 | .IP \(bu 2 | |
2319 | \fInet/core/filter.c\fP contains the definition of most network\-related helper | |
2320 | functions, and the list of program types from which they can be used. | |
2321 | .IP \(bu 2 | |
2322 | \fIkernel/trace/bpf_trace.c\fP is the equivalent for most tracing program\-related | |
2323 | helpers. | |
2324 | .IP \(bu 2 | |
2325 | \fIkernel/bpf/verifier.c\fP contains the functions used to check that valid types | |
2326 | of eBPF maps are used with a given helper function. | |
2327 | .IP \(bu 2 | |
2328 | \fIkernel/bpf/\fP directory contains other files in which additional helpers are | |
2329 | defined (for cgroups, sockmaps, etc.). | |
2330 | .UNINDENT | |
2331 | .sp | |
2332 | Compatibility between helper functions and program types can generally be found | |
2333 | in the files where helper functions are defined. Look for the \fBstruct | |
2334 | bpf_func_proto\fP objects and for functions returning them: these functions | |
2335 | contain a list of helpers that a given program type can call. Note that the | |
2336 | \fBdefault:\fP label of the \fBswitch ... case\fP used to filter helpers can call | |
2337 | other functions, themselves allowing access to additional helpers. The | |
2338 | requirement for GPL license is also in those \fBstruct bpf_func_proto\fP\&. | |
2339 | .sp | |
2340 | Compatibility between helper functions and map types can be found in the | |
2341 | \fBcheck_map_func_compatibility\fP() function in file \fIkernel/bpf/verifier.c\fP\&. | |
2342 | .sp | |
2343 | Helper functions that invalidate the checks on \fBdata\fP and \fBdata_end\fP | |
2344 | pointers for network processing are listed in function | |
2345 | \fBbpf_helper_changes_pkt_data\fP() in file \fInet/core/filter.c\fP\&. | |
2346 | .SH SEE ALSO | |
2347 | .sp | |
2348 | \fBbpf\fP(2), | |
2349 | \fBcgroups\fP(7), | |
2350 | \fBip\fP(8), | |
2351 | \fBperf_event_open\fP(2), | |
2352 | \fBsendmsg\fP(2), | |
2353 | \fBsocket\fP(7), | |
2354 | \fBtc\-bpf\fP(8) | |
2355 | .\" Generated by docutils manpage writer. | |
2356 | . |