]>
Commit | Line | Data |
---|---|---|
53666f6c | 1 | .\" Man page generated from reStructuredText. |
e6107b29 | 2 | . |
e46733c4 | 3 | .TH BPF-HELPERS 7 2019-11-19 "Linux" "Linux Programmer's Manual" |
53666f6c MK |
4 | .SH NAME |
5 | BPF-HELPERS \- list of eBPF helper functions | |
e6107b29 | 6 | . |
53666f6c | 7 | .nr rst2man-indent-level 0 |
e6107b29 | 8 | . |
53666f6c MK |
9 | .de1 rstReportMargin |
10 | \\$1 \\n[an-margin] | |
11 | level \\n[rst2man-indent-level] | |
12 | level margin: \\n[rst2man-indent\\n[rst2man-indent-level]] | |
e6107b29 | 13 | - |
53666f6c MK |
14 | \\n[rst2man-indent0] |
15 | \\n[rst2man-indent1] | |
16 | \\n[rst2man-indent2] | |
17 | .. | |
18 | .de1 INDENT | |
19 | .\" .rstReportMargin pre: | |
20 | . RS \\$1 | |
21 | . nr rst2man-indent\\n[rst2man-indent-level] \\n[an-margin] | |
22 | . nr rst2man-indent-level +1 | |
23 | .\" .rstReportMargin post: | |
24 | .. | |
25 | .de UNINDENT | |
26 | . RE | |
27 | .\" indent \\n[an-margin] | |
28 | .\" old: \\n[rst2man-indent\\n[rst2man-indent-level]] | |
29 | .nr rst2man-indent-level -1 | |
30 | .\" new: \\n[rst2man-indent\\n[rst2man-indent-level]] | |
31 | .in \\n[rst2man-indent\\n[rst2man-indent-level]]u | |
32 | .. | |
e6107b29 | 33 | .\" Copyright (C) All BPF authors and contributors from 2014 to present. |
e6107b29 | 34 | .\" See git log include/uapi/linux/bpf.h in kernel tree for details. |
324f6154 | 35 | .\" |
e6107b29 | 36 | .\" %%%LICENSE_START(VERBATIM) |
e6107b29 | 37 | .\" Permission is granted to make and distribute verbatim copies of this |
e6107b29 | 38 | .\" manual provided the copyright notice and this permission notice are |
e6107b29 | 39 | .\" preserved on all copies. |
324f6154 | 40 | .\" |
e6107b29 | 41 | .\" Permission is granted to copy and distribute modified versions of this |
e6107b29 | 42 | .\" manual under the conditions for verbatim copying, provided that the |
e6107b29 | 43 | .\" entire resulting derived work is distributed under the terms of a |
e6107b29 | 44 | .\" permission notice identical to this one. |
324f6154 | 45 | .\" |
e6107b29 | 46 | .\" Since the Linux kernel and libraries are constantly changing, this |
e6107b29 | 47 | .\" manual page may be incorrect or out-of-date. The author(s) assume no |
e6107b29 | 48 | .\" responsibility for errors or omissions, or for damages resulting from |
e6107b29 | 49 | .\" the use of the information contained herein. The author(s) may not |
e6107b29 | 50 | .\" have taken the same level of care in the production of this manual, |
e6107b29 | 51 | .\" which is licensed free of charge, as they might when working |
e6107b29 | 52 | .\" professionally. |
324f6154 | 53 | .\" |
e6107b29 | 54 | .\" Formatted or processed versions of this manual, if unaccompanied by |
e6107b29 | 55 | .\" the source, must acknowledge the copyright and authors of this work. |
e6107b29 | 56 | .\" %%%LICENSE_END |
324f6154 | 57 | .\" |
e6107b29 | 58 | .\" Please do not edit this file. It was generated from the documentation |
e6107b29 | 59 | .\" located in file include/uapi/linux/bpf.h of the Linux kernel sources |
e6107b29 | 60 | .\" (helpers description), and from scripts/bpf_helpers_doc.py in the same |
e6107b29 | 61 | .\" repository (header and footer). |
53666f6c MK |
62 | .SH DESCRIPTION |
63 | .sp | |
64 | The extended Berkeley Packet Filter (eBPF) subsystem consists in programs | |
65 | written in a pseudo\-assembly language, then attached to one of the several | |
66 | kernel hooks and run in reaction of specific events. This framework differs | |
67 | from the older, "classic" BPF (or "cBPF") in several aspects, one of them being | |
68 | the ability to call special functions (or "helpers") from within a program. | |
69 | These functions are restricted to a white\-list of helpers defined in the | |
70 | kernel. | |
71 | .sp | |
72 | These helpers are used by eBPF programs to interact with the system, or with | |
73 | the context in which they work. For instance, they can be used to print | |
74 | debugging messages, to get the time since the system was booted, to interact | |
75 | with eBPF maps, or to manipulate network packets. Since there are several eBPF | |
76 | program types, and that they do not run in the same context, each program type | |
77 | can only call a subset of those helpers. | |
78 | .sp | |
79 | Due to eBPF conventions, a helper can not have more than five arguments. | |
80 | .sp | |
81 | Internally, eBPF programs call directly into the compiled helper functions | |
82 | without requiring any foreign\-function interface. As a result, calling helpers | |
83 | introduces no overhead, thus offering excellent performance. | |
84 | .sp | |
85 | This document is an attempt to list and document the helpers available to eBPF | |
86 | developers. They are sorted by chronological order (the oldest helpers in the | |
87 | kernel at the top). | |
88 | .SH HELPERS | |
89 | .INDENT 0.0 | |
90 | .TP | |
91 | .B \fBvoid *bpf_map_lookup_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP | |
92 | .INDENT 7.0 | |
93 | .TP | |
94 | .B Description | |
95 | Perform a lookup in \fImap\fP for an entry associated to \fIkey\fP\&. | |
96 | .TP | |
97 | .B Return | |
98 | Map value associated to \fIkey\fP, or \fBNULL\fP if no entry was | |
99 | found. | |
100 | .UNINDENT | |
101 | .TP | |
102 | .B \fBint bpf_map_update_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
103 | .INDENT 7.0 | |
104 | .TP | |
105 | .B Description | |
106 | Add or update the value of the entry associated to \fIkey\fP in | |
107 | \fImap\fP with \fIvalue\fP\&. \fIflags\fP is one of: | |
108 | .INDENT 7.0 | |
109 | .TP | |
110 | .B \fBBPF_NOEXIST\fP | |
111 | The entry for \fIkey\fP must not exist in the map. | |
112 | .TP | |
113 | .B \fBBPF_EXIST\fP | |
114 | The entry for \fIkey\fP must already exist in the map. | |
115 | .TP | |
116 | .B \fBBPF_ANY\fP | |
117 | No condition on the existence of the entry for \fIkey\fP\&. | |
118 | .UNINDENT | |
119 | .sp | |
120 | Flag value \fBBPF_NOEXIST\fP cannot be used for maps of types | |
121 | \fBBPF_MAP_TYPE_ARRAY\fP or \fBBPF_MAP_TYPE_PERCPU_ARRAY\fP (all | |
122 | elements always exist), the helper would return an error. | |
123 | .TP | |
124 | .B Return | |
125 | 0 on success, or a negative error in case of failure. | |
126 | .UNINDENT | |
127 | .TP | |
128 | .B \fBint bpf_map_delete_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIkey\fP\fB)\fP | |
129 | .INDENT 7.0 | |
130 | .TP | |
131 | .B Description | |
132 | Delete entry with \fIkey\fP from \fImap\fP\&. | |
133 | .TP | |
134 | .B Return | |
135 | 0 on success, or a negative error in case of failure. | |
136 | .UNINDENT | |
137 | .TP | |
138 | .B \fBint bpf_probe_read(void *\fP\fIdst\fP\fB, u32\fP \fIsize\fP\fB, const void *\fP\fIsrc\fP\fB)\fP | |
139 | .INDENT 7.0 | |
140 | .TP | |
141 | .B Description | |
142 | For tracing programs, safely attempt to read \fIsize\fP bytes from | |
143 | address \fIsrc\fP and store the data in \fIdst\fP\&. | |
144 | .TP | |
145 | .B Return | |
146 | 0 on success, or a negative error in case of failure. | |
147 | .UNINDENT | |
148 | .TP | |
149 | .B \fBu64 bpf_ktime_get_ns(void)\fP | |
150 | .INDENT 7.0 | |
151 | .TP | |
152 | .B Description | |
153 | Return the time elapsed since system boot, in nanoseconds. | |
154 | .TP | |
155 | .B Return | |
156 | Current \fIktime\fP\&. | |
157 | .UNINDENT | |
158 | .TP | |
159 | .B \fBint bpf_trace_printk(const char *\fP\fIfmt\fP\fB, u32\fP \fIfmt_size\fP\fB, ...)\fP | |
160 | .INDENT 7.0 | |
161 | .TP | |
162 | .B Description | |
163 | This helper is a "printk()\-like" facility for debugging. It | |
164 | prints a message defined by format \fIfmt\fP (of size \fIfmt_size\fP) | |
165 | to file \fI/sys/kernel/debug/tracing/trace\fP from DebugFS, if | |
166 | available. It can take up to three additional \fBu64\fP | |
167 | arguments (as an eBPF helpers, the total number of arguments is | |
168 | limited to five). | |
169 | .sp | |
170 | Each time the helper is called, it appends a line to the trace. | |
e6107b29 MK |
171 | Lines are discarded while \fI/sys/kernel/debug/tracing/trace\fP is |
172 | open, use \fI/sys/kernel/debug/tracing/trace_pipe\fP to avoid this. | |
53666f6c MK |
173 | The format of the trace is customizable, and the exact output |
174 | one will get depends on the options set in | |
175 | \fI/sys/kernel/debug/tracing/trace_options\fP (see also the | |
176 | \fIREADME\fP file under the same directory). However, it usually | |
177 | defaults to something like: | |
178 | .INDENT 7.0 | |
179 | .INDENT 3.5 | |
180 | .sp | |
181 | .nf | |
182 | .ft C | |
183 | telnet\-470 [001] .N.. 419421.045894: 0x00000001: <formatted msg> | |
184 | .ft P | |
185 | .fi | |
186 | .UNINDENT | |
187 | .UNINDENT | |
188 | .sp | |
189 | In the above: | |
190 | .INDENT 7.0 | |
191 | .INDENT 3.5 | |
192 | .INDENT 0.0 | |
193 | .IP \(bu 2 | |
194 | \fBtelnet\fP is the name of the current task. | |
195 | .IP \(bu 2 | |
196 | \fB470\fP is the PID of the current task. | |
197 | .IP \(bu 2 | |
198 | \fB001\fP is the CPU number on which the task is | |
199 | running. | |
200 | .IP \(bu 2 | |
201 | In \fB\&.N..\fP, each character refers to a set of | |
202 | options (whether irqs are enabled, scheduling | |
203 | options, whether hard/softirqs are running, level of | |
204 | preempt_disabled respectively). \fBN\fP means that | |
205 | \fBTIF_NEED_RESCHED\fP and \fBPREEMPT_NEED_RESCHED\fP | |
206 | are set. | |
207 | .IP \(bu 2 | |
208 | \fB419421.045894\fP is a timestamp. | |
209 | .IP \(bu 2 | |
210 | \fB0x00000001\fP is a fake value used by BPF for the | |
211 | instruction pointer register. | |
212 | .IP \(bu 2 | |
213 | \fB<formatted msg>\fP is the message formatted with | |
214 | \fIfmt\fP\&. | |
215 | .UNINDENT | |
216 | .UNINDENT | |
217 | .UNINDENT | |
218 | .sp | |
219 | The conversion specifiers supported by \fIfmt\fP are similar, but | |
220 | more limited than for printk(). They are \fB%d\fP, \fB%i\fP, | |
221 | \fB%u\fP, \fB%x\fP, \fB%ld\fP, \fB%li\fP, \fB%lu\fP, \fB%lx\fP, \fB%lld\fP, | |
222 | \fB%lli\fP, \fB%llu\fP, \fB%llx\fP, \fB%p\fP, \fB%s\fP\&. No modifier (size | |
223 | of field, padding with zeroes, etc.) is available, and the | |
224 | helper will return \fB\-EINVAL\fP (but print nothing) if it | |
225 | encounters an unknown specifier. | |
226 | .sp | |
227 | Also, note that \fBbpf_trace_printk\fP() is slow, and should | |
228 | only be used for debugging purposes. For this reason, a notice | |
229 | bloc (spanning several lines) is printed to kernel logs and | |
230 | states that the helper should not be used "for production use" | |
231 | the first time this helper is used (or more precisely, when | |
232 | \fBtrace_printk\fP() buffers are allocated). For passing values | |
233 | to user space, perf events should be preferred. | |
234 | .TP | |
235 | .B Return | |
236 | The number of bytes written to the buffer, or a negative error | |
237 | in case of failure. | |
238 | .UNINDENT | |
239 | .TP | |
240 | .B \fBu32 bpf_get_prandom_u32(void)\fP | |
241 | .INDENT 7.0 | |
242 | .TP | |
243 | .B Description | |
244 | Get a pseudo\-random number. | |
245 | .sp | |
246 | From a security point of view, this helper uses its own | |
247 | pseudo\-random internal state, and cannot be used to infer the | |
248 | seed of other random functions in the kernel. However, it is | |
249 | essential to note that the generator used by the helper is not | |
250 | cryptographically secure. | |
251 | .TP | |
252 | .B Return | |
253 | A random 32\-bit unsigned value. | |
254 | .UNINDENT | |
255 | .TP | |
256 | .B \fBu32 bpf_get_smp_processor_id(void)\fP | |
257 | .INDENT 7.0 | |
258 | .TP | |
259 | .B Description | |
260 | Get the SMP (symmetric multiprocessing) processor id. Note that | |
261 | all programs run with preemption disabled, which means that the | |
262 | SMP processor id is stable during all the execution of the | |
263 | program. | |
264 | .TP | |
265 | .B Return | |
266 | The SMP id of the processor running the program. | |
267 | .UNINDENT | |
268 | .TP | |
269 | .B \fBint bpf_skb_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
270 | .INDENT 7.0 | |
271 | .TP | |
272 | .B Description | |
273 | Store \fIlen\fP bytes from address \fIfrom\fP into the packet | |
274 | associated to \fIskb\fP, at \fIoffset\fP\&. \fIflags\fP are a combination of | |
275 | \fBBPF_F_RECOMPUTE_CSUM\fP (automatically recompute the | |
276 | checksum for the packet after storing the bytes) and | |
277 | \fBBPF_F_INVALIDATE_HASH\fP (set \fIskb\fP\fB\->hash\fP, \fIskb\fP\fB\->swhash\fP and \fIskb\fP\fB\->l4hash\fP to 0). | |
278 | .sp | |
e6107b29 | 279 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
280 | packet buffer. Therefore, at load time, all checks on pointers |
281 | previously done by the verifier are invalidated and must be | |
282 | performed again, if the helper is used in combination with | |
283 | direct packet access. | |
284 | .TP | |
285 | .B Return | |
286 | 0 on success, or a negative error in case of failure. | |
287 | .UNINDENT | |
288 | .TP | |
289 | .B \fBint bpf_l3_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIsize\fP\fB)\fP | |
290 | .INDENT 7.0 | |
291 | .TP | |
292 | .B Description | |
293 | Recompute the layer 3 (e.g. IP) checksum for the packet | |
294 | associated to \fIskb\fP\&. Computation is incremental, so the helper | |
295 | must know the former value of the header field that was | |
296 | modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the | |
297 | number of bytes (2 or 4) for this field, stored in \fIsize\fP\&. | |
298 | Alternatively, it is possible to store the difference between | |
299 | the previous and the new values of the header field in \fIto\fP, by | |
300 | setting \fIfrom\fP and \fIsize\fP to 0. For both methods, \fIoffset\fP | |
301 | indicates the location of the IP checksum within the packet. | |
302 | .sp | |
303 | This helper works in combination with \fBbpf_csum_diff\fP(), | |
304 | which does not update the checksum in\-place, but offers more | |
305 | flexibility and can handle sizes larger than 2 or 4 for the | |
306 | checksum to update. | |
307 | .sp | |
e6107b29 | 308 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
309 | packet buffer. Therefore, at load time, all checks on pointers |
310 | previously done by the verifier are invalidated and must be | |
311 | performed again, if the helper is used in combination with | |
312 | direct packet access. | |
313 | .TP | |
314 | .B Return | |
315 | 0 on success, or a negative error in case of failure. | |
316 | .UNINDENT | |
317 | .TP | |
318 | .B \fBint bpf_l4_csum_replace(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, u64\fP \fIfrom\fP\fB, u64\fP \fIto\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
319 | .INDENT 7.0 | |
320 | .TP | |
321 | .B Description | |
322 | Recompute the layer 4 (e.g. TCP, UDP or ICMP) checksum for the | |
323 | packet associated to \fIskb\fP\&. Computation is incremental, so the | |
324 | helper must know the former value of the header field that was | |
325 | modified (\fIfrom\fP), the new value of this field (\fIto\fP), and the | |
326 | number of bytes (2 or 4) for this field, stored on the lowest | |
327 | four bits of \fIflags\fP\&. Alternatively, it is possible to store | |
328 | the difference between the previous and the new values of the | |
329 | header field in \fIto\fP, by setting \fIfrom\fP and the four lowest | |
330 | bits of \fIflags\fP to 0. For both methods, \fIoffset\fP indicates the | |
331 | location of the IP checksum within the packet. In addition to | |
332 | the size of the field, \fIflags\fP can be added (bitwise OR) actual | |
333 | flags. With \fBBPF_F_MARK_MANGLED_0\fP, a null checksum is left | |
334 | untouched (unless \fBBPF_F_MARK_ENFORCE\fP is added as well), and | |
335 | for updates resulting in a null checksum the value is set to | |
336 | \fBCSUM_MANGLED_0\fP instead. Flag \fBBPF_F_PSEUDO_HDR\fP indicates | |
337 | the checksum is to be computed against a pseudo\-header. | |
338 | .sp | |
339 | This helper works in combination with \fBbpf_csum_diff\fP(), | |
340 | which does not update the checksum in\-place, but offers more | |
341 | flexibility and can handle sizes larger than 2 or 4 for the | |
342 | checksum to update. | |
343 | .sp | |
e6107b29 | 344 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
345 | packet buffer. Therefore, at load time, all checks on pointers |
346 | previously done by the verifier are invalidated and must be | |
347 | performed again, if the helper is used in combination with | |
348 | direct packet access. | |
349 | .TP | |
350 | .B Return | |
351 | 0 on success, or a negative error in case of failure. | |
352 | .UNINDENT | |
353 | .TP | |
354 | .B \fBint bpf_tail_call(void *\fP\fIctx\fP\fB, struct bpf_map *\fP\fIprog_array_map\fP\fB, u32\fP \fIindex\fP\fB)\fP | |
355 | .INDENT 7.0 | |
356 | .TP | |
357 | .B Description | |
358 | This special helper is used to trigger a "tail call", or in | |
359 | other words, to jump into another eBPF program. The same stack | |
360 | frame is used (but values on stack and in registers for the | |
361 | caller are not accessible to the callee). This mechanism allows | |
362 | for program chaining, either for raising the maximum number of | |
363 | available eBPF instructions, or to execute given programs in | |
364 | conditional blocks. For security reasons, there is an upper | |
365 | limit to the number of successive tail calls that can be | |
366 | performed. | |
367 | .sp | |
368 | Upon call of this helper, the program attempts to jump into a | |
369 | program referenced at index \fIindex\fP in \fIprog_array_map\fP, a | |
370 | special map of type \fBBPF_MAP_TYPE_PROG_ARRAY\fP, and passes | |
371 | \fIctx\fP, a pointer to the context. | |
372 | .sp | |
373 | If the call succeeds, the kernel immediately runs the first | |
374 | instruction of the new program. This is not a function call, | |
375 | and it never returns to the previous program. If the call | |
376 | fails, then the helper has no effect, and the caller continues | |
377 | to run its subsequent instructions. A call can fail if the | |
378 | destination program for the jump does not exist (i.e. \fIindex\fP | |
379 | is superior to the number of entries in \fIprog_array_map\fP), or | |
380 | if the maximum number of tail calls has been reached for this | |
381 | chain of programs. This limit is defined in the kernel by the | |
382 | macro \fBMAX_TAIL_CALL_CNT\fP (not accessible to user space), | |
383 | which is currently set to 32. | |
384 | .TP | |
385 | .B Return | |
386 | 0 on success, or a negative error in case of failure. | |
387 | .UNINDENT | |
388 | .TP | |
389 | .B \fBint bpf_clone_redirect(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
390 | .INDENT 7.0 | |
391 | .TP | |
392 | .B Description | |
393 | Clone and redirect the packet associated to \fIskb\fP to another | |
394 | net device of index \fIifindex\fP\&. Both ingress and egress | |
395 | interfaces can be used for redirection. The \fBBPF_F_INGRESS\fP | |
396 | value in \fIflags\fP is used to make the distinction (ingress path | |
397 | is selected if the flag is present, egress path otherwise). | |
398 | This is the only flag supported for now. | |
399 | .sp | |
400 | In comparison with \fBbpf_redirect\fP() helper, | |
401 | \fBbpf_clone_redirect\fP() has the associated cost of | |
402 | duplicating the packet buffer, but this can be executed out of | |
403 | the eBPF program. Conversely, \fBbpf_redirect\fP() is more | |
404 | efficient, but it is handled through an action code where the | |
405 | redirection happens only after the eBPF program has returned. | |
406 | .sp | |
e6107b29 | 407 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
408 | packet buffer. Therefore, at load time, all checks on pointers |
409 | previously done by the verifier are invalidated and must be | |
410 | performed again, if the helper is used in combination with | |
411 | direct packet access. | |
412 | .TP | |
413 | .B Return | |
414 | 0 on success, or a negative error in case of failure. | |
415 | .UNINDENT | |
416 | .TP | |
417 | .B \fBu64 bpf_get_current_pid_tgid(void)\fP | |
418 | .INDENT 7.0 | |
419 | .TP | |
420 | .B Return | |
421 | A 64\-bit integer containing the current tgid and pid, and | |
422 | created as such: | |
423 | \fIcurrent_task\fP\fB\->tgid << 32 |\fP | |
424 | \fIcurrent_task\fP\fB\->pid\fP\&. | |
425 | .UNINDENT | |
426 | .TP | |
427 | .B \fBu64 bpf_get_current_uid_gid(void)\fP | |
428 | .INDENT 7.0 | |
429 | .TP | |
430 | .B Return | |
431 | A 64\-bit integer containing the current GID and UID, and | |
432 | created as such: \fIcurrent_gid\fP \fB<< 32 |\fP \fIcurrent_uid\fP\&. | |
433 | .UNINDENT | |
434 | .TP | |
435 | .B \fBint bpf_get_current_comm(char *\fP\fIbuf\fP\fB, u32\fP \fIsize_of_buf\fP\fB)\fP | |
436 | .INDENT 7.0 | |
437 | .TP | |
438 | .B Description | |
439 | Copy the \fBcomm\fP attribute of the current task into \fIbuf\fP of | |
440 | \fIsize_of_buf\fP\&. The \fBcomm\fP attribute contains the name of | |
441 | the executable (excluding the path) for the current task. The | |
442 | \fIsize_of_buf\fP must be strictly positive. On success, the | |
443 | helper makes sure that the \fIbuf\fP is NUL\-terminated. On failure, | |
444 | it is filled with zeroes. | |
445 | .TP | |
446 | .B Return | |
447 | 0 on success, or a negative error in case of failure. | |
448 | .UNINDENT | |
449 | .TP | |
450 | .B \fBu32 bpf_get_cgroup_classid(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
451 | .INDENT 7.0 | |
452 | .TP | |
453 | .B Description | |
454 | Retrieve the classid for the current task, i.e. for the net_cls | |
455 | cgroup to which \fIskb\fP belongs. | |
456 | .sp | |
457 | This helper can be used on TC egress path, but not on ingress. | |
458 | .sp | |
459 | The net_cls cgroup provides an interface to tag network packets | |
460 | based on a user\-provided identifier for all traffic coming from | |
461 | the tasks belonging to the related cgroup. See also the related | |
462 | kernel documentation, available from the Linux sources in file | |
e6107b29 | 463 | \fIDocumentation/admin\-guide/cgroup\-v1/net_cls.rst\fP\&. |
53666f6c MK |
464 | .sp |
465 | The Linux kernel has two versions for cgroups: there are | |
466 | cgroups v1 and cgroups v2. Both are available to users, who can | |
467 | use a mixture of them, but note that the net_cls cgroup is for | |
468 | cgroup v1 only. This makes it incompatible with BPF programs | |
469 | run on cgroups, which is a cgroup\-v2\-only feature (a socket can | |
470 | only hold data for one version of cgroups at a time). | |
471 | .sp | |
472 | This helper is only available is the kernel was compiled with | |
473 | the \fBCONFIG_CGROUP_NET_CLASSID\fP configuration option set to | |
474 | "\fBy\fP" or to "\fBm\fP". | |
475 | .TP | |
476 | .B Return | |
477 | The classid, or 0 for the default unconfigured classid. | |
478 | .UNINDENT | |
479 | .TP | |
480 | .B \fBint bpf_skb_vlan_push(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIvlan_proto\fP\fB, u16\fP \fIvlan_tci\fP\fB)\fP | |
481 | .INDENT 7.0 | |
482 | .TP | |
483 | .B Description | |
484 | Push a \fIvlan_tci\fP (VLAN tag control information) of protocol | |
485 | \fIvlan_proto\fP to the packet associated to \fIskb\fP, then update | |
486 | the checksum. Note that if \fIvlan_proto\fP is different from | |
487 | \fBETH_P_8021Q\fP and \fBETH_P_8021AD\fP, it is considered to | |
488 | be \fBETH_P_8021Q\fP\&. | |
489 | .sp | |
e6107b29 | 490 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
491 | packet buffer. Therefore, at load time, all checks on pointers |
492 | previously done by the verifier are invalidated and must be | |
493 | performed again, if the helper is used in combination with | |
494 | direct packet access. | |
495 | .TP | |
496 | .B Return | |
497 | 0 on success, or a negative error in case of failure. | |
498 | .UNINDENT | |
499 | .TP | |
500 | .B \fBint bpf_skb_vlan_pop(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
501 | .INDENT 7.0 | |
502 | .TP | |
503 | .B Description | |
504 | Pop a VLAN header from the packet associated to \fIskb\fP\&. | |
505 | .sp | |
e6107b29 | 506 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
507 | packet buffer. Therefore, at load time, all checks on pointers |
508 | previously done by the verifier are invalidated and must be | |
509 | performed again, if the helper is used in combination with | |
510 | direct packet access. | |
511 | .TP | |
512 | .B Return | |
513 | 0 on success, or a negative error in case of failure. | |
514 | .UNINDENT | |
515 | .TP | |
516 | .B \fBint bpf_skb_get_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
517 | .INDENT 7.0 | |
518 | .TP | |
519 | .B Description | |
520 | Get tunnel metadata. This helper takes a pointer \fIkey\fP to an | |
521 | empty \fBstruct bpf_tunnel_key\fP of \fBsize\fP, that will be | |
522 | filled with tunnel metadata for the packet associated to \fIskb\fP\&. | |
523 | The \fIflags\fP can be set to \fBBPF_F_TUNINFO_IPV6\fP, which | |
524 | indicates that the tunnel is based on IPv6 protocol instead of | |
525 | IPv4. | |
526 | .sp | |
527 | The \fBstruct bpf_tunnel_key\fP is an object that generalizes the | |
528 | principal parameters used by various tunneling protocols into a | |
529 | single struct. This way, it can be used to easily make a | |
530 | decision based on the contents of the encapsulation header, | |
531 | "summarized" in this struct. In particular, it holds the IP | |
532 | address of the remote end (IPv4 or IPv6, depending on the case) | |
533 | in \fIkey\fP\fB\->remote_ipv4\fP or \fIkey\fP\fB\->remote_ipv6\fP\&. Also, | |
534 | this struct exposes the \fIkey\fP\fB\->tunnel_id\fP, which is | |
535 | generally mapped to a VNI (Virtual Network Identifier), making | |
536 | it programmable together with the \fBbpf_skb_set_tunnel_key\fP() helper. | |
537 | .sp | |
538 | Let\(aqs imagine that the following code is part of a program | |
539 | attached to the TC ingress interface, on one end of a GRE | |
540 | tunnel, and is supposed to filter out all messages coming from | |
541 | remote ends with IPv4 address other than 10.0.0.1: | |
542 | .INDENT 7.0 | |
543 | .INDENT 3.5 | |
544 | .sp | |
545 | .nf | |
546 | .ft C | |
547 | int ret; | |
548 | struct bpf_tunnel_key key = {}; | |
549 | ||
550 | ret = bpf_skb_get_tunnel_key(skb, &key, sizeof(key), 0); | |
551 | if (ret < 0) | |
552 | return TC_ACT_SHOT; // drop packet | |
553 | ||
554 | if (key.remote_ipv4 != 0x0a000001) | |
555 | return TC_ACT_SHOT; // drop packet | |
556 | ||
557 | return TC_ACT_OK; // accept packet | |
558 | .ft P | |
559 | .fi | |
560 | .UNINDENT | |
561 | .UNINDENT | |
562 | .sp | |
563 | This interface can also be used with all encapsulation devices | |
564 | that can operate in "collect metadata" mode: instead of having | |
565 | one network device per specific configuration, the "collect | |
566 | metadata" mode only requires a single device where the | |
567 | configuration can be extracted from this helper. | |
568 | .sp | |
569 | This can be used together with various tunnels such as VXLan, | |
570 | Geneve, GRE or IP in IP (IPIP). | |
571 | .TP | |
572 | .B Return | |
573 | 0 on success, or a negative error in case of failure. | |
574 | .UNINDENT | |
575 | .TP | |
576 | .B \fBint bpf_skb_set_tunnel_key(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_tunnel_key *\fP\fIkey\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
577 | .INDENT 7.0 | |
578 | .TP | |
579 | .B Description | |
580 | Populate tunnel metadata for packet associated to \fIskb.\fP The | |
581 | tunnel metadata is set to the contents of \fIkey\fP, of \fIsize\fP\&. The | |
582 | \fIflags\fP can be set to a combination of the following values: | |
583 | .INDENT 7.0 | |
584 | .TP | |
585 | .B \fBBPF_F_TUNINFO_IPV6\fP | |
586 | Indicate that the tunnel is based on IPv6 protocol | |
587 | instead of IPv4. | |
588 | .TP | |
589 | .B \fBBPF_F_ZERO_CSUM_TX\fP | |
590 | For IPv4 packets, add a flag to tunnel metadata | |
591 | indicating that checksum computation should be skipped | |
592 | and checksum set to zeroes. | |
593 | .TP | |
594 | .B \fBBPF_F_DONT_FRAGMENT\fP | |
595 | Add a flag to tunnel metadata indicating that the | |
596 | packet should not be fragmented. | |
597 | .TP | |
598 | .B \fBBPF_F_SEQ_NUMBER\fP | |
599 | Add a flag to tunnel metadata indicating that a | |
600 | sequence number should be added to tunnel header before | |
601 | sending the packet. This flag was added for GRE | |
602 | encapsulation, but might be used with other protocols | |
603 | as well in the future. | |
604 | .UNINDENT | |
605 | .sp | |
606 | Here is a typical usage on the transmit path: | |
607 | .INDENT 7.0 | |
608 | .INDENT 3.5 | |
609 | .sp | |
610 | .nf | |
611 | .ft C | |
612 | struct bpf_tunnel_key key; | |
613 | populate key ... | |
614 | bpf_skb_set_tunnel_key(skb, &key, sizeof(key), 0); | |
615 | bpf_clone_redirect(skb, vxlan_dev_ifindex, 0); | |
616 | .ft P | |
617 | .fi | |
618 | .UNINDENT | |
619 | .UNINDENT | |
620 | .sp | |
621 | See also the description of the \fBbpf_skb_get_tunnel_key\fP() | |
622 | helper for additional information. | |
623 | .TP | |
624 | .B Return | |
625 | 0 on success, or a negative error in case of failure. | |
626 | .UNINDENT | |
627 | .TP | |
628 | .B \fBu64 bpf_perf_event_read(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
629 | .INDENT 7.0 | |
630 | .TP | |
631 | .B Description | |
632 | Read the value of a perf event counter. This helper relies on a | |
633 | \fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of | |
634 | the perf event counter is selected when \fImap\fP is updated with | |
635 | perf event file descriptors. The \fImap\fP is an array whose size | |
636 | is the number of available CPUs, and each cell contains a value | |
637 | relative to one CPU. The value to retrieve is indicated by | |
638 | \fIflags\fP, that contains the index of the CPU to look up, masked | |
639 | with \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to | |
640 | \fBBPF_F_CURRENT_CPU\fP to indicate that the value for the | |
641 | current CPU should be retrieved. | |
642 | .sp | |
643 | Note that before Linux 4.13, only hardware perf event can be | |
644 | retrieved. | |
645 | .sp | |
646 | Also, be aware that the newer helper | |
647 | \fBbpf_perf_event_read_value\fP() is recommended over | |
648 | \fBbpf_perf_event_read\fP() in general. The latter has some ABI | |
649 | quirks where error and counter value are used as a return code | |
650 | (which is wrong to do since ranges may overlap). This issue is | |
651 | fixed with \fBbpf_perf_event_read_value\fP(), which at the same | |
652 | time provides more features over the \fBbpf_perf_event_read\fP() interface. Please refer to the description of | |
653 | \fBbpf_perf_event_read_value\fP() for details. | |
654 | .TP | |
655 | .B Return | |
656 | The value of the perf event counter read from the map, or a | |
657 | negative error code in case of failure. | |
658 | .UNINDENT | |
659 | .TP | |
660 | .B \fBint bpf_redirect(u32\fP \fIifindex\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
661 | .INDENT 7.0 | |
662 | .TP | |
663 | .B Description | |
664 | Redirect the packet to another net device of index \fIifindex\fP\&. | |
665 | This helper is somewhat similar to \fBbpf_clone_redirect\fP(), except that the packet is not cloned, which provides | |
666 | increased performance. | |
667 | .sp | |
668 | Except for XDP, both ingress and egress interfaces can be used | |
669 | for redirection. The \fBBPF_F_INGRESS\fP value in \fIflags\fP is used | |
670 | to make the distinction (ingress path is selected if the flag | |
671 | is present, egress path otherwise). Currently, XDP only | |
672 | supports redirection to the egress interface, and accepts no | |
673 | flag at all. | |
674 | .sp | |
675 | The same effect can be attained with the more generic | |
676 | \fBbpf_redirect_map\fP(), which requires specific maps to be | |
677 | used but offers better performance. | |
678 | .TP | |
679 | .B Return | |
680 | For XDP, the helper returns \fBXDP_REDIRECT\fP on success or | |
681 | \fBXDP_ABORTED\fP on error. For other program types, the values | |
682 | are \fBTC_ACT_REDIRECT\fP on success or \fBTC_ACT_SHOT\fP on | |
683 | error. | |
684 | .UNINDENT | |
685 | .TP | |
686 | .B \fBu32 bpf_get_route_realm(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
687 | .INDENT 7.0 | |
688 | .TP | |
689 | .B Description | |
690 | Retrieve the realm or the route, that is to say the | |
691 | \fBtclassid\fP field of the destination for the \fIskb\fP\&. The | |
692 | indentifier retrieved is a user\-provided tag, similar to the | |
693 | one used with the net_cls cgroup (see description for | |
694 | \fBbpf_get_cgroup_classid\fP() helper), but here this tag is | |
695 | held by a route (a destination entry), not by a task. | |
696 | .sp | |
697 | Retrieving this identifier works with the clsact TC egress hook | |
698 | (see also \fBtc\-bpf(8)\fP), or alternatively on conventional | |
699 | classful egress qdiscs, but not on TC ingress path. In case of | |
700 | clsact TC egress hook, this has the advantage that, internally, | |
701 | the destination entry has not been dropped yet in the transmit | |
702 | path. Therefore, the destination entry does not need to be | |
703 | artificially held via \fBnetif_keep_dst\fP() for a classful | |
704 | qdisc until the \fIskb\fP is freed. | |
705 | .sp | |
706 | This helper is available only if the kernel was compiled with | |
707 | \fBCONFIG_IP_ROUTE_CLASSID\fP configuration option. | |
708 | .TP | |
709 | .B Return | |
710 | The realm of the route for the packet associated to \fIskb\fP, or 0 | |
711 | if none was found. | |
712 | .UNINDENT | |
713 | .TP | |
e6107b29 | 714 | .B \fBint bpf_perf_event_output(struct pt_regs *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, void *\fP\fIdata\fP\fB, u64\fP \fIsize\fP\fB)\fP |
53666f6c MK |
715 | .INDENT 7.0 |
716 | .TP | |
717 | .B Description | |
718 | Write raw \fIdata\fP blob into a special BPF perf event held by | |
719 | \fImap\fP of type \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. This perf | |
720 | event must have the following attributes: \fBPERF_SAMPLE_RAW\fP | |
721 | as \fBsample_type\fP, \fBPERF_TYPE_SOFTWARE\fP as \fBtype\fP, and | |
722 | \fBPERF_COUNT_SW_BPF_OUTPUT\fP as \fBconfig\fP\&. | |
723 | .sp | |
724 | The \fIflags\fP are used to indicate the index in \fImap\fP for which | |
725 | the value must be put, masked with \fBBPF_F_INDEX_MASK\fP\&. | |
726 | Alternatively, \fIflags\fP can be set to \fBBPF_F_CURRENT_CPU\fP | |
727 | to indicate that the index of the current CPU core should be | |
728 | used. | |
729 | .sp | |
730 | The value to write, of \fIsize\fP, is passed through eBPF stack and | |
731 | pointed by \fIdata\fP\&. | |
732 | .sp | |
733 | The context of the program \fIctx\fP needs also be passed to the | |
734 | helper. | |
735 | .sp | |
736 | On user space, a program willing to read the values needs to | |
737 | call \fBperf_event_open\fP() on the perf event (either for | |
738 | one or for all CPUs) and to store the file descriptor into the | |
739 | \fImap\fP\&. This must be done before the eBPF program can send data | |
740 | into it. An example is available in file | |
741 | \fIsamples/bpf/trace_output_user.c\fP in the Linux kernel source | |
742 | tree (the eBPF program counterpart is in | |
743 | \fIsamples/bpf/trace_output_kern.c\fP). | |
744 | .sp | |
745 | \fBbpf_perf_event_output\fP() achieves better performance | |
746 | than \fBbpf_trace_printk\fP() for sharing data with user | |
747 | space, and is much better suitable for streaming data from eBPF | |
748 | programs. | |
749 | .sp | |
750 | Note that this helper is not restricted to tracing use cases | |
751 | and can be used with programs attached to TC or XDP as well, | |
752 | where it allows for passing data to user space listeners. Data | |
753 | can be: | |
754 | .INDENT 7.0 | |
755 | .IP \(bu 2 | |
756 | Only custom structs, | |
757 | .IP \(bu 2 | |
758 | Only the packet payload, or | |
759 | .IP \(bu 2 | |
760 | A combination of both. | |
761 | .UNINDENT | |
762 | .TP | |
763 | .B Return | |
764 | 0 on success, or a negative error in case of failure. | |
765 | .UNINDENT | |
766 | .TP | |
767 | .B \fBint bpf_skb_load_bytes(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
768 | .INDENT 7.0 | |
769 | .TP | |
770 | .B Description | |
771 | This helper was provided as an easy way to load data from a | |
772 | packet. It can be used to load \fIlen\fP bytes from \fIoffset\fP from | |
773 | the packet associated to \fIskb\fP, into the buffer pointed by | |
774 | \fIto\fP\&. | |
775 | .sp | |
776 | Since Linux 4.7, usage of this helper has mostly been replaced | |
777 | by "direct packet access", enabling packet data to be | |
778 | manipulated with \fIskb\fP\fB\->data\fP and \fIskb\fP\fB\->data_end\fP | |
779 | pointing respectively to the first byte of packet data and to | |
780 | the byte after the last byte of packet data. However, it | |
781 | remains useful if one wishes to read large quantities of data | |
782 | at once from a packet into the eBPF stack. | |
783 | .TP | |
784 | .B Return | |
785 | 0 on success, or a negative error in case of failure. | |
786 | .UNINDENT | |
787 | .TP | |
e6107b29 | 788 | .B \fBint bpf_get_stackid(struct pt_regs *\fP\fIctx\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP |
53666f6c MK |
789 | .INDENT 7.0 |
790 | .TP | |
791 | .B Description | |
792 | Walk a user or a kernel stack and return its id. To achieve | |
793 | this, the helper needs \fIctx\fP, which is a pointer to the context | |
794 | on which the tracing program is executed, and a pointer to a | |
795 | \fImap\fP of type \fBBPF_MAP_TYPE_STACK_TRACE\fP\&. | |
796 | .sp | |
797 | The last argument, \fIflags\fP, holds the number of stack frames to | |
798 | skip (from 0 to 255), masked with | |
799 | \fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set | |
800 | a combination of the following flags: | |
801 | .INDENT 7.0 | |
802 | .TP | |
803 | .B \fBBPF_F_USER_STACK\fP | |
804 | Collect a user space stack instead of a kernel stack. | |
805 | .TP | |
806 | .B \fBBPF_F_FAST_STACK_CMP\fP | |
807 | Compare stacks by hash only. | |
808 | .TP | |
809 | .B \fBBPF_F_REUSE_STACKID\fP | |
810 | If two different stacks hash into the same \fIstackid\fP, | |
811 | discard the old one. | |
812 | .UNINDENT | |
813 | .sp | |
814 | The stack id retrieved is a 32 bit long integer handle which | |
815 | can be further combined with other data (including other stack | |
816 | ids) and used as a key into maps. This can be useful for | |
817 | generating a variety of graphs (such as flame graphs or off\-cpu | |
818 | graphs). | |
819 | .sp | |
820 | For walking a stack, this helper is an improvement over | |
821 | \fBbpf_probe_read\fP(), which can be used with unrolled loops | |
822 | but is not efficient and consumes a lot of eBPF instructions. | |
823 | Instead, \fBbpf_get_stackid\fP() can collect up to | |
824 | \fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames. Note that | |
825 | this limit can be controlled with the \fBsysctl\fP program, and | |
826 | that it should be manually increased in order to profile long | |
827 | user stacks (such as stacks for Java programs). To do so, use: | |
828 | .INDENT 7.0 | |
829 | .INDENT 3.5 | |
830 | .sp | |
831 | .nf | |
832 | .ft C | |
833 | # sysctl kernel.perf_event_max_stack=<new value> | |
834 | .ft P | |
835 | .fi | |
836 | .UNINDENT | |
837 | .UNINDENT | |
838 | .TP | |
839 | .B Return | |
840 | The positive or null stack id on success, or a negative error | |
841 | in case of failure. | |
842 | .UNINDENT | |
843 | .TP | |
844 | .B \fBs64 bpf_csum_diff(__be32 *\fP\fIfrom\fP\fB, u32\fP \fIfrom_size\fP\fB, __be32 *\fP\fIto\fP\fB, u32\fP \fIto_size\fP\fB, __wsum\fP \fIseed\fP\fB)\fP | |
845 | .INDENT 7.0 | |
846 | .TP | |
847 | .B Description | |
848 | Compute a checksum difference, from the raw buffer pointed by | |
849 | \fIfrom\fP, of length \fIfrom_size\fP (that must be a multiple of 4), | |
850 | towards the raw buffer pointed by \fIto\fP, of size \fIto_size\fP | |
851 | (same remark). An optional \fIseed\fP can be added to the value | |
852 | (this can be cascaded, the seed may come from a previous call | |
853 | to the helper). | |
854 | .sp | |
855 | This is flexible enough to be used in several ways: | |
856 | .INDENT 7.0 | |
857 | .IP \(bu 2 | |
858 | With \fIfrom_size\fP == 0, \fIto_size\fP > 0 and \fIseed\fP set to | |
859 | checksum, it can be used when pushing new data. | |
860 | .IP \(bu 2 | |
861 | With \fIfrom_size\fP > 0, \fIto_size\fP == 0 and \fIseed\fP set to | |
862 | checksum, it can be used when removing data from a packet. | |
863 | .IP \(bu 2 | |
864 | With \fIfrom_size\fP > 0, \fIto_size\fP > 0 and \fIseed\fP set to 0, it | |
865 | can be used to compute a diff. Note that \fIfrom_size\fP and | |
866 | \fIto_size\fP do not need to be equal. | |
867 | .UNINDENT | |
868 | .sp | |
869 | This helper can be used in combination with | |
870 | \fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(), to | |
871 | which one can feed in the difference computed with | |
872 | \fBbpf_csum_diff\fP(). | |
873 | .TP | |
874 | .B Return | |
875 | The checksum result, or a negative error code in case of | |
876 | failure. | |
877 | .UNINDENT | |
878 | .TP | |
879 | .B \fBint bpf_skb_get_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP | |
880 | .INDENT 7.0 | |
881 | .TP | |
882 | .B Description | |
883 | Retrieve tunnel options metadata for the packet associated to | |
884 | \fIskb\fP, and store the raw tunnel option data to the buffer \fIopt\fP | |
885 | of \fIsize\fP\&. | |
886 | .sp | |
887 | This helper can be used with encapsulation devices that can | |
888 | operate in "collect metadata" mode (please refer to the related | |
889 | note in the description of \fBbpf_skb_get_tunnel_key\fP() for | |
890 | more details). A particular example where this can be used is | |
891 | in combination with the Geneve encapsulation protocol, where it | |
892 | allows for pushing (with \fBbpf_skb_get_tunnel_opt\fP() helper) | |
893 | and retrieving arbitrary TLVs (Type\-Length\-Value headers) from | |
894 | the eBPF program. This allows for full customization of these | |
895 | headers. | |
896 | .TP | |
897 | .B Return | |
898 | The size of the option data retrieved. | |
899 | .UNINDENT | |
900 | .TP | |
901 | .B \fBint bpf_skb_set_tunnel_opt(struct sk_buff *\fP\fIskb\fP\fB, u8 *\fP\fIopt\fP\fB, u32\fP \fIsize\fP\fB)\fP | |
902 | .INDENT 7.0 | |
903 | .TP | |
904 | .B Description | |
905 | Set tunnel options metadata for the packet associated to \fIskb\fP | |
906 | to the option data contained in the raw buffer \fIopt\fP of \fIsize\fP\&. | |
907 | .sp | |
908 | See also the description of the \fBbpf_skb_get_tunnel_opt\fP() | |
909 | helper for additional information. | |
910 | .TP | |
911 | .B Return | |
912 | 0 on success, or a negative error in case of failure. | |
913 | .UNINDENT | |
914 | .TP | |
915 | .B \fBint bpf_skb_change_proto(struct sk_buff *\fP\fIskb\fP\fB, __be16\fP \fIproto\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
916 | .INDENT 7.0 | |
917 | .TP | |
918 | .B Description | |
919 | Change the protocol of the \fIskb\fP to \fIproto\fP\&. Currently | |
920 | supported are transition from IPv4 to IPv6, and from IPv6 to | |
921 | IPv4. The helper takes care of the groundwork for the | |
922 | transition, including resizing the socket buffer. The eBPF | |
923 | program is expected to fill the new headers, if any, via | |
924 | \fBskb_store_bytes\fP() and to recompute the checksums with | |
925 | \fBbpf_l3_csum_replace\fP() and \fBbpf_l4_csum_replace\fP(). The main case for this helper is to perform NAT64 | |
926 | operations out of an eBPF program. | |
927 | .sp | |
928 | Internally, the GSO type is marked as dodgy so that headers are | |
929 | checked and segments are recalculated by the GSO/GRO engine. | |
930 | The size for GSO target is adapted as well. | |
931 | .sp | |
932 | All values for \fIflags\fP are reserved for future usage, and must | |
933 | be left at zero. | |
934 | .sp | |
e6107b29 | 935 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
936 | packet buffer. Therefore, at load time, all checks on pointers |
937 | previously done by the verifier are invalidated and must be | |
938 | performed again, if the helper is used in combination with | |
939 | direct packet access. | |
940 | .TP | |
941 | .B Return | |
942 | 0 on success, or a negative error in case of failure. | |
943 | .UNINDENT | |
944 | .TP | |
945 | .B \fBint bpf_skb_change_type(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB)\fP | |
946 | .INDENT 7.0 | |
947 | .TP | |
948 | .B Description | |
949 | Change the packet type for the packet associated to \fIskb\fP\&. This | |
950 | comes down to setting \fIskb\fP\fB\->pkt_type\fP to \fItype\fP, except | |
951 | the eBPF program does not have a write access to \fIskb\fP\fB\->pkt_type\fP beside this helper. Using a helper here allows | |
952 | for graceful handling of errors. | |
953 | .sp | |
954 | The major use case is to change incoming \fIskb*s to | |
955 | **PACKET_HOST*\fP in a programmatic way instead of having to | |
956 | recirculate via \fBredirect\fP(..., \fBBPF_F_INGRESS\fP), for | |
957 | example. | |
958 | .sp | |
959 | Note that \fItype\fP only allows certain values. At this time, they | |
960 | are: | |
961 | .INDENT 7.0 | |
962 | .TP | |
963 | .B \fBPACKET_HOST\fP | |
964 | Packet is for us. | |
965 | .TP | |
966 | .B \fBPACKET_BROADCAST\fP | |
967 | Send packet to all. | |
968 | .TP | |
969 | .B \fBPACKET_MULTICAST\fP | |
970 | Send packet to group. | |
971 | .TP | |
972 | .B \fBPACKET_OTHERHOST\fP | |
973 | Send packet to someone else. | |
974 | .UNINDENT | |
975 | .TP | |
976 | .B Return | |
977 | 0 on success, or a negative error in case of failure. | |
978 | .UNINDENT | |
979 | .TP | |
980 | .B \fBint bpf_skb_under_cgroup(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP | |
981 | .INDENT 7.0 | |
982 | .TP | |
983 | .B Description | |
984 | Check whether \fIskb\fP is a descendant of the cgroup2 held by | |
985 | \fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&. | |
986 | .TP | |
987 | .B Return | |
988 | The return value depends on the result of the test, and can be: | |
989 | .INDENT 7.0 | |
990 | .IP \(bu 2 | |
991 | 0, if the \fIskb\fP failed the cgroup2 descendant test. | |
992 | .IP \(bu 2 | |
993 | 1, if the \fIskb\fP succeeded the cgroup2 descendant test. | |
994 | .IP \(bu 2 | |
995 | A negative error code, if an error occurred. | |
996 | .UNINDENT | |
997 | .UNINDENT | |
998 | .TP | |
999 | .B \fBu32 bpf_get_hash_recalc(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1000 | .INDENT 7.0 | |
1001 | .TP | |
1002 | .B Description | |
1003 | Retrieve the hash of the packet, \fIskb\fP\fB\->hash\fP\&. If it is | |
1004 | not set, in particular if the hash was cleared due to mangling, | |
1005 | recompute this hash. Later accesses to the hash can be done | |
1006 | directly with \fIskb\fP\fB\->hash\fP\&. | |
1007 | .sp | |
1008 | Calling \fBbpf_set_hash_invalid\fP(), changing a packet | |
1009 | prototype with \fBbpf_skb_change_proto\fP(), or calling | |
1010 | \fBbpf_skb_store_bytes\fP() with the | |
1011 | \fBBPF_F_INVALIDATE_HASH\fP are actions susceptible to clear | |
1012 | the hash and to trigger a new computation for the next call to | |
1013 | \fBbpf_get_hash_recalc\fP(). | |
1014 | .TP | |
1015 | .B Return | |
1016 | The 32\-bit hash. | |
1017 | .UNINDENT | |
1018 | .TP | |
1019 | .B \fBu64 bpf_get_current_task(void)\fP | |
1020 | .INDENT 7.0 | |
1021 | .TP | |
1022 | .B Return | |
1023 | A pointer to the current task struct. | |
1024 | .UNINDENT | |
1025 | .TP | |
1026 | .B \fBint bpf_probe_write_user(void *\fP\fIdst\fP\fB, const void *\fP\fIsrc\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
1027 | .INDENT 7.0 | |
1028 | .TP | |
1029 | .B Description | |
1030 | Attempt in a safe way to write \fIlen\fP bytes from the buffer | |
1031 | \fIsrc\fP to \fIdst\fP in memory. It only works for threads that are in | |
1032 | user context, and \fIdst\fP must be a valid user space address. | |
1033 | .sp | |
1034 | This helper should not be used to implement any kind of | |
1035 | security mechanism because of TOC\-TOU attacks, but rather to | |
1036 | debug, divert, and manipulate execution of semi\-cooperative | |
1037 | processes. | |
1038 | .sp | |
1039 | Keep in mind that this feature is meant for experiments, and it | |
1040 | has a risk of crashing the system and running programs. | |
1041 | Therefore, when an eBPF program using this helper is attached, | |
1042 | a warning including PID and process name is printed to kernel | |
1043 | logs. | |
1044 | .TP | |
1045 | .B Return | |
1046 | 0 on success, or a negative error in case of failure. | |
1047 | .UNINDENT | |
1048 | .TP | |
1049 | .B \fBint bpf_current_task_under_cgroup(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIindex\fP\fB)\fP | |
1050 | .INDENT 7.0 | |
1051 | .TP | |
1052 | .B Description | |
1053 | Check whether the probe is being run is the context of a given | |
1054 | subset of the cgroup2 hierarchy. The cgroup2 to test is held by | |
1055 | \fImap\fP of type \fBBPF_MAP_TYPE_CGROUP_ARRAY\fP, at \fIindex\fP\&. | |
1056 | .TP | |
1057 | .B Return | |
1058 | The return value depends on the result of the test, and can be: | |
1059 | .INDENT 7.0 | |
1060 | .IP \(bu 2 | |
1061 | 0, if the \fIskb\fP task belongs to the cgroup2. | |
1062 | .IP \(bu 2 | |
1063 | 1, if the \fIskb\fP task does not belong to the cgroup2. | |
1064 | .IP \(bu 2 | |
1065 | A negative error code, if an error occurred. | |
1066 | .UNINDENT | |
1067 | .UNINDENT | |
1068 | .TP | |
1069 | .B \fBint bpf_skb_change_tail(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1070 | .INDENT 7.0 | |
1071 | .TP | |
1072 | .B Description | |
1073 | Resize (trim or grow) the packet associated to \fIskb\fP to the | |
1074 | new \fIlen\fP\&. The \fIflags\fP are reserved for future usage, and must | |
1075 | be left at zero. | |
1076 | .sp | |
1077 | The basic idea is that the helper performs the needed work to | |
1078 | change the size of the packet, then the eBPF program rewrites | |
1079 | the rest via helpers like \fBbpf_skb_store_bytes\fP(), | |
1080 | \fBbpf_l3_csum_replace\fP(), \fBbpf_l3_csum_replace\fP() | |
1081 | and others. This helper is a slow path utility intended for | |
1082 | replies with control messages. And because it is targeted for | |
1083 | slow path, the helper itself can afford to be slow: it | |
1084 | implicitly linearizes, unclones and drops offloads from the | |
1085 | \fIskb\fP\&. | |
1086 | .sp | |
e6107b29 | 1087 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
1088 | packet buffer. Therefore, at load time, all checks on pointers |
1089 | previously done by the verifier are invalidated and must be | |
1090 | performed again, if the helper is used in combination with | |
1091 | direct packet access. | |
1092 | .TP | |
1093 | .B Return | |
1094 | 0 on success, or a negative error in case of failure. | |
1095 | .UNINDENT | |
1096 | .TP | |
1097 | .B \fBint bpf_skb_pull_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
1098 | .INDENT 7.0 | |
1099 | .TP | |
1100 | .B Description | |
1101 | Pull in non\-linear data in case the \fIskb\fP is non\-linear and not | |
1102 | all of \fIlen\fP are part of the linear section. Make \fIlen\fP bytes | |
1103 | from \fIskb\fP readable and writable. If a zero value is passed for | |
1104 | \fIlen\fP, then the whole length of the \fIskb\fP is pulled. | |
1105 | .sp | |
1106 | This helper is only needed for reading and writing with direct | |
1107 | packet access. | |
1108 | .sp | |
1109 | For direct packet access, testing that offsets to access | |
1110 | are within packet boundaries (test on \fIskb\fP\fB\->data_end\fP) is | |
1111 | susceptible to fail if offsets are invalid, or if the requested | |
1112 | data is in non\-linear parts of the \fIskb\fP\&. On failure the | |
1113 | program can just bail out, or in the case of a non\-linear | |
1114 | buffer, use a helper to make the data available. The | |
1115 | \fBbpf_skb_load_bytes\fP() helper is a first solution to access | |
1116 | the data. Another one consists in using \fBbpf_skb_pull_data\fP | |
1117 | to pull in once the non\-linear parts, then retesting and | |
1118 | eventually access the data. | |
1119 | .sp | |
1120 | At the same time, this also makes sure the \fIskb\fP is uncloned, | |
1121 | which is a necessary condition for direct write. As this needs | |
1122 | to be an invariant for the write part only, the verifier | |
1123 | detects writes and adds a prologue that is calling | |
1124 | \fBbpf_skb_pull_data()\fP to effectively unclone the \fIskb\fP from | |
1125 | the very beginning in case it is indeed cloned. | |
1126 | .sp | |
e6107b29 | 1127 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
1128 | packet buffer. Therefore, at load time, all checks on pointers |
1129 | previously done by the verifier are invalidated and must be | |
1130 | performed again, if the helper is used in combination with | |
1131 | direct packet access. | |
1132 | .TP | |
1133 | .B Return | |
1134 | 0 on success, or a negative error in case of failure. | |
1135 | .UNINDENT | |
1136 | .TP | |
1137 | .B \fBs64 bpf_csum_update(struct sk_buff *\fP\fIskb\fP\fB, __wsum\fP \fIcsum\fP\fB)\fP | |
1138 | .INDENT 7.0 | |
1139 | .TP | |
1140 | .B Description | |
1141 | Add the checksum \fIcsum\fP into \fIskb\fP\fB\->csum\fP in case the | |
1142 | driver has supplied a checksum for the entire packet into that | |
1143 | field. Return an error otherwise. This helper is intended to be | |
1144 | used in combination with \fBbpf_csum_diff\fP(), in particular | |
1145 | when the checksum needs to be updated after data has been | |
1146 | written into the packet through direct packet access. | |
1147 | .TP | |
1148 | .B Return | |
1149 | The checksum on success, or a negative error code in case of | |
1150 | failure. | |
1151 | .UNINDENT | |
1152 | .TP | |
1153 | .B \fBvoid bpf_set_hash_invalid(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1154 | .INDENT 7.0 | |
1155 | .TP | |
1156 | .B Description | |
1157 | Invalidate the current \fIskb\fP\fB\->hash\fP\&. It can be used after | |
1158 | mangling on headers through direct packet access, in order to | |
1159 | indicate that the hash is outdated and to trigger a | |
1160 | recalculation the next time the kernel tries to access this | |
1161 | hash or when the \fBbpf_get_hash_recalc\fP() helper is called. | |
1162 | .UNINDENT | |
1163 | .TP | |
1164 | .B \fBint bpf_get_numa_node_id(void)\fP | |
1165 | .INDENT 7.0 | |
1166 | .TP | |
1167 | .B Description | |
1168 | Return the id of the current NUMA node. The primary use case | |
1169 | for this helper is the selection of sockets for the local NUMA | |
1170 | node, when the program is attached to sockets using the | |
1171 | \fBSO_ATTACH_REUSEPORT_EBPF\fP option (see also \fBsocket(7)\fP), | |
1172 | but the helper is also available to other eBPF program types, | |
1173 | similarly to \fBbpf_get_smp_processor_id\fP(). | |
1174 | .TP | |
1175 | .B Return | |
1176 | The id of current NUMA node. | |
1177 | .UNINDENT | |
1178 | .TP | |
1179 | .B \fBint bpf_skb_change_head(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1180 | .INDENT 7.0 | |
1181 | .TP | |
1182 | .B Description | |
1183 | Grows headroom of packet associated to \fIskb\fP and adjusts the | |
1184 | offset of the MAC header accordingly, adding \fIlen\fP bytes of | |
1185 | space. It automatically extends and reallocates memory as | |
1186 | required. | |
1187 | .sp | |
1188 | This helper can be used on a layer 3 \fIskb\fP to push a MAC header | |
1189 | for redirection into a layer 2 device. | |
1190 | .sp | |
1191 | All values for \fIflags\fP are reserved for future usage, and must | |
1192 | be left at zero. | |
1193 | .sp | |
e6107b29 | 1194 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
1195 | packet buffer. Therefore, at load time, all checks on pointers |
1196 | previously done by the verifier are invalidated and must be | |
1197 | performed again, if the helper is used in combination with | |
1198 | direct packet access. | |
1199 | .TP | |
1200 | .B Return | |
1201 | 0 on success, or a negative error in case of failure. | |
1202 | .UNINDENT | |
1203 | .TP | |
1204 | .B \fBint bpf_xdp_adjust_head(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP | |
1205 | .INDENT 7.0 | |
1206 | .TP | |
1207 | .B Description | |
1208 | Adjust (move) \fIxdp_md\fP\fB\->data\fP by \fIdelta\fP bytes. Note that | |
1209 | it is possible to use a negative value for \fIdelta\fP\&. This helper | |
1210 | can be used to prepare the packet for pushing or popping | |
1211 | headers. | |
1212 | .sp | |
e6107b29 | 1213 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
1214 | packet buffer. Therefore, at load time, all checks on pointers |
1215 | previously done by the verifier are invalidated and must be | |
1216 | performed again, if the helper is used in combination with | |
1217 | direct packet access. | |
1218 | .TP | |
1219 | .B Return | |
1220 | 0 on success, or a negative error in case of failure. | |
1221 | .UNINDENT | |
1222 | .TP | |
1223 | .B \fBint bpf_probe_read_str(void *\fP\fIdst\fP\fB, int\fP \fIsize\fP\fB, const void *\fP\fIunsafe_ptr\fP\fB)\fP | |
1224 | .INDENT 7.0 | |
1225 | .TP | |
1226 | .B Description | |
1227 | Copy a NUL terminated string from an unsafe address | |
1228 | \fIunsafe_ptr\fP to \fIdst\fP\&. The \fIsize\fP should include the | |
1229 | terminating NUL byte. In case the string length is smaller than | |
1230 | \fIsize\fP, the target is not padded with further NUL bytes. If the | |
1231 | string length is larger than \fIsize\fP, just \fIsize\fP\-1 bytes are | |
1232 | copied and the last byte is set to NUL. | |
1233 | .sp | |
1234 | On success, the length of the copied string is returned. This | |
1235 | makes this helper useful in tracing programs for reading | |
1236 | strings, and more importantly to get its length at runtime. See | |
1237 | the following snippet: | |
1238 | .INDENT 7.0 | |
1239 | .INDENT 3.5 | |
1240 | .sp | |
1241 | .nf | |
1242 | .ft C | |
1243 | SEC("kprobe/sys_open") | |
1244 | void bpf_sys_open(struct pt_regs *ctx) | |
e6107b29 | 1245 | { |
53666f6c MK |
1246 | char buf[PATHLEN]; // PATHLEN is defined to 256 |
1247 | int res = bpf_probe_read_str(buf, sizeof(buf), | |
1248 | ctx\->di); | |
1249 | ||
1250 | // Consume buf, for example push it to | |
1251 | // userspace via bpf_perf_event_output(); we | |
1252 | // can use res (the string length) as event | |
1253 | // size, after checking its boundaries. | |
e6107b29 | 1254 | } |
53666f6c MK |
1255 | .ft P |
1256 | .fi | |
1257 | .UNINDENT | |
1258 | .UNINDENT | |
1259 | .sp | |
1260 | In comparison, using \fBbpf_probe_read()\fP helper here instead | |
1261 | to read the string would require to estimate the length at | |
1262 | compile time, and would often result in copying more memory | |
1263 | than necessary. | |
1264 | .sp | |
1265 | Another useful use case is when parsing individual process | |
1266 | arguments or individual environment variables navigating | |
1267 | \fIcurrent\fP\fB\->mm\->arg_start\fP and \fIcurrent\fP\fB\->mm\->env_start\fP: using this helper and the return value, | |
1268 | one can quickly iterate at the right offset of the memory area. | |
1269 | .TP | |
1270 | .B Return | |
1271 | On success, the strictly positive length of the string, | |
1272 | including the trailing NUL character. On error, a negative | |
1273 | value. | |
1274 | .UNINDENT | |
1275 | .TP | |
1276 | .B \fBu64 bpf_get_socket_cookie(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1277 | .INDENT 7.0 | |
1278 | .TP | |
1279 | .B Description | |
1280 | If the \fBstruct sk_buff\fP pointed by \fIskb\fP has a known socket, | |
1281 | retrieve the cookie (generated by the kernel) of this socket. | |
1282 | If no cookie has been set yet, generate a new cookie. Once | |
1283 | generated, the socket cookie remains stable for the life of the | |
1284 | socket. This helper can be useful for monitoring per socket | |
e6107b29 MK |
1285 | networking traffic statistics as it provides a global socket |
1286 | identifier that can be assumed unique. | |
53666f6c MK |
1287 | .TP |
1288 | .B Return | |
1289 | A 8\-byte long non\-decreasing number on success, or 0 if the | |
1290 | socket field is missing inside \fIskb\fP\&. | |
1291 | .UNINDENT | |
1292 | .TP | |
1293 | .B \fBu64 bpf_get_socket_cookie(struct bpf_sock_addr *\fP\fIctx\fP\fB)\fP | |
1294 | .INDENT 7.0 | |
1295 | .TP | |
1296 | .B Description | |
1297 | Equivalent to bpf_get_socket_cookie() helper that accepts | |
e6107b29 | 1298 | \fIskb\fP, but gets socket from \fBstruct bpf_sock_addr\fP context. |
53666f6c MK |
1299 | .TP |
1300 | .B Return | |
1301 | A 8\-byte long non\-decreasing number. | |
1302 | .UNINDENT | |
1303 | .TP | |
1304 | .B \fBu64 bpf_get_socket_cookie(struct bpf_sock_ops *\fP\fIctx\fP\fB)\fP | |
1305 | .INDENT 7.0 | |
1306 | .TP | |
1307 | .B Description | |
1308 | Equivalent to bpf_get_socket_cookie() helper that accepts | |
e6107b29 | 1309 | \fIskb\fP, but gets socket from \fBstruct bpf_sock_ops\fP context. |
53666f6c MK |
1310 | .TP |
1311 | .B Return | |
1312 | A 8\-byte long non\-decreasing number. | |
1313 | .UNINDENT | |
1314 | .TP | |
1315 | .B \fBu32 bpf_get_socket_uid(struct sk_buff *\fP\fIskb\fP\fB)\fP | |
1316 | .INDENT 7.0 | |
1317 | .TP | |
1318 | .B Return | |
1319 | The owner UID of the socket associated to \fIskb\fP\&. If the socket | |
1320 | is \fBNULL\fP, or if it is not a full socket (i.e. if it is a | |
1321 | time\-wait or a request socket instead), \fBoverflowuid\fP value | |
1322 | is returned (note that \fBoverflowuid\fP might also be the actual | |
1323 | UID value for the socket). | |
1324 | .UNINDENT | |
1325 | .TP | |
1326 | .B \fBu32 bpf_set_hash(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIhash\fP\fB)\fP | |
1327 | .INDENT 7.0 | |
1328 | .TP | |
1329 | .B Description | |
1330 | Set the full hash for \fIskb\fP (set the field \fIskb\fP\fB\->hash\fP) | |
1331 | to value \fIhash\fP\&. | |
1332 | .TP | |
1333 | .B Return | |
e6107b29 | 1334 | 0 |
53666f6c MK |
1335 | .UNINDENT |
1336 | .TP | |
1337 | .B \fBint bpf_setsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP | |
1338 | .INDENT 7.0 | |
1339 | .TP | |
1340 | .B Description | |
1341 | Emulate a call to \fBsetsockopt()\fP on the socket associated to | |
1342 | \fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at | |
1343 | which the option resides and the name \fIoptname\fP of the option | |
1344 | must be specified, see \fBsetsockopt(2)\fP for more information. | |
1345 | The option value of length \fIoptlen\fP is pointed by \fIoptval\fP\&. | |
1346 | .sp | |
1347 | This helper actually implements a subset of \fBsetsockopt()\fP\&. | |
1348 | It supports the following \fIlevel\fPs: | |
1349 | .INDENT 7.0 | |
1350 | .IP \(bu 2 | |
1351 | \fBSOL_SOCKET\fP, which supports the following \fIoptname\fPs: | |
1352 | \fBSO_RCVBUF\fP, \fBSO_SNDBUF\fP, \fBSO_MAX_PACING_RATE\fP, | |
1353 | \fBSO_PRIORITY\fP, \fBSO_RCVLOWAT\fP, \fBSO_MARK\fP\&. | |
1354 | .IP \(bu 2 | |
1355 | \fBIPPROTO_TCP\fP, which supports the following \fIoptname\fPs: | |
1356 | \fBTCP_CONGESTION\fP, \fBTCP_BPF_IW\fP, | |
1357 | \fBTCP_BPF_SNDCWND_CLAMP\fP\&. | |
1358 | .IP \(bu 2 | |
1359 | \fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&. | |
1360 | .IP \(bu 2 | |
1361 | \fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&. | |
1362 | .UNINDENT | |
1363 | .TP | |
1364 | .B Return | |
1365 | 0 on success, or a negative error in case of failure. | |
1366 | .UNINDENT | |
1367 | .TP | |
2223d7df | 1368 | .B \fBint bpf_skb_adjust_room(struct sk_buff *\fP\fIskb\fP\fB, s32\fP \fIlen_diff\fP\fB, u32\fP \fImode\fP\fB, u64\fP \fIflags\fP\fB)\fP |
53666f6c MK |
1369 | .INDENT 7.0 |
1370 | .TP | |
1371 | .B Description | |
1372 | Grow or shrink the room for data in the packet associated to | |
1373 | \fIskb\fP by \fIlen_diff\fP, and according to the selected \fImode\fP\&. | |
1374 | .sp | |
e6107b29 | 1375 | There are two supported modes at this time: |
53666f6c MK |
1376 | .INDENT 7.0 |
1377 | .IP \(bu 2 | |
e6107b29 MK |
1378 | \fBBPF_ADJ_ROOM_MAC\fP: Adjust room at the mac layer |
1379 | (room space is added or removed below the layer 2 header). | |
1380 | .IP \(bu 2 | |
53666f6c MK |
1381 | \fBBPF_ADJ_ROOM_NET\fP: Adjust room at the network layer |
1382 | (room space is added or removed below the layer 3 header). | |
1383 | .UNINDENT | |
1384 | .sp | |
e6107b29 MK |
1385 | The following flags are supported at this time: |
1386 | .INDENT 7.0 | |
1387 | .IP \(bu 2 | |
1388 | \fBBPF_F_ADJ_ROOM_FIXED_GSO\fP: Do not adjust gso_size. | |
1389 | Adjusting mss in this way is not allowed for datagrams. | |
1390 | .IP \(bu 2 | |
1391 | \fBBPF_F_ADJ_ROOM_ENCAP_L3_IPV4\fP, | |
1392 | \fBBPF_F_ADJ_ROOM_ENCAP_L3_IPV6\fP: | |
1393 | Any new space is reserved to hold a tunnel header. | |
1394 | Configure skb offsets and other fields accordingly. | |
1395 | .IP \(bu 2 | |
1396 | \fBBPF_F_ADJ_ROOM_ENCAP_L4_GRE\fP, | |
1397 | \fBBPF_F_ADJ_ROOM_ENCAP_L4_UDP\fP: | |
1398 | Use with ENCAP_L3 flags to further specify the tunnel type. | |
1399 | .IP \(bu 2 | |
1400 | \fBBPF_F_ADJ_ROOM_ENCAP_L2\fP(\fIlen\fP): | |
1401 | Use with ENCAP_L3/L4 flags to further specify the tunnel | |
1402 | type; \fIlen\fP is the length of the inner MAC header. | |
1403 | .UNINDENT | |
53666f6c | 1404 | .sp |
e6107b29 | 1405 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
1406 | packet buffer. Therefore, at load time, all checks on pointers |
1407 | previously done by the verifier are invalidated and must be | |
1408 | performed again, if the helper is used in combination with | |
1409 | direct packet access. | |
1410 | .TP | |
1411 | .B Return | |
1412 | 0 on success, or a negative error in case of failure. | |
1413 | .UNINDENT | |
1414 | .TP | |
1415 | .B \fBint bpf_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1416 | .INDENT 7.0 | |
1417 | .TP | |
1418 | .B Description | |
1419 | Redirect the packet to the endpoint referenced by \fImap\fP at | |
1420 | index \fIkey\fP\&. Depending on its type, this \fImap\fP can contain | |
1421 | references to net devices (for forwarding packets through other | |
1422 | ports), or to CPUs (for redirecting XDP frames to another CPU; | |
1423 | but this is only implemented for native XDP (with driver | |
1424 | support) as of this writing). | |
1425 | .sp | |
e6107b29 MK |
1426 | The lower two bits of \fIflags\fP are used as the return code if |
1427 | the map lookup fails. This is so that the return value can be | |
1428 | one of the XDP program return codes up to XDP_TX, as chosen by | |
1429 | the caller. Any higher bits in the \fIflags\fP argument must be | |
1430 | unset. | |
53666f6c MK |
1431 | .sp |
1432 | When used to redirect packets to net devices, this helper | |
1433 | provides a high performance increase over \fBbpf_redirect\fP(). | |
1434 | This is due to various implementation details of the underlying | |
1435 | mechanisms, one of which is the fact that \fBbpf_redirect_map\fP() tries to send packet as a "bulk" to the device. | |
1436 | .TP | |
1437 | .B Return | |
1438 | \fBXDP_REDIRECT\fP on success, or \fBXDP_ABORTED\fP on error. | |
1439 | .UNINDENT | |
1440 | .TP | |
1441 | .B \fBint bpf_sk_redirect_map(struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1442 | .INDENT 7.0 | |
1443 | .TP | |
1444 | .B Description | |
1445 | Redirect the packet to the socket referenced by \fImap\fP (of type | |
1446 | \fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and | |
1447 | egress interfaces can be used for redirection. The | |
1448 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
1449 | distinction (ingress path is selected if the flag is present, | |
1450 | egress path otherwise). This is the only flag supported for now. | |
1451 | .TP | |
1452 | .B Return | |
1453 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
1454 | .UNINDENT | |
1455 | .TP | |
1456 | .B \fBint bpf_sock_map_update(struct bpf_sock_ops *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1457 | .INDENT 7.0 | |
1458 | .TP | |
1459 | .B Description | |
1460 | Add an entry to, or update a \fImap\fP referencing sockets. The | |
1461 | \fIskops\fP is used as a new value for the entry associated to | |
1462 | \fIkey\fP\&. \fIflags\fP is one of: | |
1463 | .INDENT 7.0 | |
1464 | .TP | |
1465 | .B \fBBPF_NOEXIST\fP | |
1466 | The entry for \fIkey\fP must not exist in the map. | |
1467 | .TP | |
1468 | .B \fBBPF_EXIST\fP | |
1469 | The entry for \fIkey\fP must already exist in the map. | |
1470 | .TP | |
1471 | .B \fBBPF_ANY\fP | |
1472 | No condition on the existence of the entry for \fIkey\fP\&. | |
1473 | .UNINDENT | |
1474 | .sp | |
1475 | If the \fImap\fP has eBPF programs (parser and verdict), those will | |
1476 | be inherited by the socket being added. If the socket is | |
1477 | already attached to eBPF programs, this results in an error. | |
1478 | .TP | |
1479 | .B Return | |
1480 | 0 on success, or a negative error in case of failure. | |
1481 | .UNINDENT | |
1482 | .TP | |
1483 | .B \fBint bpf_xdp_adjust_meta(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP | |
1484 | .INDENT 7.0 | |
1485 | .TP | |
1486 | .B Description | |
1487 | Adjust the address pointed by \fIxdp_md\fP\fB\->data_meta\fP by | |
1488 | \fIdelta\fP (which can be positive or negative). Note that this | |
1489 | operation modifies the address stored in \fIxdp_md\fP\fB\->data\fP, | |
1490 | so the latter must be loaded only after the helper has been | |
1491 | called. | |
1492 | .sp | |
1493 | The use of \fIxdp_md\fP\fB\->data_meta\fP is optional and programs | |
1494 | are not required to use it. The rationale is that when the | |
1495 | packet is processed with XDP (e.g. as DoS filter), it is | |
1496 | possible to push further meta data along with it before passing | |
1497 | to the stack, and to give the guarantee that an ingress eBPF | |
1498 | program attached as a TC classifier on the same device can pick | |
1499 | this up for further post\-processing. Since TC works with socket | |
1500 | buffers, it remains possible to set from XDP the \fBmark\fP or | |
1501 | \fBpriority\fP pointers, or other pointers for the socket buffer. | |
1502 | Having this scratch space generic and programmable allows for | |
1503 | more flexibility as the user is free to store whatever meta | |
1504 | data they need. | |
1505 | .sp | |
e6107b29 | 1506 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
1507 | packet buffer. Therefore, at load time, all checks on pointers |
1508 | previously done by the verifier are invalidated and must be | |
1509 | performed again, if the helper is used in combination with | |
1510 | direct packet access. | |
1511 | .TP | |
1512 | .B Return | |
1513 | 0 on success, or a negative error in case of failure. | |
1514 | .UNINDENT | |
1515 | .TP | |
1516 | .B \fBint bpf_perf_event_read_value(struct bpf_map *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP | |
1517 | .INDENT 7.0 | |
1518 | .TP | |
1519 | .B Description | |
1520 | Read the value of a perf event counter, and store it into \fIbuf\fP | |
1521 | of size \fIbuf_size\fP\&. This helper relies on a \fImap\fP of type | |
1522 | \fBBPF_MAP_TYPE_PERF_EVENT_ARRAY\fP\&. The nature of the perf event | |
1523 | counter is selected when \fImap\fP is updated with perf event file | |
1524 | descriptors. The \fImap\fP is an array whose size is the number of | |
1525 | available CPUs, and each cell contains a value relative to one | |
1526 | CPU. The value to retrieve is indicated by \fIflags\fP, that | |
1527 | contains the index of the CPU to look up, masked with | |
1528 | \fBBPF_F_INDEX_MASK\fP\&. Alternatively, \fIflags\fP can be set to | |
1529 | \fBBPF_F_CURRENT_CPU\fP to indicate that the value for the | |
1530 | current CPU should be retrieved. | |
1531 | .sp | |
1532 | This helper behaves in a way close to | |
1533 | \fBbpf_perf_event_read\fP() helper, save that instead of | |
1534 | just returning the value observed, it fills the \fIbuf\fP | |
1535 | structure. This allows for additional data to be retrieved: in | |
1536 | particular, the enabled and running times (in \fIbuf\fP\fB\->enabled\fP and \fIbuf\fP\fB\->running\fP, respectively) are | |
1537 | copied. In general, \fBbpf_perf_event_read_value\fP() is | |
1538 | recommended over \fBbpf_perf_event_read\fP(), which has some | |
1539 | ABI issues and provides fewer functionalities. | |
1540 | .sp | |
1541 | These values are interesting, because hardware PMU (Performance | |
1542 | Monitoring Unit) counters are limited resources. When there are | |
1543 | more PMU based perf events opened than available counters, | |
1544 | kernel will multiplex these events so each event gets certain | |
1545 | percentage (but not all) of the PMU time. In case that | |
1546 | multiplexing happens, the number of samples or counter value | |
1547 | will not reflect the case compared to when no multiplexing | |
1548 | occurs. This makes comparison between different runs difficult. | |
1549 | Typically, the counter value should be normalized before | |
1550 | comparing to other experiments. The usual normalization is done | |
1551 | as follows. | |
1552 | .INDENT 7.0 | |
1553 | .INDENT 3.5 | |
1554 | .sp | |
1555 | .nf | |
1556 | .ft C | |
1557 | normalized_counter = counter * t_enabled / t_running | |
1558 | .ft P | |
1559 | .fi | |
1560 | .UNINDENT | |
1561 | .UNINDENT | |
1562 | .sp | |
1563 | Where t_enabled is the time enabled for event and t_running is | |
1564 | the time running for event since last normalization. The | |
1565 | enabled and running times are accumulated since the perf event | |
1566 | open. To achieve scaling factor between two invocations of an | |
1567 | eBPF program, users can can use CPU id as the key (which is | |
1568 | typical for perf array usage model) to remember the previous | |
1569 | value and do the calculation inside the eBPF program. | |
1570 | .TP | |
1571 | .B Return | |
1572 | 0 on success, or a negative error in case of failure. | |
1573 | .UNINDENT | |
1574 | .TP | |
1575 | .B \fBint bpf_perf_prog_read_value(struct bpf_perf_event_data *\fP\fIctx\fP\fB, struct bpf_perf_event_value *\fP\fIbuf\fP\fB, u32\fP \fIbuf_size\fP\fB)\fP | |
1576 | .INDENT 7.0 | |
1577 | .TP | |
1578 | .B Description | |
1579 | For en eBPF program attached to a perf event, retrieve the | |
1580 | value of the event counter associated to \fIctx\fP and store it in | |
1581 | the structure pointed by \fIbuf\fP and of size \fIbuf_size\fP\&. Enabled | |
1582 | and running times are also stored in the structure (see | |
1583 | description of helper \fBbpf_perf_event_read_value\fP() for | |
1584 | more details). | |
1585 | .TP | |
1586 | .B Return | |
1587 | 0 on success, or a negative error in case of failure. | |
1588 | .UNINDENT | |
1589 | .TP | |
1590 | .B \fBint bpf_getsockopt(struct bpf_sock_ops *\fP\fIbpf_socket\fP\fB, int\fP \fIlevel\fP\fB, int\fP \fIoptname\fP\fB, char *\fP\fIoptval\fP\fB, int\fP \fIoptlen\fP\fB)\fP | |
1591 | .INDENT 7.0 | |
1592 | .TP | |
1593 | .B Description | |
1594 | Emulate a call to \fBgetsockopt()\fP on the socket associated to | |
1595 | \fIbpf_socket\fP, which must be a full socket. The \fIlevel\fP at | |
1596 | which the option resides and the name \fIoptname\fP of the option | |
1597 | must be specified, see \fBgetsockopt(2)\fP for more information. | |
1598 | The retrieved value is stored in the structure pointed by | |
1599 | \fIopval\fP and of length \fIoptlen\fP\&. | |
1600 | .sp | |
1601 | This helper actually implements a subset of \fBgetsockopt()\fP\&. | |
1602 | It supports the following \fIlevel\fPs: | |
1603 | .INDENT 7.0 | |
1604 | .IP \(bu 2 | |
1605 | \fBIPPROTO_TCP\fP, which supports \fIoptname\fP | |
1606 | \fBTCP_CONGESTION\fP\&. | |
1607 | .IP \(bu 2 | |
1608 | \fBIPPROTO_IP\fP, which supports \fIoptname\fP \fBIP_TOS\fP\&. | |
1609 | .IP \(bu 2 | |
1610 | \fBIPPROTO_IPV6\fP, which supports \fIoptname\fP \fBIPV6_TCLASS\fP\&. | |
1611 | .UNINDENT | |
1612 | .TP | |
1613 | .B Return | |
1614 | 0 on success, or a negative error in case of failure. | |
1615 | .UNINDENT | |
1616 | .TP | |
e6107b29 | 1617 | .B \fBint bpf_override_return(struct pt_regs *\fP\fIregs\fP\fB, u64\fP \fIrc\fP\fB)\fP |
53666f6c MK |
1618 | .INDENT 7.0 |
1619 | .TP | |
1620 | .B Description | |
1621 | Used for error injection, this helper uses kprobes to override | |
1622 | the return value of the probed function, and to set it to \fIrc\fP\&. | |
1623 | The first argument is the context \fIregs\fP on which the kprobe | |
1624 | works. | |
1625 | .sp | |
1626 | This helper works by setting setting the PC (program counter) | |
1627 | to an override function which is run in place of the original | |
1628 | probed function. This means the probed function is not run at | |
1629 | all. The replacement function just returns with the required | |
1630 | value. | |
1631 | .sp | |
1632 | This helper has security implications, and thus is subject to | |
1633 | restrictions. It is only available if the kernel was compiled | |
1634 | with the \fBCONFIG_BPF_KPROBE_OVERRIDE\fP configuration | |
1635 | option, and in this case it only works on functions tagged with | |
1636 | \fBALLOW_ERROR_INJECTION\fP in the kernel code. | |
1637 | .sp | |
1638 | Also, the helper is only available for the architectures having | |
1639 | the CONFIG_FUNCTION_ERROR_INJECTION option. As of this writing, | |
1640 | x86 architecture is the only one to support this feature. | |
1641 | .TP | |
1642 | .B Return | |
e6107b29 | 1643 | 0 |
53666f6c MK |
1644 | .UNINDENT |
1645 | .TP | |
1646 | .B \fBint bpf_sock_ops_cb_flags_set(struct bpf_sock_ops *\fP\fIbpf_sock\fP\fB, int\fP \fIargval\fP\fB)\fP | |
1647 | .INDENT 7.0 | |
1648 | .TP | |
1649 | .B Description | |
1650 | Attempt to set the value of the \fBbpf_sock_ops_cb_flags\fP field | |
1651 | for the full TCP socket associated to \fIbpf_sock_ops\fP to | |
1652 | \fIargval\fP\&. | |
1653 | .sp | |
1654 | The primary use of this field is to determine if there should | |
1655 | be calls to eBPF programs of type | |
1656 | \fBBPF_PROG_TYPE_SOCK_OPS\fP at various points in the TCP | |
1657 | code. A program of the same type can change its value, per | |
1658 | connection and as necessary, when the connection is | |
1659 | established. This field is directly accessible for reading, but | |
1660 | this helper must be used for updates in order to return an | |
1661 | error if an eBPF program tries to set a callback that is not | |
1662 | supported in the current kernel. | |
1663 | .sp | |
e6107b29 | 1664 | \fIargval\fP is a flag array which can combine these flags: |
53666f6c MK |
1665 | .INDENT 7.0 |
1666 | .IP \(bu 2 | |
1667 | \fBBPF_SOCK_OPS_RTO_CB_FLAG\fP (retransmission time out) | |
1668 | .IP \(bu 2 | |
1669 | \fBBPF_SOCK_OPS_RETRANS_CB_FLAG\fP (retransmission) | |
1670 | .IP \(bu 2 | |
1671 | \fBBPF_SOCK_OPS_STATE_CB_FLAG\fP (TCP state change) | |
e6107b29 MK |
1672 | .IP \(bu 2 |
1673 | \fBBPF_SOCK_OPS_RTT_CB_FLAG\fP (every RTT) | |
1674 | .UNINDENT | |
1675 | .sp | |
1676 | Therefore, this function can be used to clear a callback flag by | |
1677 | setting the appropriate bit to zero. e.g. to disable the RTO | |
1678 | callback: | |
1679 | .INDENT 7.0 | |
1680 | .TP | |
1681 | .B \fBbpf_sock_ops_cb_flags_set(bpf_sock,\fP | |
1682 | \fBbpf_sock\->bpf_sock_ops_cb_flags & ~BPF_SOCK_OPS_RTO_CB_FLAG)\fP | |
53666f6c MK |
1683 | .UNINDENT |
1684 | .sp | |
1685 | Here are some examples of where one could call such eBPF | |
1686 | program: | |
1687 | .INDENT 7.0 | |
1688 | .IP \(bu 2 | |
1689 | When RTO fires. | |
1690 | .IP \(bu 2 | |
1691 | When a packet is retransmitted. | |
1692 | .IP \(bu 2 | |
1693 | When the connection terminates. | |
1694 | .IP \(bu 2 | |
1695 | When a packet is sent. | |
1696 | .IP \(bu 2 | |
1697 | When a packet is received. | |
1698 | .UNINDENT | |
1699 | .TP | |
1700 | .B Return | |
1701 | Code \fB\-EINVAL\fP if the socket is not a full TCP socket; | |
1702 | otherwise, a positive number containing the bits that could not | |
1703 | be set is returned (which comes down to 0 if all bits were set | |
1704 | as required). | |
1705 | .UNINDENT | |
1706 | .TP | |
1707 | .B \fBint bpf_msg_redirect_map(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, u32\fP \fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1708 | .INDENT 7.0 | |
1709 | .TP | |
1710 | .B Description | |
1711 | This helper is used in programs implementing policies at the | |
1712 | socket level. If the message \fImsg\fP is allowed to pass (i.e. if | |
1713 | the verdict eBPF program returns \fBSK_PASS\fP), redirect it to | |
1714 | the socket referenced by \fImap\fP (of type | |
1715 | \fBBPF_MAP_TYPE_SOCKMAP\fP) at index \fIkey\fP\&. Both ingress and | |
1716 | egress interfaces can be used for redirection. The | |
1717 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
1718 | distinction (ingress path is selected if the flag is present, | |
1719 | egress path otherwise). This is the only flag supported for now. | |
1720 | .TP | |
1721 | .B Return | |
1722 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
1723 | .UNINDENT | |
1724 | .TP | |
1725 | .B \fBint bpf_msg_apply_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP | |
1726 | .INDENT 7.0 | |
1727 | .TP | |
1728 | .B Description | |
1729 | For socket policies, apply the verdict of the eBPF program to | |
1730 | the next \fIbytes\fP (number of bytes) of message \fImsg\fP\&. | |
1731 | .sp | |
1732 | For example, this helper can be used in the following cases: | |
1733 | .INDENT 7.0 | |
1734 | .IP \(bu 2 | |
1735 | A single \fBsendmsg\fP() or \fBsendfile\fP() system call | |
1736 | contains multiple logical messages that the eBPF program is | |
1737 | supposed to read and for which it should apply a verdict. | |
1738 | .IP \(bu 2 | |
1739 | An eBPF program only cares to read the first \fIbytes\fP of a | |
1740 | \fImsg\fP\&. If the message has a large payload, then setting up | |
1741 | and calling the eBPF program repeatedly for all bytes, even | |
1742 | though the verdict is already known, would create unnecessary | |
1743 | overhead. | |
1744 | .UNINDENT | |
1745 | .sp | |
1746 | When called from within an eBPF program, the helper sets a | |
1747 | counter internal to the BPF infrastructure, that is used to | |
1748 | apply the last verdict to the next \fIbytes\fP\&. If \fIbytes\fP is | |
1749 | smaller than the current data being processed from a | |
1750 | \fBsendmsg\fP() or \fBsendfile\fP() system call, the first | |
1751 | \fIbytes\fP will be sent and the eBPF program will be re\-run with | |
1752 | the pointer for start of data pointing to byte number \fIbytes\fP | |
1753 | \fB+ 1\fP\&. If \fIbytes\fP is larger than the current data being | |
1754 | processed, then the eBPF verdict will be applied to multiple | |
1755 | \fBsendmsg\fP() or \fBsendfile\fP() calls until \fIbytes\fP are | |
1756 | consumed. | |
1757 | .sp | |
1758 | Note that if a socket closes with the internal counter holding | |
1759 | a non\-zero value, this is not a problem because data is not | |
1760 | being buffered for \fIbytes\fP and is sent as it is received. | |
1761 | .TP | |
1762 | .B Return | |
e6107b29 | 1763 | 0 |
53666f6c MK |
1764 | .UNINDENT |
1765 | .TP | |
1766 | .B \fBint bpf_msg_cork_bytes(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIbytes\fP\fB)\fP | |
1767 | .INDENT 7.0 | |
1768 | .TP | |
1769 | .B Description | |
1770 | For socket policies, prevent the execution of the verdict eBPF | |
1771 | program for message \fImsg\fP until \fIbytes\fP (byte number) have been | |
1772 | accumulated. | |
1773 | .sp | |
1774 | This can be used when one needs a specific number of bytes | |
1775 | before a verdict can be assigned, even if the data spans | |
1776 | multiple \fBsendmsg\fP() or \fBsendfile\fP() calls. The extreme | |
1777 | case would be a user calling \fBsendmsg\fP() repeatedly with | |
1778 | 1\-byte long message segments. Obviously, this is bad for | |
1779 | performance, but it is still valid. If the eBPF program needs | |
1780 | \fIbytes\fP bytes to validate a header, this helper can be used to | |
1781 | prevent the eBPF program to be called again until \fIbytes\fP have | |
1782 | been accumulated. | |
1783 | .TP | |
1784 | .B Return | |
e6107b29 | 1785 | 0 |
53666f6c MK |
1786 | .UNINDENT |
1787 | .TP | |
1788 | .B \fBint bpf_msg_pull_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIend\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1789 | .INDENT 7.0 | |
1790 | .TP | |
1791 | .B Description | |
1792 | For socket policies, pull in non\-linear data from user space | |
1793 | for \fImsg\fP and set pointers \fImsg\fP\fB\->data\fP and \fImsg\fP\fB\->data_end\fP to \fIstart\fP and \fIend\fP bytes offsets into \fImsg\fP, | |
1794 | respectively. | |
1795 | .sp | |
1796 | If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a | |
1797 | \fImsg\fP it can only parse data that the (\fBdata\fP, \fBdata_end\fP) | |
1798 | pointers have already consumed. For \fBsendmsg\fP() hooks this | |
1799 | is likely the first scatterlist element. But for calls relying | |
1800 | on the \fBsendpage\fP handler (e.g. \fBsendfile\fP()) this will | |
1801 | be the range (\fB0\fP, \fB0\fP) because the data is shared with | |
1802 | user space and by default the objective is to avoid allowing | |
1803 | user space to modify data while (or after) eBPF verdict is | |
1804 | being decided. This helper can be used to pull in data and to | |
1805 | set the start and end pointer to given values. Data will be | |
1806 | copied if necessary (i.e. if data was not linear and if start | |
1807 | and end pointers do not point to the same chunk). | |
1808 | .sp | |
e6107b29 | 1809 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
1810 | packet buffer. Therefore, at load time, all checks on pointers |
1811 | previously done by the verifier are invalidated and must be | |
1812 | performed again, if the helper is used in combination with | |
1813 | direct packet access. | |
1814 | .sp | |
1815 | All values for \fIflags\fP are reserved for future usage, and must | |
1816 | be left at zero. | |
1817 | .TP | |
1818 | .B Return | |
1819 | 0 on success, or a negative error in case of failure. | |
1820 | .UNINDENT | |
1821 | .TP | |
1822 | .B \fBint bpf_bind(struct bpf_sock_addr *\fP\fIctx\fP\fB, struct sockaddr *\fP\fIaddr\fP\fB, int\fP \fIaddr_len\fP\fB)\fP | |
1823 | .INDENT 7.0 | |
1824 | .TP | |
1825 | .B Description | |
1826 | Bind the socket associated to \fIctx\fP to the address pointed by | |
1827 | \fIaddr\fP, of length \fIaddr_len\fP\&. This allows for making outgoing | |
1828 | connection from the desired IP address, which can be useful for | |
1829 | example when all processes inside a cgroup should use one | |
1830 | single IP address on a host that has multiple IP configured. | |
1831 | .sp | |
1832 | This helper works for IPv4 and IPv6, TCP and UDP sockets. The | |
1833 | domain (\fIaddr\fP\fB\->sa_family\fP) must be \fBAF_INET\fP (or | |
1834 | \fBAF_INET6\fP). Looking for a free port to bind to can be | |
1835 | expensive, therefore binding to port is not permitted by the | |
1836 | helper: \fIaddr\fP\fB\->sin_port\fP (or \fBsin6_port\fP, respectively) | |
1837 | must be set to zero. | |
1838 | .TP | |
1839 | .B Return | |
1840 | 0 on success, or a negative error in case of failure. | |
1841 | .UNINDENT | |
1842 | .TP | |
1843 | .B \fBint bpf_xdp_adjust_tail(struct xdp_buff *\fP\fIxdp_md\fP\fB, int\fP \fIdelta\fP\fB)\fP | |
1844 | .INDENT 7.0 | |
1845 | .TP | |
1846 | .B Description | |
1847 | Adjust (move) \fIxdp_md\fP\fB\->data_end\fP by \fIdelta\fP bytes. It is | |
1848 | only possible to shrink the packet as of this writing, | |
1849 | therefore \fIdelta\fP must be a negative integer. | |
1850 | .sp | |
e6107b29 | 1851 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
1852 | packet buffer. Therefore, at load time, all checks on pointers |
1853 | previously done by the verifier are invalidated and must be | |
1854 | performed again, if the helper is used in combination with | |
1855 | direct packet access. | |
1856 | .TP | |
1857 | .B Return | |
1858 | 0 on success, or a negative error in case of failure. | |
1859 | .UNINDENT | |
1860 | .TP | |
1861 | .B \fBint bpf_skb_get_xfrm_state(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIindex\fP\fB, struct bpf_xfrm_state *\fP\fIxfrm_state\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1862 | .INDENT 7.0 | |
1863 | .TP | |
1864 | .B Description | |
1865 | Retrieve the XFRM state (IP transform framework, see also | |
1866 | \fBip\-xfrm(8)\fP) at \fIindex\fP in XFRM "security path" for \fIskb\fP\&. | |
1867 | .sp | |
1868 | The retrieved value is stored in the \fBstruct bpf_xfrm_state\fP | |
1869 | pointed by \fIxfrm_state\fP and of length \fIsize\fP\&. | |
1870 | .sp | |
1871 | All values for \fIflags\fP are reserved for future usage, and must | |
1872 | be left at zero. | |
1873 | .sp | |
1874 | This helper is available only if the kernel was compiled with | |
1875 | \fBCONFIG_XFRM\fP configuration option. | |
1876 | .TP | |
1877 | .B Return | |
1878 | 0 on success, or a negative error in case of failure. | |
1879 | .UNINDENT | |
1880 | .TP | |
1881 | .B \fBint bpf_get_stack(struct pt_regs *\fP\fIregs\fP\fB, void *\fP\fIbuf\fP\fB, u32\fP \fIsize\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
1882 | .INDENT 7.0 | |
1883 | .TP | |
1884 | .B Description | |
1885 | Return a user or a kernel stack in bpf program provided buffer. | |
1886 | To achieve this, the helper needs \fIctx\fP, which is a pointer | |
1887 | to the context on which the tracing program is executed. | |
1888 | To store the stacktrace, the bpf program provides \fIbuf\fP with | |
1889 | a nonnegative \fIsize\fP\&. | |
1890 | .sp | |
1891 | The last argument, \fIflags\fP, holds the number of stack frames to | |
1892 | skip (from 0 to 255), masked with | |
1893 | \fBBPF_F_SKIP_FIELD_MASK\fP\&. The next bits can be used to set | |
1894 | the following flags: | |
1895 | .INDENT 7.0 | |
1896 | .TP | |
1897 | .B \fBBPF_F_USER_STACK\fP | |
1898 | Collect a user space stack instead of a kernel stack. | |
1899 | .TP | |
1900 | .B \fBBPF_F_USER_BUILD_ID\fP | |
1901 | Collect buildid+offset instead of ips for user stack, | |
1902 | only valid if \fBBPF_F_USER_STACK\fP is also specified. | |
1903 | .UNINDENT | |
1904 | .sp | |
1905 | \fBbpf_get_stack\fP() can collect up to | |
1906 | \fBPERF_MAX_STACK_DEPTH\fP both kernel and user frames, subject | |
1907 | to sufficient large buffer size. Note that | |
1908 | this limit can be controlled with the \fBsysctl\fP program, and | |
1909 | that it should be manually increased in order to profile long | |
1910 | user stacks (such as stacks for Java programs). To do so, use: | |
1911 | .INDENT 7.0 | |
1912 | .INDENT 3.5 | |
1913 | .sp | |
1914 | .nf | |
1915 | .ft C | |
1916 | # sysctl kernel.perf_event_max_stack=<new value> | |
1917 | .ft P | |
1918 | .fi | |
1919 | .UNINDENT | |
1920 | .UNINDENT | |
1921 | .TP | |
1922 | .B Return | |
1923 | A non\-negative value equal to or less than \fIsize\fP on success, | |
1924 | or a negative error in case of failure. | |
1925 | .UNINDENT | |
1926 | .TP | |
1927 | .B \fBint bpf_skb_load_bytes_relative(const struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, void *\fP\fIto\fP\fB, u32\fP \fIlen\fP\fB, u32\fP \fIstart_header\fP\fB)\fP | |
1928 | .INDENT 7.0 | |
1929 | .TP | |
1930 | .B Description | |
1931 | This helper is similar to \fBbpf_skb_load_bytes\fP() in that | |
1932 | it provides an easy way to load \fIlen\fP bytes from \fIoffset\fP | |
1933 | from the packet associated to \fIskb\fP, into the buffer pointed | |
1934 | by \fIto\fP\&. The difference to \fBbpf_skb_load_bytes\fP() is that | |
1935 | a fifth argument \fIstart_header\fP exists in order to select a | |
1936 | base offset to start from. \fIstart_header\fP can be one of: | |
1937 | .INDENT 7.0 | |
1938 | .TP | |
1939 | .B \fBBPF_HDR_START_MAC\fP | |
1940 | Base offset to load data from is \fIskb\fP\(aqs mac header. | |
1941 | .TP | |
1942 | .B \fBBPF_HDR_START_NET\fP | |
1943 | Base offset to load data from is \fIskb\fP\(aqs network header. | |
1944 | .UNINDENT | |
1945 | .sp | |
1946 | In general, "direct packet access" is the preferred method to | |
1947 | access packet data, however, this helper is in particular useful | |
1948 | in socket filters where \fIskb\fP\fB\->data\fP does not always point | |
1949 | to the start of the mac header and where "direct packet access" | |
1950 | is not available. | |
1951 | .TP | |
1952 | .B Return | |
1953 | 0 on success, or a negative error in case of failure. | |
1954 | .UNINDENT | |
1955 | .TP | |
1956 | .B \fBint bpf_fib_lookup(void *\fP\fIctx\fP\fB, struct bpf_fib_lookup *\fP\fIparams\fP\fB, int\fP \fIplen\fP\fB, u32\fP \fIflags\fP\fB)\fP | |
1957 | .INDENT 7.0 | |
1958 | .TP | |
1959 | .B Description | |
1960 | Do FIB lookup in kernel tables using parameters in \fIparams\fP\&. | |
1961 | If lookup is successful and result shows packet is to be | |
1962 | forwarded, the neighbor tables are searched for the nexthop. | |
1963 | If successful (ie., FIB lookup shows forwarding and nexthop | |
1964 | is resolved), the nexthop address is returned in ipv4_dst | |
1965 | or ipv6_dst based on family, smac is set to mac address of | |
1966 | egress device, dmac is set to nexthop mac address, rt_metric | |
1967 | is set to metric from route (IPv4/IPv6 only), and ifindex | |
1968 | is set to the device index of the nexthop from the FIB lookup. | |
1969 | .sp | |
1970 | \fIplen\fP argument is the size of the passed in struct. | |
1971 | \fIflags\fP argument can be a combination of one or more of the | |
1972 | following values: | |
1973 | .INDENT 7.0 | |
1974 | .TP | |
1975 | .B \fBBPF_FIB_LOOKUP_DIRECT\fP | |
1976 | Do a direct table lookup vs full lookup using FIB | |
1977 | rules. | |
1978 | .TP | |
1979 | .B \fBBPF_FIB_LOOKUP_OUTPUT\fP | |
1980 | Perform lookup from an egress perspective (default is | |
1981 | ingress). | |
1982 | .UNINDENT | |
1983 | .sp | |
1984 | \fIctx\fP is either \fBstruct xdp_md\fP for XDP programs or | |
1985 | \fBstruct sk_buff\fP tc cls_act programs. | |
1986 | .TP | |
1987 | .B Return | |
1988 | .INDENT 7.0 | |
1989 | .IP \(bu 2 | |
1990 | < 0 if any input argument is invalid | |
1991 | .IP \(bu 2 | |
1992 | 0 on success (packet is forwarded, nexthop neighbor exists) | |
1993 | .IP \(bu 2 | |
1994 | > 0 one of \fBBPF_FIB_LKUP_RET_\fP codes explaining why the | |
1995 | packet is not forwarded or needs assist from full stack | |
1996 | .UNINDENT | |
1997 | .UNINDENT | |
1998 | .TP | |
1999 | .B \fBint bpf_sock_hash_update(struct bpf_sock_ops_kern *\fP\fIskops\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2000 | .INDENT 7.0 | |
2001 | .TP | |
2002 | .B Description | |
2003 | Add an entry to, or update a sockhash \fImap\fP referencing sockets. | |
2004 | The \fIskops\fP is used as a new value for the entry associated to | |
2005 | \fIkey\fP\&. \fIflags\fP is one of: | |
2006 | .INDENT 7.0 | |
2007 | .TP | |
2008 | .B \fBBPF_NOEXIST\fP | |
2009 | The entry for \fIkey\fP must not exist in the map. | |
2010 | .TP | |
2011 | .B \fBBPF_EXIST\fP | |
2012 | The entry for \fIkey\fP must already exist in the map. | |
2013 | .TP | |
2014 | .B \fBBPF_ANY\fP | |
2015 | No condition on the existence of the entry for \fIkey\fP\&. | |
2016 | .UNINDENT | |
2017 | .sp | |
2018 | If the \fImap\fP has eBPF programs (parser and verdict), those will | |
2019 | be inherited by the socket being added. If the socket is | |
2020 | already attached to eBPF programs, this results in an error. | |
2021 | .TP | |
2022 | .B Return | |
2023 | 0 on success, or a negative error in case of failure. | |
2024 | .UNINDENT | |
2025 | .TP | |
2026 | .B \fBint bpf_msg_redirect_hash(struct sk_msg_buff *\fP\fImsg\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2027 | .INDENT 7.0 | |
2028 | .TP | |
2029 | .B Description | |
2030 | This helper is used in programs implementing policies at the | |
2031 | socket level. If the message \fImsg\fP is allowed to pass (i.e. if | |
2032 | the verdict eBPF program returns \fBSK_PASS\fP), redirect it to | |
2033 | the socket referenced by \fImap\fP (of type | |
2034 | \fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and | |
2035 | egress interfaces can be used for redirection. The | |
2036 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
2037 | distinction (ingress path is selected if the flag is present, | |
2038 | egress path otherwise). This is the only flag supported for now. | |
2039 | .TP | |
2040 | .B Return | |
2041 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
2042 | .UNINDENT | |
2043 | .TP | |
2044 | .B \fBint bpf_sk_redirect_hash(struct sk_buff *\fP\fIskb\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2045 | .INDENT 7.0 | |
2046 | .TP | |
2047 | .B Description | |
2048 | This helper is used in programs implementing policies at the | |
2049 | skb socket level. If the sk_buff \fIskb\fP is allowed to pass (i.e. | |
2050 | if the verdeict eBPF program returns \fBSK_PASS\fP), redirect it | |
2051 | to the socket referenced by \fImap\fP (of type | |
2052 | \fBBPF_MAP_TYPE_SOCKHASH\fP) using hash \fIkey\fP\&. Both ingress and | |
2053 | egress interfaces can be used for redirection. The | |
2054 | \fBBPF_F_INGRESS\fP value in \fIflags\fP is used to make the | |
2055 | distinction (ingress path is selected if the flag is present, | |
2056 | egress otherwise). This is the only flag supported for now. | |
2057 | .TP | |
2058 | .B Return | |
2059 | \fBSK_PASS\fP on success, or \fBSK_DROP\fP on error. | |
2060 | .UNINDENT | |
2061 | .TP | |
2062 | .B \fBint bpf_lwt_push_encap(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fItype\fP\fB, void *\fP\fIhdr\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
2063 | .INDENT 7.0 | |
2064 | .TP | |
2065 | .B Description | |
2066 | Encapsulate the packet associated to \fIskb\fP within a Layer 3 | |
2067 | protocol header. This header is provided in the buffer at | |
2068 | address \fIhdr\fP, with \fIlen\fP its size in bytes. \fItype\fP indicates | |
2069 | the protocol of the header and can be one of: | |
2070 | .INDENT 7.0 | |
2071 | .TP | |
2072 | .B \fBBPF_LWT_ENCAP_SEG6\fP | |
2073 | IPv6 encapsulation with Segment Routing Header | |
2074 | (\fBstruct ipv6_sr_hdr\fP). \fIhdr\fP only contains the SRH, | |
2075 | the IPv6 header is computed by the kernel. | |
2076 | .TP | |
2077 | .B \fBBPF_LWT_ENCAP_SEG6_INLINE\fP | |
2078 | Only works if \fIskb\fP contains an IPv6 packet. Insert a | |
2079 | Segment Routing Header (\fBstruct ipv6_sr_hdr\fP) inside | |
2080 | the IPv6 header. | |
e6107b29 MK |
2081 | .TP |
2082 | .B \fBBPF_LWT_ENCAP_IP\fP | |
2083 | IP encapsulation (GRE/GUE/IPIP/etc). The outer header | |
2084 | must be IPv4 or IPv6, followed by zero or more | |
2085 | additional headers, up to \fBLWT_BPF_MAX_HEADROOM\fP | |
2086 | total bytes in all prepended headers. Please note that | |
2087 | if \fBskb_is_gso\fP(\fIskb\fP) is true, no more than two | |
2088 | headers can be prepended, and the inner header, if | |
2089 | present, should be either GRE or UDP/GUE. | |
53666f6c MK |
2090 | .UNINDENT |
2091 | .sp | |
e6107b29 MK |
2092 | \fBBPF_LWT_ENCAP_SEG6\fP* types can be called by BPF programs |
2093 | of type \fBBPF_PROG_TYPE_LWT_IN\fP; \fBBPF_LWT_ENCAP_IP\fP type can | |
2094 | be called by bpf programs of types \fBBPF_PROG_TYPE_LWT_IN\fP and | |
2095 | \fBBPF_PROG_TYPE_LWT_XMIT\fP\&. | |
2096 | .sp | |
2097 | A call to this helper is susceptible to change the underlying | |
53666f6c MK |
2098 | packet buffer. Therefore, at load time, all checks on pointers |
2099 | previously done by the verifier are invalidated and must be | |
2100 | performed again, if the helper is used in combination with | |
2101 | direct packet access. | |
2102 | .TP | |
2103 | .B Return | |
2104 | 0 on success, or a negative error in case of failure. | |
2105 | .UNINDENT | |
2106 | .TP | |
2107 | .B \fBint bpf_lwt_seg6_store_bytes(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, const void *\fP\fIfrom\fP\fB, u32\fP \fIlen\fP\fB)\fP | |
2108 | .INDENT 7.0 | |
2109 | .TP | |
2110 | .B Description | |
2111 | Store \fIlen\fP bytes from address \fIfrom\fP into the packet | |
2112 | associated to \fIskb\fP, at \fIoffset\fP\&. Only the flags, tag and TLVs | |
2113 | inside the outermost IPv6 Segment Routing Header can be | |
2114 | modified through this helper. | |
2115 | .sp | |
e6107b29 | 2116 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
2117 | packet buffer. Therefore, at load time, all checks on pointers |
2118 | previously done by the verifier are invalidated and must be | |
2119 | performed again, if the helper is used in combination with | |
2120 | direct packet access. | |
2121 | .TP | |
2122 | .B Return | |
2123 | 0 on success, or a negative error in case of failure. | |
2124 | .UNINDENT | |
2125 | .TP | |
2126 | .B \fBint bpf_lwt_seg6_adjust_srh(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIoffset\fP\fB, s32\fP \fIdelta\fP\fB)\fP | |
2127 | .INDENT 7.0 | |
2128 | .TP | |
2129 | .B Description | |
2130 | Adjust the size allocated to TLVs in the outermost IPv6 | |
2131 | Segment Routing Header contained in the packet associated to | |
2132 | \fIskb\fP, at position \fIoffset\fP by \fIdelta\fP bytes. Only offsets | |
2133 | after the segments are accepted. \fIdelta\fP can be as well | |
2134 | positive (growing) as negative (shrinking). | |
2135 | .sp | |
e6107b29 | 2136 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
2137 | packet buffer. Therefore, at load time, all checks on pointers |
2138 | previously done by the verifier are invalidated and must be | |
2139 | performed again, if the helper is used in combination with | |
2140 | direct packet access. | |
2141 | .TP | |
2142 | .B Return | |
2143 | 0 on success, or a negative error in case of failure. | |
2144 | .UNINDENT | |
2145 | .TP | |
2146 | .B \fBint bpf_lwt_seg6_action(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIaction\fP\fB, void *\fP\fIparam\fP\fB, u32\fP \fIparam_len\fP\fB)\fP | |
2147 | .INDENT 7.0 | |
2148 | .TP | |
2149 | .B Description | |
2150 | Apply an IPv6 Segment Routing action of type \fIaction\fP to the | |
2151 | packet associated to \fIskb\fP\&. Each action takes a parameter | |
2152 | contained at address \fIparam\fP, and of length \fIparam_len\fP bytes. | |
2153 | \fIaction\fP can be one of: | |
2154 | .INDENT 7.0 | |
2155 | .TP | |
2156 | .B \fBSEG6_LOCAL_ACTION_END_X\fP | |
2157 | End.X action: Endpoint with Layer\-3 cross\-connect. | |
2158 | Type of \fIparam\fP: \fBstruct in6_addr\fP\&. | |
2159 | .TP | |
2160 | .B \fBSEG6_LOCAL_ACTION_END_T\fP | |
2161 | End.T action: Endpoint with specific IPv6 table lookup. | |
2162 | Type of \fIparam\fP: \fBint\fP\&. | |
2163 | .TP | |
2164 | .B \fBSEG6_LOCAL_ACTION_END_B6\fP | |
2165 | End.B6 action: Endpoint bound to an SRv6 policy. | |
e6107b29 | 2166 | Type of \fIparam\fP: \fBstruct ipv6_sr_hdr\fP\&. |
53666f6c MK |
2167 | .TP |
2168 | .B \fBSEG6_LOCAL_ACTION_END_B6_ENCAP\fP | |
2169 | End.B6.Encap action: Endpoint bound to an SRv6 | |
2170 | encapsulation policy. | |
e6107b29 | 2171 | Type of \fIparam\fP: \fBstruct ipv6_sr_hdr\fP\&. |
53666f6c MK |
2172 | .UNINDENT |
2173 | .sp | |
e6107b29 | 2174 | A call to this helper is susceptible to change the underlying |
53666f6c MK |
2175 | packet buffer. Therefore, at load time, all checks on pointers |
2176 | previously done by the verifier are invalidated and must be | |
2177 | performed again, if the helper is used in combination with | |
2178 | direct packet access. | |
2179 | .TP | |
2180 | .B Return | |
2181 | 0 on success, or a negative error in case of failure. | |
2182 | .UNINDENT | |
2183 | .TP | |
e6107b29 | 2184 | .B \fBint bpf_rc_repeat(void *\fP\fIctx\fP\fB)\fP |
53666f6c MK |
2185 | .INDENT 7.0 |
2186 | .TP | |
2187 | .B Description | |
2188 | This helper is used in programs implementing IR decoding, to | |
e6107b29 MK |
2189 | report a successfully decoded repeat key message. This delays |
2190 | the generation of a key up event for previously generated | |
2191 | key down event. | |
53666f6c | 2192 | .sp |
e6107b29 MK |
2193 | Some IR protocols like NEC have a special IR message for |
2194 | repeating last button, for when a button is held down. | |
53666f6c MK |
2195 | .sp |
2196 | The \fIctx\fP should point to the lirc sample as passed into | |
2197 | the program. | |
2198 | .sp | |
53666f6c MK |
2199 | This helper is only available is the kernel was compiled with |
2200 | the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to | |
2201 | "\fBy\fP". | |
2202 | .TP | |
2203 | .B Return | |
e6107b29 | 2204 | 0 |
53666f6c MK |
2205 | .UNINDENT |
2206 | .TP | |
e6107b29 | 2207 | .B \fBint bpf_rc_keydown(void *\fP\fIctx\fP\fB, u32\fP \fIprotocol\fP\fB, u64\fP \fIscancode\fP\fB, u32\fP \fItoggle\fP\fB)\fP |
53666f6c MK |
2208 | .INDENT 7.0 |
2209 | .TP | |
2210 | .B Description | |
2211 | This helper is used in programs implementing IR decoding, to | |
e6107b29 MK |
2212 | report a successfully decoded key press with \fIscancode\fP, |
2213 | \fItoggle\fP value in the given \fIprotocol\fP\&. The scancode will be | |
2214 | translated to a keycode using the rc keymap, and reported as | |
2215 | an input key down event. After a period a key up event is | |
2216 | generated. This period can be extended by calling either | |
2217 | \fBbpf_rc_keydown\fP() again with the same values, or calling | |
2218 | \fBbpf_rc_repeat\fP(). | |
53666f6c | 2219 | .sp |
e6107b29 MK |
2220 | Some protocols include a toggle bit, in case the button was |
2221 | released and pressed again between consecutive scancodes. | |
53666f6c MK |
2222 | .sp |
2223 | The \fIctx\fP should point to the lirc sample as passed into | |
2224 | the program. | |
2225 | .sp | |
e6107b29 MK |
2226 | The \fIprotocol\fP is the decoded protocol number (see |
2227 | \fBenum rc_proto\fP for some predefined values). | |
2228 | .sp | |
53666f6c MK |
2229 | This helper is only available is the kernel was compiled with |
2230 | the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to | |
2231 | "\fBy\fP". | |
2232 | .TP | |
2233 | .B Return | |
e6107b29 | 2234 | 0 |
53666f6c MK |
2235 | .UNINDENT |
2236 | .TP | |
e6107b29 | 2237 | .B \fBu64 bpf_skb_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB)\fP |
53666f6c MK |
2238 | .INDENT 7.0 |
2239 | .TP | |
2240 | .B Description | |
2241 | Return the cgroup v2 id of the socket associated with the \fIskb\fP\&. | |
2242 | This is roughly similar to the \fBbpf_get_cgroup_classid\fP() | |
2243 | helper for cgroup v1 by providing a tag resp. identifier that | |
2244 | can be matched on or used for map lookups e.g. to implement | |
2245 | policy. The cgroup v2 id of a given path in the hierarchy is | |
2246 | exposed in user space through the f_handle API in order to get | |
2247 | to the same 64\-bit id. | |
2248 | .sp | |
2249 | This helper can be used on TC egress path, but not on ingress, | |
2250 | and is available only if the kernel was compiled with the | |
2251 | \fBCONFIG_SOCK_CGROUP_DATA\fP configuration option. | |
2252 | .TP | |
2253 | .B Return | |
2254 | The id is returned or 0 in case the id could not be retrieved. | |
2255 | .UNINDENT | |
2256 | .TP | |
53666f6c MK |
2257 | .B \fBu64 bpf_get_current_cgroup_id(void)\fP |
2258 | .INDENT 7.0 | |
2259 | .TP | |
2260 | .B Return | |
2261 | A 64\-bit integer containing the current cgroup id based | |
2262 | on the cgroup within which the current task is running. | |
2263 | .UNINDENT | |
2264 | .TP | |
e6107b29 | 2265 | .B \fBvoid *bpf_get_local_storage(void *\fP\fImap\fP\fB, u64\fP \fIflags\fP\fB)\fP |
53666f6c MK |
2266 | .INDENT 7.0 |
2267 | .TP | |
2268 | .B Description | |
2269 | Get the pointer to the local storage area. | |
2270 | The type and the size of the local storage is defined | |
2271 | by the \fImap\fP argument. | |
2272 | The \fIflags\fP meaning is specific for each map type, | |
2273 | and has to be 0 for cgroup local storage. | |
2274 | .sp | |
2223d7df MK |
2275 | Depending on the BPF program type, a local storage area |
2276 | can be shared between multiple instances of the BPF program, | |
53666f6c MK |
2277 | running simultaneously. |
2278 | .sp | |
e6107b29 | 2279 | A user should care about the synchronization by himself. |
2223d7df | 2280 | For example, by using the \fBBPF_STX_XADD\fP instruction to alter |
53666f6c MK |
2281 | the shared data. |
2282 | .TP | |
2283 | .B Return | |
2223d7df | 2284 | A pointer to the local storage area. |
53666f6c MK |
2285 | .UNINDENT |
2286 | .TP | |
2287 | .B \fBint bpf_sk_select_reuseport(struct sk_reuseport_md *\fP\fIreuse\fP\fB, struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIkey\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2288 | .INDENT 7.0 | |
2289 | .TP | |
2290 | .B Description | |
2223d7df MK |
2291 | Select a \fBSO_REUSEPORT\fP socket from a |
2292 | \fBBPF_MAP_TYPE_REUSEPORT_ARRAY\fP \fImap\fP\&. | |
2293 | It checks the selected socket is matching the incoming | |
2294 | request in the socket buffer. | |
53666f6c MK |
2295 | .TP |
2296 | .B Return | |
2297 | 0 on success, or a negative error in case of failure. | |
2298 | .UNINDENT | |
2223d7df | 2299 | .TP |
e6107b29 MK |
2300 | .B \fBu64 bpf_skb_ancestor_cgroup_id(struct sk_buff *\fP\fIskb\fP\fB, int\fP \fIancestor_level\fP\fB)\fP |
2301 | .INDENT 7.0 | |
2302 | .TP | |
2303 | .B Description | |
2304 | Return id of cgroup v2 that is ancestor of cgroup associated | |
2305 | with the \fIskb\fP at the \fIancestor_level\fP\&. The root cgroup is at | |
2306 | \fIancestor_level\fP zero and each step down the hierarchy | |
2307 | increments the level. If \fIancestor_level\fP == level of cgroup | |
2308 | associated with \fIskb\fP, then return value will be same as that | |
2309 | of \fBbpf_skb_cgroup_id\fP(). | |
2310 | .sp | |
2311 | The helper is useful to implement policies based on cgroups | |
2312 | that are upper in hierarchy than immediate cgroup associated | |
2313 | with \fIskb\fP\&. | |
2314 | .sp | |
2315 | The format of returned id and helper limitations are same as in | |
2316 | \fBbpf_skb_cgroup_id\fP(). | |
2317 | .TP | |
2318 | .B Return | |
2319 | The id is returned or 0 in case the id could not be retrieved. | |
2320 | .UNINDENT | |
2321 | .TP | |
2223d7df MK |
2322 | .B \fBstruct bpf_sock *bpf_sk_lookup_tcp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP |
2323 | .INDENT 7.0 | |
2324 | .TP | |
2325 | .B Description | |
2326 | Look for TCP socket matching \fItuple\fP, optionally in a child | |
2327 | network namespace \fInetns\fP\&. The return value must be checked, | |
2328 | and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP(). | |
2329 | .sp | |
2330 | The \fIctx\fP should point to the context of the program, such as | |
2331 | the skb or socket (depending on the hook in use). This is used | |
2332 | to determine the base network namespace for the lookup. | |
2333 | .sp | |
2334 | \fItuple_size\fP must be one of: | |
2335 | .INDENT 7.0 | |
2336 | .TP | |
2337 | .B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP) | |
2338 | Look for an IPv4 socket. | |
2339 | .TP | |
2340 | .B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP) | |
2341 | Look for an IPv6 socket. | |
2342 | .UNINDENT | |
2343 | .sp | |
2344 | If the \fInetns\fP is a negative signed 32\-bit integer, then the | |
2345 | socket lookup table in the netns associated with the \fIctx\fP will | |
2346 | will be used. For the TC hooks, this is the netns of the device | |
2347 | in the skb. For socket hooks, this is the netns of the socket. | |
2348 | If \fInetns\fP is any other signed 32\-bit value greater than or | |
2349 | equal to zero then it specifies the ID of the netns relative to | |
2350 | the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the | |
2351 | range of 32\-bit integers are reserved for future use. | |
2352 | .sp | |
2353 | All values for \fIflags\fP are reserved for future usage, and must | |
2354 | be left at zero. | |
2355 | .sp | |
2356 | This helper is available only if the kernel was compiled with | |
2357 | \fBCONFIG_NET\fP configuration option. | |
2358 | .TP | |
2359 | .B Return | |
2360 | Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure. | |
2361 | For sockets with reuseport option, the \fBstruct bpf_sock\fP | |
e6107b29 MK |
2362 | result is from \fIreuse\fP\fB\->socks\fP[] using the hash of the |
2363 | tuple. | |
2223d7df MK |
2364 | .UNINDENT |
2365 | .TP | |
2366 | .B \fBstruct bpf_sock *bpf_sk_lookup_udp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2367 | .INDENT 7.0 | |
2368 | .TP | |
2369 | .B Description | |
2370 | Look for UDP socket matching \fItuple\fP, optionally in a child | |
2371 | network namespace \fInetns\fP\&. The return value must be checked, | |
2372 | and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP(). | |
2373 | .sp | |
2374 | The \fIctx\fP should point to the context of the program, such as | |
2375 | the skb or socket (depending on the hook in use). This is used | |
2376 | to determine the base network namespace for the lookup. | |
2377 | .sp | |
2378 | \fItuple_size\fP must be one of: | |
2379 | .INDENT 7.0 | |
2380 | .TP | |
2381 | .B \fBsizeof\fP(\fItuple\fP\fB\->ipv4\fP) | |
2382 | Look for an IPv4 socket. | |
2383 | .TP | |
2384 | .B \fBsizeof\fP(\fItuple\fP\fB\->ipv6\fP) | |
2385 | Look for an IPv6 socket. | |
2386 | .UNINDENT | |
2387 | .sp | |
2388 | If the \fInetns\fP is a negative signed 32\-bit integer, then the | |
2389 | socket lookup table in the netns associated with the \fIctx\fP will | |
2390 | will be used. For the TC hooks, this is the netns of the device | |
2391 | in the skb. For socket hooks, this is the netns of the socket. | |
2392 | If \fInetns\fP is any other signed 32\-bit value greater than or | |
2393 | equal to zero then it specifies the ID of the netns relative to | |
2394 | the netns associated with the \fIctx\fP\&. \fInetns\fP values beyond the | |
2395 | range of 32\-bit integers are reserved for future use. | |
2396 | .sp | |
2397 | All values for \fIflags\fP are reserved for future usage, and must | |
2398 | be left at zero. | |
2399 | .sp | |
2400 | This helper is available only if the kernel was compiled with | |
2401 | \fBCONFIG_NET\fP configuration option. | |
2402 | .TP | |
2403 | .B Return | |
2404 | Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure. | |
2405 | For sockets with reuseport option, the \fBstruct bpf_sock\fP | |
e6107b29 MK |
2406 | result is from \fIreuse\fP\fB\->socks\fP[] using the hash of the |
2407 | tuple. | |
2223d7df MK |
2408 | .UNINDENT |
2409 | .TP | |
2410 | .B \fBint bpf_sk_release(struct bpf_sock *\fP\fIsock\fP\fB)\fP | |
2411 | .INDENT 7.0 | |
2412 | .TP | |
2413 | .B Description | |
2414 | Release the reference held by \fIsock\fP\&. \fIsock\fP must be a | |
2415 | non\-\fBNULL\fP pointer that was returned from | |
2416 | \fBbpf_sk_lookup_xxx\fP(). | |
2417 | .TP | |
2418 | .B Return | |
2419 | 0 on success, or a negative error in case of failure. | |
2420 | .UNINDENT | |
2421 | .TP | |
e6107b29 MK |
2422 | .B \fBint bpf_map_push_elem(struct bpf_map *\fP\fImap\fP\fB, const void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP |
2423 | .INDENT 7.0 | |
2424 | .TP | |
2425 | .B Description | |
2426 | Push an element \fIvalue\fP in \fImap\fP\&. \fIflags\fP is one of: | |
2427 | .INDENT 7.0 | |
2428 | .TP | |
2429 | .B \fBBPF_EXIST\fP | |
2430 | If the queue/stack is full, the oldest element is | |
2431 | removed to make room for this. | |
2432 | .UNINDENT | |
2433 | .TP | |
2434 | .B Return | |
2435 | 0 on success, or a negative error in case of failure. | |
2436 | .UNINDENT | |
2437 | .TP | |
2223d7df MK |
2438 | .B \fBint bpf_map_pop_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP |
2439 | .INDENT 7.0 | |
2440 | .TP | |
2441 | .B Description | |
2442 | Pop an element from \fImap\fP\&. | |
2443 | .TP | |
2444 | .B Return | |
2445 | 0 on success, or a negative error in case of failure. | |
2446 | .UNINDENT | |
2447 | .TP | |
2448 | .B \fBint bpf_map_peek_elem(struct bpf_map *\fP\fImap\fP\fB, void *\fP\fIvalue\fP\fB)\fP | |
2449 | .INDENT 7.0 | |
2450 | .TP | |
2451 | .B Description | |
2452 | Get an element from \fImap\fP without removing it. | |
2453 | .TP | |
2454 | .B Return | |
2455 | 0 on success, or a negative error in case of failure. | |
2456 | .UNINDENT | |
2457 | .TP | |
2458 | .B \fBint bpf_msg_push_data(struct sk_buff *\fP\fIskb\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIlen\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2459 | .INDENT 7.0 | |
2460 | .TP | |
2461 | .B Description | |
2462 | For socket policies, insert \fIlen\fP bytes into \fImsg\fP at offset | |
2463 | \fIstart\fP\&. | |
2464 | .sp | |
2465 | If a program of type \fBBPF_PROG_TYPE_SK_MSG\fP is run on a | |
2466 | \fImsg\fP it may want to insert metadata or options into the \fImsg\fP\&. | |
2467 | This can later be read and used by any of the lower layer BPF | |
2468 | hooks. | |
2469 | .sp | |
2470 | This helper may fail if under memory pressure (a malloc | |
2471 | fails) in these cases BPF programs will get an appropriate | |
2472 | error and BPF programs will need to handle them. | |
2473 | .TP | |
2474 | .B Return | |
2475 | 0 on success, or a negative error in case of failure. | |
2476 | .UNINDENT | |
2477 | .TP | |
2478 | .B \fBint bpf_msg_pop_data(struct sk_msg_buff *\fP\fImsg\fP\fB, u32\fP \fIstart\fP\fB, u32\fP \fIpop\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2479 | .INDENT 7.0 | |
2480 | .TP | |
2481 | .B Description | |
2482 | Will remove \fIpop\fP bytes from a \fImsg\fP starting at byte \fIstart\fP\&. | |
2483 | This may result in \fBENOMEM\fP errors under certain situations if | |
2484 | an allocation and copy are required due to a full ring buffer. | |
2485 | However, the helper will try to avoid doing the allocation | |
2486 | if possible. Other errors can occur if input parameters are | |
2487 | invalid either due to \fIstart\fP byte not being valid part of \fImsg\fP | |
2488 | payload and/or \fIpop\fP value being to large. | |
2489 | .TP | |
2490 | .B Return | |
2491 | 0 on success, or a negative error in case of failure. | |
2492 | .UNINDENT | |
2493 | .TP | |
2494 | .B \fBint bpf_rc_pointer_rel(void *\fP\fIctx\fP\fB, s32\fP \fIrel_x\fP\fB, s32\fP \fIrel_y\fP\fB)\fP | |
2495 | .INDENT 7.0 | |
2496 | .TP | |
2497 | .B Description | |
2498 | This helper is used in programs implementing IR decoding, to | |
2499 | report a successfully decoded pointer movement. | |
2500 | .sp | |
2501 | The \fIctx\fP should point to the lirc sample as passed into | |
2502 | the program. | |
2503 | .sp | |
2504 | This helper is only available is the kernel was compiled with | |
2505 | the \fBCONFIG_BPF_LIRC_MODE2\fP configuration option set to | |
2506 | "\fBy\fP". | |
2507 | .TP | |
2508 | .B Return | |
e6107b29 MK |
2509 | 0 |
2510 | .UNINDENT | |
2511 | .TP | |
2512 | .B \fBint bpf_spin_lock(struct bpf_spin_lock *\fP\fIlock\fP\fB)\fP | |
2513 | .INDENT 7.0 | |
2514 | .TP | |
2515 | .B Description | |
2516 | Acquire a spinlock represented by the pointer \fIlock\fP, which is | |
2517 | stored as part of a value of a map. Taking the lock allows to | |
2518 | safely update the rest of the fields in that value. The | |
2519 | spinlock can (and must) later be released with a call to | |
2520 | \fBbpf_spin_unlock\fP(\fIlock\fP). | |
2521 | .sp | |
2522 | Spinlocks in BPF programs come with a number of restrictions | |
2523 | and constraints: | |
2524 | .INDENT 7.0 | |
2525 | .IP \(bu 2 | |
2526 | \fBbpf_spin_lock\fP objects are only allowed inside maps of | |
2527 | types \fBBPF_MAP_TYPE_HASH\fP and \fBBPF_MAP_TYPE_ARRAY\fP (this | |
2528 | list could be extended in the future). | |
2529 | .IP \(bu 2 | |
2530 | BTF description of the map is mandatory. | |
2531 | .IP \(bu 2 | |
2532 | The BPF program can take ONE lock at a time, since taking two | |
2533 | or more could cause dead locks. | |
2534 | .IP \(bu 2 | |
2535 | Only one \fBstruct bpf_spin_lock\fP is allowed per map element. | |
2536 | .IP \(bu 2 | |
2537 | When the lock is taken, calls (either BPF to BPF or helpers) | |
2538 | are not allowed. | |
2539 | .IP \(bu 2 | |
2540 | The \fBBPF_LD_ABS\fP and \fBBPF_LD_IND\fP instructions are not | |
2541 | allowed inside a spinlock\-ed region. | |
2542 | .IP \(bu 2 | |
2543 | The BPF program MUST call \fBbpf_spin_unlock\fP() to release | |
2544 | the lock, on all execution paths, before it returns. | |
2545 | .IP \(bu 2 | |
2546 | The BPF program can access \fBstruct bpf_spin_lock\fP only via | |
2547 | the \fBbpf_spin_lock\fP() and \fBbpf_spin_unlock\fP() | |
2548 | helpers. Loading or storing data into the \fBstruct | |
2549 | bpf_spin_lock\fP \fIlock\fP\fB;\fP field of a map is not allowed. | |
2550 | .IP \(bu 2 | |
2551 | To use the \fBbpf_spin_lock\fP() helper, the BTF description | |
2552 | of the map value must be a struct and have \fBstruct | |
2553 | bpf_spin_lock\fP \fIanyname\fP\fB;\fP field at the top level. | |
2554 | Nested lock inside another struct is not allowed. | |
2555 | .IP \(bu 2 | |
2556 | The \fBstruct bpf_spin_lock\fP \fIlock\fP field in a map value must | |
2557 | be aligned on a multiple of 4 bytes in that value. | |
2558 | .IP \(bu 2 | |
2559 | Syscall with command \fBBPF_MAP_LOOKUP_ELEM\fP does not copy | |
2560 | the \fBbpf_spin_lock\fP field to user space. | |
2561 | .IP \(bu 2 | |
2562 | Syscall with command \fBBPF_MAP_UPDATE_ELEM\fP, or update from | |
2563 | a BPF program, do not update the \fBbpf_spin_lock\fP field. | |
2564 | .IP \(bu 2 | |
2565 | \fBbpf_spin_lock\fP cannot be on the stack or inside a | |
2566 | networking packet (it can only be inside of a map values). | |
2567 | .IP \(bu 2 | |
2568 | \fBbpf_spin_lock\fP is available to root only. | |
2569 | .IP \(bu 2 | |
2570 | Tracing programs and socket filter programs cannot use | |
2571 | \fBbpf_spin_lock\fP() due to insufficient preemption checks | |
2572 | (but this may change in the future). | |
2573 | .IP \(bu 2 | |
2574 | \fBbpf_spin_lock\fP is not allowed in inner maps of map\-in\-map. | |
2575 | .UNINDENT | |
2576 | .TP | |
2577 | .B Return | |
2578 | 0 | |
2579 | .UNINDENT | |
2580 | .TP | |
2581 | .B \fBint bpf_spin_unlock(struct bpf_spin_lock *\fP\fIlock\fP\fB)\fP | |
2582 | .INDENT 7.0 | |
2583 | .TP | |
2584 | .B Description | |
2585 | Release the \fIlock\fP previously locked by a call to | |
2586 | \fBbpf_spin_lock\fP(\fIlock\fP). | |
2587 | .TP | |
2588 | .B Return | |
2589 | 0 | |
2590 | .UNINDENT | |
2591 | .TP | |
2592 | .B \fBstruct bpf_sock *bpf_sk_fullsock(struct bpf_sock *\fP\fIsk\fP\fB)\fP | |
2593 | .INDENT 7.0 | |
2594 | .TP | |
2595 | .B Description | |
2596 | This helper gets a \fBstruct bpf_sock\fP pointer such | |
2597 | that all the fields in this \fBbpf_sock\fP can be accessed. | |
2598 | .TP | |
2599 | .B Return | |
2600 | A \fBstruct bpf_sock\fP pointer on success, or \fBNULL\fP in | |
2601 | case of failure. | |
2602 | .UNINDENT | |
2603 | .TP | |
2604 | .B \fBstruct bpf_tcp_sock *bpf_tcp_sock(struct bpf_sock *\fP\fIsk\fP\fB)\fP | |
2605 | .INDENT 7.0 | |
2606 | .TP | |
2607 | .B Description | |
2608 | This helper gets a \fBstruct bpf_tcp_sock\fP pointer from a | |
2609 | \fBstruct bpf_sock\fP pointer. | |
2610 | .TP | |
2611 | .B Return | |
2612 | A \fBstruct bpf_tcp_sock\fP pointer on success, or \fBNULL\fP in | |
2613 | case of failure. | |
2614 | .UNINDENT | |
2615 | .TP | |
2616 | .B \fBint bpf_skb_ecn_set_ce(struct sk_buf *\fP\fIskb\fP\fB)\fP | |
2617 | .INDENT 7.0 | |
2618 | .TP | |
2619 | .B Description | |
2620 | Set ECN (Explicit Congestion Notification) field of IP header | |
2621 | to \fBCE\fP (Congestion Encountered) if current value is \fBECT\fP | |
2622 | (ECN Capable Transport). Otherwise, do nothing. Works with IPv6 | |
2623 | and IPv4. | |
2624 | .TP | |
2625 | .B Return | |
2626 | 1 if the \fBCE\fP flag is set (either by the current helper call | |
2627 | or because it was already present), 0 if it is not set. | |
2628 | .UNINDENT | |
2629 | .TP | |
2630 | .B \fBstruct bpf_sock *bpf_get_listener_sock(struct bpf_sock *\fP\fIsk\fP\fB)\fP | |
2631 | .INDENT 7.0 | |
2632 | .TP | |
2633 | .B Description | |
2634 | Return a \fBstruct bpf_sock\fP pointer in \fBTCP_LISTEN\fP state. | |
2635 | \fBbpf_sk_release\fP() is unnecessary and not allowed. | |
2636 | .TP | |
2637 | .B Return | |
2638 | A \fBstruct bpf_sock\fP pointer on success, or \fBNULL\fP in | |
2639 | case of failure. | |
2640 | .UNINDENT | |
2641 | .TP | |
2642 | .B \fBstruct bpf_sock *bpf_skc_lookup_tcp(void *\fP\fIctx\fP\fB, struct bpf_sock_tuple *\fP\fItuple\fP\fB, u32\fP \fItuple_size\fP\fB, u64\fP \fInetns\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2643 | .INDENT 7.0 | |
2644 | .TP | |
2645 | .B Description | |
2646 | Look for TCP socket matching \fItuple\fP, optionally in a child | |
2647 | network namespace \fInetns\fP\&. The return value must be checked, | |
2648 | and if non\-\fBNULL\fP, released via \fBbpf_sk_release\fP(). | |
2649 | .sp | |
2650 | This function is identical to \fBbpf_sk_lookup_tcp\fP(), except | |
2651 | that it also returns timewait or request sockets. Use | |
2652 | \fBbpf_sk_fullsock\fP() or \fBbpf_tcp_sock\fP() to access the | |
2653 | full structure. | |
2654 | .sp | |
2655 | This helper is available only if the kernel was compiled with | |
2656 | \fBCONFIG_NET\fP configuration option. | |
2657 | .TP | |
2658 | .B Return | |
2659 | Pointer to \fBstruct bpf_sock\fP, or \fBNULL\fP in case of failure. | |
2660 | For sockets with reuseport option, the \fBstruct bpf_sock\fP | |
2661 | result is from \fIreuse\fP\fB\->socks\fP[] using the hash of the | |
2662 | tuple. | |
2663 | .UNINDENT | |
2664 | .TP | |
2665 | .B \fBint bpf_tcp_check_syncookie(struct bpf_sock *\fP\fIsk\fP\fB, void *\fP\fIiph\fP\fB, u32\fP \fIiph_len\fP\fB, struct tcphdr *\fP\fIth\fP\fB, u32\fP \fIth_len\fP\fB)\fP | |
2666 | .INDENT 7.0 | |
2667 | .TP | |
2668 | .B Description | |
2669 | Check whether \fIiph\fP and \fIth\fP contain a valid SYN cookie ACK for | |
2670 | the listening socket in \fIsk\fP\&. | |
2671 | .sp | |
2672 | \fIiph\fP points to the start of the IPv4 or IPv6 header, while | |
2673 | \fIiph_len\fP contains \fBsizeof\fP(\fBstruct iphdr\fP) or | |
2674 | \fBsizeof\fP(\fBstruct ip6hdr\fP). | |
2675 | .sp | |
2676 | \fIth\fP points to the start of the TCP header, while \fIth_len\fP | |
2677 | contains \fBsizeof\fP(\fBstruct tcphdr\fP). | |
2678 | .TP | |
2679 | .B Return | |
2680 | 0 if \fIiph\fP and \fIth\fP are a valid SYN cookie ACK, or a negative | |
2681 | error otherwise. | |
2682 | .UNINDENT | |
2683 | .TP | |
2684 | .B \fBint bpf_sysctl_get_name(struct bpf_sysctl *\fP\fIctx\fP\fB, char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2685 | .INDENT 7.0 | |
2686 | .TP | |
2687 | .B Description | |
2688 | Get name of sysctl in /proc/sys/ and copy it into provided by | |
2689 | program buffer \fIbuf\fP of size \fIbuf_len\fP\&. | |
2690 | .sp | |
2691 | The buffer is always NUL terminated, unless it\(aqs zero\-sized. | |
2692 | .sp | |
2693 | If \fIflags\fP is zero, full name (e.g. "net/ipv4/tcp_mem") is | |
2694 | copied. Use \fBBPF_F_SYSCTL_BASE_NAME\fP flag to copy base name | |
2695 | only (e.g. "tcp_mem"). | |
2696 | .TP | |
2697 | .B Return | |
2698 | Number of character copied (not including the trailing NUL). | |
2699 | .sp | |
2700 | \fB\-E2BIG\fP if the buffer wasn\(aqt big enough (\fIbuf\fP will contain | |
2701 | truncated name in this case). | |
2702 | .UNINDENT | |
2703 | .TP | |
2704 | .B \fBint bpf_sysctl_get_current_value(struct bpf_sysctl *\fP\fIctx\fP\fB, char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB)\fP | |
2705 | .INDENT 7.0 | |
2706 | .TP | |
2707 | .B Description | |
2708 | Get current value of sysctl as it is presented in /proc/sys | |
2709 | (incl. newline, etc), and copy it as a string into provided | |
2710 | by program buffer \fIbuf\fP of size \fIbuf_len\fP\&. | |
2711 | .sp | |
2712 | The whole value is copied, no matter what file position user | |
2713 | space issued e.g. sys_read at. | |
2714 | .sp | |
2715 | The buffer is always NUL terminated, unless it\(aqs zero\-sized. | |
2716 | .TP | |
2717 | .B Return | |
2718 | Number of character copied (not including the trailing NUL). | |
2719 | .sp | |
2720 | \fB\-E2BIG\fP if the buffer wasn\(aqt big enough (\fIbuf\fP will contain | |
2721 | truncated name in this case). | |
2722 | .sp | |
2723 | \fB\-EINVAL\fP if current value was unavailable, e.g. because | |
2724 | sysctl is uninitialized and read returns \-EIO for it. | |
2725 | .UNINDENT | |
2726 | .TP | |
2727 | .B \fBint bpf_sysctl_get_new_value(struct bpf_sysctl *\fP\fIctx\fP\fB, char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB)\fP | |
2728 | .INDENT 7.0 | |
2729 | .TP | |
2730 | .B Description | |
2731 | Get new value being written by user space to sysctl (before | |
2732 | the actual write happens) and copy it as a string into | |
2733 | provided by program buffer \fIbuf\fP of size \fIbuf_len\fP\&. | |
2734 | .sp | |
2735 | User space may write new value at file position > 0. | |
2736 | .sp | |
2737 | The buffer is always NUL terminated, unless it\(aqs zero\-sized. | |
2738 | .TP | |
2739 | .B Return | |
2740 | Number of character copied (not including the trailing NUL). | |
2741 | .sp | |
2742 | \fB\-E2BIG\fP if the buffer wasn\(aqt big enough (\fIbuf\fP will contain | |
2743 | truncated name in this case). | |
2744 | .sp | |
2745 | \fB\-EINVAL\fP if sysctl is being read. | |
2746 | .UNINDENT | |
2747 | .TP | |
2748 | .B \fBint bpf_sysctl_set_new_value(struct bpf_sysctl *\fP\fIctx\fP\fB, const char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB)\fP | |
2749 | .INDENT 7.0 | |
2750 | .TP | |
2751 | .B Description | |
2752 | Override new value being written by user space to sysctl with | |
2753 | value provided by program in buffer \fIbuf\fP of size \fIbuf_len\fP\&. | |
2754 | .sp | |
2755 | \fIbuf\fP should contain a string in same form as provided by user | |
2756 | space on sysctl write. | |
2757 | .sp | |
2758 | User space may write new value at file position > 0. To override | |
2759 | the whole sysctl value file position should be set to zero. | |
2760 | .TP | |
2761 | .B Return | |
2762 | 0 on success. | |
2763 | .sp | |
2764 | \fB\-E2BIG\fP if the \fIbuf_len\fP is too big. | |
2765 | .sp | |
2766 | \fB\-EINVAL\fP if sysctl is being read. | |
2767 | .UNINDENT | |
2768 | .TP | |
2769 | .B \fBint bpf_strtol(const char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB, u64\fP \fIflags\fP\fB, long *\fP\fIres\fP\fB)\fP | |
2770 | .INDENT 7.0 | |
2771 | .TP | |
2772 | .B Description | |
2773 | Convert the initial part of the string from buffer \fIbuf\fP of | |
2774 | size \fIbuf_len\fP to a long integer according to the given base | |
2775 | and save the result in \fIres\fP\&. | |
2776 | .sp | |
2777 | The string may begin with an arbitrary amount of white space | |
2778 | (as determined by \fBisspace\fP(3)) followed by a single | |
2779 | optional \(aq\fB\-\fP\(aq sign. | |
2780 | .sp | |
2781 | Five least significant bits of \fIflags\fP encode base, other bits | |
2782 | are currently unused. | |
2783 | .sp | |
2784 | Base must be either 8, 10, 16 or 0 to detect it automatically | |
2785 | similar to user space \fBstrtol\fP(3). | |
2786 | .TP | |
2787 | .B Return | |
2788 | Number of characters consumed on success. Must be positive but | |
2789 | no more than \fIbuf_len\fP\&. | |
2790 | .sp | |
2791 | \fB\-EINVAL\fP if no valid digits were found or unsupported base | |
2792 | was provided. | |
2793 | .sp | |
2794 | \fB\-ERANGE\fP if resulting value was out of range. | |
2795 | .UNINDENT | |
2796 | .TP | |
2797 | .B \fBint bpf_strtoul(const char *\fP\fIbuf\fP\fB, size_t\fP \fIbuf_len\fP\fB, u64\fP \fIflags\fP\fB, unsigned long *\fP\fIres\fP\fB)\fP | |
2798 | .INDENT 7.0 | |
2799 | .TP | |
2800 | .B Description | |
2801 | Convert the initial part of the string from buffer \fIbuf\fP of | |
2802 | size \fIbuf_len\fP to an unsigned long integer according to the | |
2803 | given base and save the result in \fIres\fP\&. | |
2804 | .sp | |
2805 | The string may begin with an arbitrary amount of white space | |
2806 | (as determined by \fBisspace\fP(3)). | |
2807 | .sp | |
2808 | Five least significant bits of \fIflags\fP encode base, other bits | |
2809 | are currently unused. | |
2810 | .sp | |
2811 | Base must be either 8, 10, 16 or 0 to detect it automatically | |
2812 | similar to user space \fBstrtoul\fP(3). | |
2813 | .TP | |
2814 | .B Return | |
2815 | Number of characters consumed on success. Must be positive but | |
2816 | no more than \fIbuf_len\fP\&. | |
2817 | .sp | |
2818 | \fB\-EINVAL\fP if no valid digits were found or unsupported base | |
2819 | was provided. | |
2820 | .sp | |
2821 | \fB\-ERANGE\fP if resulting value was out of range. | |
2822 | .UNINDENT | |
2823 | .TP | |
2824 | .B \fBvoid *bpf_sk_storage_get(struct bpf_map *\fP\fImap\fP\fB, struct bpf_sock *\fP\fIsk\fP\fB, void *\fP\fIvalue\fP\fB, u64\fP \fIflags\fP\fB)\fP | |
2825 | .INDENT 7.0 | |
2826 | .TP | |
2827 | .B Description | |
2828 | Get a bpf\-local\-storage from a \fIsk\fP\&. | |
2829 | .sp | |
2830 | Logically, it could be thought of getting the value from | |
2831 | a \fImap\fP with \fIsk\fP as the \fBkey\fP\&. From this | |
2832 | perspective, the usage is not much different from | |
2833 | \fBbpf_map_lookup_elem\fP(\fImap\fP, \fB&\fP\fIsk\fP) except this | |
2834 | helper enforces the key must be a full socket and the map must | |
2835 | be a \fBBPF_MAP_TYPE_SK_STORAGE\fP also. | |
2836 | .sp | |
2837 | Underneath, the value is stored locally at \fIsk\fP instead of | |
2838 | the \fImap\fP\&. The \fImap\fP is used as the bpf\-local\-storage | |
2839 | "type". The bpf\-local\-storage "type" (i.e. the \fImap\fP) is | |
2840 | searched against all bpf\-local\-storages residing at \fIsk\fP\&. | |
2841 | .sp | |
2842 | An optional \fIflags\fP (\fBBPF_SK_STORAGE_GET_F_CREATE\fP) can be | |
2843 | used such that a new bpf\-local\-storage will be | |
2844 | created if one does not exist. \fIvalue\fP can be used | |
2845 | together with \fBBPF_SK_STORAGE_GET_F_CREATE\fP to specify | |
2846 | the initial value of a bpf\-local\-storage. If \fIvalue\fP is | |
2847 | \fBNULL\fP, the new bpf\-local\-storage will be zero initialized. | |
2848 | .TP | |
2849 | .B Return | |
2850 | A bpf\-local\-storage pointer is returned on success. | |
2851 | .sp | |
2852 | \fBNULL\fP if not found or there was an error in adding | |
2853 | a new bpf\-local\-storage. | |
2854 | .UNINDENT | |
2855 | .TP | |
2856 | .B \fBint bpf_sk_storage_delete(struct bpf_map *\fP\fImap\fP\fB, struct bpf_sock *\fP\fIsk\fP\fB)\fP | |
2857 | .INDENT 7.0 | |
2858 | .TP | |
2859 | .B Description | |
2860 | Delete a bpf\-local\-storage from a \fIsk\fP\&. | |
2861 | .TP | |
2862 | .B Return | |
2863 | 0 on success. | |
2864 | .sp | |
2865 | \fB\-ENOENT\fP if the bpf\-local\-storage cannot be found. | |
2866 | .UNINDENT | |
2867 | .TP | |
2868 | .B \fBint bpf_send_signal(u32\fP \fIsig\fP\fB)\fP | |
2869 | .INDENT 7.0 | |
2870 | .TP | |
2871 | .B Description | |
2872 | Send signal \fIsig\fP to the current task. | |
2873 | .TP | |
2874 | .B Return | |
2875 | 0 on success or successfully queued. | |
2876 | .sp | |
2877 | \fB\-EBUSY\fP if work queue under nmi is full. | |
2878 | .sp | |
2879 | \fB\-EINVAL\fP if \fIsig\fP is invalid. | |
2880 | .sp | |
2881 | \fB\-EPERM\fP if no permission to send the \fIsig\fP\&. | |
2882 | .sp | |
2883 | \fB\-EAGAIN\fP if bpf program can try again. | |
2884 | .UNINDENT | |
2885 | .TP | |
2886 | .B \fBs64 bpf_tcp_gen_syncookie(struct bpf_sock *\fP\fIsk\fP\fB, void *\fP\fIiph\fP\fB, u32\fP \fIiph_len\fP\fB, struct tcphdr *\fP\fIth\fP\fB, u32\fP \fIth_len\fP\fB)\fP | |
2887 | .INDENT 7.0 | |
2888 | .TP | |
2889 | .B Description | |
2890 | Try to issue a SYN cookie for the packet with corresponding | |
2891 | IP/TCP headers, \fIiph\fP and \fIth\fP, on the listening socket in \fIsk\fP\&. | |
2892 | .sp | |
2893 | \fIiph\fP points to the start of the IPv4 or IPv6 header, while | |
2894 | \fIiph_len\fP contains \fBsizeof\fP(\fBstruct iphdr\fP) or | |
2895 | \fBsizeof\fP(\fBstruct ip6hdr\fP). | |
2896 | .sp | |
2897 | \fIth\fP points to the start of the TCP header, while \fIth_len\fP | |
2898 | contains the length of the TCP header. | |
2899 | .TP | |
2900 | .B Return | |
2901 | On success, lower 32 bits hold the generated SYN cookie in | |
2902 | followed by 16 bits which hold the MSS value for that cookie, | |
2903 | and the top 16 bits are unused. | |
2904 | .sp | |
2905 | On failure, the returned value is one of the following: | |
2906 | .sp | |
2907 | \fB\-EINVAL\fP SYN cookie cannot be issued due to error | |
2908 | .sp | |
2909 | \fB\-ENOENT\fP SYN cookie should not be issued (no SYN flood) | |
2910 | .sp | |
2911 | \fB\-EOPNOTSUPP\fP kernel configuration does not enable SYN cookies | |
2912 | .sp | |
2913 | \fB\-EPROTONOSUPPORT\fP IP packet version is not 4 or 6 | |
2223d7df | 2914 | .UNINDENT |
53666f6c MK |
2915 | .UNINDENT |
2916 | .SH EXAMPLES | |
2917 | .sp | |
2918 | Example usage for most of the eBPF helpers listed in this manual page are | |
2919 | available within the Linux kernel sources, at the following locations: | |
2920 | .INDENT 0.0 | |
2921 | .IP \(bu 2 | |
2922 | \fIsamples/bpf/\fP | |
2923 | .IP \(bu 2 | |
2924 | \fItools/testing/selftests/bpf/\fP | |
2925 | .UNINDENT | |
2926 | .SH LICENSE | |
2927 | .sp | |
2928 | eBPF programs can have an associated license, passed along with the bytecode | |
2929 | instructions to the kernel when the programs are loaded. The format for that | |
2930 | string is identical to the one in use for kernel modules (Dual licenses, such | |
2931 | as "Dual BSD/GPL", may be used). Some helper functions are only accessible to | |
2932 | programs that are compatible with the GNU Privacy License (GPL). | |
2933 | .sp | |
2934 | In order to use such helpers, the eBPF program must be loaded with the correct | |
2935 | license string passed (via \fBattr\fP) to the \fBbpf\fP() system call, and this | |
2936 | generally translates into the C source code of the program containing a line | |
2937 | similar to the following: | |
2938 | .INDENT 0.0 | |
2939 | .INDENT 3.5 | |
2940 | .sp | |
2941 | .nf | |
2942 | .ft C | |
2943 | char ____license[] __attribute__((section("license"), used)) = "GPL"; | |
2944 | .ft P | |
2945 | .fi | |
2946 | .UNINDENT | |
2947 | .UNINDENT | |
2948 | .SH IMPLEMENTATION | |
2949 | .sp | |
2950 | This manual page is an effort to document the existing eBPF helper functions. | |
2951 | But as of this writing, the BPF sub\-system is under heavy development. New eBPF | |
2952 | program or map types are added, along with new helper functions. Some helpers | |
2953 | are occasionally made available for additional program types. So in spite of | |
2954 | the efforts of the community, this page might not be up\-to\-date. If you want to | |
2955 | check by yourself what helper functions exist in your kernel, or what types of | |
2956 | programs they can support, here are some files among the kernel tree that you | |
2957 | may be interested in: | |
2958 | .INDENT 0.0 | |
2959 | .IP \(bu 2 | |
2960 | \fIinclude/uapi/linux/bpf.h\fP is the main BPF header. It contains the full list | |
2961 | of all helper functions, as well as many other BPF definitions including most | |
2962 | of the flags, structs or constants used by the helpers. | |
2963 | .IP \(bu 2 | |
2964 | \fInet/core/filter.c\fP contains the definition of most network\-related helper | |
2965 | functions, and the list of program types from which they can be used. | |
2966 | .IP \(bu 2 | |
2967 | \fIkernel/trace/bpf_trace.c\fP is the equivalent for most tracing program\-related | |
2968 | helpers. | |
2969 | .IP \(bu 2 | |
2970 | \fIkernel/bpf/verifier.c\fP contains the functions used to check that valid types | |
2971 | of eBPF maps are used with a given helper function. | |
2972 | .IP \(bu 2 | |
2973 | \fIkernel/bpf/\fP directory contains other files in which additional helpers are | |
2974 | defined (for cgroups, sockmaps, etc.). | |
2975 | .UNINDENT | |
2976 | .sp | |
2977 | Compatibility between helper functions and program types can generally be found | |
2978 | in the files where helper functions are defined. Look for the \fBstruct | |
2979 | bpf_func_proto\fP objects and for functions returning them: these functions | |
2980 | contain a list of helpers that a given program type can call. Note that the | |
2981 | \fBdefault:\fP label of the \fBswitch ... case\fP used to filter helpers can call | |
2982 | other functions, themselves allowing access to additional helpers. The | |
2983 | requirement for GPL license is also in those \fBstruct bpf_func_proto\fP\&. | |
2984 | .sp | |
2985 | Compatibility between helper functions and map types can be found in the | |
2986 | \fBcheck_map_func_compatibility\fP() function in file \fIkernel/bpf/verifier.c\fP\&. | |
2987 | .sp | |
2988 | Helper functions that invalidate the checks on \fBdata\fP and \fBdata_end\fP | |
2989 | pointers for network processing are listed in function | |
2990 | \fBbpf_helper_changes_pkt_data\fP() in file \fInet/core/filter.c\fP\&. | |
2991 | .SH SEE ALSO | |
2992 | .sp | |
2993 | \fBbpf\fP(2), | |
2994 | \fBcgroups\fP(7), | |
2995 | \fBip\fP(8), | |
2996 | \fBperf_event_open\fP(2), | |
2997 | \fBsendmsg\fP(2), | |
2998 | \fBsocket\fP(7), | |
2999 | \fBtc\-bpf\fP(8) | |
3000 | .\" Generated by docutils manpage writer. | |
e6107b29 | 3001 | . |