git.ipfire.org Git - thirdparty/kernel/linux.git/log

]> git.ipfire.org Git - thirdparty/kernel/linux.git/log

projects / thirdparty / kernel / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Daniel Xu [Sun, 4 Feb 2024 21:06:34 +0000 (14:06 -0700)]

bpf: Have bpf_rdonly_cast() take a const pointer

Since 20d59ee55172 ("libbpf: add bpf_core_cast() macro"), libbpf is now
exporting a const arg version of bpf_rdonly_cast(). This causes the
following conflicting type error when generating kfunc prototypes from
BTF:

In file included from skeleton/pid_iter.bpf.c:5:
/home/dxu/dev/linux/tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_core_read.h:297:14: error: conflicting types for 'bpf_rdonly_cast'
extern void *bpf_rdonly_cast(const void *obj__ign, __u32 btf_id__k) __ksym __weak;
^
./vmlinux.h:135625:14: note: previous declaration is here
extern void *bpf_rdonly_cast(void *obj__ign, u32 btf_id__k) __weak __ksym;

This is b/c the kernel defines bpf_rdonly_cast() with non-const arg.
Since const arg is more permissive and thus backwards compatible, we
change the kernel definition as well to avoid conflicting type errors.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/dfd3823f11ffd2d4c838e961d61ec9ae8a646773.1707080349.git.dxu@dxuuu.xyz

commit | commitdiff | tree

Marco Elver [Wed, 7 Feb 2024 12:26:17 +0000 (13:26 +0100)]

bpf: Allow compiler to inline most of bpf_local_storage_lookup()

In various performance profiles of kernels with BPF programs attached,
bpf_local_storage_lookup() appears as a significant portion of CPU
cycles spent. To enable the compiler generate more optimal code, turn
bpf_local_storage_lookup() into a static inline function, where only the
cache insertion code path is outlined

Notably, outlining cache insertion helps avoid bloating callers by
duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on
architectures which do not inline spin_lock/unlock, such as x86), which
would cause the compiler produce worse code by deciding to outline
otherwise inlinable functions. The call overhead is neutral, because we
make 2 calls either way: either calling raw_spin_lock_irqsave() and
raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(),
which calls raw_spin_lock_irqsave(), followed by a tail-call to
raw_spin_unlock_irqsave() where the compiler can perform TCO and (in
optimized uninstrumented builds) turns it into a plain jump. The call to
__bpf_local_storage_insert_cache() can be elided entirely if
cacheit_lockit is a false constant expression.

Based on results from './benchs/run_bench_local_storage.sh' (21 trials,
reboot between each trial; x86 defconfig + BPF, clang 16) this produces
improvements in throughput and latency in the majority of cases, with an
average (geomean) improvement of 8%:

+---- Hashmap Control --------------------
|
| + num keys: 10
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
|   +- hits latency                       | 67.679 ns/op         | 67.879 ns/op   (  ~  )
|   +- important_hits throughput          | 14.789 M ops/s       | 14.745 M ops/s (  ~  )
|
| + num keys: 1000
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
|   +- hits latency                       | 81.754 ns/op         | 82.185 ns/op   (  ~  )
|   +- important_hits throughput          | 12.233 M ops/s       | 12.170 M ops/s (  ~  )
|
| + num keys: 10000
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
|   +- hits latency                       | 138.522 ns/op        | 138.842 ns/op  (  ~  )
|   +- important_hits throughput          | 7.220 M ops/s        | 7.204 M ops/s  (  ~  )
|
| + num keys: 100000
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
|   +- hits latency                       | 198.483 ns/op        | 194.270 ns/op  (-2.1%)
|   +- important_hits throughput          | 5.061 M ops/s        | 5.165 M ops/s  (+2.1%)
|
| + num keys: 4194304
| :                                         <before>             | <after>
| +-+ hashmap (control) sequential get    +----------------------+----------------------
|   +- hits throughput                    | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
|   +- hits latency                       | 365.220 ns/op        | 361.418 ns/op  (-1.0%)
|   +- important_hits throughput          | 2.864 M ops/s        | 2.882 M ops/s  (  ~  )
|
+---- Local Storage ----------------------
|
| + num_maps: 1
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
|   +- hits latency                       | 30.300 ns/op         | 25.598 ns/op   (-15.5%)
|   +- important_hits throughput          | 33.005 M ops/s       | 39.068 M ops/s (+18.4%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
|   +- hits latency                       | 26.919 ns/op         | 22.259 ns/op   (-17.3%)
|   +- important_hits throughput          | 37.151 M ops/s       | 44.926 M ops/s (+20.9%)
|
| + num_maps: 10
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 32.288 M ops/s       | 38.099 M ops/s (+18.0%)
|   +- hits latency                       | 30.972 ns/op         | 26.248 ns/op   (-15.3%)
|   +- important_hits throughput          | 3.229 M ops/s        | 3.810 M ops/s  (+18.0%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 34.473 M ops/s       | 41.145 M ops/s (+19.4%)
|   +- hits latency                       | 29.010 ns/op         | 24.307 ns/op   (-16.2%)
|   +- important_hits throughput          | 12.312 M ops/s       | 14.695 M ops/s (+19.4%)
|
| + num_maps: 16
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 32.524 M ops/s       | 38.341 M ops/s (+17.9%)
|   +- hits latency                       | 30.748 ns/op         | 26.083 ns/op   (-15.2%)
|   +- important_hits throughput          | 2.033 M ops/s        | 2.396 M ops/s  (+17.9%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 34.575 M ops/s       | 41.338 M ops/s (+19.6%)
|   +- hits latency                       | 28.925 ns/op         | 24.193 ns/op   (-16.4%)
|   +- important_hits throughput          | 11.001 M ops/s       | 13.153 M ops/s (+19.6%)
|
| + num_maps: 17
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 28.861 M ops/s       | 32.756 M ops/s (+13.5%)
|   +- hits latency                       | 34.649 ns/op         | 30.530 ns/op   (-11.9%)
|   +- important_hits throughput          | 1.700 M ops/s        | 1.929 M ops/s  (+13.5%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 31.529 M ops/s       | 36.110 M ops/s (+14.5%)
|   +- hits latency                       | 31.719 ns/op         | 27.697 ns/op   (-12.7%)
|   +- important_hits throughput          | 9.598 M ops/s        | 10.993 M ops/s (+14.5%)
|
| + num_maps: 24
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 18.602 M ops/s       | 19.937 M ops/s (+7.2%)
|   +- hits latency                       | 53.767 ns/op         | 50.166 ns/op   (-6.7%)
|   +- important_hits throughput          | 0.776 M ops/s        | 0.831 M ops/s  (+7.2%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 21.718 M ops/s       | 23.332 M ops/s (+7.4%)
|   +- hits latency                       | 46.047 ns/op         | 42.865 ns/op   (-6.9%)
|   +- important_hits throughput          | 6.110 M ops/s        | 6.564 M ops/s  (+7.4%)
|
| + num_maps: 32
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 14.118 M ops/s       | 14.626 M ops/s (+3.6%)
|   +- hits latency                       | 70.856 ns/op         | 68.381 ns/op   (-3.5%)
|   +- important_hits throughput          | 0.442 M ops/s        | 0.458 M ops/s  (+3.6%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 17.111 M ops/s       | 17.906 M ops/s (+4.6%)
|   +- hits latency                       | 58.451 ns/op         | 55.865 ns/op   (-4.4%)
|   +- important_hits throughput          | 4.776 M ops/s        | 4.998 M ops/s  (+4.6%)
|
| + num_maps: 100
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 5.281 M ops/s        | 5.528 M ops/s  (+4.7%)
|   +- hits latency                       | 192.398 ns/op        | 183.059 ns/op  (-4.9%)
|   +- important_hits throughput          | 0.053 M ops/s        | 0.055 M ops/s  (+4.9%)
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 6.265 M ops/s        | 6.498 M ops/s  (+3.7%)
|   +- hits latency                       | 161.436 ns/op        | 152.877 ns/op  (-5.3%)
|   +- important_hits throughput          | 1.636 M ops/s        | 1.697 M ops/s  (+3.7%)
|
| + num_maps: 1000
| :                                         <before>             | <after>
| +-+ local_storage cache sequential get  +----------------------+----------------------
|   +- hits throughput                    | 0.355 M ops/s        | 0.354 M ops/s  (  ~  )
|   +- hits latency                       | 2826.538 ns/op       | 2827.139 ns/op (  ~  )
|   +- important_hits throughput          | 0.000 M ops/s        | 0.000 M ops/s  (  ~  )
| :
| :                                         <before>             | <after>
| +-+ local_storage cache interleaved get +----------------------+----------------------
|   +- hits throughput                    | 0.404 M ops/s        | 0.403 M ops/s  (  ~  )
|   +- hits latency                       | 2481.190 ns/op       | 2487.555 ns/op (  ~  )
|   +- important_hits throughput          | 0.102 M ops/s        | 0.101 M ops/s  (  ~  )

The on_lookup test in {cgrp,task}_ls_recursion.c is removed
because the bpf_local_storage_lookup is no longer traceable
and adding tracepoint will make the compiler generate worse
code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/

Signed-off-by: Marco Elver <elver@google.com>
Cc: Martin KaFai Lau <martin.lau@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Thu, 8 Feb 2024 19:05:09 +0000 (11:05 -0800)]

Merge branch 'bpf, btf: Add DEBUG_INFO_BTF checks for __register_bpf_struct_ops'

Geliang Tang says:

====================
bpf: Add DEBUG_INFO_BTF checks for __register_bpf_struct_ops

This patch set avoids module loading failure when the module
trying to register a struct_ops and the module has its btf section
stripped. This will then work similarly as module kfunc registration in
commit 3de4d22cc9ac ("bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()")

v5:
- drop CONFIG_MODULE_ALLOW_BTF_MISMATCH check as Martin suggested.

v4:
- add a new patch to fix error checks for btf_get_module_btf.
- rename the helper to check_btf_kconfigs.

v3:
- fix this build error:
kernel/bpf/btf.c:7750:11: error: incomplete definition of type 'struct module'

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202402040934.Fph0XeEo-lkp@intel.com/
v2:
- add register_check_missing_btf helper as Jiri suggested.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Geliang Tang [Thu, 8 Feb 2024 06:24:23 +0000 (14:24 +0800)]

bpf, btf: Check btf for register_bpf_struct_ops

Similar to the handling in the functions __register_btf_kfunc_id_set()
and register_btf_id_dtor_kfuncs(), this patch uses the newly added
helper check_btf_kconfigs() to handle module with its btf section
stripped.

While at it, the patch also adds the missed IS_ERR() check to fix the
commit f6be98d19985 ("bpf, net: switch to dynamic registration")

Fixes: f6be98d19985 ("bpf, net: switch to dynamic registration")
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/69082b9835463fe36f9e354bddf2d0a97df39c2b.1707373307.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Geliang Tang [Thu, 8 Feb 2024 06:24:22 +0000 (14:24 +0800)]

bpf, btf: Add check_btf_kconfigs helper

This patch extracts duplicate code on error path when btf_get_module_btf()
returns NULL from the functions __register_btf_kfunc_id_set() and
register_btf_id_dtor_kfuncs() into a new helper named check_btf_kconfigs()
to check CONFIG_DEBUG_INFO_BTF and CONFIG_DEBUG_INFO_BTF_MODULES in it.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/fa5537fc55f1e4d0bfd686598c81b7ab9dbd82b7.1707373307.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Geliang Tang [Thu, 8 Feb 2024 06:24:21 +0000 (14:24 +0800)]

bpf, btf: Fix return value of register_btf_id_dtor_kfuncs

The same as __register_btf_kfunc_id_set(), to let the modules with
stripped btf section loaded, this patch changes the return value of
register_btf_id_dtor_kfuncs() too from -ENOENT to 0 when btf is NULL.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Link: https://lore.kernel.org/r/eab65586d7fb0e72f2707d3747c7d4a5d60c823f.1707373307.git.tanggeliang@kylinos.cn
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Masahiro Yamada [Sun, 4 Feb 2024 07:56:34 +0000 (16:56 +0900)]

bpf: Merge two CONFIG_BPF entries

'config BPF' exists in both init/Kconfig and kernel/bpf/Kconfig.

Commit b24abcff918a ("bpf, kconfig: Add consolidated menu entry for bpf
with core options") added the second one to kernel/bpf/Kconfig instead
of moving the existing one.

Merge them together.

Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240204075634.32969-1-masahiroy@kernel.org

commit | commitdiff | tree

Yafang Shao [Tue, 6 Feb 2024 08:14:15 +0000 (16:14 +0800)]

selftests/bpf: Mark cpumask kfunc declarations as __weak

After the series "Annotate kfuncs in .BTF_ids section"[0], kfuncs can be
generated from bpftool. Let's mark the existing cpumask kfunc declarations
__weak so they don't conflict with definitions that will eventually come
from vmlinux.h.

[0]. https://lore.kernel.org/all/cover.1706491398.git.dxu@dxuuu.xyz

Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/20240206081416.26242-5-laoar.shao@gmail.com

commit | commitdiff | tree

Yafang Shao [Tue, 6 Feb 2024 08:14:14 +0000 (16:14 +0800)]

selftests/bpf: Fix error checking for cpumask_success__load()

We should verify the return value of cpumask_success__load().

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240206081416.26242-4-laoar.shao@gmail.com

commit | commitdiff | tree

Andrii Nakryiko [Wed, 7 Feb 2024 23:56:18 +0000 (15:56 -0800)]

Merge branch 'tools-resolve_btfids-fix-cross-compilation-to-non-host-endianness'

Viktor Malik says:

====================
tools/resolve_btfids: fix cross-compilation to non-host endianness

The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
build and afterwards patched by resolve_btfids with correct values.
Since resolve_btfids always writes in host-native endianness, it relies
on libelf to do the translation when the target ELF is cross-compiled to
a different endianness (this was introduced in commit 61e8aeda9398
("bpf: Fix libelf endian handling in resolv_btfids")).

Unfortunately, the translation will corrupt the flags fields of SET8
entries because these were written during vmlinux compilation and are in
the correct endianness already. This will lead to numerous selftests
failures such as:

    $ sudo ./test_verifier 502 502
    #502/p sleepable fentry accept FAIL
    Failed to load prog 'Invalid argument'!
    bpf_fentry_test1 is not sleepable
    verification time 34 usec
    stack depth 0
    processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
    Summary: 0 PASSED, 0 SKIPPED, 1 FAILED

Since it's not possible to instruct libelf to translate just certain
values, let's manually bswap the flags (both global and entry flags) in
resolve_btfids when needed, so that libelf then translates everything
correctly.

The first patch of the series refactors resolve_btfids by using types
from btf_ids.h instead of accessing the BTF ID data using magic offsets.
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
Changes in v4:
- remove unnecessary vars and pointer casts (suggested by Daniel Xu)

Changes in v3:
- add byte swap of global 'flags' field in btf_id_set8 (suggested by
  Jiri Olsa)
- cleaner refactoring of sets_patch (suggested by Jiri Olsa)
- add compile-time assertion that IDs are at the beginning of pairs
  struct in btf_id_set8 (suggested by Daniel Borkmann)

Changes in v2:
- use type defs from btf_ids.h (suggested by Andrii Nakryiko)
====================

Link: https://lore.kernel.org/r/cover.1707223196.git.vmalik@redhat.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

commit | commitdiff | tree

Viktor Malik [Tue, 6 Feb 2024 12:46:10 +0000 (13:46 +0100)]

tools/resolve_btfids: Fix cross-compilation to non-host endianness

The .BTF_ids section is pre-filled with zeroed BTF ID entries during the
build and afterwards patched by resolve_btfids with correct values.
Since resolve_btfids always writes in host-native endianness, it relies
on libelf to do the translation when the target ELF is cross-compiled to
a different endianness (this was introduced in commit 61e8aeda9398
("bpf: Fix libelf endian handling in resolv_btfids")).

Unfortunately, the translation will corrupt the flags fields of SET8
entries because these were written during vmlinux compilation and are in
the correct endianness already. This will lead to numerous selftests
failures such as:

    $ sudo ./test_verifier 502 502
    #502/p sleepable fentry accept FAIL
    Failed to load prog 'Invalid argument'!
    bpf_fentry_test1 is not sleepable
    verification time 34 usec
    stack depth 0
    processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0
    Summary: 0 PASSED, 0 SKIPPED, 1 FAILED

Since it's not possible to instruct libelf to translate just certain
values, let's manually bswap the flags (both global and entry flags) in
resolve_btfids when needed, so that libelf then translates everything
correctly.

Fixes: ef2c6f370a63 ("tools/resolve_btfids: Add support for 8-byte BTF sets")
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/7b6bff690919555574ce0f13d2a5996cacf7bf69.1707223196.git.vmalik@redhat.com

commit | commitdiff | tree

Viktor Malik [Tue, 6 Feb 2024 12:46:09 +0000 (13:46 +0100)]

tools/resolve_btfids: Refactor set sorting with types from btf_ids.h

Instead of using magic offsets to access BTF ID set data, leverage types
from btf_ids.h (btf_id_set and btf_id_set8) which define the actual
layout of the data. Thanks to this change, set sorting should also
continue working if the layout changes.

This requires to sync the definition of 'struct btf_id_set8' from
include/linux/btf_ids.h to tools/include/linux/btf_ids.h. We don't sync
the rest of the file at the moment, b/c that would require to also sync
multiple dependent headers and we don't need any other defs from
btf_ids.h.

Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/ff7f062ddf6a00815fda3087957c4ce667f50532.1707223196.git.vmalik@redhat.com

commit | commitdiff | tree

Toke Høiland-Jørgensen [Tue, 6 Feb 2024 12:59:22 +0000 (13:59 +0100)]

libbpf: Use OPTS_SET() macro in bpf_xdp_query()

When the feature_flags and xdp_zc_max_segs fields were added to the libbpf
bpf_xdp_query_opts, the code writing them did not use the OPTS_SET() macro.
This causes libbpf to write to those fields unconditionally, which means
that programs compiled against an older version of libbpf (with a smaller
size of the bpf_xdp_query_opts struct) will have its stack corrupted by
libbpf writing out of bounds.

The patch adding the feature_flags field has an early bail out if the
feature_flags field is not part of the opts struct (via the OPTS_HAS)
macro, but the patch adding xdp_zc_max_segs does not. For consistency, this
fix just changes the assignments to both fields to use the OPTS_SET()
macro.

Fixes: 13ce2daa259a ("xsk: add new netlink attribute dedicated for ZC max frags")
Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240206125922.1992815-1-toke@redhat.com

commit | commitdiff | tree

Jose E. Marchesi [Tue, 6 Feb 2024 10:23:30 +0000 (11:23 +0100)]

bpf: Use -Wno-address-of-packed-member in some selftests

[Differences from V2:
- Remove conditionals in the source files pragmas, as the
  pragma is supported by both GCC and clang.]

Both GCC and clang implement the -Wno-address-of-packed-member
warning, which is enabled by -Wall, that warns about taking the
address of a packed struct field when it can lead to an "unaligned"
address.

This triggers the following errors (-Werror) when building three
particular BPF selftests with GCC:

  progs/test_cls_redirect.c
  986 |         if (ipv4_is_fragment((void *)&encap->ip)) {
  progs/test_cls_redirect_dynptr.c
  410 |         pkt_ipv4_checksum((void *)&encap_gre->ip);
  progs/test_cls_redirect.c
  521 |         pkt_ipv4_checksum((void *)&encap_gre->ip);
  progs/test_tc_tunnel.c
   232 |         set_ipv4_csum((void *)&h_outer.ip);

These warnings do not signal any real problem in the tests as far as I
can see.

This patch adds pragmas to these test files that inhibit the
-Waddress-of-packed-member warning.

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240206102330.7113-1-jose.marchesi@oracle.com

commit | commitdiff | tree

Dave Thaler [Tue, 6 Feb 2024 04:51:46 +0000 (20:51 -0800)]

bpf, docs: Fix typos in instructions-set.rst

* "imm32" should just be "imm"
* Add blank line to fix formatting error reported by Stephen Rothwell [0]

[0]: https://lore.kernel.org/bpf/20240206153301.4ead0bad@canb.auug.org.au/T/#u

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240206045146.4965-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Tue, 6 Feb 2024 00:40:08 +0000 (16:40 -0800)]

selftests/bpf: mark dynptr kfuncs __weak to make them optional on old kernels

Mark dynptr kfuncs as __weak to allow
verifier_global_subprogs/arg_ctx_{perf,kprobe,raw_tp} subtests to be
loadable on old kernels. Because bpf_dynptr_from_xdp() kfunc is used
from arg_tag_dynptr BPF program in progs/verifier_global_subprogs.c
*and* is not marked as __weak, loading any subtest from
verifier_global_subprogs fails on old kernels that don't have
bpf_dynptr_from_xdp() kfunc defined. Even if arg_tag_dynptr program
itself is not loaded, libbpf bails out on non-weak reference to
bpf_dynptr_from_xdp (that can't be resolved), which shared across all
programs in progs/verifier_global_subprogs.c.

So mark all dynptr-related kfuncs as __weak to unblock libbpf CI ([0]).
In the upcoming "kfunc in vmlinux.h" work we should make sure that
kfuncs are always declared __weak as well.

[0] https://github.com/libbpf/libbpf/actions/runs/7792673215/job/21251250831?pr=776#step:4:7961

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240206004008.1541513-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Tue, 6 Feb 2024 00:22:43 +0000 (16:22 -0800)]

libbpf: fix return value for PERF_EVENT __arg_ctx type fix up check

If PERF_EVENT program has __arg_ctx argument with matching
architecture-specific pt_regs/user_pt_regs/user_regs_struct pointer
type, libbpf should still perform type rewrite for old kernels, but not
emit the warning. Fix copy/paste from kernel code where 0 is meant to
signify "no error" condition. For libbpf we need to return "true" to
proceed with type rewrite (which for PERF_EVENT program will be
a canonical `struct bpf_perf_event_data *` type).

Fixes: 9eea8fafe33e ("libbpf: fix __arg_ctx type enforcement for perf_event programs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240206002243.1439450-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Tue, 6 Feb 2024 04:01:16 +0000 (20:01 -0800)]

Merge branch 'xsk-support-redirect-to-any-socket-bound-to-the-same-umem'

Magnus Karlsson says:

====================
xsk: support redirect to any socket bound to the same umem

This patch set adds support for directing a packet to any socket bound
to the same umem. This makes it possible to use the XDP program to
select what socket the packet should be received on. The user can
populate the XSKMAP with various sockets and as long as they share the
same umem, the XDP program can pick any one of them.

The implementation is straight-forward. Instead of testing that the
incoming packet is targeting the same device and queue id as the
socket is bound to, just check that the umem the packet was received
on is the same as the socket we want it to be received on. This
guarantees that the redirect is legal as it is already in the correct
umem.

Patch #1 implements the feature and patch #2 adds documentation.

Thanks: Magnus
====================

Link: https://lore.kernel.org/r/20240205123553.22180-1-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Magnus Karlsson [Mon, 5 Feb 2024 12:35:51 +0000 (13:35 +0100)]

xsk: document ability to redirect to any socket bound to the same umem

Document the ability to redirect to any socket bound to the same umem.

Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20240205123553.22180-3-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Magnus Karlsson [Mon, 5 Feb 2024 12:35:50 +0000 (13:35 +0100)]

xsk: support redirect to any socket bound to the same umem

Add support for directing a packet to any socket bound to the same
umem. This makes it possible to use the XDP program to select what
socket the packet should be received on. The user can populate the
XSKMAP with various sockets and as long as they share the same umem,
the XDP program can pick any one of them.

Suggested-by: Yuval El-Hanany <yuvale@radware.com>
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240205123553.22180-2-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Tue, 6 Feb 2024 04:00:14 +0000 (20:00 -0800)]

Merge branch 'transfer-rcu-lock-state-across-subprog-calls'

Kumar Kartikeya Dwivedi says:

====================
Transfer RCU lock state across subprog calls

David suggested during the discussion in [0] that we should handle RCU
locks in a similar fashion to spin locks where the verifier understands
when a lock held in a caller is released in callee, or lock taken in
callee is released in a caller, or the callee is called within a lock
critical section. This set extends the same semantics to RCU read locks
and adds a few selftests to verify correct behavior. This issue has also
come up for sched-ext programs.

This would now allow static subprog calls to be made without errors
within RCU read sections, for subprogs to release RCU locks of callers
and return to them, or for subprogs to take RCU lock which is later
released in the caller.

[0]: https://lore.kernel.org/bpf/20240204120206.796412-1-memxor@gmail.com

Changelog:
----------
v1 -> v2:
v1: https://lore.kernel.org/bpf/20240204230231.1013964-1-memxor@gmail.com

* Add tests for global subprog behaviour (Yafang)
* Add Acks, Tested-by (Yonghong, Yafang)
====================

Link: https://lore.kernel.org/r/20240205055646.1112186-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Kumar Kartikeya Dwivedi [Mon, 5 Feb 2024 05:56:46 +0000 (05:56 +0000)]

selftests/bpf: Add tests for RCU lock transfer between subprogs

Add selftests covering the following cases:
- A static or global subprog called from within a RCU read section works
- A static subprog taking an RCU read lock which is released in caller works
- A static subprog releasing the caller's RCU read lock works

Global subprogs that leave the lock in an imbalanced state will not
work, as they are verified separately, so ensure those cases fail as
well.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240205055646.1112186-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Kumar Kartikeya Dwivedi [Mon, 5 Feb 2024 05:56:45 +0000 (05:56 +0000)]

bpf: Transfer RCU lock state between subprog calls

Allow transferring an imbalanced RCU lock state between subprog calls
during verification. This allows patterns where a subprog call returns
with an RCU lock held, or a subprog call releases an RCU lock held by
the caller. Currently, the verifier would end up complaining if the RCU
lock is not released when processing an exit from a subprog, which is
non-ideal if its execution is supposed to be enclosed in an RCU read
section of the caller.

Instead, simply only check whether we are processing exit for frame#0
and do not complain on an active RCU lock otherwise. We only need to
update the check when processing BPF_EXIT insn, as copy_verifier_state
is already set up to do the right thing.

Suggested-by: David Vernet <void@manifault.com>
Tested-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240205055646.1112186-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Tue, 6 Feb 2024 03:58:47 +0000 (19:58 -0800)]

Merge branch 'enable-static-subprog-calls-in-spin-lock-critical-sections'

Kumar Kartikeya Dwivedi says:

====================
Enable static subprog calls in spin lock critical sections

This set allows a BPF program to make a call to a static subprog within
a bpf_spin_lock critical section. This problem has been hit in sched-ext
and ghOSt [0] as well, and is mostly an annoyance which is worked around
by inling the static subprog into the critical section.

In case of sched-ext, there are a lot of other helper/kfunc calls that
need to be allow listed for the support to be complete, but a separate
follow up will deal with that.

Unlike static subprogs, global subprogs cannot be allowed yet as the
verifier will not explore their body when encountering a call
instruction for them. Therefore, we would need an alternative approach
(some sort of function summarization to ensure a lock is never taken
from a global subprog and all its callees).

[0]: https://lore.kernel.org/bpf/bd173bf2-dea6-3e0e-4176-4a9256a9a056@google.com

Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/20240204120206.796412-1-memxor@gmail.com

* Indicate global function call in verifier error string (Yonghong, David)
* Add Acks from Yonghong, David
====================

Link: https://lore.kernel.org/r/20240204222349.938118-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Kumar Kartikeya Dwivedi [Sun, 4 Feb 2024 22:23:49 +0000 (22:23 +0000)]

selftests/bpf: Add test for static subprog call in lock cs

Add selftests for static subprog calls within bpf_spin_lock critical
section, and ensure we still reject global subprog calls. Also test the
case where a subprog call will unlock the caller's held lock, or the
caller will unlock a lock taken by a subprog call, ensuring correct
transfer of lock state across frames on exit.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240204222349.938118-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Kumar Kartikeya Dwivedi [Sun, 4 Feb 2024 22:23:48 +0000 (22:23 +0000)]

bpf: Allow calling static subprogs while holding a bpf_spin_lock

Currently, calling any helpers, kfuncs, or subprogs except the graph
data structure (lists, rbtrees) API kfuncs while holding a bpf_spin_lock
is not allowed. One of the original motivations of this decision was to
force the BPF programmer's hand into keeping the bpf_spin_lock critical
section small, and to ensure the execution time of the program does not
increase due to lock waiting times. In addition to this, some of the
helpers and kfuncs may be unsafe to call while holding a bpf_spin_lock.

However, when it comes to subprog calls, atleast for static subprogs,
the verifier is able to explore their instructions during verification.
Therefore, it is similar in effect to having the same code inlined into
the critical section. Hence, not allowing static subprog calls in the
bpf_spin_lock critical section is mostly an annoyance that needs to be
worked around, without providing any tangible benefit.

Unlike static subprog calls, global subprog calls are not safe to permit
within the critical section, as the verifier does not explore them
during verification, therefore whether the same lock will be taken
again, or unlocked, cannot be ascertained.

Therefore, allow calling static subprogs within a bpf_spin_lock critical
section, and only reject it in case the subprog linkage is global.

Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20240204222349.938118-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Dave Thaler [Fri, 2 Feb 2024 22:11:10 +0000 (14:11 -0800)]

bpf, docs: Expand set of initial conformance groups

This patch attempts to update the ISA specification according
to the latest mailing list discussion about conformance groups,
in a way that is intended to be consistent with IANA registry
processes and IETF 118 WG meeting discussion.

It does the following:
* Split basic into base32 and base64 for 32-bit vs 64-bit base
instructions
* Split division/multiplication/modulo instructions out of base groups
* Split atomic instructions out of base groups

There may be additional changes as discussion continues,
but there seems to be consensus on the principles above.

v1->v2: fixed typo pointed out by David Vernet

v2->v3: Moved multiplication to same groups as division/modulo

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20240202221110.3872-1-dthaler1968@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Yonghong Song [Mon, 5 Feb 2024 05:29:14 +0000 (21:29 -0800)]

selftests/bpf: Fix flaky selftest lwt_redirect/lwt_reroute

Recently, when running './test_progs -j', I occasionally hit the
following errors:

  test_lwt_redirect:PASS:pthread_create 0 nsec
  test_lwt_redirect_run:FAIL:netns_create unexpected error: 256 (errno 0)
  #142/2   lwt_redirect/lwt_redirect_normal_nomac:FAIL
  #142     lwt_redirect:FAIL
  test_lwt_reroute:PASS:pthread_create 0 nsec
  test_lwt_reroute_run:FAIL:netns_create unexpected error: 256 (errno 0)
  test_lwt_reroute:PASS:pthread_join 0 nsec
  #143/2   lwt_reroute/lwt_reroute_qdisc_dropped:FAIL
  #143     lwt_reroute:FAIL

The netns_create() definition looks like below:

  #define NETNS "ns_lwt"
  static inline int netns_create(void)
  {
        return system("ip netns add " NETNS);
  }

One possibility is that both lwt_redirect and lwt_reroute create
netns with the same name "ns_lwt" which may cause conflict. I tried
the following example:
  $ sudo ip netns add abc
  $ echo $?
  0
  $ sudo ip netns add abc
  Cannot create namespace file "/var/run/netns/abc": File exists
  $ echo $?
  1
  $

The return code for above netns_create() is 256. The internet search
suggests that the return value for 'ip netns add ns_lwt' is 1, which
matches the above 'sudo ip netns add abc' example.

This patch tried to use different netns names for two tests to avoid
'ip netns add <name>' failure.

I ran './test_progs -j' 10 times and all succeeded with
lwt_redirect/lwt_reroute tests.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240205052914.1742687-1-yonghong.song@linux.dev

commit | commitdiff | tree

Kui-Feng Lee [Sun, 4 Feb 2024 06:12:04 +0000 (22:12 -0800)]

selftests/bpf: Suppress warning message of an unused variable.

"r" is used to receive the return value of test_2 in bpf_testmod.c, but it
is not actually used. So, we remove "r" and change the return type to
"void".

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401300557.z5vzn8FM-lkp@intel.com/
Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240204061204.1864529-1-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Yonghong Song [Sun, 4 Feb 2024 19:44:52 +0000 (11:44 -0800)]

selftests/bpf: Fix flaky test ptr_untrusted

Somehow recently I frequently hit the following test failure
with either ./test_progs or ./test_progs-cpuv4:
  serial_test_ptr_untrusted:PASS:skel_open 0 nsec
  serial_test_ptr_untrusted:PASS:lsm_attach 0 nsec
  serial_test_ptr_untrusted:PASS:raw_tp_attach 0 nsec
  serial_test_ptr_untrusted:FAIL:cmp_tp_name unexpected cmp_tp_name: actual -115 != expected 0
  #182     ptr_untrusted:FAIL

Further investigation found the failure is due to
  bpf_probe_read_user_str()
where reading user-level string attr->raw_tracepoint.name
is not successfully, most likely due to the
string itself still in disk and not populated into memory yet.

One solution is do a printf() call of the string before doing bpf
syscall which will force the raw_tracepoint.name into memory.
But I think a more robust solution is to use bpf_copy_from_user()
which is used in sleepable program and can tolerate page fault,
and the fix here used the latter approach.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240204194452.2785936-1-yonghong.song@linux.dev

commit | commitdiff | tree

Kui-Feng Lee [Sat, 3 Feb 2024 05:51:19 +0000 (21:51 -0800)]

bpf: Remove an unnecessary check.

The "i" here is always equal to "btf_type_vlen(t)" since
the "for_each_member()" loop never breaks.

Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240203055119.2235598-1-thinker.li@gmail.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Sat, 3 Feb 2024 02:08:59 +0000 (18:08 -0800)]

Merge branch 'two-small-fixes-for-global-subprog-tagging'

Andrii Nakryiko says:

====================
Two small fixes for global subprog tagging

Fix a bug with passing trusted PTR_TO_BTF_ID_OR_NULL register into global
subprog that expects `__arg_trusted __arg_nullable` arguments, which was
discovered when adopting production BPF application.

Also fix annoying warnings that are irrelevant for static subprogs, which are
just an artifact of using btf_prepare_func_args() for both static and global
subprogs.
====================

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20240202190529.2374377-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Fri, 2 Feb 2024 19:05:29 +0000 (11:05 -0800)]

bpf: don't emit warnings intended for global subprogs for static subprogs

When btf_prepare_func_args() was generalized to handle both static and
global subprogs, a few warnings/errors that are meant only for global
subprog cases started to be emitted for static subprogs, where they are
sort of expected and irrelavant.

Stop polutting verifier logs with irrelevant scary-looking messages.

Fixes: e26080d0da87 ("bpf: prepare btf_prepare_func_args() for handling static subprogs")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Fri, 2 Feb 2024 19:05:28 +0000 (11:05 -0800)]

selftests/bpf: add more cases for __arg_trusted __arg_nullable args

Add extra layer of global functions to ensure that passing around
(trusted) PTR_TO_BTF_ID_OR_NULL registers works as expected. We also
extend trusted_task_arg_nullable subtest to check three possible valid
argumements: known NULL, known non-NULL, and maybe NULL cases.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Fri, 2 Feb 2024 19:05:27 +0000 (11:05 -0800)]

bpf: handle trusted PTR_TO_BTF_ID_OR_NULL in argument check logic

Add PTR_TRUSTED | PTR_MAYBE_NULL modifiers for PTR_TO_BTF_ID to
check_reg_type() to support passing trusted nullable PTR_TO_BTF_ID
registers into global functions accepting `__arg_trusted __arg_nullable`
arguments. This hasn't been caught earlier because tests were either
passing known non-NULL PTR_TO_BTF_ID registers or known NULL (SCALAR)
registers.

When utilizing this functionality in complicated real-world BPF
application that passes around PTR_TO_BTF_ID_OR_NULL, it became apparent
that verifier rejects valid case because check_reg_type() doesn't handle
this case explicitly. Existing check_reg_type() logic is already
anticipating this combination, so we just need to explicitly list this
combo in the switch statement.

Fixes: e2b3c4ff5d18 ("bpf: add __arg_trusted global func arg tag")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240202190529.2374377-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Shung-Hsi Yu [Fri, 2 Feb 2024 09:55:58 +0000 (17:55 +0800)]

selftests/bpf: trace_helpers.c: do not use poisoned type

After commit c698eaebdf47 ("selftests/bpf: trace_helpers.c: Optimize
kallsyms cache") trace_helpers.c now includes libbpf_internal.h, and
thus can no longer use the u32 type (among others) since they are poison
in libbpf_internal.h. Replace u32 with __u32 to fix the following error
when building trace_helpers.c on powerpc:

error: attempt to use poisoned "u32"

Fixes: c698eaebdf47 ("selftests/bpf: trace_helpers.c: Optimize kallsyms cache")
Signed-off-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20240202095559.12900-1-shung-hsi.yu@suse.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Fri, 2 Feb 2024 21:22:15 +0000 (13:22 -0800)]

Merge branch 'improvements-for-tracking-scalars-in-the-bpf-verifier'

Maxim Mikityanskiy says:

====================
Improvements for tracking scalars in the BPF verifier

From: Maxim Mikityanskiy <maxim@isovalent.com>

The goal of this series is to extend the verifier's capabilities of
tracking scalars when they are spilled to stack, especially when the
spill or fill is narrowing. It also contains a fix by Eduard for
infinite loop detection and a state pruning optimization by Eduard that
compensates for a verification complexity regression introduced by
tracking unbounded scalars. These improvements reduce the surface of
false rejections that I saw while working on Cilium codebase.

Patches 1-9 of the original series were previously applied in v2.

Patches 1-2 (Maxim): Support the case when boundary checks are first
performed after the register was spilled to the stack.

Patches 3-4 (Maxim): Support narrowing fills.

Patches 5-6 (Eduard): Optimization for state pruning in stacksafe() to
mitigate the verification complexity regression.

veristat -e file,prog,states -f '!states_diff<50' -f '!states_pct<10' -f '!states_a<10' -f '!states_b<10' -C ...

* Without patch 5:

File                  Program   States (A)  States (B)  States    (DIFF)
--------------------  --------  ----------  ----------  ----------------
pyperf100.bpf.o       on_event        4878        6528   +1650 (+33.83%)
pyperf180.bpf.o       on_event        6936       11032   +4096 (+59.05%)
pyperf600.bpf.o       on_event       22271       39455  +17184 (+77.16%)
pyperf600_iter.bpf.o  on_event         400         490     +90 (+22.50%)
strobemeta.bpf.o      on_event        4895       14028  +9133 (+186.58%)

* With patch 5:

File                     Program        States (A)  States (B)  States   (DIFF)
-----------------------  -------------  ----------  ----------  ---------------
bpf_xdp.o                tail_lb_ipv4         2770        2224   -546 (-19.71%)
pyperf100.bpf.o          on_event             4878        5848   +970 (+19.89%)
pyperf180.bpf.o          on_event             6936        8868  +1932 (+27.85%)
pyperf600.bpf.o          on_event            22271       29656  +7385 (+33.16%)
pyperf600_iter.bpf.o     on_event              400         450    +50 (+12.50%)
xdp_synproxy_kern.bpf.o  syncookie_tc          280         226    -54 (-19.29%)
xdp_synproxy_kern.bpf.o  syncookie_xdp         302         228    -74 (-24.50%)

v2 changes:

Fixed comments in patch 1, moved endianness checks to header files in
patch 12 where possible, added Eduard's ACKs.

v3 changes:

Maxim: Removed __is_scalar_unbounded altogether, addressed Andrii's
comments.

Eduard: Patch #5 (#14 in v2) changed significantly:
- Logical changes:
  - Handling of STACK_{MISC,ZERO} mix turned out to be incorrect:
    a mix of MISC and ZERO in old state is not equivalent to e.g.
    just MISC is current state, because verifier could have deduced
    zero scalars from ZERO slots in old state for some loads.
  - There is no reason to limit the change only to cases when
    old or current stack is a spill of unbounded scalar,
    it is valid to compare any 64-bit scalar spill with fake
    register impersonating MISC.
  - STACK_ZERO vs spilled zero case was dropped,
    after recent changes for zero handling by Andrii and Yonghong
    it is hard (impossible?) to conjure all ZERO slots for an spi.
    => the case does not make any difference in veristat results.
- Use global static variable for unbound_reg (Andrii)
- Code shuffling to remove duplication in stacksafe() (Andrii)
====================

Link: https://lore.kernel.org/r/20240127175237.526726-1-maxtram95@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

commit | commitdiff | tree

Eduard Zingerman [Sat, 27 Jan 2024 17:52:37 +0000 (19:52 +0200)]

selftests/bpf: States pruning checks for scalar vs STACK_MISC

Check that stacksafe() compares spilled scalars with STACK_MISC.
The following combinations are explored:
- old spill of imprecise scalar is equivalent to cur STACK_{MISC,INVALID}
(plus error in unpriv mode);
- old spill of precise scalar is not equivalent to cur STACK_MISC;
- old STACK_MISC is equivalent to cur scalar;
- old STACK_MISC is not equivalent to cur non-scalar.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-7-maxtram95@gmail.com

commit | commitdiff | tree

Eduard Zingerman [Sat, 27 Jan 2024 17:52:36 +0000 (19:52 +0200)]

bpf: Handle scalar spill vs all MISC in stacksafe()

When check_stack_read_fixed_off() reads value from an spi
all stack slots of which are set to STACK_{MISC,INVALID},
the destination register is set to unbound SCALAR_VALUE.

Exploit this fact by allowing stacksafe() to use a fake
unbound scalar register to compare 'mmmm mmmm' stack value
in old state vs spilled 64-bit scalar in current state
and vice versa.

Veristat results after this patch show some gains:

./veristat -C -e file,prog,states -f 'states_pct>10'  not-opt after
File                     Program                States   (DIFF)
-----------------------  ---------------------  ---------------
bpf_overlay.o            tail_rev_nodeport_lb4    -45 (-15.85%)
bpf_xdp.o                tail_lb_ipv4            -541 (-19.57%)
pyperf100.bpf.o          on_event                -680 (-10.42%)
pyperf180.bpf.o          on_event               -2164 (-19.62%)
pyperf600.bpf.o          on_event               -9799 (-24.84%)
strobemeta.bpf.o         on_event               -9157 (-65.28%)
xdp_synproxy_kern.bpf.o  syncookie_tc             -54 (-19.29%)
xdp_synproxy_kern.bpf.o  syncookie_xdp            -74 (-24.50%)

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-6-maxtram95@gmail.com

commit | commitdiff | tree

Maxim Mikityanskiy [Sat, 27 Jan 2024 17:52:35 +0000 (19:52 +0200)]

selftests/bpf: Add test cases for narrowing fill

The previous commit allowed to preserve boundaries and track IDs of
scalars on narrowing fills. Add test cases for that pattern.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240127175237.526726-5-maxtram95@gmail.com

commit | commitdiff | tree

Maxim Mikityanskiy [Sat, 27 Jan 2024 17:52:34 +0000 (19:52 +0200)]

bpf: Preserve boundaries and track scalars on narrowing fill

When the width of a fill is smaller than the width of the preceding
spill, the information about scalar boundaries can still be preserved,
as long as it's coerced to the right width (done by coerce_reg_to_size).
Even further, if the actual value fits into the fill width, the ID can
be preserved as well for further tracking of equal scalars.

Implement the above improvements, which makes narrowing fills behave the
same as narrowing spills and MOVs between registers.

Two tests are adjusted to accommodate for endianness differences and to
take into account that it's now allowed to do a narrowing fill from the
least significant bits.

reg_bounds_sync is added to coerce_reg_to_size to correctly adjust
umin/umax boundaries after the var_off truncation, for example, a 64-bit
value 0xXXXXXXXX00000000, when read as a 32-bit, gets umin = 0, umax =
0xFFFFFFFF, var_off = (0x0; 0xffffffff00000000), which needs to be
synced down to umax = 0, otherwise reg_bounds_sanity_check doesn't pass.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127175237.526726-4-maxtram95@gmail.com

commit | commitdiff | tree

Maxim Mikityanskiy [Sat, 27 Jan 2024 17:52:33 +0000 (19:52 +0200)]

selftests/bpf: Test tracking spilled unbounded scalars

The previous commit added tracking for unbounded scalars on spill. Add
the test case to check the new functionality.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240127175237.526726-3-maxtram95@gmail.com

commit | commitdiff | tree

Maxim Mikityanskiy [Sat, 27 Jan 2024 17:52:32 +0000 (19:52 +0200)]

bpf: Track spilled unbounded scalars

Support the pattern where an unbounded scalar is spilled to the stack,
then boundary checks are performed on the src register, after which the
stack frame slot is refilled into a register.

Before this commit, the verifier didn't treat the src register and the
stack slot as related if the src register was an unbounded scalar. The
register state wasn't copied, the id wasn't preserved, and the stack
slot was marked as STACK_MISC. Subsequent boundary checks on the src
register wouldn't result in updating the boundaries of the spilled
variable on the stack.

After this commit, the verifier will preserve the bond between src and
dst even if src is unbounded, which permits to do boundary checks on src
and refill dst later, still remembering its boundaries. Such a pattern
is sometimes generated by clang when compiling complex long functions.

One test is adjusted to reflect that now unbounded scalars are tracked.

Signed-off-by: Maxim Mikityanskiy <maxim@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240127175237.526726-2-maxtram95@gmail.com

commit | commitdiff | tree

Andrii Nakryiko [Thu, 1 Feb 2024 17:20:27 +0000 (09:20 -0800)]

selftests/bpf: Fix bench runner SIGSEGV

Some benchmarks don't have either "consumer" or "producer" sides. For
example, trig-tp and other BPF triggering benchmarks don't have
consumers, as they only do "producing" by calling into syscall or
predefined uproes. As such it's valid for some benchmarks to have zero
consumers or producers. So allows to specify `-c0` explicitly.

This triggers another problem. If benchmark doesn't support either
consumer or producer side, consumer_thread/producer_thread callback will
be NULL, but benchmark runner will attempt to use those NULL callback to
create threads anyways. So instead of crashing with SIGSEGV in case of
misconfigured benchmark, detect the condition and report error.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-6-andrii@kernel.org

commit | commitdiff | tree

Andrii Nakryiko [Thu, 1 Feb 2024 17:20:26 +0000 (09:20 -0800)]

libbpf: Add missed btf_ext__raw_data() API

Another API that was declared in libbpf.map but actual implementation
was missing. btf_ext__get_raw_data() was intended as a discouraged alias
to consistently-named btf_ext__raw_data(), so make this an actuality.

Fixes: 20eccf29e297 ("libbpf: hide and discourage inconsistently named getters")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-5-andrii@kernel.org

commit | commitdiff | tree

Andrii Nakryiko [Thu, 1 Feb 2024 17:20:25 +0000 (09:20 -0800)]

libbpf: Add btf__new_split() API that was declared but not implemented

Seems like original commit adding split BTF support intended to add
btf__new_split() API, and even declared it in libbpf.map, but never
added (trivial) implementation. Fix this.

Fixes: ba451366bf44 ("libbpf: Implement basic split BTF support")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-4-andrii@kernel.org

commit | commitdiff | tree

Andrii Nakryiko [Thu, 1 Feb 2024 17:20:24 +0000 (09:20 -0800)]

libbpf: Add missing LIBBPF_API annotation to libbpf_set_memlock_rlim API

LIBBPF_API annotation seems missing on libbpf_set_memlock_rlim API, so
add it to make this API callable from libbpf's shared library version.

Fixes: e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF")
Fixes: ab9a5a05dc48 ("libbpf: fix up few libbpf.map problems")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-3-andrii@kernel.org

commit | commitdiff | tree

Andrii Nakryiko [Thu, 1 Feb 2024 17:20:23 +0000 (09:20 -0800)]

libbpf: Call memfd_create() syscall directly

Some versions of Android do not implement memfd_create() wrapper in
their libc implementation, leading to build failures ([0]). On the other
hand, memfd_create() is available as a syscall on quite old kernels
(3.17+, while bpf() syscall itself is available since 3.18+), so it is
ok to assume that syscall availability and call into it with syscall()
helper to avoid Android-specific workarounds.

Validated in libbpf-bootstrap's CI ([1]).

[0] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7701003207/job/20986080319#step:5:83
[1] https://github.com/libbpf/libbpf-bootstrap/actions/runs/7715988887/job/21031767212?pr=253

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20240201172027.604869-2-andrii@kernel.org

commit | commitdiff | tree

Matt Bobrowski [Thu, 1 Feb 2024 10:43:40 +0000 (10:43 +0000)]

bpf: Minor clean-up to sleepable_lsm_hooks BTF set

There's already one main CONFIG_SECURITY_NETWORK ifdef block within
the sleepable_lsm_hooks BTF set. Consolidate this duplicated ifdef
block as there's no need for it and all things guarded by it should
remain in one place in this specific context.

Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/Zbt1smz43GDMbVU3@google.com

commit | commitdiff | tree

Pu Lehui [Tue, 30 Jan 2024 12:46:59 +0000 (12:46 +0000)]

selftests/bpf: Enable inline bpf_kptr_xchg() test for RV64

Enable inline bpf_kptr_xchg() test for RV64, and the test have passed as
show below:

Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240130124659.670321-3-pulehui@huaweicloud.com

commit | commitdiff | tree

Pu Lehui [Tue, 30 Jan 2024 12:46:58 +0000 (12:46 +0000)]

riscv, bpf: Enable inline bpf_kptr_xchg() for RV64

RV64 JIT supports 64-bit BPF_XCHG atomic instructions. At the same time,
the underlying implementation of xchg() and atomic64_xchg() in RV64 both
are raw_xchg() that supported 64-bit. Therefore inline bpf_kptr_xchg()
will have equivalent semantics. Let's inline it for better performance.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240130124659.670321-2-pulehui@huaweicloud.com

commit | commitdiff | tree

Dave Thaler [Wed, 31 Jan 2024 03:37:59 +0000 (19:37 -0800)]

bpf, docs: Clarify which legacy packet instructions existed

As discussed on the BPF IETF mailing list (see link), this patch updates
the "Legacy BPF Packet access instructions" section to clarify which
instructions are deprecated (vs which were never defined and so are not
deprecated).

Signed-off-by: Dave Thaler <dthaler1968@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: David Vernet <void@manifault.com>
Link: https://mailarchive.ietf.org/arch/msg/bpf/5LnnKm093cGpOmDI9TnLQLBXyys
Link: https://lore.kernel.org/bpf/20240131033759.3634-1-dthaler1968@gmail.com

commit | commitdiff | tree

Eduard Zingerman [Wed, 31 Jan 2024 21:26:15 +0000 (23:26 +0200)]

libbpf: Remove unnecessary null check in kernel_supports()

After recent changes, Coverity complained about inconsistent null checks
in kernel_supports() function:

    kernel_supports(const struct bpf_object *obj, ...)
    [...]
    // var_compare_op: Comparing obj to null implies that obj might be null
    if (obj && obj->gen_loader)
        return true;

    // var_deref_op: Dereferencing null pointer obj
    if (obj->token_fd)
        return feat_supported(obj->feat_cache, feat_id);
    [...]

- The original null check was introduced by commit [0], which introduced
  a call `kernel_supports(NULL, ...)` in function bump_rlimit_memlock();
- This call was refactored to use `feat_supported(NULL, ...)` in commit [1].

Looking at all places where kernel_supports() is called:

- There is either `obj->...` access before the call;
- Or `obj` comes from `prog->obj` expression, where `prog` comes from
  enumeration of programs in `obj`;
- Or `obj` comes from `prog->obj`, where `prog` is a parameter to one
  of the API functions:
  - bpf_program__attach_kprobe_opts;
  - bpf_program__attach_kprobe;
  - bpf_program__attach_ksyscall.

Assuming correct API usage, it appears that `obj` can never be null when
passed to kernel_supports(). Silence the Coverity warning by removing
redundant null check.

  [0] e542f2c4cd16 ("libbpf: Auto-bump RLIMIT_MEMLOCK if kernel needs it for BPF")
  [1] d6dd1d49367a ("libbpf: Further decouple feature checking logic from bpf_object")

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240131212615.20112-1-eddyz87@gmail.com

commit | commitdiff | tree

Alexei Starovoitov [Wed, 31 Jan 2024 20:05:24 +0000 (12:05 -0800)]

Merge branch 'annotate-kfuncs-in-btf_ids-section'

Daniel Xu says:

====================
Annotate kfuncs in .BTF_ids section

=== Description ===

This is a bpf-treewide change that annotates all kfuncs as such inside
.BTF_ids. This annotation eventually allows us to automatically generate
kfunc prototypes from bpftool.

We store this metadata inside a yet-unused flags field inside struct
btf_id_set8 (thanks Kumar!). pahole will be taught where to look.

More details about the full chain of events are available in commit 3's
description.

The accompanying pahole and bpftool changes can be viewed
here on these "frozen" branches [0][1].

[0]: https://github.com/danobi/pahole/tree/kfunc_btf-v3-mailed
[1]: https://github.com/danobi/linux/tree/kfunc_bpftool-mailed

=== Changelog ===

Changes from v3:
* Rebase to bpf-next and add missing annotation on new kfunc

Changes from v2:
* Only WARN() for vmlinux kfuncs

Changes from v1:
* Move WARN_ON() up a call level
* Also return error when kfunc set is not properly tagged
* Use BTF_KFUNCS_START/END instead of flags
* Rename BTF_SET8_KFUNC to BTF_SET8_KFUNCS
====================

Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/cover.1706491398.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Xu [Mon, 29 Jan 2024 01:24:08 +0000 (18:24 -0700)]

bpf: treewide: Annotate BPF kfuncs in BTF

This commit marks kfuncs as such inside the .BTF_ids section. The upshot
of these annotations is that we'll be able to automatically generate
kfunc prototypes for downstream users. The process is as follows:

1. In source, use BTF_KFUNCS_START/END macro pair to mark kfuncs
2. During build, pahole injects into BTF a "bpf_kfunc" BTF_DECL_TAG for
each function inside BTF_KFUNCS sets
3. At runtime, vmlinux or module BTF is made available in sysfs
4. At runtime, bpftool (or similar) can look at provided BTF and
generate appropriate prototypes for functions with "bpf_kfunc" tag

To ensure future kfunc are similarly tagged, we now also return error
inside kfunc registration for untagged kfuncs. For vmlinux kfuncs,
we also WARN(), as initcall machinery does not handle errors.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Acked-by: Benjamin Tissoires <bentiss@kernel.org>
Link: https://lore.kernel.org/r/e55150ceecbf0a5d961e608941165c0bee7bc943.1706491398.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Xu [Mon, 29 Jan 2024 01:24:07 +0000 (18:24 -0700)]

bpf: btf: Add BTF_KFUNCS_START/END macro pair

This macro pair is functionally equivalent to BTF_SET8_START/END, except
with BTF_SET8_KFUNCS flag set in the btf_id_set8 flags field. The next
commit will codemod all kfunc set8s to this new variant such that all
kfuncs are tagged as such in .BTF_ids section.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/d536c57c7c2af428686853cc7396b7a44faa53b7.1706491398.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Xu [Mon, 29 Jan 2024 01:24:06 +0000 (18:24 -0700)]

bpf: btf: Support flags for BTF_SET8 sets

This commit adds support for flags on BTF_SET8s. struct btf_id_set8
already supported 32 bits worth of flags, but was only used for
alignment purposes before.

We now use these bits to encode flags. The first use case is tagging
kfunc sets with a flag so that pahole can recognize which
BTF_ID_FLAGS(func, ..) are actual kfuncs.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/7bb152ec76d6c2c930daec88e995bf18484a5ebb.1706491398.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Manu Bretelle [Wed, 31 Jan 2024 05:32:12 +0000 (21:32 -0800)]

selftests/bpf: Disable IPv6 for lwt_redirect test

After a recent change in the vmtest runner, this test started failing
sporadically.

Investigation showed that this test was subject to race condition which
got exacerbated after the vm runner change. The symptoms being that the
logic that waited for an ICMPv4 packet is naive and will break if 5 or
more non-ICMPv4 packets make it to tap0.
When ICMPv6 is enabled, the kernel will generate traffic such as ICMPv6
router solicitation...
On a system with good performance, the expected ICMPv4 packet would very
likely make it to the network interface promptly, but on a system with
poor performance, those "guarantees" do not hold true anymore.

Given that the test is IPv4 only, this change disable IPv6 in the test
netns by setting `net.ipv6.conf.all.disable_ipv6` to 1.
This essentially leaves "ping" as the sole generator of traffic in the
network namespace.
If this test was to be made IPv6 compatible, the logic in
`wait_for_packet` would need to be modified.

In more details...

At a high level, the test does:
- create a new namespace
- in `setup_redirect_target` set up lo, tap0, and link_err interfaces as
  well as add 2 routes that attaches ingress/egress sections of
  `test_lwt_redirect.bpf.o` to the xmit path.
- in `send_and_capture_test_packets` send an ICMP packet and read off
  the tap interface (using `wait_for_packet`) to check that a ICMP packet
  with the right size is read.

`wait_for_packet` will try to read `max_retry` (5) times from the tap0
fd looking for an ICMPv4 packet matching some criteria.

The problem is that when we set up the `tap0` interface, because IPv6 is
enabled by default, traffic such as Router solicitation is sent through
tap0, as in:

  # tcpdump -r /tmp/lwt_redirect.pc
  reading from file /tmp/lwt_redirect.pcap, link-type EN10MB (Ethernet)
  04:46:23.578352 IP6 :: > ff02::1:ffc0:4427: ICMP6, neighbor solicitation, who has fe80::fcba:dff:fec0:4427, length 32
  04:46:23.659522 IP6 :: > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
  04:46:24.389169 IP 10.0.0.1 > 20.0.0.9: ICMP echo request, id 122, seq 1, length 108
  04:46:24.618599 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
  04:46:24.619985 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16
  04:46:24.767326 IP6 fe80::fcba:dff:fec0:4427 > ff02::16: HBH ICMP6, multicast listener report v2, 1 group record(s), length 28
  04:46:28.936402 IP6 fe80::fcba:dff:fec0:4427 > ff02::2: ICMP6, router solicitation, length 16

If `wait_for_packet` sees 5 non-ICMPv4 packets, it will return 0, which is what we see in:

  2024-01-31T03:51:25.0336992Z test_lwt_redirect_run:PASS:netns_create 0 nsec
  2024-01-31T03:51:25.0341309Z open_netns:PASS:malloc token 0 nsec
  2024-01-31T03:51:25.0344844Z open_netns:PASS:open /proc/self/ns/net 0 nsec
  2024-01-31T03:51:25.0350071Z open_netns:PASS:open netns fd 0 nsec
  2024-01-31T03:51:25.0353516Z open_netns:PASS:setns 0 nsec
  2024-01-31T03:51:25.0356560Z test_lwt_redirect_run:PASS:setns 0 nsec
  2024-01-31T03:51:25.0360140Z open_tuntap:PASS:open(/dev/net/tun) 0 nsec
  2024-01-31T03:51:25.0363822Z open_tuntap:PASS:ioctl(TUNSETIFF) 0 nsec
  2024-01-31T03:51:25.0367402Z open_tuntap:PASS:fcntl(O_NONBLOCK) 0 nsec
  2024-01-31T03:51:25.0371167Z setup_redirect_target:PASS:open_tuntap 0 nsec
  2024-01-31T03:51:25.0375180Z setup_redirect_target:PASS:if_nametoindex 0 nsec
  2024-01-31T03:51:25.0379929Z setup_redirect_target:PASS:ip link add link_err type dummy 0 nsec
  2024-01-31T03:51:25.0384874Z setup_redirect_target:PASS:ip link set lo up 0 nsec
  2024-01-31T03:51:25.0389678Z setup_redirect_target:PASS:ip addr add dev lo 10.0.0.1/32 0 nsec
  2024-01-31T03:51:25.0394814Z setup_redirect_target:PASS:ip link set link_err up 0 nsec
  2024-01-31T03:51:25.0399874Z setup_redirect_target:PASS:ip link set tap0 up 0 nsec
  2024-01-31T03:51:25.0407731Z setup_redirect_target:PASS:ip route add 10.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_ingress 0 nsec
  2024-01-31T03:51:25.0419105Z setup_redirect_target:PASS:ip route add 20.0.0.0/24 dev link_err encap bpf xmit obj test_lwt_redirect.bpf.o sec redir_egress 0 nsec
  2024-01-31T03:51:25.0427209Z test_lwt_redirect_normal:PASS:setup_redirect_target 0 nsec
  2024-01-31T03:51:25.0431424Z ping_dev:PASS:if_nametoindex 0 nsec
  2024-01-31T03:51:25.0437222Z send_and_capture_test_packets:FAIL:wait_for_epacket unexpected wait_for_epacket: actual 0 != expected 1
  2024-01-31T03:51:25.0448298Z (/tmp/work/bpf/bpf/tools/testing/selftests/bpf/prog_tests/lwt_redirect.c:175: errno: Success) test_lwt_redirect_normal egress test fails
  2024-01-31T03:51:25.0457124Z close_netns:PASS:setns 0 nsec

When running in a VM which potential resource contrains, the odds that calling
`ping` is not scheduled very soon after bringing `tap0` up increases,
and with this the chances to get our ICMP packet pushed to position 6+
in the network trace.

To confirm this indeed solves the issue, I ran the test 100 times in a
row with:

  errors=0
  successes=0
  for i in `seq 1 100`
  do
    ./test_progs -t lwt_redirect/lwt_redirect_normal
    if [ $? -eq 0 ]; then
      successes=$((successes+1))
    else
      errors=$((errors+1))
    fi
  done
  echo "successes: $successes/errors: $errors"

While this test would at least fail a couple of time every 10 runs, here
it ran 100 times with no error.

Fixes: 43a7c3ef8a15 ("selftests/bpf: Add lwt_xmit tests for BPF_REDIRECT")
Signed-off-by: Manu Bretelle <chantr4@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240131053212.2247527-1-chantr4@gmail.com

commit | commitdiff | tree

Martin KaFai Lau [Tue, 30 Jan 2024 23:55:50 +0000 (15:55 -0800)]

Merge branch 'libbpf: add bpf_core_cast() helper'

Andrii Nakryiko says:

====================
Add bpf_core_cast(<ptr>, <type>) macro wrapper around bpf_rdonly_cast() kfunc
to make it easier to use this functionality in BPF code. See patch #2 for
BPF selftests conversions demonstrating improvements in code succinctness.
====================

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Tue, 30 Jan 2024 21:20:23 +0000 (13:20 -0800)]

selftests/bpf: convert bpf_rdonly_cast() uses to bpf_core_cast() macro

Use more ergonomic bpf_core_cast() macro instead of bpf_rdonly_cast() in
selftests code.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130212023.183765-3-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Tue, 30 Jan 2024 21:20:22 +0000 (13:20 -0800)]

libbpf: add bpf_core_cast() macro

Add bpf_core_cast() macro that wraps bpf_rdonly_cast() kfunc. It's more
ergonomic than kfunc, as it automatically extracts btf_id with
bpf_core_type_id_kernel(), and works with type names. It also casts result
to (T *) pointer. See the definition of the macro, it's self-explanatory.

libbpf declares bpf_rdonly_cast() extern as __weak __ksym and should be
safe to not conflict with other possible declarations in user code.

But we do have a conflict with current BPF selftests that declare their
externs with first argument as `void *obj`, while libbpf opts into more
permissive `const void *obj`. This causes conflict, so we fix up BPF
selftests uses in the same patch.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130212023.183765-2-andrii@kernel.org
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Tue, 30 Jan 2024 17:41:51 +0000 (09:41 -0800)]

Merge branch 'trusted-ptr_to_btf_id-arg-support-in-global-subprogs'

Andrii Nakryiko says:

====================
Trusted PTR_TO_BTF_ID arg support in global subprogs

This patch set follows recent changes that added btf_decl_tag-based argument
annotation support for global subprogs. This time we add ability to pass
PTR_TO_BTF_ID (BTF-aware kernel pointers) arguments into global subprograms.
We support explicitly trusted arguments only, for now.

Patch #1 adds logic for arg:trusted tag support on the verifier side. Default
semantic of such arguments is non-NULL, enforced on caller side. But patch #2
adds arg:nullable tag that can be combined with arg:trusted to make callee
explicitly do the NULL check, which helps implement "optional" PTR_TO_BTF_ID
arguments.

Patch #3 adds libbpf-side __arg_trusted and __arg_nullable macros.

Patch #4 adds a bunch of tests validating __arg_trusted in combination with
__arg_nullable.

v2->v3:
  - went back to arg:nullable and __arg_nullable naming;
  - rebased on latest bpf-next after prepartory patches landed;
v1->v2:
  - added fix up to type enforcement changes, landed earlier;
  - dropped bpf_core_cast() changes, will post them separately, as they now
    are not used in added tests;
  - dropped arg:untrusted support (Alexei);
  - renamed arg:nullable to arg:maybe_null (Alexei);
  - and also added task_struct___local flavor tests (Alexei).
====================

Link: https://lore.kernel.org/r/20240130000648.2144827-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Tue, 30 Jan 2024 00:06:48 +0000 (16:06 -0800)]

selftests/bpf: add trusted global subprog arg tests

Add a bunch of test cases validating behavior of __arg_trusted and its
combination with __arg_nullable tag. We also validate CO-RE flavor
support by kernel for __arg_trusted args.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Tue, 30 Jan 2024 00:06:47 +0000 (16:06 -0800)]

libbpf: add __arg_trusted and __arg_nullable tag macros

Add __arg_trusted to annotate global func args that accept trusted
PTR_TO_BTF_ID arguments.

Also add __arg_nullable to combine with __arg_trusted (and maybe other
tags in the future) to force global subprog itself (i.e., callee) to do
NULL checks, as opposed to default non-NULL semantics (and thus caller's
responsibility to ensure non-NULL values).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Tue, 30 Jan 2024 00:06:46 +0000 (16:06 -0800)]

bpf: add arg:nullable tag to be combined with trusted pointers

Add ability to mark arg:trusted arguments with optional arg:nullable
tag to mark it as PTR_TO_BTF_ID_OR_NULL variant, which will allow
callers to pass NULL, and subsequently will force global subprog's code
to do NULL check. This allows to have "optional" PTR_TO_BTF_ID values
passed into global subprogs.

For now arg:nullable cannot be combined with anything else.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Tue, 30 Jan 2024 00:06:45 +0000 (16:06 -0800)]

bpf: add __arg_trusted global func arg tag

Add support for passing PTR_TO_BTF_ID registers to global subprogs.
Currently only PTR_TRUSTED flavor of PTR_TO_BTF_ID is supported.
Non-NULL semantics is assumed, so caller will be forced to prove
PTR_TO_BTF_ID can't be NULL.

Note, we disallow global subprogs to destroy passed in PTR_TO_BTF_ID
arguments, even the trusted one. We achieve that by not setting
ref_obj_id when validating subprog code. This basically enforces (in
Rust terms) borrowing semantics vs move semantics. Borrowing semantics
seems to be a better fit for isolated global subprog validation
approach.

Implementation-wise, we utilize existing logic for matching
user-provided BTF type to kernel-side BTF type, used by BPF CO-RE logic
and following same matching rules. We enforce a unique match for types.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240130000648.2144827-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Jose E. Marchesi [Tue, 30 Jan 2024 11:36:24 +0000 (12:36 +0100)]

bpf: Move -Wno-compare-distinct-pointer-types to BPF_CFLAGS

Clang supports enabling/disabling certain conversion diagnostics via
the -W[no-]compare-distinct-pointer-types command line options.
Disabling this warning is required by some BPF selftests due to
-Werror. Until very recently GCC would emit these warnings
unconditionally, which was a problem for gcc-bpf, but we added support
for the command-line options to GCC upstream [1].

This patch moves the -Wno-cmopare-distinct-pointer-types from
CLANG_CFLAGS to BPF_CFLAGS in selftests/bpf/Makefile so the option
is also used in gcc-bpf builds, not just in clang builds.

Tested in bpf-next master.
No regressions.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2023-August/627769.html

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240130113624.24940-1-jose.marchesi@oracle.com

commit | commitdiff | tree

Jose E. Marchesi [Tue, 30 Jan 2024 11:03:43 +0000 (12:03 +0100)]

bpf: Build type-punning BPF selftests with -fno-strict-aliasing

A few BPF selftests perform type punning and they may break strict
aliasing rules, which are exploited by both GCC and clang by default
while optimizing. This can lead to broken compiled programs.

This patch disables strict aliasing for these particular tests, by
mean of the -fno-strict-aliasing command line option. This will make
sure these tests are optimized properly even if some strict aliasing
rule gets violated.

After this patch, GCC is able to build all the selftests without
warning about potential strict aliasing issue.

bpf@vger discussion on strict aliasing and BPF selftests:
https://lore.kernel.org/bpf/bae1205a-b6e5-4e46-8e20-520d7c327f7a@linux.dev/T/#t

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/bae1205a-b6e5-4e46-8e20-520d7c327f7a@linux.dev
Link: https://lore.kernel.org/bpf/20240130110343.11217-1-jose.marchesi@oracle.com

commit | commitdiff | tree

Haiyue Wang [Sat, 27 Jan 2024 13:48:56 +0000 (21:48 +0800)]

bpf,token: Use BIT_ULL() to convert the bit mask

Replace the '(1ULL << *)' with the macro BIT_ULL(nr).

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240127134901.3698613-1-haiyue.wang@intel.com

commit | commitdiff | tree

Jose E. Marchesi [Sat, 27 Jan 2024 18:50:31 +0000 (19:50 +0100)]

bpf: Generate const static pointers for kernel helpers

The generated bpf_helper_defs.h file currently contains definitions
like this for the kernel helpers, which are static objects:

  static void *(*bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1;

These work well in both clang and GCC because both compilers do
constant propagation with -O1 and higher optimization, resulting in
`call 1' BPF instructions being generated, which are calls to kernel
helpers.

However, there is a discrepancy on how the -Wunused-variable
warning (activated by -Wall) is handled in these compilers:

- clang will not emit -Wunused-variable warnings for static variables
  defined in C header files, be them constant or not constant.

- GCC will not emit -Wunused-variable warnings for _constant_ static
  variables defined in header files, but it will emit warnings for
  non-constant static variables defined in header files.

There is no reason for these bpf_helpers_def.h pointers to not be
declared constant, and it is actually desirable to do so, since their
values are not to be changed.  So this patch modifies bpf_doc.py to
generate prototypes like:

  static void *(* const bpf_map_lookup_elem)(void *map, const void *key) = (void *) 1;

This allows GCC to not error while compiling BPF programs with `-Wall
-Werror', while still being able to detect and error on legitimate
unused variables in the program themselves.

This change doesn't impact the desired constant propagation in neither
Clang nor GCC with -O1 and higher.  On the contrary, being declared as
constant may increase the odds they get constant folded when
used/referred to in certain circumstances.

Tested in bpf-next master.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20240127185031.29854-1-jose.marchesi@oracle.com

commit | commitdiff | tree

Ian Rogers [Thu, 25 Jan 2024 23:18:40 +0000 (15:18 -0800)]

libbpf: Add some details for BTF parsing failures

As CONFIG_DEBUG_INFO_BTF is default off the existing "failed to find
valid kernel BTF" message makes diagnosing the kernel build issue somewhat
cryptic. Add a little more detail with the hope of helping users.

Before:
```
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not accessible:
```
libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled?
libbpf: failed to find valid kernel BTF
libbpf: Error loading vmlinux BTF: -3
```

After not readable:
```
libbpf: failed to read kernel BTF from (/sys/kernel/btf/vmlinux): -1
```

Closes: https://lore.kernel.org/bpf/CAP-5=fU+DN_+Y=Y4gtELUsJxKNDDCOvJzPHvjUVaUoeFAzNnig@mail.gmail.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20240125231840.1647951-1-irogers@google.com

commit | commitdiff | tree

Florian Lehner [Sat, 20 Jan 2024 15:09:20 +0000 (16:09 +0100)]

perf/bpf: Fix duplicate type check

Remove the duplicate check on type and unify result.

Signed-off-by: Florian Lehner <dev@der-flo.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20240120150920.3370-1-dev@der-flo.net

commit | commitdiff | tree

Jose E. Marchesi [Sat, 27 Jan 2024 10:07:02 +0000 (11:07 +0100)]

bpf: Use -Wno-error in certain tests when building with GCC

Certain BPF selftests contain code that, albeit being legal C, trigger
warnings in GCC that cannot be disabled.  This is the case for example
for the tests

  progs/btf_dump_test_case_bitfields.c
  progs/btf_dump_test_case_namespacing.c
  progs/btf_dump_test_case_packing.c
  progs/btf_dump_test_case_padding.c
  progs/btf_dump_test_case_syntax.c

which contain struct type declarations inside function parameter
lists.  This is problematic, because:

- The BPF selftests are built with -Werror.

- The Clang and GCC compilers sometimes differ when it comes to handle
  warnings.  in the handling of warnings.  One compiler may emit
  warnings for code that the other compiles compiles silently, and one
  compiler may offer the possibility to disable certain warnings, while
  the other doesn't.

In order to overcome this problem, this patch modifies the
tools/testing/selftests/bpf/Makefile in order to:

1. Enable the possibility of specifing per-source-file extra CFLAGS.
   This is done by defining a make variable like:

   <source-filename>-CFLAGS := <whateverflags>

   And then modifying the proper Make rule in order to use these flags
   when compiling <source-filename>.

2. Use the mechanism above to add -Wno-error to CFLAGS for the
   following selftests:

   progs/btf_dump_test_case_bitfields.c
   progs/btf_dump_test_case_namespacing.c
   progs/btf_dump_test_case_packing.c
   progs/btf_dump_test_case_padding.c
   progs/btf_dump_test_case_syntax.c

   Note the corresponding -CFLAGS variables for these files are
   defined only if the selftests are being built with GCC.

Note that, while compiler pragmas can generally be used to disable
particular warnings per file, this 1) is only possible for warning
that actually can be disabled in the command line, i.e. that have
-Wno-FOO options, and 2) doesn't apply to -Wno-error.

Tested in bpf-next master branch.
No regressions.

Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240127100702.21549-1-jose.marchesi@oracle.com

commit | commitdiff | tree

Martin KaFai Lau [Sat, 27 Jan 2024 02:50:17 +0000 (18:50 -0800)]

selftests/bpf: Remove "&>" usage in the selftests

In s390, CI reported that the sock_iter_batch selftest
hits this error very often:

2024-01-26T16:56:49.3091804Z Bind /proc/self/ns/net -> /run/netns/sock_iter_batch_netns failed: No such file or directory
2024-01-26T16:56:49.3149524Z Cannot remove namespace file "/run/netns/sock_iter_batch_netns": No such file or directory
2024-01-26T16:56:49.3772213Z test_sock_iter_batch:FAIL:ip netns add sock_iter_batch_netns unexpected error: 256 (errno 0)

It happens very often in s390 but Manu also noticed it happens very
sparsely in other arch also.

It turns out the default dash shell does not recognize "&>"
as a redirection operator, so the command went to the background.
In the sock_iter_batch selftest, the "ip netns delete" went
into background and then race with the following "ip netns add"
command.

This patch replaces the "&> /dev/null" usage with ">/dev/null 2>&1"
and does this redirection in the SYS_NOFAIL macro instead of doing
it individually by its caller. The SYS_NOFAIL callers do not care
about failure, so it is no harm to do this redirection even if
some of the existing callers do not redirect to /dev/null now.

It touches different test files, so I skipped the Fixes tags
in this patch. Some of the changed tests do not use "&>"
but they use the SYS_NOFAIL, so these tests are also
changed to avoid doing its own redirection because
SYS_NOFAIL does it internally now.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240127025017.950825-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Thu, 25 Jan 2024 20:55:06 +0000 (12:55 -0800)]

bpf: move arg:ctx type enforcement check inside the main logic loop

Now that bpf and bpf-next trees converged and we don't run the risk of
merge conflicts, move btf_validate_prog_ctx_type() into its most logical
place inside the main logic loop.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Thu, 25 Jan 2024 20:55:05 +0000 (12:55 -0800)]

libbpf: fix __arg_ctx type enforcement for perf_event programs

Adjust PERF_EVENT type enforcement around __arg_ctx to match exactly
what kernel is doing.

Fixes: 76ec90a996e3 ("libbpf: warn on unexpected __arg_ctx type when rewriting BTF")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Thu, 25 Jan 2024 20:55:04 +0000 (12:55 -0800)]

libbpf: integrate __arg_ctx feature detector into kernel_supports()

Now that feature detection code is in bpf-next tree, integrate __arg_ctx
kernel-side support into kernel_supports() framework.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240125205510.3642094-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Yonghong Song [Sat, 27 Jan 2024 19:46:29 +0000 (11:46 -0800)]

docs/bpf: Improve documentation of 64-bit immediate instructions

For 64-bit immediate instruction, 'BPF_IMM | BPF_DW | BPF_LD' and
src_reg=[0-6], the current documentation describes the 64-bit
immediate is constructed by:

imm64 = (next_imm << 32) | imm

But actually imm64 is only used when src_reg=0. For all other
variants (src_reg != 0), 'imm' and 'next_imm' have separate special
encoding requirement and imm64 cannot be easily used to describe
instruction semantics.

This patch clarifies that 64-bit immediate instructions use
two 32-bit immediate values instead of a 64-bit immediate value,
so later describing individual 64-bit immediate instructions
becomes less confusing.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Dave Thaler <dthaler1968@gmail.com>
Link: https://lore.kernel.org/bpf/20240127194629.737589-1-yonghong.song@linux.dev

commit | commitdiff | tree

Menglong Dong [Sun, 28 Jan 2024 05:54:43 +0000 (13:54 +0800)]

bpf: Remove unused field "mod" in struct bpf_trampoline

It seems that the field "mod" in struct bpf_trampoline is not used
anywhere after the commit 31bf1dbccfb0 ("bpf: Fix attaching
fentry/fexit/fmod_ret/lsm to modules"). So we can just remove it now.

Fixes: 31bf1dbccfb0 ("bpf: Fix attaching fentry/fexit/fmod_ret/lsm to modules")
Signed-off-by: Menglong Dong <dongmenglong.8@bytedance.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240128055443.413291-1-dongmenglong.8@bytedance.com

commit | commitdiff | tree

Geliang Tang [Sun, 28 Jan 2024 11:43:57 +0000 (19:43 +0800)]

selftests/bpf: Drop return in bpf_testmod_exit

bpf_testmod_exit() does not need to have a return value (given the void),
so this patch drops this useless 'return' in it.

Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/5765b287ea088f0c820f2a834faf9b20fb2f8215.1706442113.git.tanggeliang@kylinos.cn

commit | commitdiff | tree

Pu Lehui [Mon, 15 Jan 2024 13:12:35 +0000 (13:12 +0000)]

riscv, bpf: Optimize bswap insns with Zbb support

Optimize bswap instructions by rev8 Zbb instruction conbined with srli
instruction. And Optimize 16-bit zero-extension with Zbb support.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-7-pulehui@huaweicloud.com

commit | commitdiff | tree

Pu Lehui [Mon, 15 Jan 2024 13:12:34 +0000 (13:12 +0000)]

riscv, bpf: Optimize sign-extention mov insns with Zbb support

Add 8-bit and 16-bit sign-extention wraper with Zbb support to optimize
sign-extension mov instructions.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-6-pulehui@huaweicloud.com

commit | commitdiff | tree

Pu Lehui [Mon, 15 Jan 2024 13:12:33 +0000 (13:12 +0000)]

riscv, bpf: Add necessary Zbb instructions

Add necessary Zbb instructions introduced by [0] to reduce code size and
improve performance of RV64 JIT. Meanwhile, a runtime deteted helper is
added to check whether the CPU supports Zbb instructions.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-38-g865e7a7.pdf
Link: https://lore.kernel.org/bpf/20240115131235.2914289-5-pulehui@huaweicloud.com

commit | commitdiff | tree

Pu Lehui [Mon, 15 Jan 2024 13:12:32 +0000 (13:12 +0000)]

riscv, bpf: Simplify sext and zext logics in branch instructions

There are many extension helpers in the current branch instructions, and
the implementation is a bit complicated. We simplify this logic through
two simple extension helpers with alternate register.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-4-pulehui@huaweicloud.com

commit | commitdiff | tree

Pu Lehui [Mon, 15 Jan 2024 13:12:31 +0000 (13:12 +0000)]

riscv, bpf: Unify 32-bit zero-extension to emit_zextw

For code unification, add emit_zextw wrapper to unify all the 32-bit
zero-extension operations.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-3-pulehui@huaweicloud.com

commit | commitdiff | tree

Pu Lehui [Mon, 15 Jan 2024 13:12:30 +0000 (13:12 +0000)]

riscv, bpf: Unify 32-bit sign-extension to emit_sextw

For code unification, add emit_sextw wrapper to unify all the 32-bit
sign-extension operations.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20240115131235.2914289-2-pulehui@huaweicloud.com

commit | commitdiff | tree

Andrii Nakryiko [Fri, 26 Jan 2024 22:09:44 +0000 (14:09 -0800)]

libbpf: Fix faccessat() usage on Android

Android implementation of libc errors out with -EINVAL in faccessat() if
passed AT_EACCESS ([0]), this leads to ridiculous issue with libbpf
refusing to load /sys/kernel/btf/vmlinux on Androids ([1]). Fix by
detecting Android and redefining AT_EACCESS to 0, it's equivalent on
Android.

[0] https://android.googlesource.com/platform/bionic/+/refs/heads/android13-release/libc/bionic/faccessat.cpp#50
[1] https://github.com/libbpf/libbpf-bootstrap/issues/250#issuecomment-1911324250

Fixes: 6a4ab8869d0b ("libbpf: Fix the case of running as non-root with capabilities")
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20240126220944.2497665-1-andrii@kernel.org

commit | commitdiff | tree

Arnaldo Carvalho de Melo [Mon, 29 Jan 2024 14:33:26 +0000 (11:33 -0300)]

bpftool: Be more portable by using POSIX's basename()

musl libc had the basename() prototype in string.h, but this is a
glibc-ism, now they removed the _GNU_SOURCE bits in their devel distro,
Alpine Linux edge:

https://git.musl-libc.org/cgit/musl/commit/?id=725e17ed6dff4d0cd22487bb64470881e86a92e7

So lets use the POSIX version, the whole rationale is spelled out at:

https://gitlab.alpinelinux.org/alpine/aports/-/issues/15643

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <olsajiri@gmail.com>
Acked-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/lkml/ZZhsPs00TI75RdAr@kernel.org
Link: https://lore.kernel.org/bpf/Zbe3NuOgaupvUcpF@kernel.org

commit | commitdiff | tree

Jakub Kicinski [Fri, 26 Jan 2024 20:14:49 +0000 (12:14 -0800)]

net: free altname using an RCU callback

We had to add another synchronize_rcu() in recent fix.
Bite the bullet and add an rcu_head to netdev_name_node,
free from RCU.

Note that name_node does not hold any reference on dev
to which it points, but there must be a synchronize_rcu()
on device removal path, so we should be fine.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Tobias Schramm [Thu, 25 Jan 2024 20:15:05 +0000 (21:15 +0100)]

dt-bindings: nfc: ti,trf7970a: fix usage example

The TRF7970A is a SPI device, not I2C.

Signed-off-by: Tobias Schramm <t.schramm@manjaro.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Min Li [Wed, 24 Jan 2024 18:49:47 +0000 (13:49 -0500)]

ptp: add FemtoClock3 Wireless as ptp hardware clock

The RENESAS FemtoClock3 Wireless is a high-performance jitter attenuator,
frequency translator, and clock synthesizer. The device is comprised of 3
digital PLLs (DPLL) to track CLKIN inputs and three independent low phase
noise fractional output dividers (FOD) that output low phase noise clocks.

FemtoClock3 supports one Time Synchronization (Time Sync) channel to enable
an external processor to control the phase and frequency of the Time Sync
channel and to take phase measurements using the TDC. Intended applications
are synchronization using the precision time protocol (PTP) and
synchronization with 0.5 Hz and 1 Hz signals from GNSS.

Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Lee Jones <lee@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Min Li [Wed, 24 Jan 2024 18:49:46 +0000 (13:49 -0500)]

ptp: introduce PTP_CLOCK_EXTOFF event for the measured external offset

This change is for the PHC devices that can measure the phase offset
between PHC signal and the external signal, such as the 1PPS signal of
GNSS. Reporting PTP_CLOCK_EXTOFF to user space will be piggy-backed to
the existing ptp_extts_event so that application such as ts2phc can
poll the external offset the same way as extts. Hence, ts2phc can use
the offset to achieve the alignment between PHC and the external signal
by the help of either SW or HW filters.

Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

David S. Miller [Mon, 29 Jan 2024 12:12:51 +0000 (12:12 +0000)]

Merge branch 'net-module-description'

Breno Leitao says:

====================
Fix MODULE_DESCRIPTION() for net (p3)

There are hundreds of network modules that misses MODULE_DESCRIPTION(),
causing a warning when compiling with W=1. Example:

        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com90io.o
        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/arc-rimi.o
        WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/net/arcnet/com20020.o

This part3 of the patchset focus on the missing ethernet drivers, which
is now warning free. This also fixes net/pcs and ieee802154.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 25 Jan 2024 19:34:20 +0000 (11:34 -0800)]

net: fill in MODULE_DESCRIPTION()s for arcnet

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to arcnet module.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 25 Jan 2024 19:34:19 +0000 (11:34 -0800)]

net: fill in MODULE_DESCRIPTION()s for ieee802154

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to ieee802154 modules.

Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Miquel Raynal <miquel.raynal@bootlin.com>
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 25 Jan 2024 19:34:18 +0000 (11:34 -0800)]

net: fill in MODULE_DESCRIPTION()s for PCS drivers

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Lynx, XPCS and LynxI PCS drivers.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 25 Jan 2024 19:34:17 +0000 (11:34 -0800)]

net: fill in MODULE_DESCRIPTION()s for ec_bhf

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Beckhoff CX5020 EtherCAT Ethernet driver.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 25 Jan 2024 19:34:16 +0000 (11:34 -0800)]

net: fill in MODULE_DESCRIPTION()s for cpsw-common

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the TI CPSW switch module.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 25 Jan 2024 19:34:15 +0000 (11:34 -0800)]

net: fill in MODULE_DESCRIPTION()s for dwmac-socfpga

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the STMicro DWMAC for Altera SOCs.

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

commit | commitdiff | tree

Breno Leitao [Thu, 25 Jan 2024 19:34:14 +0000 (11:34 -0800)]

net: fill in MODULE_DESCRIPTION()s for Qualcom drivers

W=1 builds now warn if module is built without a MODULE_DESCRIPTION().
Add descriptions to the Qualcom rmnet and emac drivers.

Signed-off-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Subash Abhinov Kasiviswanathan <quic_subashab@quicinc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

A mirror of Linus' kernel repository