git.ipfire.org Git - thirdparty/linux.git/log

bpf: verifier: Refactor helper access type tracking

Previously, the verifier was treating all PTR_TO_STACK registers passed
to a helper call as potentially written to by the helper. However, all
calls to check_stack_range_initialized() already have precise access type
information available.

Rather than treat ACCESS_HELPER as a proxy for BPF_WRITE, pass
enum bpf_access_type to check_stack_range_initialized() to more
precisely track helper arguments.

One benefit from this precision is that registers tracked as valid
spills and passed as a read-only helper argument remain tracked after
the call. Rather than being marked STACK_MISC afterwards.

An additional benefit is the verifier logs are also more precise. For
this particular error, users will enjoy a slightly clearer message. See
included selftest updates for examples.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/ff885c0e5859e0cd12077c3148ff0754cad4f7ed.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: tcp: Mark bpf_load_hdr_opt() arg2 as read-write

MEM_WRITE attribute is defined as: "Non-presence of MEM_WRITE means that
MEM is only being read". bpf_load_hdr_opt() both reads and writes from
its arg2 - void *search_res.

This matters a lot for the next commit where we more precisely track
stack accesses. Without this annotation, the verifier will make false
assumptions about the contents of memory written to by helpers and
possibly prune valid branches.

Fixes: 6fad274f06f0 ("bpf: Add MEM_WRITE attribute")
Acked-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/730e45f8c39be2a5f3d8c4406cceca9d574cbf14.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: verifier: Add missing newline on verbose() call

The print was missing a newline.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/59cbe18367b159cd470dc6d5c652524c1dc2b984.1736886479.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add distilled BTF test about marking BTF_IS_EMBEDDED

When redirecting the split BTF to the vmlinux base BTF, we need to mark
the distilled base struct/union members of split BTF structs/unions in
id_map with BTF_IS_EMBEDDED. This indicates that these types must match
both name and size later. So if a needed composite type, which is the
member of composite type in the split BTF, has a different size in the
base BTF we wish to relocate with, btf__relocate() should error out.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250115100241.4171581-4-pulehui@huaweicloud.com

libbpf: Fix incorrect traversal end type ID when marking BTF_IS_EMBEDDED

When redirecting the split BTF to the vmlinux base BTF, we need to mark
the distilled base struct/union members of split BTF structs/unions in
id_map with BTF_IS_EMBEDDED. This indicates that these types must match
both name and size later. Therefore, we need to traverse the entire
split BTF, which involves traversing type IDs from nr_dist_base_types to
nr_types. However, the current implementation uses an incorrect
traversal end type ID, so let's correct it.

Fixes: 19e00c897d50 ("libbpf: Split BTF relocation")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250115100241.4171581-3-pulehui@huaweicloud.com

libbpf: Fix return zero when elf_begin failed

The error number of elf_begin is omitted when encapsulating the
btf_find_elf_sections function.

Fixes: c86f180ffc99 ("libbpf: Make btf_parse_elf process .BTF.base transparently")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250115100241.4171581-2-pulehui@huaweicloud.com

selftests/bpf: Fix btf leak on new btf alloc failure in btf_distill test

Fix btf leak on new btf alloc failure in btf_distill test.

Fixes: affdeb50616b ("selftests/bpf: Extend distilled BTF tests to cover BTF relocation")
Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250115100241.4171581-1-pulehui@huaweicloud.com

veristat: Load struct_ops programs only once

libbpf automatically adjusts autoload for struct_ops programs,
see libbpf.c:bpf_object_adjust_struct_ops_autoload.

For example, if there is a map:

    SEC(".struct_ops.link")
    struct sched_ext_ops ops = {
     .enqueue = foo,
        .tick = bar,
    };

Both 'foo' and 'bar' would be loaded if 'ops' autocreate is true,
both 'foo' and 'bar' would be skipped if 'ops' autocreate is false.

This means that when veristat processes object file with 'ops',
it would load 4 programs in total: two programs per each
'process_prog' call.

The adjustment occurs at object load time, and libbpf remembers
association between 'ops' and 'foo'/'bar' at object open time.
The only way to persuade libbpf to load one of two is to adjust map
initial value, such that only one program is referenced.
This patch does exactly that, significantly reducing time to process
object files with big number of struct_ops programs.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250115223835.919989-1-eddyz87@gmail.com

selftests/bpf: Fix undefined UINT_MAX in veristat.c

Include <limits.h> in 'veristat.c' to provide a UINT_MAX definition and
avoid multiple compile errors against mips64el/musl-libc:

veristat.c: In function 'max_verifier_log_size':
veristat.c:1135:36: error: 'UINT_MAX' undeclared (first use in this function)
1135 |         const int SMALL_LOG_SIZE = UINT_MAX >> 8;
      |                                    ^~~~~~~~
veristat.c:24:1: note: 'UINT_MAX' is defined in header '<limits.h>'; did you forget to '#include <limits.h>'?
   23 | #include <math.h>
  +++ |+#include <limits.h>
   24 |

Fixes: 1f7c33630724 ("selftests/bpf: Increase verifier log limit in veristat")
Signed-off-by: Tony Ambardar <tony.ambardar@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250116075036.3459898-1-tony.ambardar@gmail.com

bpf: Send signals asynchronously if !preemptible

BPF programs can execute in all kinds of contexts and when a program
running in a non-preemptible context uses the bpf_send_signal() kfunc,
it will cause issues because this kfunc can sleep.
Change `irqs_disabled()` to `!preemptible()`.

Reported-by: syzbot+97da3d7e0112d59971de@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/67486b09.050a0220.253251.0084.GAE@google.com/
Fixes: 1bc7896e9ef4 ("bpf: Fix deadlock with rq_lock in bpf_send_signal()")
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250115103647.38487-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix test_xdp_adjust_tail_grow2 selftest on powerpc

On powerpc cache line size is 128 bytes, so skb_shared_info must be
aligned accordingly.

Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250110103109.3670793-1-skb99@linux.ibm.com

Merge branch 'selftests-bpf-migrate-test_xdp_redirect-sh-to-test_progs'

Bastien Curutchet says:

====================
This patch series continues the work to migrate the *.sh tests into
prog_tests.

test_xdp_redirect.sh tests the XDP redirections done through
bpf_redirect().

These XDP redirections are already tested by prog_tests/xdp_do_redirect.c
but IMO it doesn't cover the exact same code path because
xdp_do_redirect.c uses bpf_prog_test_run_opts() to trigger redirections
of 'fake packets' while test_xdp_redirect.sh redirects packets coming
from the network. Also, the test_xdp_redirect.sh script tests the
redirections with both SKB and DRV modes while xdp_do_redirect.c only
tests the DRV mode.

The patch series adds two new test cases in prog_tests/xdp_do_redirect.c
to replace the test_xdp_redirect.sh script.
====================

Link: https://patch.msgid.link/20250110-xdp_redirect-v2-0-b8f3ae53e894@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

selftests/bpf: Migrate test_xdp_redirect.c to test_xdp_do_redirect.c

prog_tests/xdp_do_redirect.c is the only user of the BPF programs
located in progs/test_xdp_do_redirect.c and progs/test_xdp_redirect.c.
There is no need to keep both files with such close names.

Move test_xdp_redirect.c contents to test_xdp_do_redirect.c and remove
progs/test_xdp_redirect.c

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250110-xdp_redirect-v2-3-b8f3ae53e894@bootlin.com

selftests/bpf: Migrate test_xdp_redirect.sh to xdp_do_redirect.c

test_xdp_redirect.sh can't be used by the BPF CI.

Migrate test_xdp_redirect.sh into a new test case in xdp_do_redirect.c.
It uses the same network topology and the same BPF programs located in
progs/test_xdp_redirect.c and progs/xdp_dummy.c.
Remove test_xdp_redirect.sh and its Makefile entry.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250110-xdp_redirect-v2-2-b8f3ae53e894@bootlin.com

selftests/bpf: test_xdp_redirect: Rename BPF sections

SEC("redirect_to_111") and SEC("redirect_to_222") can't be loaded by the
__load() helper.

Rename both sections SEC("xdp") so it can be interpreted by the __load()
helper in upcoming patch.
Update the test_xdp_redirect.sh to use the program name instead of the
section name to load the BPF program.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Link: https://patch.msgid.link/20250110-xdp_redirect-v2-1-b8f3ae53e894@bootlin.com

veristat: Document verifier log dumping capability

`-vl2` is a useful combination of flags to dump the entire
verification log. This is helpful when making changes to the verifier,
as you can see what it thinks program one instruction at a time.

This was more or less a hidden feature before. Document it so others can
discover it.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/d57bbcca81e06ae8dcdadaedb99a48dced67e422.1736466129.git.dxu@dxuuu.xyz

bpftool: Fix control flow graph segfault during edge creation

If the last instruction of a control flow graph building block is a
BPF_CALL, an incorrect edge with e->dst set to NULL is created and
results in a segfault during graph output.

Ensure that BPF_CALL as last instruction of a building block is handled
correctly and only generates a single edge unlike actual BPF_JUMP*
instructions.

Signed-off-by: Christoph Werle <christoph.werle@longjmp.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Quentin Monnet <qmo@kernel.org>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20250108220937.1470029-1-christoph.werle@longjmp.de

selftests/bpf: Add a test for kprobe multi with unique_match

Add a kprobe multi subtest to test kprobe multi unique_match option.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250109174028.3368967-1-yonghong.song@linux.dev

libbpf: Add unique_match option for multi kprobe

Jordan reported an issue in Meta production environment where func
try_to_wake_up() is renamed to try_to_wake_up.llvm.<hash>() by clang
compiler at lto mode. The original 'kprobe/try_to_wake_up' does not
work any more since try_to_wake_up() does not match the actual func
name in /proc/kallsyms.

There are a couple of ways to resolve this issue. For example, in
attach_kprobe(), we could do lookup in /proc/kallsyms so try_to_wake_up()
can be replaced by try_to_wake_up.llvm.<hach>(). Or we can force users
to use bpf_program__attach_kprobe() where they need to lookup
/proc/kallsyms to find out try_to_wake_up.llvm.<hach>(). But these two
approaches requires extra work by either libbpf or user.

Luckily, suggested by Andrii, multi kprobe already supports wildcard ('*')
for symbol matching. In the above example, 'try_to_wake_up*' can match
to try_to_wake_up() or try_to_wake_up.llvm.<hash>() and this allows
bpf prog works for different kernels as some kernels may have
try_to_wake_up() and some others may have try_to_wake_up.llvm.<hash>().

The original intention is to kprobe try_to_wake_up() only, so an optional
field unique_match is added to struct bpf_kprobe_multi_opts. If the
field is set to true, the number of matched functions must be one.
Otherwise, the attachment will fail. In the above case, multi kprobe
with 'try_to_wake_up*' and unique_match preserves user functionality.

Reported-by: Jordan Rome <linux@jordanrome.com>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250109174023.3368432-1-yonghong.song@linux.dev

Merge branch 'bpf-reduce-the-use-of-migrate_-disable-enable'

Hou Tao says:

====================
The use of migrate_{disable|enable} pair in BPF is mainly due to the
introduction of bpf memory allocator and the use of per-CPU data struct
in its internal implementation. The caller needs to disable migration
before invoking the alloc or free APIs of bpf memory allocator, and
enable migration after the invocation.

The main users of bpf memory allocator are various kind of bpf maps in
which the map values or the special fields in the map values are
allocated by using bpf memory allocator.

At present, the running context for bpf program has already disabled
migration explictly or implictly, therefore, when these maps are
manipulated in bpf program, it is OK to not invoke migrate_disable()
and migrate_enable() pair. Howevers, it is not always the case when
these maps are manipulated through bpf syscall, therefore many
migrate_{disable|enable} pairs are added when the map can either be
manipulated by BPF program or BPF syscall.

The initial idea of reducing the use of migrate_{disable|enable} comes
from Alexei [1]. I turned it into a patch set that archives the goals
through the following three methods:

1. remove unnecessary migrate_{disable|enable} pair
when the BPF syscall path also disables migration, it is OK to remove
the pair. Patch #1~#3 fall into this category, while patch #4~#5 are
partially included.

2. move the migrate_{disable|enable} pair from inner callee to outer
   caller
Instead of invoking migrate_disable() in the inner callee, invoking
migrate_disable() in the outer caller to simplify reasoning about when
migrate_disable() is needed. Patch #4~#5 and patch #6~#19 belongs to
this category.

3. add cant_migrate() check in the inner callee
Add cant_migrate() check in the inner callee to ensure the guarantee
that migration is disabled is not broken. Patch #1~#5, #13, #16~#19 also
belong to this category.

Please check the individual patches for more details. Comments are
always welcome.

Change Log:
v2:
  * sqaush the ->map_free related patches (#10~#12, #15) into one patch
  * remove unnecessary cant_migrate() checks.

v1: https://lore.kernel.org/bpf/20250106081900.1665573-1-houtao@huaweicloud.com
====================

Link: https://patch.msgid.link/20250108010728.207536-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} from bpf_selem_free()

bpf_selem_free() has the following three callers:

(1) bpf_local_storage_update
It will be invoked through ->map_update_elem syscall or helpers for
storage map. Migration has already been disabled in these running
contexts.

(2) bpf_sk_storage_clone
It has already disabled migration before invoking bpf_selem_free().

(3) bpf_selem_free_list
bpf_selem_free_list() has three callers: bpf_selem_unlink_storage(),
bpf_local_storage_update() and bpf_local_storage_destroy().

The callers of bpf_selem_unlink_storage() includes: storage map
->map_delete_elem syscall, storage map delete helpers and
bpf_local_storage_map_free(). These contexts have already disabled
migration when invoking bpf_selem_unlink() which invokes
bpf_selem_unlink_storage() and bpf_selem_free_list() correspondingly.

bpf_local_storage_update() has been analyzed as the first caller above.
bpf_local_storage_destroy() is invoked when freeing the local storage
for the kernel object. Now cgroup, task, inode and sock storage have
already disabled migration before invoking bpf_local_storage_destroy().

After the analyses above, it is safe to remove migrate_{disable|enable}
from bpf_selem_free().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-17-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} from bpf_local_storage_free()

bpf_local_storage_free() has three callers:

1) bpf_local_storage_alloc()
Its caller must have disabled migration.

2) bpf_local_storage_destroy()
Its four callers (bpf_{cgrp|inode|task|sk}_storage_free()) have already
invoked migrate_disable() before invoking bpf_local_storage_destroy().

3) bpf_selem_unlink()
Its callers include: cgrp/inode/task/sk storage ->map_delete_elem
callbacks, bpf_{cgrp|inode|task|sk}_storage_delete() helpers and
bpf_local_storage_map_free(). All of these callers have already disabled
migration before invoking bpf_selem_unlink().

Therefore, it is OK to remove migrate_{disable|enable} pair from
bpf_local_storage_free().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-16-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} from bpf_local_storage_alloc()

These two callers of bpf_local_storage_alloc() are the same as
bpf_selem_alloc(): bpf_sk_storage_clone() and
bpf_local_storage_update(). The running contexts of these two callers
have already disabled migration, therefore, there is no need to add
extra migrate_{disable|enable} pair in bpf_local_storage_alloc().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-15-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} from bpf_selem_alloc()

bpf_selem_alloc() has two callers:
(1) bpf_sk_storage_clone_elem()
bpf_sk_storage_clone() has already disabled migration before invoking
bpf_sk_storage_clone_elem().

(2) bpf_local_storage_update()
Its callers include: cgrp/task/inode/sock storage ->map_update_elem()
callbacks and bpf_{cgrp|task|inode|sk}_storage_get() helpers. These
running contexts have already disabled migration

Therefore, there is no need to add extra migrate_{disable|enable} pair
in bpf_selem_alloc().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-14-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable,enable} in bpf_cpumask_release()

When BPF program invokes bpf_cpumask_release(), the migration must have
been disabled. When bpf_cpumask_release_dtor() invokes
bpf_cpumask_release(), the caller bpf_obj_free_fields() also has
disabled migration, therefore, it is OK to remove the unnecessary
migrate_{disable|enable} pair in bpf_cpumask_release().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-13-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} in bpf_obj_free_fields()

The callers of bpf_obj_free_fields() have already guaranteed that the
migration is disabled, therefore, there is no need to invoke
migrate_{disable,enable} pair in bpf_obj_free_fields()'s underly
implementation.

This patch removes unnecessary migrate_{disable|enable} pairs from
bpf_obj_free_fields() and its callees: bpf_list_head_free() and
bpf_rb_root_free().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-12-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Disable migration before calling ops->map_free()

The freeing of all map elements may invoke bpf_obj_free_fields() to free
the special fields in the map value. Since these special fields may be
allocated from bpf memory allocator, migrate_{disable|enable} pairs are
necessary for the freeing of these special fields.

To simplify reasoning about when migrate_disable() is needed for the
freeing of these special fields, let the caller to guarantee migration
is disabled before invoking bpf_obj_free_fields(). Therefore, disabling
migration before calling ops->map_free() to simplify the freeing of map
values or special fields allocated from bpf memory allocator.

After disabling migration in bpf_map_free(), there is no need for
additional migration_{disable|enable} pairs in these ->map_free()
callbacks. Remove these redundant invocations.

The migrate_{disable|enable} pairs in the underlying implementation of
bpf_obj_free_fields() will be removed by the following patch.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-11-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Disable migration in bpf_selem_free_rcu

bpf_selem_free_rcu() calls bpf_obj_free_fields() to free the special
fields in map value (e.g., kptr). Since kptrs may be allocated from bpf
memory allocator, migrate_{disable|enable} pairs are necessary for the
freeing of these kptrs.

To simplify reasoning about when migrate_disable() is needed for the
freeing of these dynamically-allocated kptrs, let the caller to
guarantee migration is disabled before invoking bpf_obj_free_fields().

Therefore, the patch adds migrate_{disable|enable} pair in
bpf_selem_free_rcu(). The migrate_{disable|enable} pairs in the
underlying implementation of bpf_obj_free_fields() will be removed by
the following patch.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-10-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Disable migration when cloning sock storage

bpf_sk_storage_clone() will call bpf_selem_free() to free the clone
element when the allocation of new sock storage fails. bpf_selem_free()
will call check_and_free_fields() to free the special fields in the
element. Since the allocated element is not visible to bpf syscall or
bpf program when bpf_local_storage_alloc() fails, these special fields
in the element must be all zero when invoking bpf_selem_free().

To be uniform with other callers of bpf_selem_free(), disabling
migration when cloning sock storage. Adding migrate_{disable|enable}
pair also benefits the potential switching from kzalloc to bpf memory
allocator for sock storage.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-9-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Disable migration when destroying sock storage

When destroying sock storage, it invokes bpf_local_storage_destroy() to
remove all storage elements saved in the sock storage. The destroy
procedure will call bpf_selem_free() to free the element, and
bpf_selem_free() calls bpf_obj_free_fields() to free the special fields
in map value (e.g., kptr). Since kptrs may be allocated from bpf memory
allocator, migrate_{disable|enable} pairs are necessary for the freeing
of these kptrs.

To simplify reasoning about when migrate_disable() is needed for the
freeing of these dynamically-allocated kptrs, let the caller to
guarantee migration is disabled before invoking bpf_obj_free_fields().
Therefore, the patch adds migrate_{disable|enable} pair in
bpf_sock_storage_free(). The migrate_{disable|enable} pairs in the
underlying implementation of bpf_obj_free_fields() will be removed by
The following patch.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-8-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Disable migration when destroying inode storage

When destroying inode storage, it invokes bpf_local_storage_destroy() to
remove all storage elements saved in the inode storage. The destroy
procedure will call bpf_selem_free() to free the element, and
bpf_selem_free() calls bpf_obj_free_fields() to free the special fields
in map value (e.g., kptr). Since kptrs may be allocated from bpf memory
allocator, migrate_{disable|enable} pairs are necessary for the freeing
of these kptrs.

To simplify reasoning about when migrate_disable() is needed for the
freeing of these dynamically-allocated kptrs, let the caller to
guarantee migration is disabled before invoking bpf_obj_free_fields().
Therefore, the patch adds migrate_{disable|enable} pair in
bpf_inode_storage_free(). The migrate_{disable|enable} pairs in the
underlying implementation of bpf_obj_free_fields() will be removed by
the following patch.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-7-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} from bpf_task_storage_lock helpers

Three callers of bpf_task_storage_lock() are ->map_lookup_elem,
->map_update_elem, ->map_delete_elem from bpf syscall. BPF syscall for
these three operations of task storage has already disabled migration.
Another two callers are bpf_task_storage_get() and
bpf_task_storage_delete() helpers which will be used by BPF program.

Two callers of bpf_task_storage_trylock() are bpf_task_storage_get() and
bpf_task_storage_delete() helpers. The running contexts of these helpers
have already disabled migration.

Therefore, it is safe to remove migrate_{disable|enable} from task
storage lock helpers for these call sites. However,
bpf_task_storage_free() also invokes bpf_task_storage_lock() and its
running context doesn't disable migration, therefore, add the missed
migrate_{disable|enable} in bpf_task_storage_free().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-6-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} from bpf_cgrp_storage_lock helpers

Three callers of bpf_cgrp_storage_lock() are ->map_lookup_elem,
->map_update_elem, ->map_delete_elem from bpf syscall. BPF syscall for
these three operations of cgrp storage has already disabled migration.

Two call sites of bpf_cgrp_storage_trylock() are bpf_cgrp_storage_get(),
and bpf_cgrp_storage_delete() helpers. The running contexts of these
helpers have already disabled migration.

Therefore, it is safe to remove migrate_disable() for these callers.
However, bpf_cgrp_storage_free() also invokes bpf_cgrp_storage_lock()
and its running context doesn't disable migration. Therefore, also add
the missed migrate_{disabled|enable} in bpf_cgrp_storage_free().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-5-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} in htab_elem_free

htab_elem_free() has two call-sites: delete_all_elements() has already
disabled migration, free_htab_elem() is invoked by other 4 functions:
__htab_map_lookup_and_delete_elem, __htab_map_lookup_and_delete_batch,
htab_map_update_elem and htab_map_delete_elem.

BPF syscall has already disabled migration before invoking
->map_update_elem, ->map_delete_elem, and ->map_lookup_and_delete_elem
callbacks for hash map. __htab_map_lookup_and_delete_batch() also
disables migration before invoking free_htab_elem(). ->map_update_elem()
and ->map_delete_elem() of hash map may be invoked by BPF program and
the running context of BPF program has already disabled migration.
Therefore, it is safe to remove the migration_{disable|enable} pair in
htab_elem_free()

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-4-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} in ->map_for_each_callback

BPF program may call bpf_for_each_map_elem(), and it will call
the ->map_for_each_callback callback of related bpf map. Considering the
running context of bpf program has already disabled migration, remove
the unnecessary migrate_{disable|enable} pair in the implementations of
->map_for_each_callback. To ensure the guarantee will not be voilated
later, also add cant_migrate() check in the implementations.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove migrate_{disable|enable} from LPM trie

Both bpf program and bpf syscall may invoke ->update or ->delete
operation for LPM trie. For bpf program, its running context has already
disabled migration explicitly through (migrate_disable()) or implicitly
through (preempt_disable() or disable irq). For bpf syscall, the
migration is disabled through the use of bpf_disable_instrumentation()
before invoking the corresponding map operation callback.

Therefore, it is safe to remove the migrate_{disable|enable){} pair from
LPM trie.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250108010728.207536-2-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add kprobe session recursion check test

Adding kprobe.session probe to bpf_kfunc_common_test that misses bpf
program execution due to recursion check and making sure it increases
the program missed count properly.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250106175048.1443905-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Return error for missed kprobe multi bpf program execution

When kprobe multi bpf program can't be executed due to recursion check,
we currently return 0 (success) to fprobe layer where it's ignored for
standard kprobe multi probes.

For kprobe session the success return value will make fprobe layer to
install return probe and try to execute it as well.

But the return session probe should not get executed, because the entry
part did not run. FWIW the return probe bpf program most likely won't get
executed, because its recursion check will likely fail as well, but we
don't need to run it in the first place.. also we can make this clear
and obvious.

It also affects missed counts for kprobe session program execution, which
are now doubled (extra count for not executed return probe).

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Link: https://lore.kernel.org/r/20250106175048.1443905-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Move out synchronize_rcu_tasks_trace from mutex CS

Commit ef1b808e3b7c ("bpf: Fix UAF via mismatching bpf_prog/attachment
RCU flavors") resolved a possible UAF issue in uprobes that attach
non-sleepable bpf prog by explicitly waiting for a tasks-trace-RCU grace
period. But, in the current implementation, synchronize_rcu_tasks_trace
is included within the mutex critical section, which increases the
length of the critical section and may affect performance. So let's move
out synchronize_rcu_tasks_trace from mutex CS.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250104013946.1111785-1-pulehui@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix range_tree_set() error handling

range_tree_set() might fail and return -ENOMEM,
causing subsequent `bpf_arena_alloc_pages` to fail.
Add the error handling.

Signed-off-by: Soma Nakata <soma.nakata@somane.sakura.ne.jp>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250106231536.52856-1-soma.nakata@somane.sakura.ne.jp
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: add -std=gnu11 to BPF_CFLAGS and CFLAGS

Latest versions of GCC BPF use C23 standard by default. This causes
compilation errors in vmlinux.h due to bool types declarations.

Add -std=gnu11 to BPF_CFLAGS and CFLAGS. This aligns with the version
of the standard used when building the kernel currently [1].

For more details see the discussions at [2] and [3].

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Makefile#n465
[2] https://lore.kernel.org/bpf/EYcXjcKDCJY7Yb0GGtAAb7nLKPEvrgWdvWpuNzXm2qi6rYMZDixKv5KwfVVMBq17V55xyC-A1wIjrqG3aw-Imqudo9q9X7D7nLU2gWgbN0w=@pm.me/
[3] https://lore.kernel.org/bpf/20250106202715.1232864-1-ihor.solodrai@pm.me/

CC: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Link: https://lore.kernel.org/r/20250107235813.2964472-1-ihor.solodrai@pm.me
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Handle prog/attach type comparison in veristat

Implemented handling of prog type and attach type stats comparison in
veristat.
To test this change:
```
./veristat pyperf600.bpf.o -o csv > base1.csv
./veristat pyperf600.bpf.o -o csv > base2.csv
./veristat -C base2.csv base1.csv -o csv
...,raw_tracepoint,raw_tracepoint,MATCH,
...,cgroup_inet_ingress,cgroup_inet_ingress,MATCH
```

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250106144321.32337-1-mykyta.yatsenko5@gmail.com

selftests/bpf: add -fno-strict-aliasing to BPF_CFLAGS

Following the discussion at [1], set -fno-strict-aliasing flag for all
BPF object build rules. Remove now unnecessary <test>-CFLAGS variables.

[1] https://lore.kernel.org/bpf/20250106185447.951609-1-ihor.solodrai@pm.me/

CC: Jose E. Marchesi <jose.marchesi@oracle.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@pm.me>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250106201728.1219791-1-ihor.solodrai@pm.me
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-allow-bpf_for-bpf_repeat-while-holding-spin'

Emil Tsalapatis says:

====================
In BPF programs, kfunc calls while holding a lock are not allowed
because kfuncs may sleep by default. The exception to this rule are the
functions in special_kfunc_list, which are guaranteed to not sleep. The
bpf_iter_num_* functions used by the bpf_for and bpf_repeat macros make
no function calls themselves, and as such are guaranteed to not sleep.
Add them to special_kfunc_list to allow them within BPF spinlock
critical sections.

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
====================

Link: https://patch.msgid.link/20250104202528.882482-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test bpf_for within spin lock section

Add a selftest to ensure BPF for loops within critical sections are
accepted by the verifier.

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250104202528.882482-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Allow bpf_for/bpf_repeat calls while holding a spinlock

Add the bpf_iter_num_* kfuncs called by bpf_for in special_kfunc_list,
and allow the calls even while holding a spin lock.

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250104202528.882482-2-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf/tests: Add 32 bits only long conditional jump tests

Commit f1517eb790f9 ("bpf/tests: Expand branch conversion JIT test")
introduced "Long conditional jump tests" but due to those tests making
use of 64 bits DIV and MOD, they don't get jited on powerpc/32,
leading to the long conditional jump test being skiped for unrelated
reason.

Add 4 new tests that are restricted to 32 bits ALU so that the jump
tests can also be performed on platforms that do no support 64 bits
operations.

Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/609f87a2d84e032c8d9ccb9ba7aebef893698f1e.1736154762.git.christophe.leroy@csgroup.eu

bpf, arm64: Emit A64_{ADD,SUB}_I when possible in emit_{lse,ll_sc}_atomic()

Currently in emit_{lse,ll_sc}_atomic(), if there is an offset, we add it
to the base address by doing e.g.:

  if (off) {
          emit_a64_mov_i(1, tmp, off, ctx);
          emit(A64_ADD(1, tmp, tmp, dst), ctx);
  [...]

As pointed out by Xu, we can use emit_a64_add_i() (added in the previous
patch) instead, which tries to combine the above into a single A64_ADD_I
or A64_SUB_I when possible.

Suggested-by: Xu Kuohai <xukuohai@huaweicloud.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/9ad3034a62361d91a99af24efa03f48c4c9e13ea.1735868489.git.yepeilin@google.com

bpf, arm64: Factor out emit_a64_add_i()

As suggested by Xu, factor out emit_a64_add_i() for later use. No
functional change.

Suggested-by: Xu Kuohai <xukuohai@huaweicloud.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/fedbaca80e6d8bd5bcba1ac5320dfbbdab14472e.1735868489.git.yepeilin@google.com

bpf, arm64: Simplify if logic in emit_lse_atomic()

Delete that unnecessary outer if clause. No functional change.

Signed-off-by: Peilin Ye <yepeilin@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/bpf/e8520e5503a489e2dea8526077976ae5a0ab1849.1735868489.git.yepeilin@google.com

selftests/bpf: Avoid generating untracked files when running bpf selftests

Currently, when we run the BPF selftests with the following command:

  make -C tools/testing/selftests TARGETS=bpf SKIP_TARGETS=""

The command generates untracked files and directories with make version
less than 4.4:

'''
Untracked files:
  (use "git add <file>..." to include in what will be committed)
tools/testing/selftests/bpfFEATURE-DUMP.selftests
tools/testing/selftests/bpffeature/
'''

We lost slash after word "bpf". The reason is slash appending code is as
follow:

'''
OUTPUT := $(OUTPUT)/
$(eval include ../../../build/Makefile.feature)
OUTPUT := $(patsubst %/,%,$(OUTPUT))
'''

This way of assigning values to OUTPUT will never be effective for the
variable OUTPUT provided via the command argument [1] and BPF makefile
is called from parent Makfile(tools/testing/selftests/Makefile) like:

'''
all:
  ...
$(MAKE) OUTPUT=$$BUILD_TARGET -C $$TARGET
'''

According to GNU make, we can use override Directive to fix this issue [2].

  [1] https://www.gnu.org/software/make/manual/make.html#Overriding
  [2] https://www.gnu.org/software/make/manual/make.html#Override-Directive

Fixes: dc3a8804d790 ("selftests/bpf: Adapt OUTPUT appending logic to lower versions of Make")
Signed-off-by: Jiayuan Chen <mrpre@163.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20241224075957.288018-1-mrpre@163.com

bpf: Reject struct_ops registration that uses module ptr and the module btf_id is missing

There is a UAF report in the bpf_struct_ops when CONFIG_MODULES=n.
In particular, the report is on tcp_congestion_ops that has
a "struct module *owner" member.

For struct_ops that has a "struct module *owner" member,
it can be extended either by the regular kernel module or
by the bpf_struct_ops. bpf_try_module_get() will be used
to do the refcounting and different refcount is done
based on the owner pointer. When CONFIG_MODULES=n,
the btf_id of the "struct module" is missing:

WARN: resolve_btfids: unresolved symbol module

Thus, the bpf_try_module_get() cannot do the correct refcounting.

Not all subsystem's struct_ops requires the "struct module *owner" member.
e.g. the recent sched_ext_ops.

This patch is to disable bpf_struct_ops registration if
the struct_ops has the "struct module *" member and the
"struct module" btf_id is missing. The btf_type_is_fwd() helper
is moved to the btf.h header file for this test.

This has happened since the beginning of bpf_struct_ops which has gone
through many changes. The Fixes tag is set to a recent commit that this
patch can apply cleanly. Considering CONFIG_MODULES=n is not
common and the age of the issue, targeting for bpf-next also.

Fixes: 1611603537a4 ("bpf: Create argument information for nullable arguments.")
Reported-by: Robert Morris <rtm@csail.mit.edu>
Closes: https://lore.kernel.org/bpf/74665.1733669976@localhost/
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20241220201818.127152-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Use refcount_t instead of atomic_t for mmap_count

Use an API that resembles more the actual use of mmap_count.

Found by cocci:
kernel/bpf/arena.c:245:6-25: WARNING: atomic_dec_and_test variation before object free at line 249.

Fixes: b90d77e5fd78 ("bpf: Fix remap of arena.")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202412292037.LXlYSHKl-lkp@intel.com/
Signed-off-by: Pei Xiao <xiaopei01@kylinos.cn>
Link: https://lore.kernel.org/r/6ecce439a6bc81adb85d5080908ea8959b792a50.1735542814.git.xiaopei01@kylinos.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove unused MT_ENTRY define

The range tree introduction removed the need for maple tree usage
but missed removing the MT_ENTRY defined value that was used to
mark maple tree allocated entries.
Remove the MT_ENTRY define.

Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Link: https://lore.kernel.org/r/20241223115901.14207-1-lpieralisi@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: fix veristat comp mode with new stats

Commit 82c1f13de315 ("selftests/bpf: Add more stats into veristat")
introduced new stats, added by default in the CSV output, that were not
added to parse_stat_value, used in parse_stats_csv which is used in
comparison mode. Thus it broke comparison mode altogether making it fail
with "Unrecognized stat #7" and EINVAL.

One quirk is that PROG_TYPE and ATTACH_TYPE have been transformed to
strings using libbpf_bpf_prog_type_str and libbpf_bpf_attach_type_str
respectively. Since we might not want to compare those string values, we
just skip the parsing in this patch. We might want to translate it back
to the enum value or compare the string value directly.

Fixes: 82c1f13de315 ("selftests/bpf: Add more stats into veristat")
Signed-off-by: Mahe Tardy <mahe.tardy@gmail.com>
Tested-by: Mykyta Yatsenko<yatsenko@meta.com>
Link: https://lore.kernel.org/r/20241220152218.28405-1-mahe.tardy@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix holes in special_kfunc_list if !CONFIG_NET

If the function is not available its entry has to be replaced with
BTF_ID_UNUSED instead of skipped.
Otherwise the list doesn't work correctly.

Reported-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Closes: https://lore.kernel.org/lkml/CAADnVQJQpVziHzrPCCpGE5=8uzw2OkxP8gqe1FkJ6_XVVyVbNw@mail.gmail.com/
Fixes: 00a5acdbf398 ("bpf: Fix configuration-dependent BTF function references")
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20241219-bpf-fix-special_kfunc_list-v1-1-d9d50dd61505@weissschuh.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-verifier-improve-precision-of-bpf_mul'

Matan Shachnai says:

====================
This patch-set aims to improve precision of BPF_MUL and add testcases
to illustrate precision gains using signed and unsigned bounds.

Changes from v1:
- Fixed typo made in patch.

Changes from v2:
- Added signed multiplication to BPF_MUL.
- Added test cases to exercise BPF_MUL.
- Reordered patches in the series.

Changes from v3:
- Coding style fixes.
====================

Link: https://patch.msgid.link/20241218032337.12214-1-m.shachnai@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add testcases for BPF_MUL

The previous commit improves precision of BPF_MUL.
Add tests to exercise updated BPF_MUL.

Signed-off-by: Matan Shachnai <m.shachnai@gmail.com>
Link: https://lore.kernel.org/r/20241218032337.12214-3-m.shachnai@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, verifier: Improve precision of BPF_MUL

This patch improves (or maintains) the precision of register value tracking
in BPF_MUL across all possible inputs. It also simplifies
scalar32_min_max_mul() and scalar_min_max_mul().

As it stands, BPF_MUL is composed of three functions:

case BPF_MUL:
  tnum_mul();
  scalar32_min_max_mul();
  scalar_min_max_mul();

The current implementation of scalar_min_max_mul() restricts the u64 input
ranges of dst_reg and src_reg to be within [0, U32_MAX]:

    /* Both values are positive, so we can work with unsigned and
     * copy the result to signed (unless it exceeds S64_MAX).
     */
    if (umax_val > U32_MAX || dst_reg->umax_value > U32_MAX) {
        /* Potential overflow, we know nothing */
        __mark_reg64_unbounded(dst_reg);
        return;
    }

This restriction is done to avoid unsigned overflow, which could otherwise
wrap the result around 0, and leave an unsound output where umin > umax. We
also observe that limiting these u64 input ranges to [0, U32_MAX] leads to
a loss of precision. Consider the case where the u64 bounds of dst_reg are
[0, 2^34] and the u64 bounds of src_reg are [0, 2^2]. While the
multiplication of these two bounds doesn't overflow and is sound [0, 2^36],
the current scalar_min_max_mul() would set the entire register state to
unbounded.

Importantly, we update BPF_MUL to allow signed bound multiplication
(i.e. multiplying negative bounds) as well as allow u64 inputs to take on
values from [0, U64_MAX]. We perform signed multiplication on two bounds
[a,b] and [c,d] by multiplying every combination of the bounds
(i.e. a*c, a*d, b*c, and b*d) and checking for overflow of each product. If
there is an overflow, we mark the signed bounds unbounded [S64_MIN, S64_MAX].
In the case of no overflow, we take the minimum of these products to
be the resulting smin, and the maximum to be the resulting smax.

The key idea here is that if there’s no possibility of overflow, either
when multiplying signed bounds or unsigned bounds, we can safely multiply the
respective bounds; otherwise, we set the bounds that exhibit overflow
(during multiplication) to unbounded.

if (check_mul_overflow(*dst_umax, src_reg->umax_value, dst_umax) ||
       (check_mul_overflow(*dst_umin, src_reg->umin_value, dst_umin))) {
        /* Overflow possible, we know nothing */
        *dst_umin = 0;
        *dst_umax = U64_MAX;
    }
  ...

Below, we provide an example BPF program (below) that exhibits the
imprecision in the current BPF_MUL, where the outputs are all unbounded. In
contrast, the updated BPF_MUL produces a bounded register state:

BPF_LD_IMM64(BPF_REG_1, 11),
BPF_LD_IMM64(BPF_REG_2, 4503599627370624),
BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0),
BPF_ALU64_IMM(BPF_NEG, BPF_REG_2, 0),
BPF_ALU64_REG(BPF_AND, BPF_REG_1, BPF_REG_2),
BPF_LD_IMM64(BPF_REG_3, 809591906117232263),
BPF_ALU64_REG(BPF_MUL, BPF_REG_3, BPF_REG_1),
BPF_MOV64_IMM(BPF_REG_0, 1),
BPF_EXIT_INSN(),

Verifier log using the old BPF_MUL:

func#0 @0
0: R1=ctx() R10=fp0
0: (18) r1 = 0xb                      ; R1_w=11
2: (18) r2 = 0x10000000000080         ; R2_w=0x10000000000080
4: (87) r2 = -r2                      ; R2_w=scalar()
5: (87) r2 = -r2                      ; R2_w=scalar()
6: (5f) r1 &= r2                      ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R2_w=scalar()
7: (18) r3 = 0xb3c3f8c99262687        ; R3_w=0xb3c3f8c99262687
9: (2f) r3 *= r1                      ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R3_w=scalar()
...

Verifier using the new updated BPF_MUL (more precise bounds at label 9)

func#0 @0
0: R1=ctx() R10=fp0
0: (18) r1 = 0xb                      ; R1_w=11
2: (18) r2 = 0x10000000000080         ; R2_w=0x10000000000080
4: (87) r2 = -r2                      ; R2_w=scalar()
5: (87) r2 = -r2                      ; R2_w=scalar()
6: (5f) r1 &= r2                      ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R2_w=scalar()
7: (18) r3 = 0xb3c3f8c99262687        ; R3_w=0xb3c3f8c99262687
9: (2f) r3 *= r1                      ; R1_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=11,var_off=(0x0; 0xb)) R3_w=scalar(smin=0,smax=umax=0x7b96bb0a94a3a7cd,var_off=(0x0; 0x7fffffffffffffff))
...

Finally, we proved the soundness of the new scalar_min_max_mul() and
scalar32_min_max_mul() functions. Typically, multiplication operations are
expensive to check with bitvector-based solvers. We were able to prove the
soundness of these functions using Non-Linear Integer Arithmetic (NIA)
theory. Additionally, using Agni [2,3], we obtained the encodings for
scalar32_min_max_mul() and scalar_min_max_mul() in bitvector theory, and
were able to prove their soundness using 8-bit bitvectors (instead of
64-bit bitvectors that the functions actually use).

In conclusion, with this patch,

1. We were able to show that we can improve the overall precision of
   BPF_MUL. We proved (using an SMT solver) that this new version of
   BPF_MUL is at least as precise as the current version for all inputs
   and more precise for some inputs.

2. We are able to prove the soundness of the new scalar_min_max_mul() and
   scalar32_min_max_mul(). By leveraging the existing proof of tnum_mul
   [1], we can say that the composition of these three functions within
   BPF_MUL is sound.

[1] https://ieeexplore.ieee.org/abstract/document/9741267
[2] https://link.springer.com/chapter/10.1007/978-3-031-37709-9_12
[3] https://people.cs.rutgers.edu/~sn349/papers/sas24-preprint.pdf

Co-developed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Matan Shachnai <m.shachnai@gmail.com>
Link: https://lore.kernel.org/r/20241218032337.12214-2-m.shachnai@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Set MFD_NOEXEC_SEAL when creating memfd

Starting from 105ff5339f49 ("mm/memfd: add MFD_NOEXEC_SEAL and
MFD_EXEC") and until 1717449b4417 ("memfd: drop warning for missing
exec-related flags"), the kernel would print a warning if neither
MFD_NOEXEC_SEAL nor MFD_EXEC is set in memfd_create().

If libbpf runs on on a kernel between these two commits (eg. on an
improperly backported system), it'll trigger this warning.

To avoid this warning (and also be more secure), explicitly set
MFD_NOEXEC_SEAL. But since libbpf can be run on potentially very old
kernels, leave a fallback for kernels without MFD_NOEXEC_SEAL support.

Signed-off-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/r/6e62c2421ad7eb1da49cbf16da95aaaa7f94d394.1735594195.git.dxu@dxuuu.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Clear out Python syntax warnings

Invalid escape sequences are used, and produced syntax warnings:

  $ test_bpftool_synctypes.py
  test_bpftool_synctypes.py:69: SyntaxWarning: invalid escape sequence '\['
    self.start_marker = re.compile(f'(static )?const bool {self.array_name}\[.*\] = {{\n')
  test_bpftool_synctypes.py:83: SyntaxWarning: invalid escape sequence '\['
    pattern = re.compile('\[(BPF_\w*)\]\s*= (true|false),?$')
  test_bpftool_synctypes.py:181: SyntaxWarning: invalid escape sequence '\s'
    pattern = re.compile('^\s*(BPF_\w+),?(\s+/\*.*\*/)?$')
  test_bpftool_synctypes.py:229: SyntaxWarning: invalid escape sequence '\*'
    start_marker = re.compile(f'\*{block_name}\* := {{')
  test_bpftool_synctypes.py:229: SyntaxWarning: invalid escape sequence '\*'
    start_marker = re.compile(f'\*{block_name}\* := {{')
  test_bpftool_synctypes.py:230: SyntaxWarning: invalid escape sequence '\*'
    pattern = re.compile('\*\*([\w/-]+)\*\*')
  test_bpftool_synctypes.py:248: SyntaxWarning: invalid escape sequence '\s'
    start_marker = re.compile(f'"\s*{block_name} := {{')
  test_bpftool_synctypes.py:249: SyntaxWarning: invalid escape sequence '\w'
    pattern = re.compile('([\w/]+) [|}]')
  test_bpftool_synctypes.py:267: SyntaxWarning: invalid escape sequence '\s'
    start_marker = re.compile(f'"\s*{macro}\s*" [|}}]')
  test_bpftool_synctypes.py:267: SyntaxWarning: invalid escape sequence '\s'
    start_marker = re.compile(f'"\s*{macro}\s*" [|}}]')
  test_bpftool_synctypes.py:268: SyntaxWarning: invalid escape sequence '\w'
    pattern = re.compile('([\w-]+) ?(?:\||}[ }\]])')
  test_bpftool_synctypes.py:287: SyntaxWarning: invalid escape sequence '\w'
    pattern = re.compile('(?:.*=\')?([\w/]+)')
  test_bpftool_synctypes.py:319: SyntaxWarning: invalid escape sequence '\w'
    pattern = re.compile('([\w-]+) ?(?:\||}[ }\]"])')
  test_bpftool_synctypes.py:341: SyntaxWarning: invalid escape sequence '\|'
    start_marker = re.compile('\|COMMON_OPTIONS\| replace:: {')
  test_bpftool_synctypes.py:342: SyntaxWarning: invalid escape sequence '\*'
    pattern = re.compile('\*\*([\w/-]+)\*\*')

Escaping them clears out the warnings.

  $ tools/testing/selftests/bpf/test_bpftool_synctypes.py; echo $?
  0

Signed-off-by: Ariel Otilibili <ariel.otilibili-anieli@eurecom.fr>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Quentin Monnet <qmo@kernel.org>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://docs.python.org/3/library/re.html
Link: https://lore.kernel.org/bpf/20241211220012.714055-2-ariel.otilibili-anieli@eurecom.fr

bpf: bpf_local_storage: Always use bpf_mem_alloc in PREEMPT_RT

In PREEMPT_RT, kmalloc(GFP_ATOMIC) is still not safe in non preemptible
context. bpf_mem_alloc must be used in PREEMPT_RT. This patch is
to enforce bpf_mem_alloc in the bpf_local_storage when CONFIG_PREEMPT_RT
is enabled.

[   35.118559] BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
[   35.118566] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 1832, name: test_progs
[   35.118569] preempt_count: 1, expected: 0
[   35.118571] RCU nest depth: 1, expected: 1
[   35.118577] INFO: lockdep is turned off.
    ...
[   35.118647]  __might_resched+0x433/0x5b0
[   35.118677]  rt_spin_lock+0xc3/0x290
[   35.118700]  ___slab_alloc+0x72/0xc40
[   35.118723]  __kmalloc_noprof+0x13f/0x4e0
[   35.118732]  bpf_map_kzalloc+0xe5/0x220
[   35.118740]  bpf_selem_alloc+0x1d2/0x7b0
[   35.118755]  bpf_local_storage_update+0x2fa/0x8b0
[   35.118784]  bpf_sk_storage_get_tracing+0x15a/0x1d0
[   35.118791]  bpf_prog_9a118d86fca78ebb_trace_inet_sock_set_state+0x44/0x66
[   35.118795]  bpf_trace_run3+0x222/0x400
[   35.118820]  __bpf_trace_inet_sock_set_state+0x11/0x20
[   35.118824]  trace_inet_sock_set_state+0x112/0x130
[   35.118830]  inet_sk_state_store+0x41/0x90
[   35.118836]  tcp_set_state+0x3b3/0x640

There is no need to adjust the gfp_flags passing to the
bpf_mem_cache_alloc_flags() which only honors the GFP_KERNEL.
The verifier has ensured GFP_KERNEL is passed only in sleepable context.

It has been an old issue since the first introduction of the
bpf_local_storage ~5 years ago, so this patch targets the bpf-next.

bpf_mem_alloc is needed to solve it, so the Fixes tag is set
to the commit when bpf_mem_alloc was first used in the bpf_local_storage.

Fixes: 08a7ce384e33 ("bpf: Use bpf_mem_cache_alloc/free in bpf_local_storage_elem")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20241218193000.2084281-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

veristat: Fix top source line stat collection

Fix comparator implementation to return most popular source code
lines instead of least.
Introduce min/max macro for building veristat outside of Linux
repository.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241217181113.364651-1-mykyta.yatsenko5@gmail.com

bpf: lsm: Remove hook to bpf_task_storage_free

free_task() already calls bpf_task_storage_free(). It is not necessary
to call it again on security_task_free(). Remove the hook.

Signed-off-by: Song Liu <song@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Matt Bobrowski <mattbobrowski@google.com>
Link: https://patch.msgid.link/20241212075956.2614894-1-song@kernel.org

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Cross-merge bpf fixes after downstream PR.

No conflicts.

Adjacent changes in:
Auto-merging include/linux/bpf.h
Auto-merging include/linux/bpf_verifier.h
Auto-merging kernel/bpf/btf.c
Auto-merging kernel/bpf/verifier.c
Auto-merging kernel/trace/bpf_trace.c
Auto-merging tools/testing/selftests/bpf/progs/test_tp_btf_nullable.c

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Linux 6.13-rc3

Merge tag 'arc-6.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc

Pull ARC fixes from Vineet Gupta:

- Sundry build and misc fixes

* tag 'arc-6.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc:
  ARC: build: Try to guess GCC variant of cross compiler
  ARC: bpf: Correct conditional check in 'check_jmp_32'
  ARC: dts: Replace deprecated snps,nr-gpios property for snps,dw-apb-gpio-port devices
  ARC: build: Use __force to suppress per-CPU cmpxchg warnings
  ARC: fix reference of dependency for PAE40 config
  ARC: build: disallow invalid PAE40 + 4K page config
  arc: rename aux.h to arc_aux.h

Merge tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi

Pull EFI fixes from Ard Biesheuvel:

- Limit EFI zboot to GZIP and ZSTD before it comes in wider use

- Fix inconsistent error when looking up a non-existent file in
   efivarfs with a name that does not adhere to the NAME-GUID format

- Drop some unused code

* tag 'efi-fixes-for-v6.13-1' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi:
  efi/esrt: remove esre_attribute::store()
  efivarfs: Fix error on non-existent file
  efi/zboot: Limit compression options to GZIP and ZSTD

Merge tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"i2c host fixes: PNX used the wrong unit for timeouts, Nomadik was
  missing a sentinel, and RIIC was missing rounding up"

* tag 'i2c-for-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: riic: Always round-up when calculating bus period
  i2c: nomadik: Add missing sentinel to match table
  i2c: pnx: Fix timeout in wait functions

Merge tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fix from Borislav Petkov:

- Make sure amd64_edac loads successfully on certain Zen4 memory
configurations

* tag 'edac_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/amd64: Simplify ECC check on unified memory controllers

Merge tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

- Disable the secure programming interface of the GIC500 chip in the
   RK3399 SoC to fix interrupt priority assignment and even make a dead
   machine boot again when the gic-v3 driver enables pseudo NMIs

- Correct the declaration of a percpu variable to fix several sparse
   warnings

* tag 'irq_urgent_for_v6.13_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/gic-v3: Work around insecure GIC integrations
  irqchip/gic: Correct declaration of *percpu_base pointer in union gic_base

Merge tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Borislav Petkov:

- Prevent incorrect dequeueing of the deadline dlserver helper task and
   fix its time accounting

- Properly track the CFS runqueue runnable stats

- Check the total number of all queued tasks in a sched fair's runqueue
   hierarchy before deciding to stop the tick

- Fix the scheduling of the task that got woken last (NEXT_BUDDY) by
   preventing those from being delayed

* tag 'sched_urgent_for_v6.13_rc3-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/dlserver: Fix dlserver time accounting
  sched/dlserver: Fix dlserver double enqueue
  sched/eevdf: More PELT vs DELAYED_DEQUEUE
  sched/fair: Fix sched_can_stop_tick() for fair tasks
  sched/fair: Fix NEXT_BUDDY

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM64:

   - Fix confusion with implicitly-shifted MDCR_EL2 masks breaking
     SPE/TRBE initialization

   - Align nested page table walker with the intended memory attribute
     combining rules of the architecture

   - Prevent userspace from constraining the advertised ASID width,
     avoiding horrors of guest TLBIs not matching the intended context
     in hardware

   - Don't leak references on LPIs when insertion into the translation
     cache fails

  RISC-V:

   - Replace csr_write() with csr_set() for HVIEN PMU overflow bit

  x86:

   - Cache CPUID.0xD XSTATE offsets+sizes during module init

     On Intel's Emerald Rapids CPUID costs hundreds of cycles and there
     are a lot of leaves under 0xD. Getting rid of the CPUIDs during
     nested VM-Enter and VM-Exit is planned for the next release, for
     now just cache them: even on Skylake that is 40% faster"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: x86: Cache CPUID.0xD XSTATE offsets+sizes during module init
  RISC-V: KVM: Fix csr_write -> csr_set for HVIEN PMU overflow bit
  KVM: arm64: vgic-its: Add error handling in vgic_its_cache_translation
  KVM: arm64: Do not allow ID_AA64MMFR0_EL1.ASIDbits to be overridden
  KVM: arm64: Fix S1/S2 combination when FWB==1 and S2 has Device memory type
  arm64: Fix usage of new shifted MDCR_EL2 values

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fix from James Bottomley:
"Single one-line fix in the ufs driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Update compl_time_stamp_local_clock after completing a cqe

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Daniel Borkmann:

- Fix a bug in the BPF verifier to track changes to packet data
   property for global functions (Eduard Zingerman)

- Fix a theoretical BPF prog_array use-after-free in RCU handling of
   __uprobe_perf_func (Jann Horn)

- Fix BPF tracing to have an explicit list of tracepoints and their
   arguments which need to be annotated as PTR_MAYBE_NULL (Kumar
   Kartikeya Dwivedi)

- Fix a logic bug in the bpf_remove_insns code where a potential error
   would have been wrongly propagated (Anton Protopopov)

- Avoid deadlock scenarios caused by nested kprobe and fentry BPF
   programs (Priya Bala Govindasamy)

- Fix a bug in BPF verifier which was missing a size check for
   BTF-based context access (Kumar Kartikeya Dwivedi)

- Fix a crash found by syzbot through an invalid BPF prog_array access
   in perf_event_detach_bpf_prog (Jiri Olsa)

- Fix several BPF sockmap bugs including a race causing a refcount
   imbalance upon element replace (Michal Luczaj)

- Fix a use-after-free from mismatching BPF program/attachment RCU
   flavors (Jann Horn)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: (23 commits)
  bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs
  selftests/bpf: Add tests for raw_tp NULL args
  bpf: Augment raw_tp arguments with PTR_MAYBE_NULL
  bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"
  selftests/bpf: Add test for narrow ctx load for pointer args
  bpf: Check size for BTF-based ctx access of pointer members
  selftests/bpf: extend changes_pkt_data with cases w/o subprograms
  bpf: fix null dereference when computing changes_pkt_data of prog w/o subprogs
  bpf: Fix theoretical prog_array UAF in __uprobe_perf_func()
  bpf: fix potential error return
  selftests/bpf: validate that tail call invalidates packet pointers
  bpf: consider that tail calls invalidate packet pointers
  selftests/bpf: freplace tests for tracking of changes_packet_data
  bpf: check changes_pkt_data property for extension programs
  selftests/bpf: test for changing packet data from global functions
  bpf: track changes_pkt_data property for global functions
  bpf: refactor bpf_helper_changes_pkt_data to use helper number
  bpf: add find_containing_subprog() utility function
  bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog
  bpf: Fix UAF via mismatching bpf_prog/attachment RCU flavors
  ...

bpf: Avoid deadlock caused by nested kprobe and fentry bpf programs

BPF program types like kprobe and fentry can cause deadlocks in certain
situations. If a function takes a lock and one of these bpf programs is
hooked to some point in the function's critical section, and if the
bpf program tries to call the same function and take the same lock it will
lead to deadlock. These situations have been reported in the following
bug reports.

In percpu_freelist -
Link: https://lore.kernel.org/bpf/CAADnVQLAHwsa+2C6j9+UC6ScrDaN9Fjqv1WjB1pP9AzJLhKuLQ@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEYm+9zduStsZaDnq93q1jPLqO-PiKX9jy0MuL8LCXmCrQ@mail.gmail.com/T/
In bpf_lru_list -
Link: https://lore.kernel.org/bpf/CAPPBnEajj+DMfiR_WRWU5=6A7KKULdB5Rob_NJopFLWF+i9gCA@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEZQDVN6VqnQXvVqGoB+ukOtHGZ9b9U0OLJJYvRoSsMY_g@mail.gmail.com/T/
Link: https://lore.kernel.org/bpf/CAPPBnEaCB1rFAYU7Wf8UxqcqOWKmRPU1Nuzk3_oLk6qXR7LBOA@mail.gmail.com/T/
Similar bugs have been reported by syzbot.
In queue_stack_maps -
Link: https://lore.kernel.org/lkml/0000000000004c3fc90615f37756@google.com/
Link: https://lore.kernel.org/all/20240418230932.2689-1-hdanton@sina.com/T/
In lpm_trie -
Link: https://lore.kernel.org/linux-kernel/00000000000035168a061a47fa38@google.com/T/
In ringbuf -
Link: https://lore.kernel.org/bpf/20240313121345.2292-1-hdanton@sina.com/T/
Prevent kprobe and fentry bpf programs from attaching to these critical
sections by removing CC_FLAGS_FTRACE for percpu_freelist.o,
bpf_lru_list.o, queue_stack_maps.o, lpm_trie.o, ringbuf.o files.

The bugs reported by syzbot are due to tracepoint bpf programs being
called in the critical sections. This patch does not aim to fix deadlocks
caused by tracepoint programs. However, it does prevent deadlocks from
occurring in similar situations due to kprobe and fentry programs.

Signed-off-by: Priya Bala Govindasamy <pgovind2@uci.edu>
Link: https://lore.kernel.org/r/CAPPBnEZpjGnsuA26Mf9kYibSaGLm=oF6=12L21X1GEQdqjLnzQ@mail.gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'usb-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes for some reported issues.
  Included in here are:

   - typec driver bugfixes

   - u_serial gadget driver bugfix for much reported and discussed issue

   - dwc2 bugfixes

   - midi gadget driver bugfix

   - ehci-hcd driver bugfix

   - other small bugfixes

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'usb-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  usb: typec: ucsi: Fix connector status writing past buffer size
  usb: typec: ucsi: Fix completion notifications
  usb: dwc2: Fix HCD port connection race
  usb: dwc2: hcd: Fix GetPortStatus & SetPortFeature
  usb: dwc2: Fix HCD resume
  usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer
  usb: misc: onboard_usb_dev: skip suspend/resume sequence for USB5744 SMBus support
  usb: dwc3: xilinx: make sure pipe clock is deselected in usb2 only mode
  usb: core: hcd: only check primary hcd skip_phy_initialization
  usb: gadget: midi2: Fix interpretation of is_midi1 bits
  usb: dwc3: imx8mp: fix software node kernel dump
  usb: typec: anx7411: fix OF node reference leaks in anx7411_typec_switch_probe()
  usb: typec: anx7411: fix fwnode_handle reference leak
  usb: host: max3421-hcd: Correctly abort a USB request.
  dt-bindings: phy: imx8mq-usb: correct reference to usb-switch.yaml
  usb: ehci-hcd: fix call balance of clocks handling routines

Merge tag 'tty-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty

Pull serial driver fixes from Greg KH:
"Here are two small serial driver fixes for 6.13-rc3.  They are:

   - ioport build fallout fix for the 8250 port driver that should
     resolve Guenter's runtime problems

   - sh-sci driver bugfix for a reported problem

  Both of these have been in linux-next for a while with no reported
  issues"

* tag 'tty-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  tty: serial: Work around warning backtrace in serial8250_set_defaults
  serial: sh-sci: Check if TX data was written to device in .tx_empty()

Merge tag 'staging-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are some small staging gpib driver build and bugfixes for issues
  that have been much-reported (should finally fix Guenter's build
  issues). There are more of these coming in later -rc releases, but for
  now this should fix the majority of the reported problems.

  All of these have been in linux-next for a while with no reported
  issues"

* tag 'staging-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: gpib: Fix i386 build issue
  staging: gpib: Fix faulty workaround for assignment in if
  staging: gpib: Workaround for ppc build failure
  staging: gpib: Make GPIB_NI_PCI_ISA depend on HAS_IOPORT

Merge tag 'v6.13-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto fixes from Herbert Xu:
"Fix a regression in rsassa-pkcs1 as well as a buffer overrun in
  hisilicon/debugfs"

* tag 'v6.13-p2' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
  crypto: hisilicon/debugfs - fix the struct pointer incorrectly offset problem
  crypto: rsassa-pkcs1 - Copy source data for SG list

Merge tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linux

Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:

   - Set bindgen's Rust target version to prevent issues when
     pairing older rustc releases with newer bindgen releases,
     such as bindgen >= 0.71.0 and rustc < 1.82 due to
     unsafe_extern_blocks.

  drm/panic:

   - Remove spurious empty line detected by a new Clippy warning"

* tag 'rust-fixes-6.13' of https://github.com/Rust-for-Linux/linux:
  rust: kbuild: set `bindgen`'s Rust target version
  drm/panic: remove spurious empty line to clean warning

Merge tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:

- Per-domain device-list locking fixes for the AMD IOMMU driver

- Fix incorrect use of smp_processor_id() in the NVidia-specific part
   of the ARM-SMMU-v3 driver

- Intel IOMMU driver fixes:
     - Remove cache tags before disabling ATS
     - Avoid draining PRQ in sva mm release path
     - Fix qi_batch NULL pointer with nested parent domain

* tag 'iommu-fixes-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/vt-d: Avoid draining PRQ in sva mm release path
  iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
  iommu/vt-d: Remove cache tags before disabling ATS
  iommu/amd: Add lockdep asserts for domain->dev_list
  iommu/amd: Put list_add/del(dev_data) back under the domain->lock
  iommu/tegra241-cmdqv: do not use smp_processor_id in preemptible context

Merge tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Damien Le Moal:

- Fix an OF node reference leak in the sata_highbank driver

* tag 'ata-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
ata: sata_highbank: fix OF node reference leak in highbank_initialize_phys()

Merge tag 'i2c-host-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/andi.shyti/linux into i2c/for-current

i2c-host-fixes for v6.13-rc3

- Replaced jiffies with msec for timeout calculations.
- Added a sentinel to the 'of_device_id' array in Nomadik.
- Rounded up bus period calculation in RIIC.

Merge tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- fix rmmod leak

- two minor cleanups

- fix for unlink/rename with pending i/o

* tag '6.13-rc2-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: destroy cfid_put_wq on module exit
  cifs: Use str_yes_no() helper in cifs_ses_add_channel()
  cifs: Fix rmdir failure due to ongoing I/O on deleted file
  smb3: fix compiler warning in reparse code

Merge tag 'spi-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"A few fairly small fixes for v6.13, the most substatial one being
  disabling STIG mode for Cadence QSPI controllers on Altera SoCFPGA
  platforms since it doesn't work"

* tag 'spi-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spi-cadence-qspi: Disable STIG mode for Altera SoCFPGA.
  spi: rockchip: Fix PM runtime count on no-op cs
  spi: aspeed: Fix an error handling path in aspeed_spi_[read|write]_user()

Merge tag 'regulator-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A couple of additional changes, one ensuring we give AXP717 enough
  time to stabilise after changing voltages which fixes serious
  stability issues on some platforms and another documenting the DT
  support required for the Qualcomm WCN6750"

* tag 'regulator-fix-v6.13-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: axp20x: AXP717: set ramp_delay
  regulator: dt-bindings: qcom,qca6390-pmu: document wcn6750-pmu

Merge tag 'drm-fixes-2024-12-14' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"This is the weekly fixes pull for drm. Just has i915, xe and amdgpu
  changes in it. Nothing too major in here:

  i915:
   - Don't use indexed register writes needlessly [dsb]
   - Stop using non-posted DSB writes for legacy LUT [color]
   - Fix NULL pointer dereference in capture_engine
   - Fix memory leak by correcting cache object name in error handler

  xe:
   - Fix a KUNIT test error message (Mirsad Todorovac)
   - Fix an invalidation fence PM ref leak (Daniele)
   - Fix a register pool UAF (Lucas)

  amdgpu:
   - ISP hw init fix
   - SR-IOV fixes
   - Fix contiguous VRAM mapping for UVD on older GPUs
   - Fix some regressions due to drm scheduler changes
   - Workload profile fixes
   - Cleaner shader fix

  amdkfd:
   - Fix DMA map direction for migration
   - Fix a potential null pointer dereference
   - Cacheline size fixes
   - Runtime PM fix"

* tag 'drm-fixes-2024-12-14' of https://gitlab.freedesktop.org/drm/kernel:
  drm/xe/reg_sr: Remove register pool
  drm/xe: Call invalidation_fence_fini for PT inval fences in error state
  drm/xe: fix the ERR_PTR() returned on failure to allocate tiny pt
  drm/amdkfd: pause autosuspend when creating pdd
  drm/amdgpu: fix when the cleaner shader is emitted
  drm/amdgpu: Fix ISP HW init issue
  drm/amdkfd: hard-code MALL cacheline size for gfx11, gfx12
  drm/amdkfd: hard-code cacheline size for gfx11
  drm/amdkfd: Dereference null return value
  drm/i915: Fix memory leak by correcting cache object name in error handler
  drm/i915: Fix NULL pointer dereference in capture_engine
  drm/i915/color: Stop using non-posted DSB writes for legacy LUT
  drm/i915/dsb: Don't use indexed register writes needlessly
  drm/amdkfd: Correct the migration DMA map direction
  drm/amd/pm: Set SMU v13.0.7 default workload type
  drm/amd/pm: Initialize power profile mode
  amdgpu/uvd: get ring reference from rq scheduler
  drm/amdgpu: fix UVD contiguous CS mapping problem
  drm/amdgpu: use sjt mec fw on gfx943 for sriov
  Revert "drm/amdgpu: Fix ISP hw init issue"

Merge tag 'pm-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management documentation fix from Rafael Wysocki:
"Fix a runtime PM documentation mistake that may mislead someone into
making a coding mistake (Paul Barker)"

* tag 'pm-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
Documentation: PM: Clarify pm_runtime_resume_and_get() return value

Merge tag 'acpi-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fixes from Rafael Wysocki:
"These fix two coding mistakes, one in the ACPI resources handling code
  and one in ACPICA:

   - Relocate the addr->info.mem.caching check in acpi_decode_space() to
     only execute it if the resource is of the correct type (Ilpo
     Järvinen)

   - Don't release a context_mutex that was never acquired in
     acpi_remove_address_space_handler() (Daniil Tatianin)"

* tag 'acpi-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPICA: events/evxfregn: don't release the ContextMutex that was never acquired
  ACPI: resource: Fix memory resource type union access

Merge branch 'explicit-raw_tp-null-arguments'

Kumar Kartikeya Dwivedi says:

====================
Explicit raw_tp NULL arguments

This set reverts the raw_tp masking changes introduced in commit
cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL") and
replaces it wwith an explicit list of tracepoints and their arguments
which need to be annotated as PTR_MAYBE_NULL. More context on the
fallout caused by the masking fix and subsequent discussions can be
found in [0].

To remedy this, we implement a solution of explicitly defined tracepoint
and define which args need to be marked NULL or scalar (for IS_ERR
case). The commit logs describes the details of this approach in detail.

We will follow up this solution an approach Eduard is working on to
perform automated analysis of NULL-ness of tracepoint arguments. The
current PoC is available here:

- LLVM branch with the analysis:
  https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params
- Python script for merging of analysis results:
  https://gist.github.com/eddyz87/e47c164466a60e8d49e6911cff146f47

The idea is to infer a tri-state verdict for each tracepoint parameter:
definitely not null, can be null, unknown (in which case no assumptions
should be made).

Using this information, the verifier in most cases will be able to
precisely determine the state of the tracepoint parameter without any
human effort. At that point, the table maintained manually in this set
can be dropped and replace with this automated analysis tool's result.
This will be kept up to date with each kernel release.

  [0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com

Changelog:
----------
v2 -> v3:
v2: https://lore.kernel.org/bpf/20241213175127.2084759-1-memxor@gmail.com

* Address Eduard's nits, add Reviewed-by

v1 -> v2:
v1: https://lore.kernel.org/bpf/20241211020156.18966-1-memxor@gmail.com

* Address comments from Jiri
   * Mark module tracepoints args NULL by default
   * Add more sunrpc tracepoints
   * Unify scalar or null handling
* Address comments from Alexei
   * Use bitmask approach suggested in review
   * Unify scalar or null handling
   * Drop most tests that rely on CONFIG options
   * Drop scripts to generate tests
====================

Link: https://patch.msgid.link/20241213221929.3495062-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tests for raw_tp NULL args

Add tests to ensure that arguments are correctly marked based on their
specified positions, and whether they get marked correctly as maybe
null. For modules, all tracepoint parameters should be marked
PTR_MAYBE_NULL by default.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Augment raw_tp arguments with PTR_MAYBE_NULL

Arguments to a raw tracepoint are tagged as trusted, which carries the
semantics that the pointer will be non-NULL.  However, in certain cases,
a raw tracepoint argument may end up being NULL. More context about this
issue is available in [0].

Thus, there is a discrepancy between the reality, that raw_tp arguments can
actually be NULL, and the verifier's knowledge, that they are never NULL,
causing explicit NULL check branch to be dead code eliminated.

A previous attempt [1], i.e. the second fixed commit, was made to
simulate symbolic execution as if in most accesses, the argument is a
non-NULL raw_tp, except for conditional jumps.  This tried to suppress
branch prediction while preserving compatibility, but surfaced issues
with production programs that were difficult to solve without increasing
verifier complexity. A more complete discussion of issues and fixes is
available at [2].

Fix this by maintaining an explicit list of tracepoints where the
arguments are known to be NULL, and mark the positional arguments as
PTR_MAYBE_NULL. Additionally, capture the tracepoints where arguments
are known to be ERR_PTR, and mark these arguments as scalar values to
prevent potential dereference.

Each hex digit is used to encode NULL-ness (0x1) or ERR_PTR-ness (0x2),
shifted by the zero-indexed argument number x 4. This can be represented
as follows:
1st arg: 0x1
2nd arg: 0x10
3rd arg: 0x100
... and so on (likewise for ERR_PTR case).

In the future, an automated pass will be used to produce such a list, or
insert __nullable annotations automatically for tracepoints. Each
compilation unit will be analyzed and results will be collated to find
whether a tracepoint pointer is definitely not null, maybe null, or an
unknown state where verifier conservatively marks it PTR_MAYBE_NULL.
A proof of concept of this tool from Eduard is available at [3].

Note that in case we don't find a specification in the raw_tp_null_args
array and the tracepoint belongs to a kernel module, we will
conservatively mark the arguments as PTR_MAYBE_NULL. This is because
unlike for in-tree modules, out-of-tree module tracepoints may pass NULL
freely to the tracepoint. We don't protect against such tracepoints
passing ERR_PTR (which is uncommon anyway), lest we mark all such
arguments as SCALAR_VALUE.

While we are it, let's adjust the test raw_tp_null to not perform
dereference of the skb->mark, as that won't be allowed anymore, and make
it more robust by using inline assembly to test the dead code
elimination behavior, which should still stay the same.

  [0]: https://lore.kernel.org/bpf/ZrCZS6nisraEqehw@jlelli-thinkpadt14gen4.remote.csb
  [1]: https://lore.kernel.org/all/20241104171959.2938862-1-memxor@gmail.com
  [2]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com
  [3]: https://github.com/eddyz87/llvm-project/tree/nullness-for-tracepoint-params

Reported-by: Juri Lelli <juri.lelli@redhat.com> # original bug
Reported-by: Manu Bretelle <chantra@meta.com> # bugs in masking fix
Fixes: 3f00c5239344 ("bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs")
Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Revert "bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"

This patch reverts commit
cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL"). The
patch was well-intended and meant to be as a stop-gap fixing branch
prediction when the pointer may actually be NULL at runtime. Eventually,
it was supposed to be replaced by an automated script or compiler pass
detecting possibly NULL arguments and marking them accordingly.

However, it caused two main issues observed for production programs and
failed to preserve backwards compatibility. First, programs relied on
the verifier not exploring == NULL branch when pointer is not NULL, thus
they started failing with a 'dereference of scalar' error. Next,
allowing raw_tp arguments to be modified surfaced the warning in the
verifier that warns against reg->off when PTR_MAYBE_NULL is set.

More information, context, and discusson on both problems is available
in [0]. Overall, this approach had several shortcomings, and the fixes
would further complicate the verifier's logic, and the entire masking
scheme would have to be removed eventually anyway.

Hence, revert the patch in preparation of a better fix avoiding these
issues to replace this commit.

[0]: https://lore.kernel.org/bpf/20241206161053.809580-1-memxor@gmail.com

Reported-by: Manu Bretelle <chantra@meta.com>
Fixes: cb4158ce8ec8 ("bpf: Mark raw_tp arguments with PTR_MAYBE_NULL")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20241213221929.3495062-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'block-6.13-20241213' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

- Series from Damien fixing issues with the zoned write plugging

- Fix for a potential UAF in block cgroups

- Fix deadlock around queue freezing and the sysfs lock

- Various little cleanups and fixes

* tag 'block-6.13-20241213' of git://git.kernel.dk/linux:
  block: Fix potential deadlock while freezing queue and acquiring sysfs_lock
  block: Fix queue_iostats_passthrough_show()
  blk-mq: Clean up blk_mq_requeue_work()
  mq-deadline: Remove a local variable
  blk-iocost: Avoid using clamp() on inuse in __propagate_weights()
  block: Make bio_iov_bvec_set() accept pointer to const iov_iter
  block: get wp_offset by bdev_offset_from_zone_start
  blk-cgroup: Fix UAF in blkcg_unpin_online()
  MAINTAINERS: update Coly Li's email address
  block: Prevent potential deadlocks in zone write plug error recovery
  dm: Fix dm-zoned-reclaim zone write pointer alignment
  block: Ignore REQ_NOWAIT for zone reset and zone finish operations
  block: Use a zone write plug BIO work for REQ_NOWAIT BIOs

Merge tag 'io_uring-6.13-20241213' of git://git.kernel.dk/linux

Pull io_uring fix from Jens Axboe:
"A single fix for a regression introduced in the 6.13 merge window"

* tag 'io_uring-6.13-20241213' of git://git.kernel.dk/linux:
io_uring/rsrc: don't put/free empty buffers

Merge tag 'libnvdimm-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm

Pull libnvdimm fix from Ira Weiny:

- sysbot fix for out of bounds access

* tag 'libnvdimm-fixes-6.13-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
acpi: nfit: vmalloc-out-of-bounds Read in acpi_nfit_ctl

bpf: Fix configuration-dependent BTF function references

These BTF functions are not available unconditionally,
only reference them when they are available.

Avoid the following build warnings:

  BTF     .tmp_vmlinux1.btf.o
btf_encoder__tag_kfunc: failed to find kfunc 'bpf_send_signal_task' in BTF
btf_encoder__tag_kfuncs: failed to tag kfunc 'bpf_send_signal_task'
  NM      .tmp_vmlinux1.syms
  KSYMS   .tmp_vmlinux1.kallsyms.S
  AS      .tmp_vmlinux1.kallsyms.o
  LD      .tmp_vmlinux2
  NM      .tmp_vmlinux2.syms
  KSYMS   .tmp_vmlinux2.kallsyms.S
  AS      .tmp_vmlinux2.kallsyms.o
  LD      vmlinux
  BTFIDS  vmlinux
WARN: resolve_btfids: unresolved symbol prog_test_ref_kfunc
WARN: resolve_btfids: unresolved symbol bpf_crypto_ctx
WARN: resolve_btfids: unresolved symbol bpf_send_signal_task
WARN: resolve_btfids: unresolved symbol bpf_modify_return_test_tp
WARN: resolve_btfids: unresolved symbol bpf_dynptr_from_xdp
WARN: resolve_btfids: unresolved symbol bpf_dynptr_from_skb

Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241213-bpf-cond-ids-v1-1-881849997219@weissschuh.net

Merge branch 'add-fd_array_cnt-attribute-for-bpf_prog_load'

Anton Protopopov says:

====================
Add fd_array_cnt attribute for BPF_PROG_LOAD

Add a new attribute to the bpf(BPF_PROG_LOAD) system call. If this
new attribute is non-zero, then the fd_array is considered to be a
continuous array of the fd_array_cnt length and to contain only
proper map file descriptors or btf file descriptors.

This change allows maps (and btfs), which aren't referenced directly
by a BPF program, to be bound to the program _and_ also to be present
during the program verification (so BPF_PROG_BIND_MAP is not enough
for this use case).

The primary reason for this change is that it is a prerequisite for
adding "instruction set" maps, which are both non-referenced by the
program and must be present during the program verification.

The first five commits add the new functionality, the sixth adds
corresponding self-tests, and the last one is a small additional fix.

v1 -> v2:
  * rewrite the add_fd_from_fd_array() function (Eduard)
  * a few cleanups in selftests (Eduard)

v2 -> v3:
  * various renamings (Alexei)
  * "0 is not special" (Alexei, Andrii)
  * do not alloc memory on fd_array init (Alexei)
  * fix leaking maps for error path (Hou Tao)
  * use libbpf helpers vs. raw syscalls (Andrii)
  * add comments on __btf_get_by_fd/__bpf_map_get (Alexei)
  * remove extra code (Alexei)

v3 -> v4:
  * simplify error path when parsing fd_array
  * libbpf: pass fd_array_cnt only in prog_load (Alexei)
  * selftests patch contained extra code (Alexei)
  * renames, fix comments (Alexei)

v4 -> v5:
  * Add btfs to env->used_btfs (Andrii)
  * Fix an integer overflow (Andrii)
  * A set of cleanups for selftests (Andrii)
====================

Link: https://patch.msgid.link/20241213130934.1087929-1-aspsk@isovalent.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

selftest/bpf: Replace magic constants by macros

Replace magic constants in a BTF structure initialization code by
proper macros, as is done in other similar selftests.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20241213130934.1087929-8-aspsk@isovalent.com