git.ipfire.org Git - thirdparty/kernel/linux.git/log

rqspinlock: Hardcode cond_acquire loops for arm64

Currently, for rqspinlock usage, the implementation of
smp_cond_load_acquire (and thus, atomic_cond_read_acquire) are
susceptible to stalls on arm64, because they do not guarantee that the
conditional expression will be repeatedly invoked if the address being
loaded from is not written to by other CPUs. When support for
event-streams is absent (which unblocks stuck WFE-based loops every
~100us), we may end up being stuck forever.

This causes a problem for us, as we need to repeatedly invoke the
RES_CHECK_TIMEOUT in the spin loop to break out when the timeout
expires.

Let us import the smp_cond_load_acquire_timewait implementation Ankur is
proposing in [0], and then fallback to it once it is merged.

While we rely on the implementation to amortize the cost of sampling
check_timeout for us, it will not happen when event stream support is
unavailable. This is not the common case, and it would be difficult to
fit our logic in the time_expr_ns >= time_limit_ns comparison, hence
just let it be.

[0]: https://lore.kernel.org/lkml/20250203214911.898276-1-ankur.a.arora@oracle.com

Cc: Ankur Arora <ankur.a.arora@oracle.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-9-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

rqspinlock: Add support for timeouts

Introduce policy macro RES_CHECK_TIMEOUT which can be used to detect
when the timeout has expired for the slow path to return an error. It
depends on being passed two variables initialized to 0: ts, ret. The
'ts' parameter is of type rqspinlock_timeout.

This macro resolves to the (ret) expression so that it can be used in
statements like smp_cond_load_acquire to break the waiting loop
condition.

The 'spin' member is used to amortize the cost of checking time by
dispatching to the implementation every 64k iterations. The
'timeout_end' member is used to keep track of the timestamp that denotes
the end of the waiting period. The 'ret' parameter denotes the status of
the timeout, and can be checked in the slow path to detect timeouts
after waiting loops.

The 'duration' member is used to store the timeout duration for each
waiting loop. The default timeout value defined in the header
(RES_DEF_TIMEOUT) is 0.25 seconds.

This macro will be used as a condition for waiting loops in the slow
path. Since each waiting loop applies a fresh timeout using the same
rqspinlock_timeout, we add a new RES_RESET_TIMEOUT as well to ensure the
values can be easily reinitialized to the default state.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-8-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

rqspinlock: Drop PV and virtualization support

Changes to rqspinlock in subsequent commits will be algorithmic
modifications, which won't remain in agreement with the implementations
of paravirt spin lock and virt_spin_lock support. These future changes
include measures for terminating waiting loops in slow path after a
certain point. While using a fair lock like qspinlock directly inside
virtual machines leads to suboptimal performance under certain
conditions, we cannot use the existing virtualization support before we
make it resilient as well. Therefore, drop it for now.

Note that we need to drop qspinlock_stat.h, as it's only relevant in
case of CONFIG_PARAVIRT_SPINLOCKS=y, but we need to keep lock_events.h
in the includes, which was indirectly pulled in before.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-7-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

rqspinlock: Add rqspinlock.h header

This header contains the public declarations usable in the rest of the
kernel for rqspinlock.

Let's also type alias qspinlock to rqspinlock_t to ensure consistent use
of the new lock type. We want to remove dependence on the qspinlock type
in later patches as we need to provide a test-and-set fallback, hence
begin abstracting away from now onwards.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

locking: Copy out qspinlock.c to kernel/bpf/rqspinlock.c

In preparation for introducing a new lock implementation, Resilient
Queued Spin Lock, or rqspinlock, we first begin our modifications by
using the existing qspinlock.c code as the base. Simply copy the code to
a new file and rename functions and variables from 'queued' to
'resilient_queued'.

Since we place the file in kernel/bpf, include needs to be relative.

This helps each subsequent commit in clearly showing how and where the
code is being changed. The only change after a literal copy in this
commit is renaming the functions where necessary, and rename qnodes to
rqnodes. Let's also use EXPORT_SYMBOL_GPL for rqspinlock slowpath.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

locking: Allow obtaining result of arch_mcs_spin_lock_contended

To support upcoming changes that require inspecting the return value
once the conditional waiting loop in arch_mcs_spin_lock_contended
terminates, modify the macro to preserve the result of
smp_cond_load_acquire. This enables checking the return value as needed,
which will help disambiguate the MCS node’s locked state in future
patches.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

locking: Move common qspinlock helpers to a private header

Move qspinlock helper functions that encode, decode tail word, set and
clear the pending and locked bits, and other miscellaneous definitions
and macros to a private header. To this end, create a qspinlock.h header
file in kernel/locking. Subsequent commits will introduce a modified
qspinlock slow path function, thus moving shared code to a private
header will help minimize unnecessary code duplication.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

locking: Move MCS struct definition to public header

Move the definition of the struct mcs_spinlock from the private
mcs_spinlock.h header in kernel/locking to the mcs_spinlock.h
asm-generic header, since we will need to reference it from the
qspinlock.h header in subsequent commits.

Reviewed-by: Barret Rhoden <brho@google.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250316040541.108729-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Make perf_event_read_output accessible in all program types.

The perf_event_read_event_output helper is currently only available to
tracing protrams, but is useful for other BPF programs like sched_ext
schedulers. When the helper is available, provide its bpf_func_proto
directly from the bpf base_proto.

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20250318030753.10949-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpftool-using-the-right-format-specifiers'

Jiayuan Chen says:

====================
bpftool: Using the right format specifiers

This patch adds the -Wformat-signedness compiler flag to detect and
prevent format string errors, where signed or unsigned types are
mismatched with format specifiers. Additionally, it fixes some format
string errors that were not fully addressed by the previous patch [1].

[1] https://lore.kernel.org/bpf/20250207123706.727928-1-mrpre@163.com/T/#u
---
v1->v2:
https://lore.kernel.org/bpf/20250310142037.45932-1-jiayuan.chen@linux.dev/
---
====================

Link: https://patch.msgid.link/20250311112809.81901-1-jiayuan.chen@linux.dev
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

bpftool: Using the right format specifiers

Fixed some formatting specifiers errors, such as using %d for int and %u
for unsigned int, as well as other byte-length types.

Perform type cast using the type derived from the data type itself, for
example, if it's originally an int, it will be cast to unsigned int if
forced to unsigned.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250311112809.81901-3-jiayuan.chen@linux.dev

bpftool: Add -Wformat-signedness flag to detect format errors

This commit adds the -Wformat-signedness compiler flag to detect and
prevent printf format errors, where signed or unsigned types are
mismatched with format specifiers. This helps to catch potential issues at
compile-time, ensuring that our code is more robust and reliable. With
this flag, the compiler will now warn about incorrect format strings, such
as using %d with unsigned types or %u with signed types.

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250311112809.81901-2-jiayuan.chen@linux.dev

Merge branch 'support-freplace-prog-from-user-namespace'

Mykyta Yatsenko says:

====================
Support freplace prog from user namespace

From: Mykyta Yatsenko <yatsenko@meta.com>

Freplace programs can't be loaded from user namespace, as
bpf_program__set_attach_target() requires searching for target prog BTF,
which is locked under CAP_SYS_ADMIN.
This patch set enables this use case by:
1. Relaxing capable check in bpf's BPF_BTF_GET_FD_BY_ID, check for CAP_BPF
instead of CAP_SYS_ADMIN, support BPF token in attr argument.
2. Pass BPF token around libbpf from bpf_program__set_attach_target() to
bpf syscall where capable check is.
3. Validate positive/negative scenarios in selftests

This patch set is enabled by the recent libbpf change[1], that
introduced bpf_object__prepare() API. Calling bpf_object__prepare() for
freplace program before bpf_program__set_attach_target() initializes BPF
token, which is then passed to bpf syscall by libbpf.

[1] https://lore.kernel.org/all/20250303135752.158343-1-mykyta.yatsenko5@gmail.com/
====================

Link: https://patch.msgid.link/20250317174039.161275-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

selftests/bpf: Test freplace from user namespace

Add selftests to verify that it is possible to load freplace program
from user namespace if BPF token is initialized by bpf_object__prepare
before calling bpf_program__set_attach_target.
Negative test is added as well.

Modified type of the priv_prog to xdp, as kprobe did not work on aarch64
and s390x.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250317174039.161275-5-mykyta.yatsenko5@gmail.com

libbpf: Pass BPF token from find_prog_btf_id to BPF_BTF_GET_FD_BY_ID

Pass BPF token from bpf_program__set_attach_target to
BPF_BTF_GET_FD_BY_ID bpf command.
When freplace program attaches to target program, it needs to look up
for BTF of the target, this may require BPF token, if, for example,
running from user namespace.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250317174039.161275-4-mykyta.yatsenko5@gmail.com

bpf: Return prog btf_id without capable check

Return prog's btf_id from bpf_prog_get_info_by_fd regardless of capable
check. This patch enables scenario, when freplace program, running
from user namespace, requires to query target prog's btf.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250317174039.161275-3-mykyta.yatsenko5@gmail.com

bpf: BPF token support for BPF_BTF_GET_FD_BY_ID

Currently BPF_BTF_GET_FD_BY_ID requires CAP_SYS_ADMIN, which does not
allow running it from user namespace. This creates a problem when
freplace program running from user namespace needs to query target
program BTF.
This patch relaxes capable check from CAP_SYS_ADMIN to CAP_BPF and adds
support for BPF token that can be passed in attributes to syscall.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250317174039.161275-2-mykyta.yatsenko5@gmail.com

bpf, x86: Fix objtool warning for timed may_goto

Kernel test robot reported "call without frame pointer save/setup"
warning in objtool. This will make stack traces unreliable on
CONFIG_UNWINDER_FRAME_POINTER=y, however it works on
CONFIG_UNWINDER_ORC=y. Fix this by creating a stack frame for the
function.

Fixes: 2fb761823ead ("bpf, x86: Add x86 JIT support for timed may_goto")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202503071350.QOhsHVaW-lkp@intel.com/
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250315013039.1625048-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Check map->record at the beginning of check_and_free_fields()

When there are no special fields in the map value, there is no need to
invoke bpf_obj_free_fields(). Therefore, checking the validity of
map->record in advance.

After the change, the benchmark result of the per-cpu update case in
map_perf_test increased by 40% under a 16-CPU VM.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250315150930.1511727-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix sockopt selftest failure on powerpc

The SO_RCVLOWAT option is defined as 18 in the selftest header,
which matches the generic definition. However, on powerpc,
SO_RCVLOWAT is defined as 16. This discrepancy causes
sol_socket_sockopt() to fail with the default switch case on powerpc.

This commit fixes by defining SO_RCVLOWAT as 16 for powerpc.

Signed-off-by: Saket Kumar Bhaskar <skb99@linux.ibm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20250311084647.3686544-1-skb99@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix string read in strncmp benchmark

The strncmp benchmark uses the bpf_strncmp helper and a hand-written
loop to compare two strings. The values of the strings are filled from
userspace. One of the strings is non-const (in .bss) while the other is
const (in .rodata) since that is the requirement of bpf_strncmp.

The problem is that in the hand-written loop, Clang optimizes the reads
from the const string to always return 0 which breaks the benchmark.

Use barrier_var to prevent the optimization.

The effect can be seen on the strncmp-no-helper variant.

Before this change:

    # ./bench strncmp-no-helper
    Setting up benchmark 'strncmp-no-helper'...
    Benchmark 'strncmp-no-helper' started.
    Iter   0 (112.309us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   1 (-23.238us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   2 ( 58.994us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   3 (-30.466us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   4 ( 29.996us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   5 ( 16.949us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Iter   6 (-60.035us): hits    0.000M/s (  0.000M/prod), drops    0.000M/s, total operations    0.000M/s
    Summary: hits    0.000 ± 0.000M/s (  0.000M/prod), drops    0.000 ± 0.000M/s, total operations    0.000 ± 0.000M/s

After this change:

    # ./bench strncmp-no-helper
    Setting up benchmark 'strncmp-no-helper'...
    Benchmark 'strncmp-no-helper' started.
    Iter   0 ( 77.711us): hits    5.534M/s (  5.534M/prod), drops    0.000M/s, total operations    5.534M/s
    Iter   1 ( 11.215us): hits    6.006M/s (  6.006M/prod), drops    0.000M/s, total operations    6.006M/s
    Iter   2 (-14.253us): hits    5.931M/s (  5.931M/prod), drops    0.000M/s, total operations    5.931M/s
    Iter   3 ( 59.087us): hits    6.005M/s (  6.005M/prod), drops    0.000M/s, total operations    6.005M/s
    Iter   4 (-21.379us): hits    6.010M/s (  6.010M/prod), drops    0.000M/s, total operations    6.010M/s
    Iter   5 (-20.310us): hits    5.861M/s (  5.861M/prod), drops    0.000M/s, total operations    5.861M/s
    Iter   6 ( 53.937us): hits    6.004M/s (  6.004M/prod), drops    0.000M/s, total operations    6.004M/s
    Summary: hits    5.969 ± 0.061M/s (  5.969M/prod), drops    0.000 ± 0.000M/s, total operations    5.969 ± 0.061M/s

Fixes: 9c42652f8be3 ("selftests/bpf: Add benchmark for bpf_strncmp() helper")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Viktor Malik <vmalik@redhat.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/bpf/20250313122852.1365202-1-vmalik@redhat.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix arena_spin_lock compilation on PowerPC

Venkat reported a compilation error for BPF selftests on PowerPC [0].
The crux of the error is the following message:
  In file included from progs/arena_spin_lock.c:7:
  /root/bpf-next/tools/testing/selftests/bpf/bpf_arena_spin_lock.h:122:8:
  error: member reference base type '__attribute__((address_space(1)))
  u32' (aka '__attribute__((address_space(1))) unsigned int') is not a
  structure or union
     122 |         old = atomic_read(&lock->val);

This is because PowerPC overrides the qspinlock type changing the
lock->val member's type from atomic_t to u32.

To remedy this, import the asm-generic version in the arena spin lock
header, name it __qspinlock (since it's aliased to arena_spinlock_t, the
actual name hardly matters), and adjust the selftest to not depend on
the type in vmlinux.h.

  [0]: https://lore.kernel.org/bpf/7bc80a3b-d708-4735-aa3b-6a8c21720f9d@linux.ibm.com

Fixes: 88d706ba7cc5 ("selftests/bpf: Introduce arena spin lock")
Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com>
Link: https://lore.kernel.org/bpf/20250311154244.3775505-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: preload: Add MODULE_DESCRIPTION

Modpost complains when extra warnings are enabled:

WARNING: modpost: missing MODULE_DESCRIPTION() in kernel/bpf/preload/bpf_preload.o

Add a description from the Kconfig help text.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250310134920.4123633-1-arnd@kernel.org
----
Not sure if that description actually fits what the module does. If not,
please add a different description instead.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: bpftool: Setting error code in do_loader()

We are missing setting error code in do_loader() when
bpf_object__open_file() fails. This means the command's exit status code
will be successful, even though the operation failed. So make sure to
return the correct error code. To maintain consistency with other
locations where bpf_object__open_file() is called, return -1.

[0] Closes: https://github.com/libbpf/bpftool/issues/156

Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Sewon Nam <swnam0729@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Quentin Monnet <qmo@kernel.org>
Reviewed-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/d3b5b4b4-19bb-4619-b4dd-86c958c4a367@stanley.mountain/t/#u
Link: https://lore.kernel.org/bpf/20250311031238.14865-1-swnam0729@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'security-propagate-caller-information-in-bpf-hooks'

Blaise Boscaccy says:

====================
While trying to implement an eBPF gatekeeper program, we ran into an
issue whereas the LSM hooks are missing some relevant data.

Certain subcommands passed to the bpf() syscall can be invoked from
either the kernel or userspace. Additionally, some fields in the
bpf_attr struct contain pointers, and depending on where the
subcommand was invoked, they could point to either user or kernel
memory. One example of this is the bpf_prog_load subcommand and its
fd_array. This data is made available and used by the verifier but not
made available to the LSM subsystem. This patchset simply exposes that
information to applicable LSM hooks.

Change list:
- v6 -> v7
  - use gettid/pid in lieu of getpid/tgid in test condition
- v5 -> v6
  - fix regression caused by is_kernel renaming
  - simplify test logic
- v4 -> v5
  - merge v4 selftest breakout patch back into a single patch
  - change "is_kernel" to "kernel"
  - add selftest using new kernel flag
- v3 -> v4
  - split out selftest changes into a separate patch
- v2 -> v3
  - reorder params so that the new boolean flag is the last param
  - fixup function signatures in bpf selftests
- v1 -> v2
  - Pass a boolean flag in lieu of bpfptr_t

Revisions:
- v6
  https://lore.kernel.org/bpf/20250308013314.719150-1-bboscaccy@linux.microsoft.com/
- v5
  https://lore.kernel.org/bpf/20250307213651.3065714-1-bboscaccy@linux.microsoft.com/
- v4
  https://lore.kernel.org/bpf/20250304203123.3935371-1-bboscaccy@linux.microsoft.com/
- v3
  https://lore.kernel.org/bpf/20250303222416.3909228-1-bboscaccy@linux.microsoft.com/
- v2
  https://lore.kernel.org/bpf/20250228165322.3121535-1-bboscaccy@linux.microsoft.com/
- v1
  https://lore.kernel.org/bpf/20250226003055.1654837-1-bboscaccy@linux.microsoft.com/
====================

Link: https://patch.msgid.link/20250310221737.821889-1-bboscaccy@linux.microsoft.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Convert comma to semicolon

Replace comma between expressions with semicolons.

Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.

Found by inspection.
No functional change intended.
Compile tested only.

Signed-off-by: Chen Ni <nichen@iscas.ac.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Anton Protopopov <aspsk@isovalent.com>
Link: https://lore.kernel.org/bpf/20250310032045.651068-1-nichen@iscas.ac.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add a kernel flag test for LSM bpf hook

This test exercises the kernel flag added to security_bpf by
effectively blocking light-skeletons from loading while allowing
normal skeletons to function as-is. Since this should work with any
arbitrary BPF program, an existing program from LSKELS_EXTRA was
used as a test payload.

Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250310221737.821889-3-bboscaccy@linux.microsoft.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix selection of static vs. dynamic LLVM

The Makefile uses the exit code of the `llvm-config --link-static --libs`
command to choose between statically-linked and dynamically-linked LLVMs.
The stdout and stderr of that command are redirected to /dev/null.
To redirect the output the "&>" construction is used, which might not be
supported by /bin/sh, which is executed by make for $(shell ...) commands.
On such systems the test will fail even if static LLVM is actually
supported. Replace "&>" by ">/dev/null 2>&1" to fix this.

Fixes: 2a9d30fac818 ("selftests/bpf: Support dynamically linking LLVM if static is not available")
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Xu <dxu@dxuuu.xyz>
Link: https://lore.kernel.org/bpf/20250310145112.1261241-1-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

security: Propagate caller information in bpf hooks

Certain bpf syscall subcommands are available for usage from both
userspace and the kernel. LSM modules or eBPF gatekeeper programs may
need to take a different course of action depending on whether or not
a BPF syscall originated from the kernel or userspace.

Additionally, some of the bpf_attr struct fields contain pointers to
arbitrary memory. Currently the functionality to determine whether or
not a pointer refers to kernel memory or userspace memory is exposed
to the bpf verifier, but that information is missing from various LSM
hooks.

Here we augment the LSM hooks to provide this data, by simply passing
a boolean flag indicating whether or not the call originated in the
kernel, in any hook that contains a bpf_attr struct that corresponds
to a subcommand that may be called from the kernel.

Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com>
Acked-by: Song Liu <song@kernel.org>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20250310221737.821889-2-bboscaccy@linux.microsoft.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-introduce-helper-for-populating-bpf_cpumask'

Emil Tsalapatis says:

====================
bpf: introduce helper for populating bpf_cpumask

Some BPF programs like scx schedulers have their own internal CPU mask types,
mask types, which they must transform into struct bpf_cpumask instances
before passing them to scheduling-related kfuncs. There is currently no
way to efficiently populate the bitfield of a bpf_cpumask from BPF memory,
and programs must use multiple bpf_cpumask_[set, clear] calls to do so.
Introduce a kfunc helper to populate the bitfield of a bpf_cpumask from valid
BPF memory with a single call.

Changelog :
-----------
v6->v7
v6:https://lore.kernel.org/bpf/20250307153847.8530-1-emil@etsalapatis.com/

Addressed feedback by Hou Tao:
* Removed RUN_TESTS invocation causing tests to run twice
* Added is_test_task guard to new selftests
* Removed extraneous __success attribute from existing selftests

v5->v6
v5:https://lore.kernel.org/bpf/20250307041738.6665-1-emil@etsalapatis.com/

Addressed feedback by Hou Tao:
* Removed __success attributes from cpumask selftests
* Fixed stale patch description that used old function name

v4->v5
v4: https://lore.kernel.org/bpf/20250305211235.368399-1-emil@etsalapatis.com/

Addressed feedback by Hou Tao:
* Readded the tests in tools/selftests/bpf/prog_tests/cpumask.c,
turns out the selftest entries were not duplicates.
* Removed stray whitespace in selftest.
* Add patch the missing selftest to prog_tests/cpumask.c
* Explicitly annotate all cpumask selftests with __success

The last patch could very well be its own cleanup patch, but I rolled it into
this series because it came up in the discussion. If the last patch in the
series has any issues I'd be fine with applying the first 3 patches and dealing
with it separately.

v3->v4
v3: https://lore.kernel.org/bpf/20250305161327.203396-1-emil@etsalapatis.com/

* Removed new tests from tools/selftests/bpf/prog_tests/cpumask.c because
they were being run twice.

Addressed feedback by Alexei Starovoitov:
* Added missing return value in function kdoc
* Added an additional patch fixing some missing kdoc fields in
kernel/bpf/cpumask.c

Addressed feedback by Tejun Heo:
* Renamed the kfunc to bpf_cpumask_populate to avoid confusion
w/ bitmap_fill()

v2->v3
v2: https://lore.kernel.org/bpf/20250305021020.1004858-1-emil@etsalapatis.com/

Addressed feedback by Alexei Starovoitov:
* Added back patch descriptions dropped from v1->v2
* Elide the alignment check for archs with efficient
unaligned accesses

v1->v2
v1: https://lore.kernel.org/bpf/20250228003321.1409285-1-emil@etsalapatis.com/

Addressed feedback by Hou Tao:
* Add check that the input buffer is aligned to sizeof(long)
* Adjust input buffer size check to use bitmap_size()
* Add selftest for checking the bit pattern of the bpf_cpumask
* Moved all selftests into existing files

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
====================

Link: https://patch.msgid.link/20250309230427.26603-1-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests: bpf: fix duplicate selftests in cpumask_success.

The BPF cpumask selftests are currently run twice in
test_progs/cpumask.c, once by traversing cpumask_success_testcases, and
once by invoking RUN_TESTS(cpumask_success). Remove the invocation of
RUN_TESTS to properly run the selftests only once.

Now that the tests are run only through cpumask_success_testscases, add
to it the missing test_refcount_null_tracking testcase. Also remove the
__success annotation from it, since it is now loaded and invoked by the
runner.

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250309230427.26603-5-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'selftests-bpf-move-test_lwt_seg6local-to-test_progs'

Bastien Curutchet says:

====================
This patch series continues the work to migrate the script tests into
prog_tests.

test_lwt_seg6local.sh tests some bpf_lwt_* helpers. It contains only one
test that uses a network topology quite different than the ones that
can be found in others prog_tests/lwt_*.c files so I add a new
prog_tests/lwt_seg6local.c file.

While working on the migration I noticed that some routes present in the
script weren't needed so PATCH 1 deletes them and then PATCH 2 migrates
the test into the test_progs framework.
====================

Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250307-seg6local-v1-0-990fff8f180d@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: fix missing kdoc string fields in cpumask.c

Some bpf_cpumask-related kfuncs have kdoc strings that are missing
return values. Add a the missing descriptions for the return values.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250309230427.26603-4-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix cap_enable_effective() return code

The caller of cap_enable_effective() expects negative error code.
Fix it.

Before:
  failed to restore CAP_SYS_ADMIN: -1, Unknown error -1

After:
  failed to restore CAP_SYS_ADMIN: -3, No such process
  failed to restore CAP_SYS_ADMIN: -22, Invalid argument

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250305022234.44932-1-yangfeng59949@163.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: lwt_seg6local: Move test to test_progs

test_lwt_seg6local.sh isn't used by the BPF CI.

Add a new file in the test_progs framework to migrate the tests done by
test_lwt_seg6local.sh. It uses the same network topology and the same BPF
programs located in progs/test_lwt_seg6local.c.
Use the network helpers instead of `nc` to exchange the final packet.

Remove test_lwt_seg6local.sh and its Makefile entry.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20250307-seg6local-v1-2-990fff8f180d@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests: bpf: add bpf_cpumask_populate selftests

Add selftests for the bpf_cpumask_populate helper that sets a
bpf_cpumask to a bit pattern provided by a BPF program.

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20250309230427.26603-3-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix dangling stdout seen by traffic monitor thread

Traffic monitor thread may see dangling stdout as the main thread closes
and reassigns stdout without protection. This happens when the main thread
finishes one subtest and moves to another one in the same netns_new()
scope.

The issue can be reproduced by running test_progs repeatedly with traffic
monitor enabled:

for ((i=1;i<=100;i++)); do
./test_progs -a flow_dissector_skb* -m '*'
done

For restoring stdout in crash_handler(), since it does not really care
about closing stdout, simlpy flush stdout and restore it to the original
one.

Then, Fix the issue by consolidating stdio_restore_cleanup() and
stdio_restore(), and protecting the use/close/assignment of stdout with
a lock. The locking in the main thread is always performed regradless of
whether traffic monitor is running or not for simplicity. It won't have
any side-effect.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250305182057.2802606-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: lwt_seg6local: Remove unused routes

Some routes in fb00:: are initialized during setup, even though they
aren't needed by the test as the UDP packets will travel through the
lightweight tunnels.

Remove these unnecessary routes.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Link: https://lore.kernel.org/r/20250307-seg6local-v1-1-990fff8f180d@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: add kfunc for populating cpumask bits

Add a helper kfunc that sets the bitmap of a bpf_cpumask from BPF memory.

Signed-off-by: Emil Tsalapatis (Meta) <emil@etsalapatis.com>
Acked-by: Hou Tao <houtao1@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Link: https://lore.kernel.org/r/20250309230427.26603-2-emil@etsalapatis.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Allow assigning traffic monitor print function

Allow users to change traffic monitor's print function. If not provided,
traffic monitor will print to stdout by default.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250305182057.2802606-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Clean up call sites of stdio_restore()

reset_affinity() and save_ns() are only called in run_one_test(). There is
no need to call stdio_restore() in reset_affinity() and save_ns() if
stdio_restore() is moved right after a test finishes in run_one_test().

Also remove an unnecessary check of env.stdout_saved in crash_handler()
by moving env.stdout_saved assignment to the beginning of main().

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250305182057.2802606-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Move test_lwt_ip_encap to test_progs

test_lwt_ip_encap.sh isn't used by the BPF CI.

Add a new file in the test_progs framework to migrate the tests done by
test_lwt_ip_encap.sh. It uses the same network topology and the same BPF
programs located in progs/test_lwt_ip_encap.c.
Rework the GSO part to avoid using nc and dd.

Remove test_lwt_ip_encap.sh and its Makefile entry.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250304-lwt_ip-v1-1-8fdeb9e79a56@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'arena-spin-lock'

Kumar Kartikeya Dwivedi says:

====================
Arena Spin Lock

This set provides an implementation of queued spin lock for arena.
There is no support for resiliency and recovering from deadlocks yet.
We will wait for the rqspinlock patch set to land before incorporating
support.

One minor change compared to the qspinlock algorithm in the kernel is
that we don't have the trylock fallback when nesting count exceeds 4.
The maximum number of supported CPUs is 1024, but this can be increased
in the future if necessary.

The API supports returning an error, so resiliency support can be added
in the future. Callers are still expected to check for and handle any
potential errors.

Errors are returned when the spin loops time out, when the number of
CPUs is greater than 1024, or when the extreme edge case of NMI
interrupting NMI interrupting HardIRQ interrupting SoftIRQ interrupting
task, all of them simultaneously in slow path, occurs, which is
unsupported.

Changelog:
----------
v4 -> v5
v4: https://lore.kernel.org/bpf/20250305045136.2614132-1-memxor@gmail.com

* Add better comment and document LLVM bug for __unqual_typeof.
* Switch to precise counting in the selftest and simplify test.
* Add comment about return value handling.
* Reduce size for 100k to 50k to cap test runtime.

v3 -> v4
v3: https://lore.kernel.org/bpf/20250305011849.1168917-1-memxor@gmail.com

* Drop extra corruption handling case in decode_tail.
* Stick to 1, 1k, 100k critical section sizes.
* Fix unqual_typeof to not cast away arena tag for pointers.
* Remove hack to skip first qnode.
* Choose 100 as repeat count, 1000 is too much for 100k size.
* Use pthread_barrier in test.

v2 -> v3
v2: https://lore.kernel.org/bpf/20250118162238.2621311-1-memxor@gmail.com

* Rename to arena_spin_lock
* Introduce cond_break_label macro to jump to label from cond_break.
* Drop trylock fallback when nesting count exceeds 4.
* Fix bug in try_cmpxchg implementation.
* Add tests with critical sections of varying lengths.
* Add comments for _Generic trick to drop __arena tag.
* Fix bug due to qnodes being placed on first page, leading to CPU 0's
node being indistinguishable from NULL.

v1 -> v2
v1: https://lore.kernel.org/bpf/20250117223754.1020174-1-memxor@gmail.com

* Fix definition of lock in selftest
====================

Link: https://patch.msgid.link/20250306035431.2186189-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, docs: Fix broken link to renamed bpf_iter_task_vmas.c

This file was renamed from bpf_iter_task_vma.c.

Fixes: 45b38941c81f ("selftests/bpf: Rename bpf_iter_task_vma.c to bpf_iter_task_vmas.c")
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20250304204520.201115-1-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tests for arena spin lock

Add some basic selftests for qspinlock built over BPF arena using
cond_break_label macro.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250306035431.2186189-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Introduce arena spin lock

Implement queued spin lock algorithm as BPF program for lock words
living in BPF arena.

The algorithm is copied from kernel/locking/qspinlock.c and adapted for
BPF use.

We first implement abstract helpers for portable atomics and
acquire/release load instructions, by relying on X86_64 presence to
elide expensive barriers and rely on implementation details of the JIT,
and fall back to slow but correct implementations elsewhere. When
support for acquire/release load/stores lands, we can improve this
state.

Then, the qspinlock algorithm is adapted to remove dependence on
multi-word atomics due to lack of support in BPF ISA. For instance,
xchg_tail cannot use 16-bit xchg, and needs to be a implemented as a
32-bit try_cmpxchg loop.

Loops which are seemingly infinite from verifier PoV are annotated with
cond_break_label macro to return an error. Only 1024 NR_CPUs are
supported.

Note that the slow path is a global function, hence the verifier doesn't
know the return value's precision. The recommended way of usage is to
always test against zero for success, and not ret < 0 for error, as the
verifier would assume ret > 0 has not been accounted for. Add comments
in the function documentation about this quirk.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250306035431.2186189-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Introduce cond_break_label

Add a new cond_break_label macro that jumps to the specified label when
the cond_break termination check fires, and allows us to better handle
the uncontrolled termination of the loop.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250306035431.2186189-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: correct use/def for may_goto instruction

may_goto instruction does not use any registers,
but in compute_insn_live_regs() it was treated as a regular
conditional jump of kind BPF_K with r0 as source register.
Thus unnecessarily marking r0 as used.

Fixes: 14c8552db644 ("bpf: simple DFA-based live registers analysis")
Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250305085436.2731464-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-simple-dfa-based-live-registers-analysis'

Eduard Zingerman says:

====================
bpf: simple DFA-based live registers analysis

This patch-set introduces a simple live registers DFA analysis.
Analysis is done as a separate step before main verification pass.
Results are stored in the env->insn_aux_data for each instruction.

The change helps with iterator/callback based loops handling,
as regular register liveness marks are not finalized while
loops are processed. See veristat results in patch #2.

Note: for regular subprogram calls analysis conservatively assumes
that r1-r5 are used, and r0 is used at each 'exit' instruction.
Experiments show that adding logic handling these cases precisely has
no impact on verification performance.

The patch set was tested by disabling the current register parentage
chain liveness computation, using DFA-based liveness for registers
while assuming all stack slots as live. See discussion in [1].

Changes v2 -> v3:
- added support for BPF_LOAD_ACQ, BPF_STORE_REL atomics (Alexei);
- correct use marks for r0 for BPF_CMPXCHG.

Changes v1 -> v2:
- added a refactoring commit extracting utility functions:
  jmp_offset(), verbose_insn() (Alexei);
- added a refactoring commit extracting utility function
  get_call_summary() in order to share helper/kfunc related code with
  mark_fastcall_pattern_for_call() (Alexei);
- comment in the compute_insn_live_regs() extended (Alexei).

Changes RFC -> v1:
- parameter count for helpers and kfuncs is taken into account;
- copy_verifier_state() bugfix had been merged as a separate
  patch-set and is no longer a part of this patch set.

RFC: https://lore.kernel.org/bpf/20250122120442.3536298-1-eddyz87@gmail.com/
v1:  https://lore.kernel.org/bpf/20250228060032.1425870-1-eddyz87@gmail.com/
v2:  https://lore.kernel.org/bpf/20250304074239.2328752-1-eddyz87@gmail.com/
[1]  https://lore.kernel.org/bpf/cc29975fbaf163d0c2ed904a9a4d6d9452177542.camel@gmail.com/
====================

Link: https://patch.msgid.link/20250304195024.2478889-1-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test cases for compute_live_registers()

Cover instructions from each kind:
- assignment
- arithmetic
- store/load
- endian conversion
- atomics
- branches, conditional branches, may_goto, calls
- LD_ABS/LD_IND
- address_space_cast

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250304195024.2478889-6-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'introduce-load-acquire-and-store-release-bpf-instructions'

Peilin Ye says:

====================
Introduce load-acquire and store-release BPF instructions

This patchset adds kernel support for BPF load-acquire and store-release
instructions (for background, please see [1]), including core/verifier
and arm64/x86-64 JIT compiler changes, as well as selftests.  riscv64 is
also planned to be supported.  The corresponding LLVM changes can be
found at:

  https://github.com/llvm/llvm-project/pull/108636

The first 3 patches from v4 have already been applied:

  - [bpf-next,v4,01/10] bpf/verifier: Factor out atomic_ptr_type_ok()
    https://git.kernel.org/bpf/bpf-next/c/b2d9ef71d4c9
  - [bpf-next,v4,02/10] bpf/verifier: Factor out check_atomic_rmw()
    https://git.kernel.org/bpf/bpf-next/c/d430c46c7580
  - [bpf-next,v4,03/10] bpf/verifier: Factor out check_load_mem() and check_store_reg()
    https://git.kernel.org/bpf/bpf-next/c/d38ad248fb7a

Please refer to the LLVM PR and individual kernel patches for details.
Thanks!

v5: https://lore.kernel.org/all/cover.1741046028.git.yepeilin@google.com/
v5..v6 change:

  o (Alexei) avoid using #ifndef in verifier.c

v4: https://lore.kernel.org/bpf/cover.1740978603.git.yepeilin@google.com/
v4..v5 notable changes:

  o (kernel test robot) for 32-bit arches: make the verifier reject
                        64-bit load-acquires/store-releases, and fix
                        build error in interpreter changes
    * tested ARCH=arc build following instructions from kernel test
      robot
  o (Alexei) drop Documentation/ patch (v4 10/10) for now

v3: https://lore.kernel.org/bpf/cover.1740009184.git.yepeilin@google.com/
v3..v4 notable changes:

  o (Alexei) add x86-64 JIT support (including arena)
  o add Acked-by: tags from Xu

v2: https://lore.kernel.org/bpf/cover.1738888641.git.yepeilin@google.com/
v2..v3 notable changes:

  o (Alexei) change encoding to BPF_LOAD_ACQ=0x100, BPF_STORE_REL=0x110
  o add Acked-by: tags from Ilya and Eduard
  o make new selftests depend on:
    * __clang_major__ >= 18, and
    * ENABLE_ATOMICS_TESTS is defined (currently this means -mcpu=v3 or
      v4), and
    * JIT supports load_acq/store_rel (currenty only arm64)
  o work around llvm-17 CI job failure by conditionally define
    __arena_global variables as 64-bit if __clang_major__ < 18, to make
    sure .addr_space.1 has no holes
  o add Google copyright notice in new files

v1: https://lore.kernel.org/all/cover.1737763916.git.yepeilin@google.com/
v1..v2 notable changes:

  o (Eduard) for x86 and s390, make
             bpf_jit_supports_insn(..., /*in_arena=*/true) return false
     for load_acq/store_rel
  o add Eduard's Acked-by: tag
  o (Eduard) extract LDX and non-ATOMIC STX handling into helpers, see
             PATCH v2 3/9
  o allow unpriv programs to store-release pointers to stack
  o (Alexei) make it clearer in the interpreter code (PATCH v2 4/9) that
             only W and DW are supported for atomic RMW
  o test misaligned load_acq/store_rel
  o (Eduard) other selftests/ changes:
    * test load_acq/store_rel with !atomic_ptr_type_ok() pointers:
      - PTR_TO_CTX, for is_ctx_reg()
      - PTR_TO_PACKET, for is_pkt_reg()
      - PTR_TO_FLOW_KEYS, for is_flow_key_reg()
      - PTR_TO_SOCKET, for is_sk_reg()
    * drop atomics/ tests
    * delete unnecessary 'pid' checks from arena_atomics/ tests
    * avoid depending on __BPF_FEATURE_LOAD_ACQ_STORE_REL, use
      __imm_insn() and inline asm macros instead

RFC v1: https://lore.kernel.org/all/cover.1734742802.git.yepeilin@google.com
RFC v1..v1 notable changes:

  o 1-2/8: minor verifier.c refactoring patches
  o   3/8: core/verifier changes
         * (Eduard) handle load-acquire properly in backtrack_insn()
         * (Eduard) avoid skipping checks (e.g.,
                    bpf_jit_supports_insn()) for load-acquires
         * track the value stored by store-releases, just like how
           non-atomic STX instructions are handled
         * (Eduard) add missing link in commit message
         * (Eduard) always print 'r' for disasm.c changes
  o   4/8: arm64/insn: avoid treating load_acq/store_rel as
           load_ex/store_ex
  o   5/8: arm64/insn: add load_acq/store_rel
         * (Xu) include Should-Be-One (SBO) bits in "mask" and "value",
                to avoid setting fixed bits during runtime (JIT-compile
                time)
  o   6/8: arm64 JIT compiler changes
         * (Xu) use emit_a64_add_i() for "pointer + offset" to optimize
                code emission
  o   7/8: selftests
         * (Eduard) avoid adding new tests to the 'test_verifier' runner
         * add more tests, e.g., checking mark_precise logic
  o   8/8: instruction-set.rst changes

[1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/

Thanks,
====================

Link: https://patch.msgid.link/cover.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: use register liveness information for func_states_equal

Liveness analysis DFA computes a set of registers live before each
instruction. Leverage this information to skip comparison of dead
registers in func_states_equal(). This helps with convergance of
iterator processing loops, as bpf_reg_state->live marks can't be used
when loops are processed.

This has certain performance impact for selftests, here is a veristat
listing using `-f "insns_pct>5" -f "!insns<200"`

selftests:

File                  Program                        States (A)  States (B)  States  (DIFF)
--------------------  -----------------------------  ----------  ----------  --------------
arena_htab.bpf.o      arena_htab_llvm                        37          35     -2 (-5.41%)
arena_htab_asm.bpf.o  arena_htab_asm                         37          33    -4 (-10.81%)
arena_list.bpf.o      arena_list_add                         37          22   -15 (-40.54%)
dynptr_success.bpf.o  test_dynptr_copy                       22          16    -6 (-27.27%)
dynptr_success.bpf.o  test_dynptr_copy_xdp                   68          58   -10 (-14.71%)
iters.bpf.o           checkpoint_states_deletion            918          40  -878 (-95.64%)
iters.bpf.o           clean_live_states                     136          66   -70 (-51.47%)
iters.bpf.o           iter_nested_deeply_iters               43          37    -6 (-13.95%)
iters.bpf.o           iter_nested_iters                      72          62   -10 (-13.89%)
iters.bpf.o           iter_pass_iter_ptr_to_subprog          30          26    -4 (-13.33%)
iters.bpf.o           iter_subprog_iters                     68          59    -9 (-13.24%)
iters.bpf.o           loop_state_deps2                       35          32     -3 (-8.57%)
iters_css.bpf.o       iter_css_for_each                      32          29     -3 (-9.38%)
pyperf600_iter.bpf.o  on_event                              286         192   -94 (-32.87%)

Total progs: 3578
Old success: 2061
New success: 2061
States diff min:  -95.64%
States diff max:    0.00%
-100 .. -90  %: 1
-55 .. -45  %: 3
-45 .. -35  %: 2
-35 .. -25  %: 5
-20 .. -10  %: 12
-10 .. 0    %: 6

sched_ext:

File               Program                 States (A)  States (B)  States   (DIFF)
-----------------  ----------------------  ----------  ----------  ---------------
bpf.bpf.o          lavd_dispatch                 8950        7065  -1885 (-21.06%)
bpf.bpf.o          lavd_init                      516         480     -36 (-6.98%)
bpf.bpf.o          layered_dispatch               662         501   -161 (-24.32%)
bpf.bpf.o          layered_dump                   298         237    -61 (-20.47%)
bpf.bpf.o          layered_init                   523         423   -100 (-19.12%)
bpf.bpf.o          layered_init_task               24          22      -2 (-8.33%)
bpf.bpf.o          layered_runnable               151         125    -26 (-17.22%)
bpf.bpf.o          p2dq_dispatch                   66          53    -13 (-19.70%)
bpf.bpf.o          p2dq_init                      170         142    -28 (-16.47%)
bpf.bpf.o          refresh_layer_cpumasks         120          78    -42 (-35.00%)
bpf.bpf.o          rustland_init                   37          34      -3 (-8.11%)
bpf.bpf.o          rustland_init                   37          34      -3 (-8.11%)
bpf.bpf.o          rusty_select_cpu               125         108    -17 (-13.60%)
scx_central.bpf.o  central_dispatch                59          43    -16 (-27.12%)
scx_central.bpf.o  central_init                    39          28    -11 (-28.21%)
scx_nest.bpf.o     nest_init                       58          51     -7 (-12.07%)
scx_pair.bpf.o     pair_dispatch                  142         111    -31 (-21.83%)
scx_qmap.bpf.o     qmap_dispatch                  174         141    -33 (-18.97%)
scx_qmap.bpf.o     qmap_init                      768         654   -114 (-14.84%)

Total progs: 216
Old success: 186
New success: 186
States diff min:  -35.00%
States diff max:    0.00%
-35 .. -25  %: 3
-25 .. -20  %: 6
-20 .. -15  %: 6
-15 .. -5   %: 7
  -5 .. 0    %: 6

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250304195024.2478889-5-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add selftests for load-acquire and store-release instructions

Add several ./test_progs tests:

  - arena_atomics/load_acquire
  - arena_atomics/store_release
  - verifier_load_acquire/*
  - verifier_store_release/*
  - verifier_precision/bpf_load_acquire
  - verifier_precision/bpf_store_release

The last two tests are added to check if backtrack_insn() handles the
new instructions correctly.

Additionally, the last test also makes sure that the verifier
"remembers" the value (in src_reg) we store-release into e.g. a stack
slot.  For example, if we take a look at the test program:

    #0:  r1 = 8;
      /* store_release((u64 *)(r10 - 8), r1); */
    #1:  .8byte %[store_release];
    #2:  r1 = *(u64 *)(r10 - 8);
    #3:  r2 = r10;
    #4:  r2 += r1;
    #5:  r0 = 0;
    #6:  exit;

At #1, if the verifier doesn't remember that we wrote 8 to the stack,
then later at #4 we would be adding an unbounded scalar value to the
stack pointer, which would cause the program to be rejected:

  VERIFIER LOG:
  =============
...
  math between fp pointer and register with unbounded min value is not allowed

For easier CI integration, instead of using built-ins like
__atomic_{load,store}_n() which depend on the new
__BPF_FEATURE_LOAD_ACQ_STORE_REL pre-defined macro, manually craft
load-acquire/store-release instructions using __imm_insn(), as suggested
by Eduard.

All new tests depend on:

  (1) Clang major version >= 18, and
  (2) ENABLE_ATOMICS_TESTS is defined (currently implies -mcpu=v3 or
      v4), and
  (3) JIT supports load-acquire/store-release (currently arm64 and
      x86-64)

In .../progs/arena_atomics.c:

  /* 8-byte-aligned */
  __u8 __arena_global load_acquire8_value = 0x12;
  /* 1-byte hole */
  __u16 __arena_global load_acquire16_value = 0x1234;

That 1-byte hole in the .addr_space.1 ELF section caused clang-17 to
crash:

  fatal error: error in backend: unable to write nop sequence of 1 bytes

To work around such llvm-17 CI job failures, conditionally define
__arena_global variables as 64-bit if __clang_major__ < 18, to make sure
.addr_space.1 has no holes.  Ideally we should avoid compiling this file
using clang-17 at all (arena tests depend on
__BPF_FEATURE_ADDR_SPACE_CAST, and are skipped for llvm-17 anyway), but
that is a separate topic.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/1b46c6feaf0f1b6984d9ec80e500cc7383e9da1a.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: simple DFA-based live registers analysis

Compute may-live registers before each instruction in the program.
The register is live before the instruction I if it is read by I or
some instruction S following I during program execution and is not
overwritten between I and S.

This information would be used in the next patch as a hint in
func_states_equal().

Use a simple algorithm described in [1] to compute this information:
- define the following:
  - I.use : a set of all registers read by instruction I;
  - I.def : a set of all registers written by instruction I;
  - I.in  : a set of all registers that may be alive before I execution;
  - I.out : a set of all registers that may be alive after I execution;
  - I.successors : a set of instructions S that might immediately
                   follow I for some program execution;
- associate separate empty sets 'I.in' and 'I.out' with each instruction;
- visit each instruction in a postorder and update corresponding
  'I.in' and 'I.out' sets as follows:

      I.out = U [S.in for S in I.successors]
      I.in  = (I.out / I.def) U I.use

  (where U stands for set union, / stands for set difference)
- repeat the computation while I.{in,out} changes for any instruction.

On implementation side keep things as simple, as possible:
- check_cfg() already marks instructions EXPLORED in post-order,
  modify it to save the index of each EXPLORED instruction in a vector;
- represent I.{in,out,use,def} as bitmasks;
- don't split the program into basic blocks and don't maintain the
  work queue, instead:
  - do fixed-point computation by visiting each instruction;
  - maintain a simple 'changed' flag if I.{in,out} for any instruction
    change;
  Measurements show that even such simplistic implementation does not
  add measurable verification time overhead (for selftests, at-least).

Note on check_cfg() ex_insn_beg/ex_done change:
To avoid out of bounds access to env->cfg.insn_postorder array,
it should be guaranteed that instruction transitions to EXPLORED state
only once. Previously this was not the fact for incorrect programs
with direct calls to exception callbacks.

The 'align' selftest needs adjustment to skip computed insn/live
registers printout. Otherwise it matches lines from the live registers
printout.

[1] https://en.wikipedia.org/wiki/Live-variable_analysis

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250304195024.2478889-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, x86: Support load-acquire and store-release instructions

Recently we introduced BPF load-acquire (BPF_LOAD_ACQ) and store-release
(BPF_STORE_REL) instructions.  For x86-64, simply implement them as
regular BPF_LDX/BPF_STX loads and stores.  The verifier always rejects
misaligned load-acquires/store-releases (even if BPF_F_ANY_ALIGNMENT is
set), so emitted MOV* instructions are guaranteed to be atomic.

Arena accesses are supported.  8- and 16-bit load-acquires are
zero-extending (i.e., MOVZBQ, MOVZWQ).

Rename emit_atomic{,_index}() to emit_atomic_rmw{,_index}() to make it
clear that they only handle read-modify-write atomics, and extend their
@atomic_op parameter from u8 to u32, since we are starting to use more
than the lowest 8 bits of the 'imm' field.

Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/d22bb3c69f126af1d962b7314f3489eff606a3b7.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: get_call_summary() utility function

Refactor mark_fastcall_pattern_for_call() to extract a utility
function get_call_summary(). For a helper or kfunc call this function
fills the following information: {num_params, is_void, fastcall}.

This function would be used in the next patch in order to get number
of parameters of a helper or kfunc call.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250304195024.2478889-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, arm64: Support load-acquire and store-release instructions

Support BPF load-acquire (BPF_LOAD_ACQ) and store-release
(BPF_STORE_REL) instructions in the arm64 JIT compiler.  For example
(assuming little-endian):

  db 10 00 00 00 01 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000100): BPF_LOAD_ACQ

The JIT compiler would emit an LDAR instruction for the above, e.g.:

  ldar  x7, [x0]

Similarly, consider the following 16-bit store-release:

  cb 21 00 00 10 01 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00  exit

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000110): BPF_STORE_REL

An STLRH instruction would be emitted, e.g.:

  stlrh  w1, [x0]

For a complete mapping:

  load-acquire     8-bit  LDARB
(BPF_LOAD_ACQ)   16-bit  LDARH
                  32-bit  LDAR (32-bit)
                  64-bit  LDAR (64-bit)
  store-release    8-bit  STLRB
(BPF_STORE_REL)  16-bit  STLRH
                  32-bit  STLR (32-bit)
                  64-bit  STLR (64-bit)

Arena accesses are supported.
bpf_jit_supports_insn(..., /*in_arena=*/true) always returns true for
BPF_LOAD_ACQ and BPF_STORE_REL instructions, as they don't depend on
ARM64_HAS_LSE_ATOMICS.

Acked-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/51664a1300710238ba2d4d95142b57a52c4f0cae.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: jmp_offset() and verbose_insn() utility functions

Extract two utility functions:
- One BPF jump instruction uses .imm field to encode jump offset,
  while the rest use .off. Encapsulate this detail as jmp_offset()
  function.
- Avoid duplicating instruction printing callback definitions by
  defining a verbose_insn() function, which disassembles an
  instruction into the verifier log while hiding this detail.

These functions will be used in the next patch.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20250304195024.2478889-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

arm64: insn: Add load-acquire and store-release instructions

Add load-acquire ("load_acq", LDAR{,B,H}) and store-release
("store_rel", STLR{,B,H}) instructions.  Breakdown of encoding:

                                size        L   (Rs)  o0 (Rt2) Rn    Rt
             mask (0x3fdffc00): 00 111111 1 1 0 11111 1  11111 00000 00000
  value, load_acq (0x08dffc00): 00 001000 1 1 0 11111 1  11111 00000 00000
value, store_rel (0x089ffc00): 00 001000 1 0 0 11111 1  11111 00000 00000

As suggested by Xu [1], include all Should-Be-One (SBO) bits ("Rs" and
"Rt2" fields) in the "mask" and "value" numbers.

It is worth noting that we are adding the "no offset" variant of STLR
instead of the "pre-index" variant, which has a different encoding.

Reference: Arm Architecture Reference Manual (ARM DDI 0487K.a,
           ID032224),

  * C6.2.161 LDAR
  * C6.2.353 STLR

[1] https://lore.kernel.org/bpf/4e6641ce-3f1e-4251-8daf-4dd4b77d08c4@huaweicloud.com/

Acked-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/ba92057b7502ce4c9c9b03b7d637abe5e178134e.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

arm64: insn: Add BIT(23) to {load,store}_ex's mask

We are planning to add load-acquire (LDAR{,B,H}) and store-release
(STLR{,B,H}) instructions to insn.{c,h}; add BIT(23) to mask of load_ex
and store_ex to prevent aarch64_insn_is_{load,store}_ex() from returning
false-positives for load-acquire and store-release instructions.

Reference: Arm Architecture Reference Manual (ARM DDI 0487K.a,
           ID032224),

  * C6.2.228 LDXR
  * C6.2.165 LDAXR
  * C6.2.161 LDAR
  * C6.2.393 STXR
  * C6.2.360 STLXR
  * C6.2.353 STLR

Acked-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/5a4d2a52b2cc022bf86d0b572789f0b3bc3d5162.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'timed-may_goto'

Kumar Kartikeya Dwivedi says:

====================
Timed may_goto

This series replaces the current implementation of cond_break, which
uses the may_goto instruction, and counts 8 million iterations per stack
frame, with an implementation based on sampling time locally on the CPU.

This is done to permit a longer time for a given loop per-program
invocation. The accounting is still done per-stack frame, but the count
is used to instead amortize the cost of the logic to sample and check
the time spent since the start.

This is needed for expressing more complicated algorithms (spin locks,
waiting loops, etc.) in BPF programs without false positive expiration
of the loop. For instance, the plan is to make use of this for
implementing spin locks for BPF arena [0].

For the loop as follows:

for (int i = 0;; i++) {}

Testing on a bare-metal Sapphire Rapids Intel server yields the following
table (taking an average of 25 runs).

+-----------------------------+--------------+--------------+------------------+
| Loop type       | Iterations   | Time (ms)   | Time/iter (ns) |
+-----------------------------|--------------+--------------+------------------+
| may_goto       | 8388608      | 3     | 0.36        |
| timed_may_goto (count=65535)| 589674932    | 250     | 0.42        |
| bpf_for       | 8388608      | 10     | 1.19        |
+-----------------------------+--------------+--------------+------------------+

Here, count is used to amortize the time sampling and checking logic.

Obviously, this is the limit of an empty loop. Given the complexity of
the loop body, the time spent in the loop can be longer. Cancellations
will address the task of imposing an upper bound on program runtime.

For now, the implementation only supports x86.

  [0]: https://lore.kernel.org/bpf/20250118162238.2621311-1-memxor@gmail.com

Changelog:
----------
v1 -> v2
v1: https://lore.kernel.org/bpf/20250302201348.940234-1-memxor@gmail.com

* Address comments from Alexei
   * Use kernel comment style for new code.
   * Remove p->count == 0 check in bpf_check_timed_may_goto.
   * Add comments on AX as argument/retval calling convention.
   * Add comments describing how the counting logic works.
   * Use BPF_EMIT_CALL instead of open-coding instruction encoding.
   * Change if ax != 1 goto pc+X condition to if ax != 0 goto pc+X.
====================

Link: https://patch.msgid.link/20250304003239.2390751-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Introduce load-acquire and store-release instructions

Introduce BPF instructions with load-acquire and store-release
semantics, as discussed in [1].  Define 2 new flags:

  #define BPF_LOAD_ACQ    0x100
  #define BPF_STORE_REL   0x110

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x100).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_STORE_REL (0x110).

Unlike existing atomic read-modify-write operations that only support
BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and
store-releases also support BPF_B (8-bit) and BPF_H (16-bit).  As an
exception, however, 64-bit load-acquires/store-releases are not
supported on 32-bit architectures (to fix a build error reported by the
kernel test robot).

An 8- or 16-bit load-acquire zero-extends the value before writing it to
a 32-bit register, just like ARM64 instruction LDARH and friends.

Similar to existing atomic read-modify-write operations, misaligned
load-acquires/store-releases are not allowed (even if
BPF_F_ANY_ALIGNMENT is set).

As an example, consider the following 64-bit load-acquire BPF
instruction (assuming little-endian):

  db 10 00 00 00 01 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x00000100): BPF_LOAD_ACQ

Similarly, a 16-bit BPF store-release:

  cb 21 00 00 10 01 00 00  store_release((u16 *)(r1 + 0x0), w2)

  opcode (0xcb): BPF_ATOMIC | BPF_H | BPF_STX
  imm (0x00000110): BPF_STORE_REL

In arch/{arm64,s390,x86}/net/bpf_jit_comp.c, have
bpf_jit_supports_insn(..., /*in_arena=*/true) return false for the new
instructions, until the corresponding JIT compiler supports them in
arena.

[1] https://lore.kernel.org/all/20240729183246.4110549-1-yepeilin@google.com/

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Cc: kernel test robot <lkp@intel.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/a217f46f0e445fbd573a1a024be5c6bf1d5fe716.1741049567.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'introduce-bpf_object__prepare'

Mykyta Yatsenko says:

====================
Introduce bpf_object__prepare

From: Mykyta Yatsenko <yatsenko@meta.com>

We are introducing a new function in the libbpf API, bpf_object__prepare,
which provides more granular control over the process of loading a
bpf_object. bpf_object__prepare performs ELF processing, relocations,
prepares final state of BPF program instructions (accessible with
bpf_program__insns()), creates and potentially pins maps, and stops short
of loading BPF programs.

There are couple of anticipated usecases for this API:
* Use BPF token for freplace programs that might need to lookup BTF of
other programs (BPF token creation can't be moved to open step, as open
step is "no privilege assumption" step so that tools like bpftool can
generate skeleton, discover the structure of BPF object, etc).
* Stopping at prepare gives users finalized BPF program
instructions (with subprogs appended, everything relocated and
finalized, etc). And that property can be taken advantage of by
veristat (and similar tools) that might want to process one program at
a time, but would like to avoid relatively slow ELF parsing and
processing; and even BPF selftests itself (RUN_TESTS part of it at
least) would benefit from this by eliminating waste of re-processing
ELF many times.
====================

Link: https://patch.msgid.link/20250303135752.158343-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, x86: Add x86 JIT support for timed may_goto

Implement the arch_bpf_timed_may_goto function using inline assembly to
have control over which registers are spilled, and use our special
protocol of using BPF_REG_AX as an argument into the function, and as
the return value when going back.

Emit call depth accounting for the call made from this stub, and ensure
we don't have naked returns (when rethunk mitigations are enabled) by
falling back to the RET macro (instead of retq). After popping all saved
registers, the return address into the BPF program should be on top of
the stack.

Since the JIT support is now enabled, ensure selftests which are
checking the produced may_goto sequences do not break by adjusting them.
Make sure we still test the old may_goto sequence on other
architectures, while testing the new sequence on x86_64.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250304003239.2390751-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tests for bpf_object__prepare

Add selftests, checking that running bpf_object__prepare successfully
creates maps before load step.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250303135752.158343-5-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add verifier support for timed may_goto

Implement support in the verifier for replacing may_goto implementation
from a counter-based approach to one which samples time on the local CPU
to have a bigger loop bound.

We implement it by maintaining 16-bytes per-stack frame, and using 8
bytes for maintaining the count for amortizing time sampling, and 8
bytes for the starting timestamp. To minimize overhead, we need to avoid
spilling and filling of registers around this sequence, so we push this
cost into the time sampling function 'arch_bpf_timed_may_goto'. This is
a JIT-specific wrapper around bpf_check_timed_may_goto which returns us
the count to store into the stack through BPF_REG_AX. All caller-saved
registers (r0-r5) are guaranteed to remain untouched.

The loop can be broken by returning count as 0, otherwise we dispatch
into the function when the count drops to 0, and the runtime chooses to
refresh it (by returning count as BPF_MAX_TIMED_LOOPS) or returning 0
and aborting the loop on next iteration.

Since the check for 0 is done right after loading the count from the
stack, all subsequent cond_break sequences should immediately break as
well, of the same loop or subsequent loops in the program.

We pass in the stack_depth of the count (and thus the timestamp, by
adding 8 to it) to the arch_bpf_timed_may_goto call so that it can be
passed in to bpf_check_timed_may_goto as an argument after r1 is saved,
by adding the offset to r10/fp. This adjustment will be arch specific,
and the next patch will introduce support for x86.

Note that depending on loop complexity, time spent in the loop can be
more than the current limit (250 ms), but imposing an upper bound on
program runtime is an orthogonal problem which will be addressed when
program cancellations are supported.

The current time afforded by cond_break may not be enough for cases
where BPF programs want to implement locking algorithms inline, and use
cond_break as a promise to the verifier that they will eventually
terminate.

Below are some benchmarking numbers on the time taken per-iteration for
an empty loop that counts the number of iterations until cond_break
fires. For comparison, we compare it against bpf_for/bpf_repeat which is
another way to achieve the same number of spins (BPF_MAX_LOOPS).  The
hardware used for benchmarking was a Sapphire Rapids Intel server with
performance governor enabled, mitigations were enabled.

+-----------------------------+--------------+--------------+------------------+
| Loop type                   | Iterations   |  Time (ms)   |   Time/iter (ns) |
+-----------------------------|--------------+--------------+------------------+
| may_goto                    | 8388608      |  3           |   0.36           |
| timed_may_goto (count=65535)| 589674932    |  250         |   0.42           |
| bpf_for                     | 8388608      |  10          |   1.19           |
+-----------------------------+--------------+--------------+------------------+

This gives a good approximation at low overhead while staying close to
the current implementation.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250304003239.2390751-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Split bpf object load into prepare/load

Introduce bpf_object__prepare API: additional intermediate preparation
step that performs ELF processing, relocations, prepares final state of
BPF program instructions (accessible with bpf_program__insns()), creates
and (potentially) pins maps, and stops short of loading BPF programs.

We anticipate few use cases for this API, such as:
* Use prepare to initialize bpf_token, without loading freplace
programs, unlocking possibility to lookup BTF of other programs.
* Execute prepare to obtain finalized BPF program instructions without
loading programs, enabling tools like veristat to process one program at
a time, without incurring cost of ELF parsing and processing.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250303135752.158343-4-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Introduce more granular state for bpf_object

We are going to split bpf_object loading into 2 stages: preparation and
loading. This will increase flexibility when working with bpf_object
and unlock some optimizations and use cases.
This patch substitutes a boolean flag (loaded) by more finely-grained
state for bpf_object.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250303135752.158343-3-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

net: filter: Avoid shadowing variable in bpf_convert_ctx_access()

Rename the local variable 'off' to 'offset' to avoid shadowing the existing
'off' variable that is declared as an `int` in the outer scope of
bpf_convert_ctx_access().

This fixes a compiler warning:

net/core/filter.c:9679:8: warning: declaration shadows a local variable [-Wshadow]

Signed-off-by: Breno Leitao <leitao@debian.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://patch.msgid.link/20250228-fix_filter-v1-1-ce13eae66fe9@debian.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Use map_is_created helper in map setters

Refactoring: use map_is_created helper in map setters that need to check
the state of the map. This helps to reduce the number of the places that
depend explicitly on the loaded flag, simplifying refactoring in the
next patch of this set.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250303135752.158343-2-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'selftests-bpf-migrate-test_tunnel-sh-to-test_progs'

Bastien Curutchet says:

====================
selftests/bpf: Migrate test_tunnel.sh to test_progs

Hi all,

This patch series continues the work to migrate the *.sh tests into
prog_tests framework.

The test_tunnel.sh script has already been partly migrated to
test_progs in prog_tests/test_tunnel.c so I add my work to it.

PATCH 1 & 2 create some helpers to avoid code duplication and ease the
migration in the following patches.
PATCH 3 to 9 migrate the tests of gre, ip6gre, erspan, ip6erspan,
geneve, ip6geneve and ip6tnl tunnels.
PATCH 10 removes test_tunnel.sh
====================

Link: https://patch.msgid.link/20250303-tunnels-v2-0-8329f38f0678@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Remove test_tunnel.sh

All tests from test_tunnel.sh have been migrated into test test_progs.
The last test remaining in the script is the test_ipip() that is already
covered in the test_prog framework by the NONE case of test_ipip_tunnel().

Remove the test_tunnel.sh script and its Makefile entry

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-10-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Move ip6tnl tunnel tests to test_progs

ip6tnl tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test ip6tnl tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_ipip6() and test_ip6ip6() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-9-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Move ip6geneve tunnel test to test_progs

ip6geneve tunnels are tested in the test_tunnel.sh but not in the
test_progs framework.

Add a new test in test_progs to test ip6geneve tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_ip6geneve() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-8-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Move geneve tunnel test to test_progs

geneve tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test geneve tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_geneve() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-7-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Move ip6erspan tunnel test to test_progs

ip6erspan tunnels are tested in the test_tunnel.sh but not in the
test_progs framework.

Add a new test in test_progs to test ip6erspan tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_ip6erspan() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-6-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Move erspan tunnel tests to test_progs

erspan tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test erspan tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_erspan() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-5-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Move ip6gre tunnel test to test_progs

ip6gre tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test ip6gre tunnels. It uses the same
network topology and the same BPF programs than the script. Disable the
IPv6 DAD feature because it can take lot of time and cause some tests to
fail depending on the environment they're run on.
Remove test_ip6gre() and test_ip6gretap() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-4-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Move gre tunnel test to test_progs

gre tunnels are tested in the test_tunnel.sh but not in the test_progs
framework.

Add a new test in test_progs to test gre tunnels. It uses the same
network topology and the same BPF programs than the script.
Remove test_gre() and test_gre_no_tunnel_key() from the script.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-3-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'veristat-files-list-txt-notation-for-object-files-list'

Eduard Zingerman says:

====================
veristat: @files-list.txt notation for object files list

A few small veristat improvements:
- It is possible to hit command line parameters number limit,
  e.g. when running veristat for all object files generated for
  test_progs. This patch-set adds an option to read objects files list
  from a file.
- Correct usage of strerror() function.
- Avoid printing log lines to CSV output.

Changelog:
- v1 -> v2:
  - replace strerror(errno) with strerror(-err) in patch #2 (Andrii)

v1: https://lore.kernel.org/bpf/3ee39a16-bc54-4820-984a-0add2b5b5f86@gmail.com/T/
====================

Link: https://patch.msgid.link/20250301000147.1583999-1-eddyz87@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Add ping helpers

All tests use more or less the same ping commands as final validation.
Also test_ping()'s return value is checked with ASSERT_OK() while this
check is already done by the SYS() macro inside test_ping().

Create helpers around test_ping() and use them in the tests to avoid code
duplication.
Remove the unnecessary ASSERT_OK() from the tests.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-2-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Factor out check_load_mem() and check_store_reg()

Extract BPF_LDX and most non-ATOMIC BPF_STX instruction handling logic
in do_check() into helper functions to be used later. While we are
here, make that comment about "reserved fields" more specific.

Suggested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/8b39c94eac2bb7389ff12392ca666f939124ec4f.1740978603.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

veristat: Report program type guess results to sdterr

In order not to pollute CSV output, e.g.:

  $ ./veristat -o csv exceptions_ext.bpf.o > test.csv
  Using guessed program type 'sched_cls' for exceptions_ext.bpf.o/extension...
  Using guessed program type 'sched_cls' for exceptions_ext.bpf.o/throwing_extension...

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Link: https://lore.kernel.org/bpf/20250301000147.1583999-4-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: test_tunnel: Add generic_attach* helpers

A fair amount of code duplication is present among tests to attach BPF
programs.

Create generic_attach* helpers that attach BPF programs to a given
interface.
Use ASSERT_OK_FD() instead of ASSERT_GE() to check fd's validity.
Use these helpers in all the available tests.

Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250303-tunnels-v2-1-8329f38f0678@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Factor out check_atomic_rmw()

Currently, check_atomic() only handles atomic read-modify-write (RMW)
instructions. Since we are planning to introduce other types of atomic
instructions (i.e., atomic load/store), extract the existing RMW
handling logic into its own function named check_atomic_rmw().

Remove the @insn_idx parameter as it is not really necessary. Use
'env->insn_idx' instead, as in other places in verifier.c.

Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/6323ac8e73a10a1c8ee547c77ed68cf8eb6b90e1.1740978603.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

veristat: Strerror expects positive number (errno)

Before:

  ./veristat -G @foobar iters.bpf.o
  Failed to open presets in 'foobar': Unknown error -2
  ...

After:

  ./veristat -G @foobar iters.bpf.o
  Failed to open presets in 'foobar': No such file or directory
  ...

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Link: https://lore.kernel.org/bpf/20250301000147.1583999-3-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Factor out atomic_ptr_type_ok()

Factor out atomic_ptr_type_ok() as a helper function to be used later.

Signed-off-by: Peilin Ye <yepeilin@google.com>
Link: https://lore.kernel.org/r/e5ef8b3116f3fffce78117a14060ddce05eba52a.1740978603.git.yepeilin@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

veristat: @files-list.txt notation for object files list

Allow reading object file list from file.
E.g. the following command:

./veristat @list.txt

Is equivalent to the following invocation:

./veristat line-1 line-2 ... line-N

Where line-i corresponds to lines from list.txt.
Lines starting with '#' are ignored.

Signed-off-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <mykyta.yatsenko5@gmail.com>
Link: https://lore.kernel.org/bpf/20250301000147.1583999-2-eddyz87@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: no longer acquire map_idr_lock in bpf_map_inc_not_zero()

bpf_sk_storage_clone() is the only caller of bpf_map_inc_not_zero()
and is holding rcu_read_lock().

map_idr_lock does not add any protection, just remove the cost
for passive TCP flows.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Kui-Feng Lee <kuifeng@meta.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://lore.kernel.org/r/20250301191315.1532629-1-edumazet@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'global-subprogs-in-rcu-preempt-irq-disabled-sections'

Kumar Kartikeya Dwivedi says:

====================
Global subprogs in RCU/{preempt,irq}-disabled sections

Small change to allow non-sleepable global subprogs in
RCU, preempt-disabled, and irq-disabled sections. For
now, we don't lift the limitation for locks as it requires
more analysis, and will do this one resilient spin locks
land.

This surfaced a bug where sleepable global subprogs were
allowed in RCU read sections, that has been fixed. Tests
have been added to cover various cases.

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20250301030205.1221223-1-memxor@gmail.com

  * Fix broken to_be_replaced argument in the selftest.
  * Adjust selftest program type.

v1 -> v2
v1: https://lore.kernel.org/bpf/20250228162858.1073529-1-memxor@gmail.com

  * Rename subprog_info[i].sleepable to might_sleep, which more
    accurately reflects the nature of the bit. 'sleepable' means whether
    a given context is allowed to, while might_sleep captures if it
    does.
  * Disallow extensions that might sleep to attach to targets that don't
    sleep, since they'd be permitted to be called in atomic contexts. (Eduard)
  * Add tests for mixing non-sleepable and sleepable global function
    calls, and extensions attaching to non-sleepable global functions. (Eduard)
  * Rename changes_pkt_data -> summarization
====================

Link: https://patch.msgid.link/20250301151846.1552362-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf/selftests: test_select_reuseport_kern: Remove unused header

test_select_reuseport_kern.c is currently including <stdlib.h>, but it
does not use any definition from there.

Remove stdlib.h inclusion from test_select_reuseport_kern.c

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250227-remove_wrong_header-v1-1-bc94eb4e2f73@bootlin.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tests for extending sleepable global subprogs

Add tests for freplace behavior with the combination of sleepable
and non-sleepable global subprogs. The changes_pkt_data selftest
did all the hardwork, so simply rename it and include new support
for more summarization tests for might_sleep bit.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250301151846.1552362-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add selftests allowing cgroup prog pre-ordering

Add a few selftests with cgroup prog pre-ordering.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250224230121.283601-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Test sleepable global subprogs in atomic contexts

Add tests for rejecting sleepable and accepting non-sleepable global
function calls in atomic contexts. For spin locks, we still reject
all global function calls. Once resilient spin locks land, we will
carefully lift in cases where we deem it safe.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250301151846.1552362-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Allow pre-ordering for bpf cgroup progs

Currently for bpf progs in a cgroup hierarchy, the effective prog array
is computed from bottom cgroup to upper cgroups (post-ordering). For
example, the following cgroup hierarchy
    root cgroup: p1, p2
        subcgroup: p3, p4
have BPF_F_ALLOW_MULTI for both cgroup levels.
The effective cgroup array ordering looks like
    p3 p4 p1 p2
and at run time, progs will execute based on that order.

But in some cases, it is desirable to have root prog executes earlier than
children progs (pre-ordering). For example,
  - prog p1 intends to collect original pkt dest addresses.
  - prog p3 will modify original pkt dest addresses to a proxy address for
    security reason.
The end result is that prog p1 gets proxy address which is not what it
wants. Putting p1 to every child cgroup is not desirable either as it
will duplicate itself in many child cgroups. And this is exactly a use case
we are encountering in Meta.

To fix this issue, let us introduce a flag BPF_F_PREORDER. If the flag
is specified at attachment time, the prog has higher priority and the
ordering with that flag will be from top to bottom (pre-ordering).
For example, in the above example,
    root cgroup: p1, p2
        subcgroup: p3, p4
Let us say p2 and p4 are marked with BPF_F_PREORDER. The final
effective array ordering will be
    p2 p4 p3 p1

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20250224230116.283071-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Summarize sleepable global subprogs

The verifier currently does not permit global subprog calls when a lock
is held, preemption is disabled, or when IRQs are disabled. This is
because we don't know whether the global subprog calls sleepable
functions or not.

In case of locks, there's an additional reason: functions called by the
global subprog may hold additional locks etc. The verifier won't know
while verifying the global subprog whether it was called in context
where a spin lock is already held by the program.

Perform summarization of the sleepable nature of a global subprog just
like changes_pkt_data and then allow calls to global subprogs for
non-sleepable ones from atomic context.

While making this change, I noticed that RCU read sections had no
protection against sleepable global subprog calls, include it in the
checks and fix this while we're at it.

Care needs to be taken to not allow global subprog calls when regular
bpf_spin_lock is held. When resilient spin locks is held, we want to
potentially have this check relaxed, but not for now.

Also make sure extensions freplacing global functions cannot do so
in case the target is non-sleepable, but the extension is. The other
combination is ok.

Tests are included in the next patch to handle all special conditions.

Fixes: 9bb00b2895cb ("bpf: Add kfunc bpf_rcu_read_lock/unlock()")
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20250301151846.1552362-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'optimize-bpf-selftest-to-increase-ci-success-rate'

Jiayuan Chen says:

====================
Optimize bpf selftest to increase CI success rate

1. Optimized some static bound port selftests to avoid port occupation
when running test_progs -j.
2. Optimized the retry logic for test_maps.

Some Failed CI:
https://github.com/kernel-patches/bpf/actions/runs/13275542359/job/37064974076
https://github.com/kernel-patches/bpf/actions/runs/13549227497/job/37868926343
https://github.com/kernel-patches/bpf/actions/runs/13548089029/job/37865812030
https://github.com/kernel-patches/bpf/actions/runs/13553536268/job/37883329296
(Perhaps it's due to the large number of pull requests requiring CI runs?)
====================

Link: https://patch.msgid.link/20250227142646.59711-1-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fixes for test_maps test

BPF CI has failed 3 times in the last 24 hours. Add retry for ENOMEM.
It's similar to the optimization plan:
commit 2f553b032cad ("selftsets/bpf: Retry map update for non-preallocated per-cpu map")

Failed CI:
https://github.com/kernel-patches/bpf/actions/runs/13549227497/job/37868926343
https://github.com/kernel-patches/bpf/actions/runs/13548089029/job/37865812030
https://github.com/kernel-patches/bpf/actions/runs/13553536268/job/37883329296

selftests/bpf: Fixes for test_maps test
Fork 100 tasks to 'test_update_delete'
Fork 100 tasks to 'test_update_delete'
Fork 100 tasks to 'test_update_delete'
Fork 100 tasks to 'test_update_delete'
......
test_task_storage_map_stress_lookup:PASS
test_maps: OK, 0 SKIPPED

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250227142646.59711-4-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'introduce-bpf_dynptr_copy-kfunc'

Mykyta Yatsenko says:

====================
introduce bpf_dynptr_copy kfunc

From: Mykyta Yatsenko <yatsenko@meta.com>

Introduce a new kfunc, bpf_dynptr_copy, which enables copying of
data from one dynptr to another. This functionality may be useful in
scenarios such as capturing XDP data to a ring buffer.
The patch set is split into 3 patches:
1. Refactor bpf_dynptr_read and bpf_dynptr_write by extracting code into
static functions, that allows calling them with no compiler warnings
2. Introduce bpf_dynptr_copy
3. Add tests for bpf_dynptr_copy

v2->v3:
* Implemented bpf_memcmp in dynptr_success.c test, as __builtin_memcmp
was not inlined on GCC-BPF.
====================

Link: https://patch.msgid.link/20250226183201.332713-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Allow auto port binding for bpf nf

Allow auto port binding for bpf nf test to avoid binding conflict.

./test_progs -a bpf_nf
24/1    bpf_nf/xdp-ct:OK
24/2    bpf_nf/tc-bpf-ct:OK
24/3    bpf_nf/alloc_release:OK
24/4    bpf_nf/insert_insert:OK
24/5    bpf_nf/lookup_insert:OK
24/6    bpf_nf/set_timeout_after_insert:OK
24/7    bpf_nf/set_status_after_insert:OK
24/8    bpf_nf/change_timeout_after_alloc:OK
24/9    bpf_nf/change_status_after_alloc:OK
24/10   bpf_nf/write_not_allowlisted_field:OK
24      bpf_nf:OK
Summary: 1/10 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20250227142646.59711-3-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>