]> git.ipfire.org Git - thirdparty/kernel/stable.git/log
thirdparty/kernel/stable.git
7 weeks agobpf: replace use of system_unbound_wq with system_dfl_wq
Marco Crivellari [Fri, 5 Sep 2025 08:53:08 +0000 (10:53 +0200)] 
bpf: replace use of system_unbound_wq with system_dfl_wq

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

system_unbound_wq should be the default workqueue so as not to enforce
locality constraints for random work whenever it's not required.

Adding system_dfl_wq to encourage its use when unbound work should be used.

queue_work() / queue_delayed_work() / mod_delayed_work() will now use the
new unbound wq: whether the user still use the old wq a warn will be
printed along with a wq redirect to the new one.

The old system_unbound_wq will be kept for a few release cycles.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://lore.kernel.org/r/20250905085309.94596-3-marco.crivellari@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agobpf: replace use of system_wq with system_percpu_wq
Marco Crivellari [Fri, 5 Sep 2025 08:53:07 +0000 (10:53 +0200)] 
bpf: replace use of system_wq with system_percpu_wq

Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.

This lack of consistentcy cannot be addressed without refactoring the API.

system_wq is a per-CPU worqueue, yet nothing in its name tells about that
CPU affinity constraint, which is very often not required by users. Make
it clear by adding a system_percpu_wq.

queue_work() / queue_delayed_work() mod_delayed_work() will now use the
new per-cpu wq: whether the user still stick on the old name a warn will
be printed along a wq redirect to the new one.

This patch add the new system_percpu_wq except for mm, fs and net
subsystem, whom are handled in separated patches.

The old wq will be kept for a few release cylces.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://lore.kernel.org/r/20250905085309.94596-2-marco.crivellari@suse.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoselftests/bpf: Fix the issue where the error code is 0
Feng Yang [Mon, 8 Sep 2025 06:08:10 +0000 (14:08 +0800)] 
selftests/bpf: Fix the issue where the error code is 0

The error message printed here only uses the previous err value,
which results in it being printed as 0.
When bpf_map__attach_struct_ops encounters an error,
it uses libbpf_err_ptr(err) to set errno = -err and returns NULL.
Therefore, Using -errno can fix this issue.

Fix before:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: err=0

Fix after:
run_subtest:FAIL:1019 bpf_map__attach_struct_ops failed for map pro_epilogue: err=-9

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250908060810.1054341-1-yangfeng59949@163.com
7 weeks agoselftests/bpf: Add BPF program dump in veristat
Mykyta Yatsenko [Fri, 5 Sep 2025 14:08:35 +0000 (15:08 +0100)] 
selftests/bpf: Add BPF program dump in veristat

Add the ability to dump BPF program instructions directly from veristat.
Previously, inspecting a program required separate bpftool invocations:
one to load and another to dump it, which meant running multiple
commands.
During active development, it's common for developers to use veristat
for testing verification. Integrating instruction dumping into veristat
reduces the need to switch tools and simplifies the workflow.
By making this information more readily accessible, this change aims
to streamline the BPF development cycle and improve usability for
developers.
This implementation leverages bpftool, by running it directly via popen
to avoid any code duplication and keep veristat simple.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250905140835.1416179-1-mykyta.yatsenko5@gmail.com
7 weeks agoMerge branch 'bpf-next/skb-meta-dynptr' into 'bpf-next/master'
Martin KaFai Lau [Fri, 5 Sep 2025 04:10:13 +0000 (21:10 -0700)] 
Merge branch 'bpf-next/skb-meta-dynptr' into 'bpf-next/master'

Merge skb-meta-dynptr branch into master branch after fixing a compiler
warning. No conflict.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
7 weeks agobpf: Return an error pointer for skb metadata when CONFIG_NET=n
Jakub Sitnicki [Mon, 1 Sep 2025 13:27:43 +0000 (15:27 +0200)] 
bpf: Return an error pointer for skb metadata when CONFIG_NET=n

Kernel Test Robot reported a compiler warning - a null pointer may be
passed to memmove in __bpf_dynptr_{read,write} when building without
networking support.

The warning is correct from a static analysis standpoint, but not actually
reachable. Without CONFIG_NET, creating dynptrs to skb metadata is
impossible since the constructor kfunc is missing.

Silence the false-postive diagnostic message by returning an error pointer
from bpf_skb_meta_pointer stub when CONFIG_NET=n.

Fixes: 6877cd392bae ("bpf: Enable read/write access to skb metadata through a dynptr")
Closes: https://lore.kernel.org/oe-kbuild-all/202508212031.ir9b3B6Q-lkp@intel.com/
Reported-by: kernel test robot <lkp@intel.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250901-dynptr-skb-meta-no-net-v2-1-ce607fcb6091@cloudflare.com
7 weeks agolibbpf: Remove unused args in parse_usdt_note
Jiawei Zhao [Thu, 4 Sep 2025 03:05:23 +0000 (03:05 +0000)] 
libbpf: Remove unused args in parse_usdt_note

Remove unused 'elf' and 'path' parameters from parse_usdt_note function
signature. These parameters are not referenced within the function body
and only add unnecessary complexity.

The function only requires the note header, data buffer, offsets, and
output structure to perform USDT note parsing.

Update function declaration, definition, and the single call site in
collect_usdt_targets() to match the simplified signature.

This is a safe internal cleanup as parse_usdt_note is a static function.

Signed-off-by: Jiawei Zhao <phoenix500526@163.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20250904030525.1932293-1-phoenix500526@163.com
7 weeks agoMerge branch 'selftests-bpf-introduce-experimental-bpf_in_interrupt'
Alexei Starovoitov [Thu, 4 Sep 2025 16:05:58 +0000 (09:05 -0700)] 
Merge branch 'selftests-bpf-introduce-experimental-bpf_in_interrupt'

Leon Hwang says:

====================
selftests/bpf: Introduce experimental bpf_in_interrupt()

Filtering pid_tgid is meanlingless when the current task is preempted by
an interrupt.

To address this, introduce 'bpf_in_interrupt()' helper function, which
allows BPF programs to determine whether they are executing in interrupt
context.

'get_preempt_count()':

* On x86, '*(int *) bpf_this_cpu_ptr(&__preempt_count)'.
* On arm64, 'bpf_get_current_task_btf()->thread_info.preempt.count'.

Then 'bpf_in_interrupt()' will be:

* If !PREEMPT_RT, 'get_preempt_count() & (NMI_MASK | HARDIRQ_MASK
  | SOFTIRQ_MASK)'.
* If PREEMPT_RT, '(get_preempt_count() & (NMI_MASK | HARDIRQ_MASK))
  | (bpf_get_current_task_btf()->softirq_disable_cnt & SOFTIRQ_MASK)'.

'bpf_in_interrupt()' runs well when PREEMPT_RT is enabled. But it's
difficult for me to test it well because I'm not familiar with
PREEMPT_RT.

Changes:
v2 -> v3:
* Address comments from Alexei:
  * Move bpf_in_interrupt() to bpf_experimental.h.
  * Add support for arm64.
v2: https://lore.kernel.org/bpf/20250825131502.54269-1-leon.hwang@linux.dev/

v1 -> v2:
* Fix a build error reported by test bot.
====================

Link: https://patch.msgid.link/20250903140438.59517-1-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoselftests/bpf: Add case to test bpf_in_interrupt()
Leon Hwang [Wed, 3 Sep 2025 14:04:38 +0000 (22:04 +0800)] 
selftests/bpf: Add case to test bpf_in_interrupt()

Add a timer test case to test 'bpf_in_interrupt()'.

cd tools/testing/selftests/bpf
./test_progs -t timer_interrupt
462     timer_interrupt:OK
Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250903140438.59517-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoselftests/bpf: Introduce experimental bpf_in_interrupt()
Leon Hwang [Wed, 3 Sep 2025 14:04:37 +0000 (22:04 +0800)] 
selftests/bpf: Introduce experimental bpf_in_interrupt()

Filtering pid_tgid is meanlingless when the current task is preempted by
an interrupt.

To address this, introduce 'bpf_in_interrupt()' helper function, which
allows BPF programs to determine whether they are executing in interrupt
context.

'get_preempt_count()':

* On x86, '*(int *) bpf_this_cpu_ptr(&__preempt_count)'.
* On arm64, 'bpf_get_current_task_btf()->thread_info.preempt.count'.

Then 'bpf_in_interrupt()' will be:

* If !PREEMPT_RT, 'get_preempt_count() & (NMI_MASK | HARDIRQ_MASK
  | SOFTIRQ_MASK)'.
* If PREEMPT_RT, '(get_preempt_count() & (NMI_MASK | HARDIRQ_MASK))
  | (bpf_get_current_task_btf()->softirq_disable_cnt & SOFTIRQ_MASK)'.

As for other archs, it can be added support by updating
'get_preempt_count()'.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20250903140438.59517-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agobpf, arm64: Remove duplicated bpf_flush_icache()
Hengqi Chen [Thu, 4 Sep 2025 07:57:03 +0000 (07:57 +0000)] 
bpf, arm64: Remove duplicated bpf_flush_icache()

The bpf_flush_icache() is done by bpf_arch_text_copy() already.
Remove the duplicated one in arch_prepare_bpf_trampoline().

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Acked-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20250904075703.49404-1-hengqi.chen@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoselftests/bpf: Test kfunc bpf_strcasecmp
Rong Tao [Tue, 2 Sep 2025 23:48:36 +0000 (07:48 +0800)] 
selftests/bpf: Test kfunc bpf_strcasecmp

Add testsuites for kfunc bpf_strcasecmp.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Link: https://lore.kernel.org/r/tencent_81A1A0ACC04B68158C57C4D151C46A832B07@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agobpf: add bpf_strcasecmp kfunc
Rong Tao [Tue, 2 Sep 2025 23:47:10 +0000 (07:47 +0800)] 
bpf: add bpf_strcasecmp kfunc

bpf_strcasecmp() function performs same like bpf_strcmp() except ignoring
the case of the characters.

Signed-off-by: Rong Tao <rongtao@cestc.cn>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Viktor Malik <vmalik@redhat.com>
Link: https://lore.kernel.org/r/tencent_292BD3682A628581AA904996D8E59F4ACD06@qq.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoMerge branch 'selftests-bpf-benchmark-all-symbols-for-kprobe-multi'
Alexei Starovoitov [Thu, 4 Sep 2025 16:00:26 +0000 (09:00 -0700)] 
Merge branch 'selftests-bpf-benchmark-all-symbols-for-kprobe-multi'

Menglong Dong says:

====================
selftests/bpf: benchmark all symbols for kprobe-multi

Add the benchmark testcase "kprobe-multi-all", which will hook all the
kernel functions during the testing.

This series is separated out from [1].

Changes since V2:
* add some comment to attach_ksyms_all, which notes that don't run the
  testing on a debug kernel

Changes since V1:
* introduce trace_blacklist instead of copy-pasting strcmp in the 2nd
  patch
* use fprintf() instead of printf() in 3rd patch

Link: https://lore.kernel.org/bpf/20250817024607.296117-1-dongml2@chinatelecom.cn/
====================

Link: https://patch.msgid.link/20250904021011.14069-1-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoselftests/bpf: add benchmark testing for kprobe-multi-all
Menglong Dong [Thu, 4 Sep 2025 02:10:11 +0000 (10:10 +0800)] 
selftests/bpf: add benchmark testing for kprobe-multi-all

For now, the benchmark for kprobe-multi is single, which means there is
only 1 function is hooked during testing. Add the testing
"kprobe-multi-all", which will hook all the kernel functions during
the benchmark. And the "kretprobe-multi-all" is added too.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250904021011.14069-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoselftests/bpf: skip recursive functions for kprobe_multi
Menglong Dong [Thu, 4 Sep 2025 02:10:10 +0000 (10:10 +0800)] 
selftests/bpf: skip recursive functions for kprobe_multi

Some functions is recursive for the kprobe_multi and impact the benchmark
results. So just skip them.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250904021011.14069-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoselftests/bpf: move get_ksyms and get_addrs to trace_helpers.c
Menglong Dong [Thu, 4 Sep 2025 02:10:09 +0000 (10:10 +0800)] 
selftests/bpf: move get_ksyms and get_addrs to trace_helpers.c

We need to get all the kernel function that can be traced sometimes, so we
move the get_syms() and get_addrs() in kprobe_multi_test.c to
trace_helpers.c and rename it to bpf_get_ksyms() and bpf_get_addrs().

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250904021011.14069-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
7 weeks agoselftests/bpf: Fix count write in testapp_xdp_metadata_copy()
Ricardo B. Marlière [Fri, 29 Aug 2025 19:33:49 +0000 (16:33 -0300)] 
selftests/bpf: Fix count write in testapp_xdp_metadata_copy()

Commit 4b302092553c ("selftests/xsk: Add tail adjustment tests and support
check") added a new global to xsk_xdp_progs.c, but left out the access in
the testapp_xdp_metadata_copy() function. Since bpf_map_update_elem() will
write to the whole bss section, it gets truncated. Fix by writing to
skel_rx->bss->count directly.

Fixes: 4b302092553c ("selftests/xsk: Add tail adjustment tests and support check")
Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250829-selftests-bpf-xsk_regression_fix-v1-1-5f5acdb9fe6b@suse.com
7 weeks agoselftests/bpf: Upon failures, exit with code 1 in test_xsk.sh
Ricardo B. Marlière [Thu, 28 Aug 2025 18:48:30 +0000 (15:48 -0300)] 
selftests/bpf: Upon failures, exit with code 1 in test_xsk.sh

Currently, even if some subtests fails, the end result will still yield
"ok 1 selftests: bpf: test_xsk.sh". Fix it by exiting with 1 if there are
any failures.

Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/bpf/20250828-selftests-bpf-test_xsk_ret-v1-1-e6656c01f397@suse.com
7 weeks agobpf: Replace kvfree with kfree for kzalloc memory
Feng Yang [Wed, 27 Aug 2025 03:28:12 +0000 (11:28 +0800)] 
bpf: Replace kvfree with kfree for kzalloc memory

These pointers are allocated by kzalloc. Therefore, replace kvfree() with
kfree() to avoid unnecessary is_vmalloc_addr() check in kvfree(). This is
the remaining unmodified part from [1].

Signed-off-by: Feng Yang <yangfeng@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250811123949.552885-1-rongqianfeng@vivo.com
Link: https://lore.kernel.org/bpf/20250827032812.498216-1-yangfeng59949@163.com
7 weeks agobpftool: Add CET-aware symbol matching for x86_64 architectures
Yuan Chen [Fri, 29 Aug 2025 06:11:07 +0000 (07:11 +0100)] 
bpftool: Add CET-aware symbol matching for x86_64 architectures

Adjust symbol matching logic to account for Control-flow Enforcement
Technology (CET) on x86_64 systems. CET prefixes functions with
a 4-byte 'endbr' instruction, shifting the actual hook entry point to
symbol + 4.

Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250829061107.23905-3-chenyuan_fl@163.com
7 weeks agobpftool: Refactor kernel config reading into common helper
Yuan Chen [Fri, 29 Aug 2025 06:11:06 +0000 (07:11 +0100)] 
bpftool: Refactor kernel config reading into common helper

Extract the kernel configuration file parsing logic from feature.c into
a new read_kernel_config() function in common.c. This includes:

1. Moving the config file handling and option parsing code
2. Adding required headers and struct definition
3. Keeping all existing functionality

The refactoring enables sharing this logic with other components while
maintaining current behavior. This will be used by subsequent patches
that need to check kernel config options.

Signed-off-by: Yuan Chen <chenyuan@kylinos.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Quentin Monnet <qmo@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250829061107.23905-2-chenyuan_fl@163.com
8 weeks agoselftests/bpf: Fix bpf_prog_detach2 usage in test_lirc_mode2
Ricardo B. Marlière [Thu, 28 Aug 2025 13:12:33 +0000 (10:12 -0300)] 
selftests/bpf: Fix bpf_prog_detach2 usage in test_lirc_mode2

Commit e9fc3ce99b34 ("libbpf: Streamline error reporting for high-level
APIs") redefined the way that bpf_prog_detach2() returns. Therefore, adapt
the usage in test_lirc_mode2_user.c.

Signed-off-by: Ricardo B. Marlière <rbm@suse.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250828-selftests-bpf-v1-1-c7811cd8b98c@suse.com
8 weeks agoselftests/bpf: Add LPM trie microbenchmarks
Matt Fleming [Wed, 27 Aug 2025 14:01:49 +0000 (15:01 +0100)] 
selftests/bpf: Add LPM trie microbenchmarks

Add benchmarks for the standard set of operations: LOOKUP, INSERT,
UPDATE, DELETE. Also include benchmarks to measure the overhead of the
bench framework itself (NOOP) as well as the overhead of generating keys
(BASELINE). Lastly, this includes a benchmark for FREE (trie_free())
which is known to have terrible performance for maps with many entries.

Benchmarks operate on tries without gaps in the key range, i.e. each
test begins or ends with a trie with valid keys in the range [0,
nr_entries). This is intended to cause maximum branching when traversing
the trie.

LOOKUP, UPDATE, DELETE, and FREE fill a BPF LPM trie from userspace
using bpf_map_update_batch() and run the corresponding benchmark
operation via bpf_loop(). INSERT starts with an empty map and fills it
kernel-side from bpf_loop(). FREE records the time to free a filled LPM
trie by attaching and destroying a BPF prog. NOOP measures the overhead
of the test harness by running an empty function with bpf_loop().
BASELINE is similar to NOOP except that the function generates a key.

Each operation runs 10,000 times using bpf_loop(). Note that this value
is intentionally independent of the number of entries in the LPM trie so
that the stability of the results isn't affected by the number of
entries.

For those benchmarks that need to reset the LPM trie once it's full
(INSERT) or empty (DELETE), throughput and latency results are scaled by
the fraction of a second the operation actually ran to ignore any time
spent reinitialising the trie.

By default, benchmarks run using sequential keys in the range [0,
nr_entries). BASELINE, LOOKUP, and UPDATE can use random keys via the
--random parameter but beware there is a runtime cost involved in
generating random keys. Other benchmarks are prohibited from using
random keys because it can skew the results, e.g. when inserting an
existing key or deleting a missing one.

All measurements are recorded from within the kernel to eliminate
syscall overhead. Most benchmarks run an XDP program to generate stats
but FREE needs to collect latencies using fentry/fexit on
map_free_deferred() because it's not possible to use fentry directly on
lpm_trie.c since commit c83508da5620 ("bpf: Avoid deadlock caused by
nested kprobe and fentry bpf programs") and there's no way to
create/destroy a map from within an XDP program.

Here is example output from an AMD EPYC 9684X 96-Core machine for each
of the benchmarks using a trie with 10K entries and a 32-bit prefix
length, e.g.

  $ ./bench lpm-trie-$op \
   --prefix_len=32  \
--producers=1     \
--nr_entries=10000

     noop: throughput   74.417 ± 0.032 M ops/s ( 74.417M ops/prod), latency   13.438 ns/op
 baseline: throughput   70.107 ± 0.171 M ops/s ( 70.107M ops/prod), latency   14.264 ns/op
   lookup: throughput    8.467 ± 0.047 M ops/s (  8.467M ops/prod), latency  118.109 ns/op
   insert: throughput    2.440 ± 0.015 M ops/s (  2.440M ops/prod), latency  409.290 ns/op
   update: throughput    2.806 ± 0.042 M ops/s (  2.806M ops/prod), latency  356.322 ns/op
   delete: throughput    4.625 ± 0.011 M ops/s (  4.625M ops/prod), latency  215.613 ns/op
     free: throughput    0.578 ± 0.006 K ops/s (  0.578K ops/prod), latency    1.730 ms/op

And the same benchmarks using random keys:

  $ ./bench lpm-trie-$op \
   --prefix_len=32  \
--producers=1     \
--nr_entries=10000 \
--random

     noop: throughput   74.259 ± 0.335 M ops/s ( 74.259M ops/prod), latency   13.466 ns/op
 baseline: throughput   35.150 ± 0.144 M ops/s ( 35.150M ops/prod), latency   28.450 ns/op
   lookup: throughput    7.119 ± 0.048 M ops/s (  7.119M ops/prod), latency  140.469 ns/op
   insert: N/A
   update: throughput    2.736 ± 0.012 M ops/s (  2.736M ops/prod), latency  365.523 ns/op
   delete: N/A
     free: N/A

Signed-off-by: Matt Fleming <mfleming@cloudflare.com>
Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
Link: https://lore.kernel.org/r/20250827140149.1001557-1-matt@readmodwrite.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 weeks agoMerge branch 'bpf-arm64-support-for-timed-may_goto'
Alexei Starovoitov [Thu, 28 Aug 2025 00:16:22 +0000 (17:16 -0700)] 
Merge branch 'bpf-arm64-support-for-timed-may_goto'

Puranjay Mohan says:

====================
bpf, arm64: support for timed may_goto

Changes in v2->v3:
v2: https://lore.kernel.org/all/20250809204833.44803-1-puranjay@kernel.org/
- Rebased on bpf-next/master
- Added Acked-by: tags from Xu and Kumar

Changes in v1->v2:
v1: https://lore.kernel.org/bpf/20250724125443.26182-1-puranjay@kernel.org/
- Added comment in arch_bpf_timed_may_goto() about BPF_REG_FP setup (Xu
  Kuohai)

This set adds support for the timed may_goto instruction for the arm64.
The timed may_goto instruction is implemented by the verifier by
reserving 2 8byte slots in the program stack and then calling
arch_bpf_timed_may_goto() in a loop with the stack offset of these two
slots in BPF_REG_AX. It expects the function to put a timestamp in the
first slot and the returned count in BPF_REG_AX is put into the second
slot by a store instruction emitted by the verifier.

arch_bpf_timed_may_goto() is special as it receives the parameter in
BPF_REG_AX and is expected to return the result in BPF_REG_AX as well.
It can't clobber any caller saved registers because verifier doesn't
save anything before emitting the call.

So, arch_bpf_timed_may_goto() is implemented in assembly so the exact
registers that are stored/restored can be controlled (BPF caller saved
registers here) and it also needs to take care of moving arguments and
return values to and from BPF_REG_AX <-> arm64 R0.

So, arch_bpf_timed_may_goto() acts as a trampoline to call
bpf_check_timed_may_goto() which does the main logic of placing the
timestamp and returning the count.

All tests that use may_goto instruction pass after the changing some of
them in patch 2

 #404     stream_errors:OK
 [...]
 #406/2   stream_success/stream_cond_break:OK
 [...]
 #494/23  verifier_bpf_fastcall/may_goto_interaction_x86_64:SKIP
 #494/24  verifier_bpf_fastcall/may_goto_interaction_arm64:OK
 [...]
 #539/1   verifier_may_goto_1/may_goto 0:OK
 #539/2   verifier_may_goto_1/batch 2 of may_goto 0:OK
 #539/3   verifier_may_goto_1/may_goto batch with offsets 2/1/0:OK
 #539/4   verifier_may_goto_1/may_goto batch with offsets 2/0:OK
 #539     verifier_may_goto_1:OK
 #540/1   verifier_may_goto_2/C code with may_goto 0:OK
 #540     verifier_may_goto_2:OK
 Summary: 7/16 PASSED, 25 SKIPPED, 0 FAILED
====================

Link: https://patch.msgid.link/20250827113245.52629-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 weeks agoselftests/bpf: Enable timed may_goto tests for arm64
Puranjay Mohan [Wed, 27 Aug 2025 11:32:44 +0000 (11:32 +0000)] 
selftests/bpf: Enable timed may_goto tests for arm64

As arm64 JIT now supports timed may_goto instruction, make sure all
relevant tests run on this architecture. Some tests were enabled and
other required modifications to work properly on arm64.

 $ ./test_progs -a "stream*","*may_goto*",verifier_bpf_fastcall

 #404     stream_errors:OK
 [...]
 #406/2   stream_success/stream_cond_break:OK
 [...]
 #494/23  verifier_bpf_fastcall/may_goto_interaction_x86_64:SKIP
 #494/24  verifier_bpf_fastcall/may_goto_interaction_arm64:OK
 [...]
 #539/1   verifier_may_goto_1/may_goto 0:OK
 #539/2   verifier_may_goto_1/batch 2 of may_goto 0:OK
 #539/3   verifier_may_goto_1/may_goto batch with offsets 2/1/0:OK
 #539/4   verifier_may_goto_1/may_goto batch with offsets 2/0:OK
 #539     verifier_may_goto_1:OK
 #540/1   verifier_may_goto_2/C code with may_goto 0:OK
 #540     verifier_may_goto_2:OK
 Summary: 7/16 PASSED, 25 SKIPPED, 0 FAILED

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20250827113245.52629-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
8 weeks agobpf, arm64: Add JIT support for timed may_goto
Puranjay Mohan [Wed, 27 Aug 2025 11:32:43 +0000 (11:32 +0000)] 
bpf, arm64: Add JIT support for timed may_goto

When verifier sees a timed may_goto instruction, it emits a call to
arch_bpf_timed_may_goto() with a stack offset in BPF_REG_AX (arm64 r9)
and expects a count value to be returned in the same register. The
verifier doesn't save or restore any registers before emitting this
call.

arch_bpf_timed_may_goto() should act as a trampoline to call
bpf_check_timed_may_goto() with AAPCS64 calling convention.

To support this custom calling convention, implement
arch_bpf_timed_may_goto() in assembly and make sure BPF caller saved
registers are saved and restored, call bpf_check_timed_may_goto with
arm64 calling convention where first argument and return value both are
in x0, then put the result back into BPF_REG_AX before returning.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Xu Kuohai <xukuohai@huawei.com>
Link: https://lore.kernel.org/r/20250827113245.52629-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agoMerge branch 'libbpf-fix-usdt-sib-argument-handling-causing-unrecognized-register...
Andrii Nakryiko [Wed, 27 Aug 2025 22:41:05 +0000 (15:41 -0700)] 
Merge branch 'libbpf-fix-usdt-sib-argument-handling-causing-unrecognized-register-error'

Jiawei Zhao says:

====================
libbpf: fix USDT SIB argument handling causing unrecognized register error

When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array, e.g. "1@-96(%rbp,%rax,8)".

The current USDT implementation in libbpf cannot parse these two formats,
causing `bpf_program__attach_usdt()` to fail with -ENOENT
(unrecognized register).

This patch series adds support for SIB addressing mode in USDT probes.
The main changes include:
- add correct handling logic for SIB-addressed arguments in
  `parse_usdt_arg`.
- add an usdt_o2 test case to cover SIB addressing mode.

Testing shows that the SIB probe correctly generates 8@(%rcx,%rax,8)
argument spec and passes all validation checks.

The modification history of this patch series:
Change since v1:
- refactor the code to make it more readable
- modify the commit message to explain why and how

Change since v2:
- fix the `scale` uninitialized error

Change since v3:
- force -O2 optimization for usdt.test.o to generate SIB addressing usdt
  and pass all test cases.

Change since v4:
- split the patch into two parts, one for the fix and the other for the
  test

Change since v5:
- Only enable optimization for x86 architecture to generate SIB addressing
  usdt argument spec.

Change since v6:
- Add an usdt_o2 test case to cover SIB addressing mode.
- Reinstate the usdt.c test case.

Change since v7:
- Refactor modifications to __bpf_usdt_arg_spec to avoid increasing its size,
  achieving better compatibility
- Fix some minor code style issues
- Refactor the usdt_o2 test case, removing semaphore and adding GCC attribute
  to force -O2 optimization

Change since v8:
- Refactor the usdt_o2 test case, using assembly to force SIB addressing mode.

Change since v9:
- Only enable the usdt_o2 test case on x86_64 and i386 architectures since the
  SIB addressing mode is only supported on x86_64 and i386.

Change since v10:
- Replace `__attribute__((optimize("O2")))` with `#pragma GCC optimize("O1")`
  to fix the issue where the optimized compilation condition works improperly.
- Renamed test case usdt_o2 and relevant files name to usdt_o1 in that O1
  level optimization is enough to generate SIB addressing usdt argument spec.

Change since v11:
- Replace `STAP_PROBE1` with `STAP_PROBE_ASM`
- Use bit fields instead of bit shifting operations
- Merge the usdt_o1 test case into the usdt test case

Change since v12:
- This patch is same with the v12 but with a new version number.

Change since v13(resolve some review comments):
- https://lore.kernel.org/bpf/CAEf4BzZWd2zUC=U6uGJFF3EMZ7zWGLweQAG3CJWTeHy-5yFEPw@mail.gmail.com/
- https://lore.kernel.org/bpf/CAEf4Bzbs3hV_Q47+d93tTX13WkrpkpOb4=U04mZCjHyZg4aVdw@mail.gmail.com/

Change since v14:
- fix a typo in __bpf_usdt_arg_spec

Change since v15(resolve some review comments):
- https://lore.kernel.org/bpf/CAEf4BzaxuYijEfQMDFZ+CQdjxLuDZiesUXNA-SiopS+5+VxRaA@mail.gmail.com/
- https://lore.kernel.org/bpf/CAEf4BzaHi5kpuJ6OVvDU62LT5g0qHbWYMfb_FBQ3iuvvUF9fag@mail.gmail.com/
- https://lore.kernel.org/bpf/d438bf3a-a9c9-4d34-b814-63f2e9bb3a85@linux.dev/
====================

Link: https://patch.msgid.link/20250827053128.1301287-1-phoenix500526@163.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2 months agoselftests/bpf: Enrich subtest_basic_usdt case in selftests to cover SIB handling...
Jiawei Zhao [Wed, 27 Aug 2025 05:31:28 +0000 (05:31 +0000)] 
selftests/bpf: Enrich subtest_basic_usdt case in selftests to cover SIB handling logic

When using GCC on x86-64 to compile an usdt prog with -O1 or higher
optimization, the compiler will generate SIB addressing mode for global
array, e.g. "1@-96(%rbp,%rax,8)".

In this patch:
- enrich subtest_basic_usdt test case to cover SIB addressing usdt argument spec
  handling logic

Signed-off-by: Jiawei Zhao <phoenix500526@163.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250827053128.1301287-3-phoenix500526@163.com
2 months agolibbpf: Fix USDT SIB argument handling causing unrecognized register error
Jiawei Zhao [Wed, 27 Aug 2025 05:31:27 +0000 (05:31 +0000)] 
libbpf: Fix USDT SIB argument handling causing unrecognized register error

On x86-64, USDT arguments can be specified using Scale-Index-Base (SIB)
addressing, e.g. "1@-96(%rbp,%rax,8)". The current USDT implementation
in libbpf cannot parse this format, causing `bpf_program__attach_usdt()`
to fail with -ENOENT (unrecognized register).

This patch fixes this by implementing the necessary changes:
- add correct handling for SIB-addressed arguments in `bpf_usdt_arg`.
- add adaptive support to `__bpf_usdt_arg_type` and
  `__bpf_usdt_arg_spec` to represent SIB addressing parameters.

Signed-off-by: Jiawei Zhao <phoenix500526@163.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250827053128.1301287-2-phoenix500526@163.com
2 months agoselftests/bpf: Fix typos and grammar in test sources
Shubham Sharma [Tue, 26 Aug 2025 12:57:46 +0000 (18:27 +0530)] 
selftests/bpf: Fix typos and grammar in test sources

Fix spelling typos and grammar errors in BPF selftests source code.

Signed-off-by: Shubham Sharma <slopixelz@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250826125746.17983-1-slopixelz@gmail.com
2 months agobpf: Add selftest to check the verifier's abstract multiplication
Nandakumar Edamana [Tue, 26 Aug 2025 03:45:24 +0000 (09:15 +0530)] 
bpf: Add selftest to check the verifier's abstract multiplication

Add new selftest to test the abstract multiplication technique(s) used
by the verifier, following the recent improvement in tnum
multiplication (tnum_mul). One of the newly added programs,
verifier_mul/mul_precise, results in a false positive with the old
tnum_mul, while the program passes with the latest one.

Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250826034524.2159515-2-nandakumar@nandakumar.co.in
2 months agobpf: Improve the general precision of tnum_mul
Nandakumar Edamana [Tue, 26 Aug 2025 03:45:23 +0000 (09:15 +0530)] 
bpf: Improve the general precision of tnum_mul

Drop the value-mask decomposition technique and adopt straightforward
long-multiplication with a twist: when LSB(a) is uncertain, find the
two partial products (for LSB(a) = known 0 and LSB(a) = known 1) and
take a union.

Experiment shows that applying this technique in long multiplication
improves the precision in a significant number of cases (at the cost
of losing precision in a relatively lower number of cases).

Signed-off-by: Nandakumar Edamana <nandakumar@nandakumar.co.in>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Reviewed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250826034524.2159515-1-nandakumar@nandakumar.co.in
2 months agoMerge branch 's390-bpf-add-s390-jit-support-for-timed-may_goto'
Alexei Starovoitov [Tue, 26 Aug 2025 23:51:52 +0000 (16:51 -0700)] 
Merge branch 's390-bpf-add-s390-jit-support-for-timed-may_goto'

Ilya Leoshkevich says:

====================
s390/bpf: Add s390 JIT support for timed may_goto

v1: https://lore.kernel.org/bpf/20250821103256.291412-1-iii@linux.ibm.com/
v1 -> v2: Fix test_stream_errors (caught by CI).

This series adds timed may_goto implementation to the s390x JIT.
Patch 1 is the implementation itself, patches 2-5 are the associated
test changes.
====================

Link: https://patch.msgid.link/20250821113339.292434-1-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agoselftests/bpf: Remove may_goto tests from DENYLIST.s390x
Ilya Leoshkevich [Thu, 21 Aug 2025 11:25:59 +0000 (13:25 +0200)] 
selftests/bpf: Remove may_goto tests from DENYLIST.s390x

The may_goto instruction is now fully supported on s390x, including the
timed implementation, so remove the respective test from the denylist.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-6-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agoselftests/bpf: Enable timed may_goto verifier tests on s390x
Ilya Leoshkevich [Thu, 21 Aug 2025 11:25:58 +0000 (13:25 +0200)] 
selftests/bpf: Enable timed may_goto verifier tests on s390x

Now that the timed may_goto implementation is available on s390x,
enable the respective verifier tests.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-5-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agoselftests/bpf: Add __arch_s390x macro
Ilya Leoshkevich [Thu, 21 Aug 2025 11:25:57 +0000 (13:25 +0200)] 
selftests/bpf: Add __arch_s390x macro

Make it possible to limit certain tests to s390x, just like it's
already done for x86_64, arm64, and riscv64.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-4-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agoselftests/bpf: Add a missing newline to the "bad arch spec" message
Ilya Leoshkevich [Thu, 21 Aug 2025 11:25:56 +0000 (13:25 +0200)] 
selftests/bpf: Add a missing newline to the "bad arch spec" message

Fix error messages like this one:

  parse_test_spec:FAIL:569 bad arch spec: 's390x'process_subtest:FAIL:1153 Can't parse test spec for program 'may_goto_simple'

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-3-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agos390/bpf: Add s390 JIT support for timed may_goto
Ilya Leoshkevich [Thu, 21 Aug 2025 11:25:55 +0000 (13:25 +0200)] 
s390/bpf: Add s390 JIT support for timed may_goto

The verifier provides an architecture-independent implementation of the
may_goto instruction, which is currently used on s390x, but it has a
downside: there is no way to prevent progs using it from running for a
very long time.

The solution to this problem is an alternative timed implementation,
which requires architecture-specific bits. Its availability is signaled
to the verifier by bpf_jit_supports_timed_may_goto() returning true.

The verifier then emits a call to arch_bpf_timed_may_goto() using a
non-standard calling convention. This function must act as a trampoline
for bpf_check_timed_may_goto().

Implement bpf_jit_supports_timed_may_goto(), account for the special
calling convention in the BPF_CALL implementation, and implement
arch_bpf_timed_may_goto().

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Link: https://lore.kernel.org/r/20250821113339.292434-2-iii@linux.ibm.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agoselftests/bpf: Remove entries from config.{arch} already present in config
Tiezhu Yang [Tue, 26 Aug 2025 06:50:57 +0000 (14:50 +0800)] 
selftests/bpf: Remove entries from config.{arch} already present in config

`config.{arch}` had entries already present in `config`.

When generating the config used by vmtest, concatenate the `config` file
with the `config.{arch}` one, making those entries duplicated, so remove
those duplications.

Use the following command to get the differences:

  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.x86_64) <(sort tools/testing/selftests/bpf/config)
  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.aarch64) <(sort tools/testing/selftests/bpf/config)
  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.riscv64) <(sort tools/testing/selftests/bpf/config)
  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.ppc64el) <(sort tools/testing/selftests/bpf/config)
  $ comm -1 -2  <(sort tools/testing/selftests/bpf/config.s390x) <(sort tools/testing/selftests/bpf/config)

This is similar with commit 7a42af4b94f1 ("selftests/bpf: Remove entries
from config.s390x already present in config").

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250826065057.11415-1-yangtiezhu@loongson.cn
2 months agoMerge branch 'bpf-introduce-and-use-rcu_read_lock_dont_migrate'
Alexei Starovoitov [Tue, 26 Aug 2025 01:52:17 +0000 (18:52 -0700)] 
Merge branch 'bpf-introduce-and-use-rcu_read_lock_dont_migrate'

Menglong Dong says:

====================
bpf: introduce and use rcu_read_lock_dont_migrate

migrate_disable() and rcu_read_lock() are used to together in many case in
bpf. However, when PREEMPT_RCU is not enabled, rcu_read_lock() will
disable preemption, which indicate migrate_disable(), so we don't need to
call it in this case.

In this series, we introduce rcu_read_lock_dont_migrate and
rcu_read_unlock_migrate, which will call migrate_disable and
migrate_enable only when PREEMPT_RCU enabled. And use
rcu_read_lock_dont_migrate in bpf subsystem.

Changes since V2:
* make rcu_read_lock_dont_migrate() more compatible by using IS_ENABLED()

Changes since V1:
* introduce rcu_read_lock_dont_migrate() instead of
  rcu_migrate_disable() + rcu_read_lock()
====================

Link: https://patch.msgid.link/20250821090609.42508-1-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agobpf: use rcu_read_lock_dont_migrate() for trampoline.c
Menglong Dong [Thu, 21 Aug 2025 09:06:09 +0000 (17:06 +0800)] 
bpf: use rcu_read_lock_dont_migrate() for trampoline.c

Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in
trampoline.c to obtain better performance when PREEMPT_RCU is not enabled.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250821090609.42508-8-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agobpf: use rcu_read_lock_dont_migrate() for bpf_prog_run_array_cg()
Menglong Dong [Thu, 21 Aug 2025 09:06:08 +0000 (17:06 +0800)] 
bpf: use rcu_read_lock_dont_migrate() for bpf_prog_run_array_cg()

Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in
bpf_prog_run_array_cg to obtain better performance when PREEMPT_RCU is
not enabled.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250821090609.42508-7-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agobpf: use rcu_read_lock_dont_migrate() for bpf_task_storage_free()
Menglong Dong [Thu, 21 Aug 2025 09:06:07 +0000 (17:06 +0800)] 
bpf: use rcu_read_lock_dont_migrate() for bpf_task_storage_free()

Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in
bpf_task_storage_free to obtain better performance when PREEMPT_RCU is
not enabled.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250821090609.42508-6-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agobpf: use rcu_read_lock_dont_migrate() for bpf_iter_run_prog()
Menglong Dong [Thu, 21 Aug 2025 09:06:06 +0000 (17:06 +0800)] 
bpf: use rcu_read_lock_dont_migrate() for bpf_iter_run_prog()

Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in
bpf_iter_run_prog to obtain better performance when PREEMPT_RCU is
not enabled.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250821090609.42508-5-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agobpf: use rcu_read_lock_dont_migrate() for bpf_inode_storage_free()
Menglong Dong [Thu, 21 Aug 2025 09:06:05 +0000 (17:06 +0800)] 
bpf: use rcu_read_lock_dont_migrate() for bpf_inode_storage_free()

Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in
bpf_inode_storage_free to obtain better performance when PREEMPT_RCU is
not enabled.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250821090609.42508-4-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agobpf: use rcu_read_lock_dont_migrate() for bpf_cgrp_storage_free()
Menglong Dong [Thu, 21 Aug 2025 09:06:04 +0000 (17:06 +0800)] 
bpf: use rcu_read_lock_dont_migrate() for bpf_cgrp_storage_free()

Use rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate() in
bpf_cgrp_storage_free to obtain better performance when PREEMPT_RCU is
not enabled.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Link: https://lore.kernel.org/r/20250821090609.42508-3-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agorcu: add rcu_read_lock_dont_migrate()
Menglong Dong [Thu, 21 Aug 2025 09:06:03 +0000 (17:06 +0800)] 
rcu: add rcu_read_lock_dont_migrate()

migrate_disable() is called to disable migration in the kernel, and it is
often used together with rcu_read_lock().

However, with PREEMPT_RCU disabled, it's unnecessary, as rcu_read_lock()
will always disable preemption, which will also disable migration.

Introduce rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate(),
which will do the migration enable and disable only when PREEMPT_RCU.

Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn>
Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Link: https://lore.kernel.org/r/20250821090609.42508-2-dongml2@chinatelecom.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2 months agobpf: Remove preempt_disable in bpf_try_get_buffers
Tao Chen [Tue, 19 Aug 2025 12:56:38 +0000 (20:56 +0800)] 
bpf: Remove preempt_disable in bpf_try_get_buffers

Now BPF program will run with migration disabled, so it is safe
to access this_cpu_inc_return(bpf_bprintf_nest_level).

Fixes: d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250819125638.2544715-1-chen.dylane@linux.dev
2 months agobpf: Use sha1() instead of sha1_transform() in bpf_prog_calc_tag()
Eric Biggers [Mon, 11 Aug 2025 20:16:15 +0000 (13:16 -0700)] 
bpf: Use sha1() instead of sha1_transform() in bpf_prog_calc_tag()

Now that there's a proper SHA-1 library API, just use that instead of
the low-level SHA-1 compression function.  This eliminates the need for
bpf_prog_calc_tag() to implement the SHA-1 padding itself.  No
functional change; the computed tags remain the same.

Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250811201615.564461-1-ebiggers@kernel.org
2 months agoselftests/bpf: Tests for is_scalar_branch_taken tnum logic
Paul Chaignon [Wed, 20 Aug 2025 13:19:13 +0000 (15:19 +0200)] 
selftests/bpf: Tests for is_scalar_branch_taken tnum logic

This patch adds tests for the new jeq and jne logic in
is_scalar_branch_taken. The following shows the first test failing
before the previous patch is applied. Once the previous patch is
applied, the verifier can use the tnum values to deduce that instruction
7 is dead code.

  0: call bpf_get_prandom_u32#7  ; R0_w=scalar()
  1: w0 = w0                     ; R0_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff))
  2: r0 >>= 30                   ; R0_w=scalar(smin=smin32=0,smax=umax=smax32=umax32=3,var_off=(0x0; 0x3))
  3: r0 <<= 30                   ; R0_w=scalar(smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000))
  4: r1 = r0                     ; R0_w=scalar(id=1,smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000)) R1_w=scalar(id=1,smin=0,smax=umax=umax32=0xc0000000,smax32=0x40000000,var_off=(0x0; 0xc0000000))
  5: r1 += 1024                  ; R1_w=scalar(smin=umin=umin32=1024,smax=umax=umax32=0xc0000400,smin32=0x80000400,smax32=0x40000400,var_off=(0x400; 0xc0000000))
  6: if r1 != r0 goto pc+1       ; R0_w=scalar(id=1,smin=umin=umin32=1024,smax=umax=umax32=0xc0000000,smin32=0x80000400,smax32=0x40000000,var_off=(0x400; 0xc0000000)) R1_w=scalar(smin=umin=umin32=1024,smax=umax=umax32=0xc0000000,smin32=0x80000400,smax32=0x40000400,var_off=(0x400; 0xc0000000))
  7: r10 = 0
  frame pointer is read only

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/550004f935e2553bdb2fb1f09cbde7d0452112d0.1755694148.git.paul.chaignon@gmail.com
2 months agobpf: Use tnums for JEQ/JNE is_branch_taken logic
Paul Chaignon [Wed, 20 Aug 2025 13:18:06 +0000 (15:18 +0200)] 
bpf: Use tnums for JEQ/JNE is_branch_taken logic

In the following toy program (reg states minimized for readability), R0
and R1 always have different values at instruction 6. This is obvious
when reading the program but cannot be guessed from ranges alone as
they overlap (R0 in [0; 0xc0000000], R1 in [1024; 0xc0000400]).

  0: call bpf_get_prandom_u32#7  ; R0_w=scalar()
  1: w0 = w0                     ; R0_w=scalar(var_off=(0x0; 0xffffffff))
  2: r0 >>= 30                   ; R0_w=scalar(var_off=(0x0; 0x3))
  3: r0 <<= 30                   ; R0_w=scalar(var_off=(0x0; 0xc0000000))
  4: r1 = r0                     ; R1_w=scalar(var_off=(0x0; 0xc0000000))
  5: r1 += 1024                  ; R1_w=scalar(var_off=(0x400; 0xc0000000))
  6: if r1 != r0 goto pc+1

Looking at tnums however, we can deduce that R1 is always different from
R0 because their tnums don't agree on known bits. This patch uses this
logic to improve is_scalar_branch_taken in case of BPF_JEQ and BPF_JNE.

This change has a tiny impact on complexity, which was measured with
the Cilium complexity CI test. That test covers 72 programs with
various build and load time configurations for a total of 970 test
cases. For 80% of test cases, the patch has no impact. On the other
test cases, the patch decreases complexity by only 0.08% on average. In
the best case, the verifier needs to walk 3% less instructions and, in
the worst case, 1.5% more. Overall, the patch has a small positive
impact, especially for our largest programs.

Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Shung-Hsi Yu <shung-hsi.yu@suse.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/be3ee70b6e489c49881cb1646114b1d861b5c334.1755694147.git.paul.chaignon@gmail.com
2 months agoselftests/bpf: Use vmlinux.h for BPF programs
Hengqi Chen [Thu, 21 Aug 2025 03:02:54 +0000 (03:02 +0000)] 
selftests/bpf: Use vmlinux.h for BPF programs

Some of the bpf test progs still use linux/libc headers.
Let's use vmlinux.h instead like the rest of test progs.
This will also ease cross compiling.

Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250821030254.398826-1-hengqi.chen@gmail.com
2 months agolibbpf: Add documentation to version and error API functions
Cryolitia PukNgae [Wed, 20 Aug 2025 09:22:42 +0000 (17:22 +0800)] 
libbpf: Add documentation to version and error API functions

Add documentation for the following API functions:

- libbpf_major_version()
- libbpf_minor_version()
- libbpf_version_string()
- libbpf_strerror()

Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250820-libbpf-doc-1-v1-1-13841f25a134@uniontech.com
2 months agolibbpf: Export bpf_object__prepare symbol
Mykyta Yatsenko [Tue, 19 Aug 2025 21:51:19 +0000 (22:51 +0100)] 
libbpf: Export bpf_object__prepare symbol

Add missing LIBBPF_API macro for bpf_object__prepare function to enable
its export. libbpf.map had bpf_object__prepare already listed.

Fixes: 1315c28ed809 ("libbpf: Split bpf object load into prepare/load")
Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250819215119.37795-1-mykyta.yatsenko5@gmail.com
2 months agos390/bpf: Use direct calls and jumps where possible
Ilya Leoshkevich [Tue, 19 Aug 2025 10:20:38 +0000 (12:20 +0200)] 
s390/bpf: Use direct calls and jumps where possible

After the V!=R rework (commit c98d2ecae08f ("s390/mm: Uncouple physical
vs virtual address spaces")), all kernel code and related data are
allocated within a 4G region, making it possible to use relative
addressing in BPF code more extensively.

Convert as many indirect calls and jumps to direct calls as possible,
namely:

* BPF_CALL
* __bpf_tramp_enter()
* __bpf_tramp_exit()
* __bpf_prog_enter()
* __bpf_prog_exit()
* fentry
* fmod_ret
* fexit
* BPF_TRAMP_F_CALL_ORIG without BPF_TRAMP_F_ORIG_STACK
* Trampoline returns without BPF_TRAMP_F_SKIP_FRAME and
  BPF_TRAMP_F_ORIG_STACK

The following indirect calls and jumps remain:

* Prog returns
* Trampoline returns with BPF_TRAMP_F_SKIP_FRAME or
  BPF_TRAMP_F_ORIG_STACK
* BPF_TAIL_CALL
* BPF_TRAMP_F_CALL_ORIG with BPF_TRAMP_F_ORIG_STACK

As a result, only one usage of call_r1() remains, so inline it.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250819102116.252203-1-iii@linux.ibm.com
2 months agobpftool: Add kernel.kptr_restrict hint for no instructions
Vincent Li [Mon, 18 Aug 2025 16:51:13 +0000 (09:51 -0700)] 
bpftool: Add kernel.kptr_restrict hint for no instructions

From bpftool's github repository issue [0]: When a Linux distribution
has the kernel.kptr_restrict set to 2, bpftool prog dump jited returns
"no instructions returned". This message can be puzzling to bpftool
users who are not familiar with kernel BPF internals, so add a small
hint for bpftool users to check the kernel.kptr_restrict setting
similar to the DUMP_XLATED case. Outside of kernel.kptr_restrict, no
instructions could also be returned in case the JIT was disabled.

Signed-off-by: Vincent Li <vincent.mc.li@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://github.com/libbpf/bpftool/issues/184
Link: https://lore.kernel.org/bpf/20250818165113.15982-1-vincent.mc.li@gmail.com
2 months agoMerge branch 'bpf-next/skb-meta-dynptr' into 'bpf-next/master'
Martin KaFai Lau [Tue, 19 Aug 2025 00:52:13 +0000 (17:52 -0700)] 
Merge branch 'bpf-next/skb-meta-dynptr' into 'bpf-next/master'

Merge 'skb-meta-dynptr' branch into 'master' branch. No conflict.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2 months agoMerge branch 'add-a-dynptr-type-for-skb-metadata-for-tc-bpf'
Martin KaFai Lau [Mon, 18 Aug 2025 17:29:43 +0000 (10:29 -0700)] 
Merge branch 'add-a-dynptr-type-for-skb-metadata-for-tc-bpf'

Jakub Sitnicki says:

====================
Add a dynptr type for skb metadata for TC BPF

TL;DR
-----

This is the first step in an effort which aims to enable skb metadata
access for all BPF programs which operate on an skb context.

By skb metadata we mean the custom metadata area which can be allocated
from an XDP program with the bpf_xdp_adjust_meta helper [1]. Network stack
code accesses it using the skb_metadata_* helpers.

Changelog
---------
Changes in v7:
- Make dynptr read-only for cloned skbs for now. (Martin)
- Extend tests for skb clones to cover writes to metadata.
- Drop Jesse's review stamp for patch 2 due to an update.
- Link to v6: https://lore.kernel.org/r/20250804-skb-metadata-thru-dynptr-v6-0-05da400bfa4b@cloudflare.com

Changes in v6:
- Enable CONFIG_NET_ACT_MIRRED for bpf selftests to fix CI failure
- Switch from u32 to matchall classifier, which bpf selftests already use
- Link to v5: https://lore.kernel.org/r/20250731-skb-metadata-thru-dynptr-v5-0-f02f6b5688dc@cloudflare.com

Changes in v5:
- Invalidate skb payload and metadata slices on write to metadata. (Martin)
- Drop redundant bounds check in bpf_skb_meta_*(). (Martin)
- Check for unexpected flags in __bpf_dynptr_write(). (Martin)
- Fold bpf_skb_meta_{load,store}_bytes() into callers.
- Add a test for metadata access when an skb clone has been modified.
- Drop Eduard's Ack for patch 3. Patch updated.
- Keep Eduard's Ack for patches 4-8.
- Add Jesse's stamp from an internal review.
- Link to v4: https://lore.kernel.org/r/20250723-skb-metadata-thru-dynptr-v4-0-a0fed48bcd37@cloudflare.com

Changes in v4:
- Kill bpf_dynptr_from_skb_meta_rdonly. Not needed for now. (Marin)
- Add a test to cover passing OOB offsets to dynptr ops. (Eduard)
- Factor out bounds checks from bpf_dynptr_{read,write,slice}. (Eduard)
- Squash patches:
      bpf: Enable read access to skb metadata with bpf_dynptr_read
      bpf: Enable write access to skb metadata with bpf_dynptr_write
      bpf: Enable read-write access to skb metadata with dynptr slice
- Kept Eduard's Acks for v3 on unchanged patches.
- Link to v3: https://lore.kernel.org/r/20250721-skb-metadata-thru-dynptr-v3-0-e92be5534174@cloudflare.com

Changes in v3:
- Add a kfunc set for skb metadata access. Limited to TC BPF. (Martin)
- Drop patches related to skb metadata access outside of TC BPF:
      net: Clear skb metadata on handover from device to protocol
      selftests/bpf: Cover lack of access to skb metadata at ip layer
      selftests/bpf: Count successful bpf program runs
- Link to v2: https://lore.kernel.org/r/20250716-skb-metadata-thru-dynptr-v2-0-5f580447e1df@cloudflare.com

Changes in v2:
- Switch to a dedicated dynptr type for skb metadata (Andrii)
- Add verifier test coverage since we now touch its code
- Add missing test coverage for bpf_dynptr_adjust and access at an offset
- Link to v1: https://lore.kernel.org/r/20250630-skb-metadata-thru-dynptr-v1-0-f17da13625d8@cloudflare.com

Overview
--------

Today, the skb metadata is accessible only by the BPF TC ingress programs
through the __sk_buff->data_meta pointer. We propose a three step plan to
make skb metadata available to all other BPF programs which operate on skb
objects:

 1) Add a dynptr type for skb metadata (this patch set)

    This is a preparatory step, but it also stands on its own. Here we
    enable access to the skb metadata through a bpf_dynptr, the same way we
    can already access the skb payload today.

    As the next step (2), we want to relocate the metadata as skb travels
    through the network stack in order to persist it. That will require a
    safe way to access the metadata area irrespective of its location.

    This is where the dynptr [2] comes into play. It solves exactly that
    problem. A dynptr to skb metadata can be backed by a memory area that
    resides in a different location depending on the code path.

 2) Persist skb metadata past the TC hook (future)

    Having the metadata in front of the packet headers as the skb travels
    through the network stack is problematic - see the discussion of
    alternative approaches below. Hence, we plan to relocate it as
    necessary past the TC hook.

    Where to relocate it? We don't know yet. There are a couple of
    options: (i) move it to the top of skb headroom, or (ii) allocate
    dedicated memory for it.  They are not mutually exclusive. The right
    solution might be a mix.

    When to relocate it? That is also an open question. It could be done
    during device to protocol handover or lazily when headers get pushed or
    headroom gets resized.

 3) skb dynptr for sockops, sk_lookup, etc. (future)

    There are BPF program types don't operate on __sk_buff context, but
    either have, or could have, access to the skb itself. As a final touch,
    we want to provide a way to create an skb metadata dynptr for these
    program types.

TIMTOWDI
--------

Alternative approaches which we considered:

* Keep the metadata always in front of skb->data

We think it is a bad idea for two reasons, outlined below. Nevertheless we
are open to it, if necessary.

 1) Performance concerns

    It would require the network stack to move the metadata on each header
    pull/push - see skb_reorder_vlan_header() [3] for an example. While
    doable, there is an expected performance overhead.

 2) Potential for bugs

    In addition to updating skb_push/pull and pskp_expand_head, we would
    need to audit any code paths which operate on skb->data pointer
    directly without going through the helpers. This creates a "known
    unknown" risk.

* Design a new custom metadata area from scratch

We have tried that in Arthur's patch set [4]. One of the outcomes of the
discussion there was that we don't want to have two places to store custom
metadata. Hence the change of approach to make the existing custom metadata
area work.

-jkbs

[1] https://docs.ebpf.io/linux/helper-function/bpf_xdp_adjust_meta/
[2] https://docs.ebpf.io/linux/concepts/dynptrs/
[3] https://elixir.bootlin.com/linux/v6.16-rc6/source/net/core/skbuff.c#L6211
[4] https://lore.kernel.org/all/20250422-afabre-traits-010-rfc2-v2-0-92bcc6b146c9@arthurfabre.com/
====================

Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-0-8a39e636e0fb@cloudflare.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2 months agoselftests/bpf: Cover metadata access from a modified skb clone
Jakub Sitnicki [Thu, 14 Aug 2025 09:59:35 +0000 (11:59 +0200)] 
selftests/bpf: Cover metadata access from a modified skb clone

Demonstrate that, when processing an skb clone, the metadata gets truncated
if the program contains a direct write to either the payload or the
metadata, due to an implicit unclone in the prologue, and otherwise the
dynptr to the metadata is limited to being read-only.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-9-8a39e636e0fb@cloudflare.com
2 months agoselftests/bpf: Cover read/write to skb metadata at an offset
Jakub Sitnicki [Thu, 14 Aug 2025 09:59:34 +0000 (11:59 +0200)] 
selftests/bpf: Cover read/write to skb metadata at an offset

Exercise r/w access to skb metadata through an offset-adjusted dynptr,
read/write helper with an offset argument, and a slice starting at an
offset.

Also check for the expected errors when the offset is out of bounds.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-8-8a39e636e0fb@cloudflare.com
2 months agoselftests/bpf: Cover write access to skb metadata via dynptr
Jakub Sitnicki [Thu, 14 Aug 2025 09:59:33 +0000 (11:59 +0200)] 
selftests/bpf: Cover write access to skb metadata via dynptr

Add tests what exercise writes to skb metadata in two ways:
1. indirectly, using bpf_dynptr_write helper,
2. directly, using a read-write dynptr slice.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-7-8a39e636e0fb@cloudflare.com
2 months agoselftests/bpf: Cover read access to skb metadata via dynptr
Jakub Sitnicki [Thu, 14 Aug 2025 09:59:32 +0000 (11:59 +0200)] 
selftests/bpf: Cover read access to skb metadata via dynptr

Exercise reading from SKB metadata area in two new ways:
1. indirectly, with bpf_dynptr_read(), and
2. directly, with bpf_dynptr_slice().

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-6-8a39e636e0fb@cloudflare.com
2 months agoselftests/bpf: Parametrize test_xdp_context_tuntap
Jakub Sitnicki [Thu, 14 Aug 2025 09:59:31 +0000 (11:59 +0200)] 
selftests/bpf: Parametrize test_xdp_context_tuntap

We want to add more test cases to cover different ways to access the
metadata area. Prepare for it. Pull up the skeleton management.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-5-8a39e636e0fb@cloudflare.com
2 months agoselftests/bpf: Pass just bpf_map to xdp_context_test helper
Jakub Sitnicki [Thu, 14 Aug 2025 09:59:30 +0000 (11:59 +0200)] 
selftests/bpf: Pass just bpf_map to xdp_context_test helper

Prepare for parametrizing the xdp_context tests. The assert_test_result
helper doesn't need the whole skeleton. Pass just what it needs.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-4-8a39e636e0fb@cloudflare.com
2 months agoselftests/bpf: Cover verifier checks for skb_meta dynptr type
Jakub Sitnicki [Thu, 14 Aug 2025 09:59:29 +0000 (11:59 +0200)] 
selftests/bpf: Cover verifier checks for skb_meta dynptr type

dynptr for skb metadata behaves the same way as the dynptr for skb data
with one exception - writes to skb_meta dynptr don't invalidate existing
skb and skb_meta slices.

Duplicate those the skb dynptr tests which we can, since
bpf_dynptr_from_skb_meta kfunc can be called only from TC BPF, to cover the
skb_meta dynptr verifier checks.

Also add a couple of new tests (skb_data_valid_*) to ensure we don't
invalidate the slices in the mentioned case, which are specific to skb_meta
dynptr.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-3-8a39e636e0fb@cloudflare.com
2 months agobpf: Enable read/write access to skb metadata through a dynptr
Jakub Sitnicki [Thu, 14 Aug 2025 09:59:28 +0000 (11:59 +0200)] 
bpf: Enable read/write access to skb metadata through a dynptr

Now that we can create a dynptr to skb metadata, make reads to the metadata
area possible with bpf_dynptr_read() or through a bpf_dynptr_slice(), and
make writes to the metadata area possible with bpf_dynptr_write() or
through a bpf_dynptr_slice_rdwr().

Note that for cloned skbs which share data with the original, we limit the
skb metadata dynptr to be read-only since we don't unclone on a
bpf_dynptr_write to metadata.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-2-8a39e636e0fb@cloudflare.com
2 months agobpf: Add dynptr type for skb metadata
Jakub Sitnicki [Thu, 14 Aug 2025 09:59:27 +0000 (11:59 +0200)] 
bpf: Add dynptr type for skb metadata

Add a dynptr type, similar to skb dynptr, but for the skb metadata access.

The dynptr provides an alternative to __sk_buff->data_meta for accessing
the custom metadata area allocated using the bpf_xdp_adjust_meta() helper.

More importantly, it abstracts away the fact where the storage for the
custom metadata lives, which opens up the way to persist the metadata by
relocating it as the skb travels through the network stack layers.

Writes to skb metadata invalidate any existing skb payload and metadata
slices. While this is more restrictive that needed at the moment, it leaves
the door open to reallocating the metadata on writes, and should be only a
minor inconvenience to the users.

Only the program types which can access __sk_buff->data_meta today are
allowed to create a dynptr for skb metadata at the moment. We need to
modify the network stack to persist the metadata across layers before
opening up access to other BPF hooks.

Once more BPF hooks gain access to skb_meta dynptr, we will also need to
add a read-only variant of the helper similar to
bpf_dynptr_from_skb_rdonly.

skb_meta dynptr ops are stubbed out and implemented by subsequent changes.

Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jesse Brandeburg <jbrandeburg@cloudflare.com>
Link: https://patch.msgid.link/20250814-skb-metadata-thru-dynptr-v7-1-8a39e636e0fb@cloudflare.com
2 months agobpf: Add a verbose message when the BTF limit is reached
Anton Protopopov [Sat, 16 Aug 2025 15:15:54 +0000 (15:15 +0000)] 
bpf: Add a verbose message when the BTF limit is reached

When a BPF program which is being loaded reaches the map limit
(MAX_USED_MAPS) or the BTF limit (MAX_USED_BTFS) the -E2BIG is
returned. However, in the former case there is an accompanying
verifier verbose message, and in the latter case there is not.
Add a verbose message to make the behaviour symmetrical.

Reported-by: Kevin Sheldrake <kevin.sheldrake@isovalent.com>
Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250816151554.902995-1-a.s.protopopov@gmail.com
2 months agobpf: Replace get_next_cpu() with cpumask_next_wrap()
Fushuai Wang [Mon, 18 Aug 2025 03:23:44 +0000 (11:23 +0800)] 
bpf: Replace get_next_cpu() with cpumask_next_wrap()

The get_next_cpu() function was only used in one place to find
the next possible CPU, which can be replaced by cpumask_next_wrap().

Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250818032344.23229-1-wangfushuai@baidu.com
2 months agoselftests/bpf: Clobber a lot of registers in tailcall_bpf2bpf_hierarchy tests
Ilya Leoshkevich [Wed, 13 Aug 2025 12:06:31 +0000 (14:06 +0200)] 
selftests/bpf: Clobber a lot of registers in tailcall_bpf2bpf_hierarchy tests

Clobbering a lot of registers and stack slots helps exposing tail call
counter overwrite bugs in JITs.

Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250813121016.163375-5-iii@linux.ibm.com
2 months agos390/bpf: Write back tail call counter for BPF_TRAMP_F_CALL_ORIG
Ilya Leoshkevich [Wed, 13 Aug 2025 12:06:30 +0000 (14:06 +0200)] 
s390/bpf: Write back tail call counter for BPF_TRAMP_F_CALL_ORIG

The tailcall_bpf2bpf_hierarchy_fentry test hangs on s390. Its call
graph is as follows:

  entry()
    subprog_tail()
      trampoline()
        fentry()
        the rest of subprog_tail()  # via BPF_TRAMP_F_CALL_ORIG
        return to entry()

The problem is that the rest of subprog_tail() increments the tail call
counter, but the trampoline discards the incremented value. This
results in an astronomically large number of tail calls.

Fix by making the trampoline write the incremented tail call counter
back.

Fixes: 528eb2cb87bc ("s390/bpf: Implement arch_prepare_bpf_trampoline()")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250813121016.163375-4-iii@linux.ibm.com
2 months agos390/bpf: Write back tail call counter for BPF_PSEUDO_CALL
Ilya Leoshkevich [Wed, 13 Aug 2025 12:06:29 +0000 (14:06 +0200)] 
s390/bpf: Write back tail call counter for BPF_PSEUDO_CALL

The tailcall_bpf2bpf_hierarchy_1 test hangs on s390. Its call graph is
as follows:

  entry()
    subprog_tail()
      bpf_tail_call_static(0) -> entry + tail_call_start
    subprog_tail()
      bpf_tail_call_static(0) -> entry + tail_call_start

entry() copies its tail call counter to the subprog_tail()'s frame,
which then increments it. However, the incremented result is discarded,
leading to an astronomically large number of tail calls.

Fix by writing the incremented counter back to the entry()'s frame.

Fixes: dd691e847d28 ("s390/bpf: Implement bpf_jit_supports_subprog_tailcalls()")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250813121016.163375-3-iii@linux.ibm.com
2 months agos390/bpf: Do not write tail call counter into helper and kfunc frames
Ilya Leoshkevich [Wed, 13 Aug 2025 12:06:28 +0000 (14:06 +0200)] 
s390/bpf: Do not write tail call counter into helper and kfunc frames

Only BPF functions make use of the tail call counter; helpers and
kfuncs ignore and most likely also clobber it. Writing it into these
functions' frames is pointless and misleading, so do not do it.

Fixes: dd691e847d28 ("s390/bpf: Implement bpf_jit_supports_subprog_tailcalls()")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20250813121016.163375-2-iii@linux.ibm.com
2 months agoMerge branch 'libbpf-fix-reuse-of-devmap'
Andrii Nakryiko [Fri, 15 Aug 2025 23:50:15 +0000 (16:50 -0700)] 
Merge branch 'libbpf-fix-reuse-of-devmap'

Yureka Lilian says:

====================
libbpf: fix reuse of DEVMAP

changes in v3:

- instead of setting BPF_F_RDONLY_PROG on both sides, just
  clear BPF_F_RDONLY_PROG in map_info.map_flags as suggested
  by Andrii Nakryiko

- in the test, use ASSERT_* instead of CHECK
- shorten the test by using open_and_load from the skel
- in the test, drop NULL check before unloading/destroying bpf objs

- start the commit messages with "libbpf" and "selftests/bpf"
  respectively instead of just "bpf"

changes in v2:

- preserve compatibility with older kernels
- add a basic selftest covering the re-use of DEVMAP maps
====================

Link: https://patch.msgid.link/20250814180113.1245565-2-yuka@yuka.dev
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2 months agoselftests/bpf: Add test for DEVMAP reuse
Yureka Lilian [Thu, 14 Aug 2025 18:01:13 +0000 (20:01 +0200)] 
selftests/bpf: Add test for DEVMAP reuse

The test covers basic re-use of a pinned DEVMAP map,
with both matching and mismatching parameters.

Signed-off-by: Yureka Lilian <yuka@yuka.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250814180113.1245565-4-yuka@yuka.dev
2 months agolibbpf: Fix reuse of DEVMAP
Yureka Lilian [Thu, 14 Aug 2025 18:01:12 +0000 (20:01 +0200)] 
libbpf: Fix reuse of DEVMAP

Previously, re-using pinned DEVMAP maps would always fail, because
get_map_info on a DEVMAP always returns flags with BPF_F_RDONLY_PROG set,
but BPF_F_RDONLY_PROG being set on a map during creation is invalid.

Thus, ignore the BPF_F_RDONLY_PROG flag in the flags returned from
get_map_info when checking for compatibility with an existing DEVMAP.

The same problem is handled in a third-party ebpf library:
- https://github.com/cilium/ebpf/issues/925
- https://github.com/cilium/ebpf/pull/930

Fixes: 0cdbb4b09a06 ("devmap: Allow map lookups from eBPF")
Signed-off-by: Yureka Lilian <yuka@yuka.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250814180113.1245565-3-yuka@yuka.dev
2 months agobpf: Remove migrate_disable in kprobe_multi_link_prog_run
Tao Chen [Thu, 14 Aug 2025 12:14:29 +0000 (20:14 +0800)] 
bpf: Remove migrate_disable in kprobe_multi_link_prog_run

Graph tracer framework ensures we won't migrate, kprobe_multi_link_prog_run
called all the way from graph tracer, which disables preemption in
function_graph_enter_regs, as Jiri and Yonghong suggested, there is no
need to use migrate_disable. As a result, some overhead may will be reduced.
And add cant_sleep check for __this_cpu_inc_return.

Fixes: 0dcac2725406 ("bpf: Add multi kprobe link")
Signed-off-by: Tao Chen <chen.dylane@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250814121430.2347454-1-chen.dylane@linux.dev
2 months agobpf/selftests: Fix test_tcpnotify_user
Matt Bobrowski [Fri, 15 Aug 2025 12:12:14 +0000 (12:12 +0000)] 
bpf/selftests: Fix test_tcpnotify_user

Based on a bisect, it appears that commit 7ee988770326 ("timers:
Implement the hierarchical pull model") has somehow inadvertently
broken BPF selftest test_tcpnotify_user. The error that is being
generated by this test is as follows:

FAILED: Wrong stats Expected 10 calls, got 8

It looks like the change allows timer functions to be run on CPUs
different from the one they are armed on. The test had pinned itself
to CPU 0, and in the past the retransmit attempts also occurred on CPU
0. The test had set the max_entries attribute for
BPF_MAP_TYPE_PERF_EVENT_ARRAY to 2 and was calling
bpf_perf_event_output() with BPF_F_CURRENT_CPU, so the entry was
likely to be in range. With the change to allow timers to run on other
CPUs, the current CPU tasked with performing the retransmit might be
bumped and in turn fall out of range, as the event will be filtered
out via __bpf_perf_event_output() using:

    if (unlikely(index >= array->map.max_entries))
            return -E2BIG;

A possible change would be to explicitly set the max_entries attribute
for perf_event_map in test_tcpnotify_kern.c to a value that's at least
as large as the number of CPUs. As it turns out however, if the field
is left unset, then the libbpf will determine the number of CPUs available
on the underlying system and update the max_entries attribute accordingly
in map_set_def_max_entries().

A further problem with the test is that it has a thread that continues
running up until the program exits. The main thread cleans up some
LIBBPF data structures, while the other thread continues to use them,
which inevitably will trigger a SIGSEGV. This can be dealt with by
telling the thread to run for as long as necessary and doing a
pthread_join on it before exiting the program.

Finally, I don't think binding the process to CPU 0 is meaningful for
this test any more, so get rid of that.

Fixes: 435f90a338ae ("selftests/bpf: add a test case for sock_ops perf-event notification")
Signed-off-by: Matt Bobrowski <mattbobrowski@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/aJ8kHhwgATmA3rLf@google.com
2 months agoselftests/bpf: Enable arena atomics tests for RV64
Pu Lehui [Sat, 19 Jul 2025 09:17:30 +0000 (09:17 +0000)] 
selftests/bpf: Enable arena atomics tests for RV64

Enable arena atomics tests for RV64.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-11-pulehui@huaweicloud.com
2 months agoriscv, bpf: Add support arena atomics for RV64
Pu Lehui [Sat, 19 Jul 2025 09:17:29 +0000 (09:17 +0000)] 
riscv, bpf: Add support arena atomics for RV64

Add arena atomics support for RMW atomics and load-acquire and
store-release instructions. Non-Zacas cmpxchg is implemented via loop,
which is not currently supported because it requires more complex
extable and loop logic.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-10-pulehui@huaweicloud.com
2 months agoriscv, bpf: Add ex_insn_off and ex_jmp_off for exception table handling
Pu Lehui [Sat, 19 Jul 2025 09:17:28 +0000 (09:17 +0000)] 
riscv, bpf: Add ex_insn_off and ex_jmp_off for exception table handling

Add ex_insn_off and ex_jmp_off fields to struct rv_jit_context so that
add_exception_handler() does not need to be immediately followed by the
instruction to add the exception table. ex_insn_off indicates the offset
of the instruction to add the exception table, and ex_jmp_off indicates
the offset to jump over the faulting instruction. This is to prepare for
adding the exception table to atomic instructions later, because some
atomic instructions need to perform zext or other operations.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-9-pulehui@huaweicloud.com
2 months agoriscv, bpf: Optimize cmpxchg insn with Zacas support
Pu Lehui [Sat, 19 Jul 2025 09:17:27 +0000 (09:17 +0000)] 
riscv, bpf: Optimize cmpxchg insn with Zacas support

Optimize cmpxchg instruction with amocas.w and amocas.d
Zacas instructions.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-8-pulehui@huaweicloud.com
2 months agoriscv, bpf: Add Zacas instructions
Pu Lehui [Sat, 19 Jul 2025 09:17:26 +0000 (09:17 +0000)] 
riscv, bpf: Add Zacas instructions

Add Zacas instructions introduced by [0] to reduce code size and
improve performance of RV64 JIT.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://github.com/riscvarchive/riscv-zacas/releases/download/v1.0/riscv-zacas.pdf
Link: https://lore.kernel.org/bpf/20250719091730.2660197-7-pulehui@huaweicloud.com
2 months agoriscv, bpf: Add rv_ext_enabled macro for runtime detection extentsion
Pu Lehui [Sat, 19 Jul 2025 09:17:25 +0000 (09:17 +0000)] 
riscv, bpf: Add rv_ext_enabled macro for runtime detection extentsion

Add rv_ext_enabled macro to check whether the runtime detection
extension is enabled.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-6-pulehui@huaweicloud.com
2 months agoriscv: Separate toolchain support dependency from RISCV_ISA_ZACAS
Pu Lehui [Sat, 19 Jul 2025 09:17:24 +0000 (09:17 +0000)] 
riscv: Separate toolchain support dependency from RISCV_ISA_ZACAS

RV64 bpf is going to support ZACAS instructions. Let's separate
toolchain support dependency from RISCV_ISA_ZACAS.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-5-pulehui@huaweicloud.com
2 months agoriscv, bpf: Extract emit_ldx() helper
Pu Lehui [Sat, 19 Jul 2025 09:17:23 +0000 (09:17 +0000)] 
riscv, bpf: Extract emit_ldx() helper

There's a lot of redundant code related to load into register
operations, let's extract emit_ldx() to make code more compact.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-4-pulehui@huaweicloud.com
2 months agoriscv, bpf: Extract emit_st() helper
Pu Lehui [Sat, 19 Jul 2025 09:17:22 +0000 (09:17 +0000)] 
riscv, bpf: Extract emit_st() helper

There's a lot of redundant code related to store from immediate
operations, let's extract emit_st() to make code more compact.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-3-pulehui@huaweicloud.com
2 months agoriscv, bpf: Extract emit_stx() helper
Pu Lehui [Sat, 19 Jul 2025 09:17:21 +0000 (09:17 +0000)] 
riscv, bpf: Extract emit_stx() helper

There's a lot of redundant code related to store from register
operations, let's extract emit_stx() to make code more compact.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Björn Töpel <bjorn@rivosinc.com>
Reviewed-by: Björn Töpel <bjorn@rivosinc.com>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/bpf/20250719091730.2660197-2-pulehui@huaweicloud.com
2 months agoselftests/bpf: Copy test_kmods when installing selftest
Amery Hung [Tue, 12 Aug 2025 17:50:39 +0000 (10:50 -0700)] 
selftests/bpf: Copy test_kmods when installing selftest

Commit d6212d82bf26 ("selftests/bpf: Consolidate kernel modules into
common directory") consolidated the Makefile of test_kmods. However,
since it removed test_kmods from TEST_GEN_PROGS_EXTENDED, the kernel
modules required by bpf selftests are now missing from kselftest_install
when "make install". Fix it by adding test_kmod to TEST_GEN_FILES.

Fixes: d6212d82bf26 ("selftests/bpf: Consolidate kernel modules into common directory")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250812175039.2323570-1-ameryhung@gmail.com
2 months agobpf: Don't use %pK through printk
Thomas Weißschuh [Mon, 11 Aug 2025 12:08:04 +0000 (14:08 +0200)] 
bpf: Don't use %pK through printk

In the past %pK was preferable to %p as it would not leak raw pointer
values into the kernel log.
Since commit ad67b74d2469 ("printk: hash addresses printed with %p")
the regular %p has been improved to avoid this issue.
Furthermore, restricted pointers ("%pK") were never meant to be used
through printk(). They can still unintentionally leak raw pointers or
acquire sleeping locks in atomic contexts.

Switch to the regular pointer formatting which is safer and
easier to reason about.

Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20250811-restricted-pointers-bpf-v1-1-a1d7cc3cb9e7@linutronix.de
2 months agobpf: Replace kvfree with kfree for kzalloc memory
Qianfeng Rong [Mon, 11 Aug 2025 12:39:49 +0000 (20:39 +0800)] 
bpf: Replace kvfree with kfree for kzalloc memory

The 'backedge' pointer is allocated with kzalloc(), which returns
physically contiguous memory. Using kvfree() to deallocate such
memory is functionally safe but semantically incorrect.

Replace kvfree() with kfree() to avoid unnecessary is_vmalloc_addr()
check in kvfree().

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/bpf/20250811123949.552885-1-rongqianfeng@vivo.com
2 months agobpf: Tidy verifier bug message
Paul Chaignon [Mon, 11 Aug 2025 18:58:20 +0000 (20:58 +0200)] 
bpf: Tidy verifier bug message

Yonghong noticed that error messages for potential verifier bugs often
have a '(1)' at the end. This is happening because verifier_bug_if(cond,
env, fmt, args...) prints "(" #cond ")\n" as part of the message and
verifier_bug() is defined as:

  #define verifier_bug(env, fmt, args...) verifier_bug_if(1, env, fmt, ##args)

Hence, verifier_bug() always ends up displaying '(1)'. This small patch
fixes it by having verifier_bug_if conditionally call verifier_bug
instead of the other way around.

Fixes: 1cb0f56d9618 ("bpf: WARN_ONCE on verifier bugs")
Reported-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/aJo9THBrzo8jFXsh@mail.gmail.com
2 months agobpf: Remove redundant __GFP_NOWARN
Qianfeng Rong [Mon, 4 Aug 2025 12:27:30 +0000 (20:27 +0800)] 
bpf: Remove redundant __GFP_NOWARN

Commit 16f5dfbc851b ("gfp: include __GFP_NOWARN in GFP_NOWAIT")
made GFP_NOWAIT implicitly include __GFP_NOWARN.

Therefore, explicit __GFP_NOWARN combined with GFP_NOWAIT
(e.g., `GFP_NOWAIT | __GFP_NOWARN`) is now redundant. Let's clean
up these redundant flags across subsystems.

No functional changes.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/bpf/20250804122731.460158-1-rongqianfeng@vivo.com
2 months agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Martin KaFai Lau [Mon, 11 Aug 2025 23:15:41 +0000 (16:15 -0700)] 
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Cross merge bpf/master after 6.17-rc1.

No conflict.

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2 months agoLinux 6.17-rc1 v6.17-rc1
Linus Torvalds [Sun, 10 Aug 2025 16:41:16 +0000 (19:41 +0300)] 
Linux 6.17-rc1

2 months agoMerge tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git...
Linus Torvalds [Sun, 10 Aug 2025 06:02:36 +0000 (09:02 +0300)] 
Merge tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:
 "tools/power turbostat: version 2025.09.09

   - Probe and display L3 Cache topology

   - Add ability to average an added counter (useful for pre-integrated
     "counters", such as Watts)

   - Break the limit of 64 built-in counters

   - Assorted bug fixes and minor feature tweaks"

* tag 'turbostat-2025.09.09' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2025.09.09
  tools/power turbostat: Handle non-root legacy-uncore sysfs permissions
  tools/power turbostat: standardize PER_THREAD_PARAMS
  tools/power turbostat: Fix DMR support
  tools/power turbostat: add format "average" for external attributes
  tools/power turbostat: delete GET_PKG()
  tools/power turbostat: probe and display L3 cache topology
  tools/power turbostat: Support more than 64 built-in-counters
  tools/power turbostat.8: Document Totl%C0, Any%C0, GFX%C0, CPUGFX% columns
  tools/power turbostat: Fix bogus SysWatt for forked program
  tools/power turbostat: Handle cap_get_proc() ENOSYS
  tools/power turbostat: Fix build with musl
  tools/power turbostat: verify arguments to params --show and --hide
  tools/power turbostat: regression fix: --show C1E%

2 months agoMerge tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Aug 2025 05:51:37 +0000 (08:51 +0300)] 
Merge tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull smp fixes from Borislav Petkov:

 - Remove an obsolete comment and fix spelling

* tag 'smp_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  cpu: Remove obsolete comment from takedown_cpu()
  smp: Fix spelling in on_each_cpu_cond_mask()'s doc-comment

2 months agoMerge tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Aug 2025 05:46:47 +0000 (08:46 +0300)] 
Merge tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irq fixes from Borislav Petkov:

 - Fix a wrong ioremap size in mvebu-gicp

 - Remove yet another compile-test case for a driver which needs an
   additional dependency

 - Fix a lock inversion scenario in the IRQ unit test suite

 - Remove an impossible flag situation in gic-v5

 - Do not iounmap resources in gic-v5 which are managed by devm

 - Make sure stale, left-over interrupts in mvebu-gicp are cleared on
   driver init

 - Fix a reference counting mishap in msi-lib

 - Fix a dereference-before-null-ptr-check case in the riscv-imsic
   irqchip driver

* tag 'irq_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/mvebu-gicp: Use resource_size() for ioremap()
  irqchip: Build IMX_MU_MSI only on ARM
  genirq/test: Resolve irq lock inversion warnings
  irqchip/gic-v5: Remove IRQD_RESEND_WHEN_IN_PROGRESS for ITS IRQs
  irqchip/gic-v5: iwb: Fix iounmap probe failure path
  irqchip/mvebu-gicp: Clear pending interrupts on init
  irqchip/msi-lib: Fix fwnode refcount in msi_lib_irq_domain_select()
  irqchip/riscv-imsic: Don't dereference before NULL pointer check

2 months agoMerge tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel...
Linus Torvalds [Sun, 10 Aug 2025 05:15:32 +0000 (08:15 +0300)] 
Merge tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:

 - Fix an interrupt vector setup race which leads to a non-functioning
   device

 - Add new Intel CPU models *and* a family: 0x12. Finally. Yippie! :-)

* tag 'x86_urgent_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/irq: Plug vector setup race
  x86/cpu: Add new Intel CPU model numbers for Wildcatlake and Novalake