As BPF doesn't include any barrier instructions, smp_mb() is implemented
by doing a dummy value returning atomic operation. Such an operation
acts a full barrier as enforced by LKMM and also by the work in progress
BPF memory model.
If the returned value is not used, clang[1] can optimize the value
returning atomic instruction in to a normal atomic instruction which
provides no ordering guarantees.
Mark the variable as volatile so the above optimization is never
performed and smp_mb() works as expected.
Tao Chen [Wed, 16 Jul 2025 13:46:53 +0000 (21:46 +0800)]
bpf: Add struct bpf_token_info
The 'commit 35f96de04127 ("bpf: Introduce BPF token object")' added
BPF token as a new kind of BPF kernel object. And BPF_OBJ_GET_INFO_BY_FD
already used to get BPF object info, so we can also get token info with
this cmd.
One usage scenario, when program runs failed with token, because of
the permission failure, we can report what BPF token is allowing with
this API for debugging.
The last iterators update (commit 515ee52b2224 ("bpf: make preloaded
map iterators to display map elements count")) missed the big-endian
skeleton. Update it by running "make big" with Debian clang version
21.0.0 (++20250706105601+01c97b4953e8-1~exp1~20250706225612.1558).
====================
this series follows up on the one introducing 9+ args for tracing
programs [1]. It has been observed with this series that there are cases
for which we can not identify accurately the location of the target
function arguments to prepare correctly the corresponding BPF
trampoline. This is the case for example if:
- the function consumes a struct variable _by value_
- it is passed on the stack (no more register available for it)
- it has some __packed__ or __aligned(X)__ attribute
As a consequence, a small restrictive check has been added to the ARM64
side, highlighting that other arch supporting 9+ args in BPF trampolines
are already suffering from the same issue. After a bit of discussions
and attempts, the chosen solution is, rather than applying the same
constraint to all JIT compilers, to prevent such function from being
encoded at all in BTF info([2]). As the pahole side is closed to be
integrated, we can now remove the restrictive check from kernel side.
selftests/bpf: enable tracing_struct tests for arm64
Now that the constraint preventing attachment to functions consuming
struct on stack has been removed from the kernel (and moved to pahole,
with a slightly smarter detection, to prevent only those that are
packed), re-enable the tracing_struct tests for arm64.
While introducing support for 9+ arguments for tracing programs on
ARM64, commit 9014cf56f13d ("bpf, arm64: Support up to 12 function
arguments") has also introduced a constraint preventing BPF trampolines
from being generated if the target function consumes a struct argument
passed on stack, because of uncertainties around the exact struct
location: if the struct has been marked as packed or with a custom
alignment, this info is not reflected in BTF data, and so generated
tracing trampolines could read the target function arguments at wrong
offsets.
This issue is not specific to ARM64: there has been an attempt (see [1])
to bring the same constraint to other architectures JIT compilers. But
discussions following this attempt led to the move of this constraint
out of the kernel (see [2]): instead of preventing the kernel from
generating trampolines for those functions consuming structs on stack,
it is simpler to just make sure that those functions with uncertain
struct arguments location are not encoded in BTF information, and so
that one can not even attempt to attach a tracing program to such
function. The task is then deferred to pahole (see [3]).
Now that the constraint is handled by pahole, remove it from the arm64
JIT compiler to keep it simple.
Yonghong Song [Tue, 15 Jul 2025 18:59:10 +0000 (11:59 -0700)]
selftests/bpf: Fix build error due to certain uninitialized variables
With the latest llvm21 compiler, I hit several errors when building bpf
selftests. Some of errors look like below:
test_maps.c:565:40: error: variable 'val' is uninitialized when passed as a
const pointer argument here [-Werror,-Wuninitialized-const-pointer]
565 | assert(bpf_map_update_elem(fd, NULL, &val, 0) < 0 &&
| ^~~
prog_tests/bpf_iter.c:400:25: error: variable 'c' is uninitialized when passed
as a const pointer argument here [-Werror,-Wuninitialized-const-pointer]
400 | write(finish_pipe[1], &c, 1);
| ^
Some other errors have similar the pattern as the above.
These errors are fixed by initializing those variables properly.
====================
Move attach_type into bpf_link
Andrii suggested moving the attach_type into bpf_link, the previous discussion
is as follows:
https://lore.kernel.org/bpf/CAEf4BzY7TZRjxpCJM-+LYgEqe23YFj5Uv3isb7gat2-HU4OSng@mail.gmail.com
patch1 add attach_type in bpf_link, and pass it to bpf_link_init, which
will init the attach_type field.
patch2-7 remove the attach_type in struct bpf_xx_link, update the info
with bpf_link attach_type.
There are some functions finally call bpf_link_init but do not have bpf_attr
from user or do not need to init attach_type from user like bpf_raw_tracepoint_open,
now use prog->expected_attach_type to init attach_type.
Tao Chen [Thu, 10 Jul 2025 03:20:32 +0000 (11:20 +0800)]
bpf: Add attach_type field to bpf_link
Attach_type will be set when a link is created by user. It is better to
record attach_type in bpf_link generically and have it available
universally for all link types. So add the attach_type field in bpf_link
and move the sleepable field to avoid unnecessary gap padding.
Paul Chaignon [Thu, 10 Jul 2025 18:20:53 +0000 (20:20 +0200)]
bpf: Forget ranges when refining tnum after JSET
Syzbot reported a kernel warning due to a range invariant violation on
the following BPF program.
0: call bpf_get_netns_cookie
1: if r0 == 0 goto <exit>
2: if r0 & Oxffffffff goto <exit>
The issue is on the path where we fall through both jumps.
That path is unreachable at runtime: after insn 1, we know r0 != 0, but
with the sign extension on the jset, we would only fallthrough insn 2
if r0 == 0. Unfortunately, is_branch_taken() isn't currently able to
figure this out, so the verifier walks all branches. The verifier then
refines the register bounds using the second condition and we end
up with inconsistent bounds on this unreachable path:
1: if r0 == 0 goto <exit>
r0: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0xffffffffffffffff)
2: if r0 & 0xffffffff goto <exit>
r0 before reg_bounds_sync: u64=[0x1, 0xffffffffffffffff] var_off=(0, 0)
r0 after reg_bounds_sync: u64=[0x1, 0] var_off=(0, 0)
Improving the range refinement for JSET to cover all cases is tricky. We
also don't expect many users to rely on JSET given LLVM doesn't generate
those instructions. So instead of improving the range refinement for
JSETs, Eduard suggested we forget the ranges whenever we're narrowing
tnums after a JSET. This patch implements that approach.
Reported-by: syzbot+c711ce17dd78e5d4fdcf@syzkaller.appspotmail.com Suggested-by: Eduard Zingerman <eddyz87@gmail.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/9d4fd6432a095d281f815770608fdcd16028ce0b.1752171365.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
====================
bpf/arena: Add kfunc for reserving arena memory
Add a new kfunc for BPF arenas that reserves a region of the mapping
to prevent it from being mapped. These regions serve as guards against
out-of-bounds accesses and are useful for debugging arena-related code.
- Added Acked-by tags by Yonghong.
- Replace hardcoded error numbers in selftests (Yonghong).
- Fixed selftest for partially freeing a reserved region (Yonghong).
- Removed the additional guard range tree. Adjusted tests accordingly.
Reserved regions now behave like allocated regions, and can be
unreserved using bpf_arena_free_pages(). They can also be allocated
from userspace through minor faults. It is up to the user to prevent
erroneous frees and/or use the BPF_F_SEGV_ON_FAULT flag to catch
stray userspace accesses (Alexei).
- Changed terminology from guard pages to reserved pages (Alexei,
Kartikeya).
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
====================
Emil Tsalapatis [Wed, 9 Jul 2025 19:13:11 +0000 (15:13 -0400)]
bpf/arena: add bpf_arena_reserve_pages kfunc
Add a new BPF arena kfunc for reserving a range of arena virtual
addresses without backing them with pages. This prevents the range from
being populated using bpf_arena_alloc_pages().
Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250709191312.29840-2-emil@etsalapatis.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
selftests/bpf: Remove enum64 case from __arg_untrusted test suite
The enum64 type used by verifier_global_ptr_args test case requires
CONFIG_SCHED_CLASS_EXT. At the moment selftets do not depend on this
option. There are just a few enum64 types in the kernel. Instead of
tying selftests to implementation details of unrelated sub-systems,
just remove enum64 test case. Simple enums are covered and that should
be sufficient.
Fixes: 68cca81fd57f ("selftests/bpf: tests for __arg_untrusted void * global func params") Reported-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Amery Hung <ameryhung@gmail.com> Link: https://lore.kernel.org/r/20250708220856.3059578-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
====================
bpf: Fix and test aux usage after do_check_insn()
Fix cur_aux()->nospec_result test after do_check_insn() referring to the
to-be-analyzed (potentially unsafe) instruction, not the
already-analyzed (safe) instruction. This might allow a unsafe insn to
slip through on a speculative path. Create some tests from the
reproducer [1].
Commit d6f1c85f2253 ("bpf: Fall back to nospec for Spectre v1") should
not be in any stable kernel yet, therefore bpf-next should suffice.
Changes since v2:
- Use insn_aux variable instead of introducing prev_aux() as suggested
by Eduard (and therefore also drop patch 1)
- v2: https://lore.kernel.org/bpf/20250628145016.784256-1-luis.gerhorst@fau.de/
Changes since v1:
- Fix compiler error due to missed rename of prev_insn_idx in first
patch
- v1: https://lore.kernel.org/bpf/20250628125927.763088-1-luis.gerhorst@fau.de/
Changes since RFC:
- Introduce prev_aux() as suggested by Alexei. For this, we must move
the env->prev_insn_idx assignment to happen directly after
do_check_insn(), for which I have created a separate commit. This
patch could be simplified by using a local prev_aux variable as
sugested by Eduard, but I figured one might find the new
assignment-strategy easier to understand (before, prev_insn_idx and
env->prev_insn_idx were out-of-sync for the latter part of the loop).
Also, like this we do not have an additional prev_* variable that must
be kept in-sync and the local variable's usage (old prev_insn_idx, new
tmp) is much more local. If you think it would be better to not take
the risk and keep the fix simple by just introducing the prev_aux
variable, let me know.
- Change WARN_ON_ONCE() to verifier_bug_if() as suggested by Alexei
- Change assertion to check instruction is BPF_JMP[32] as suggested by
Eduard
- RFC: https://lore.kernel.org/bpf/8734bmoemx.fsf@fau.de/
====================
Luis Gerhorst [Sat, 5 Jul 2025 19:09:08 +0000 (21:09 +0200)]
selftests/bpf: Add Spectre v4 tests
Add the following tests:
1. A test with an (unimportant) ldimm64 (16 byte insn) and a
Spectre-v4--induced nospec that clarifies and serves as a basic
Spectre v4 test.
2. Make sure a Spectre v4 nospec_result does not prevent a Spectre v1
nospec from being added before the dangerous instruction (tests that
[1] is fixed).
3. Combine the two, which is the combination that triggers the warning
in [2]. This is because the unanalyzed stack write has nospec_result
set, but the ldimm64 (which was just analyzed) had incremented
insn_idx by 2. That violates the assertion that nospec_result is only
used after insns that increment insn_idx by 1 (i.e., stack writes).
Luis Gerhorst [Sat, 5 Jul 2025 19:09:07 +0000 (21:09 +0200)]
bpf: Fix aux usage after do_check_insn()
We must terminate the speculative analysis if the just-analyzed insn had
nospec_result set. Using cur_aux() here is wrong because insn_idx might
have been incremented by do_check_insn(). Therefore, introduce and use
insn_aux variable.
Also change cur_aux(env)->nospec in case do_check_insn() ever manages to
increment insn_idx but still fail.
Change the warning to check the insn class (which prevents it from
triggering for ldimm64, for which nospec_result would not be
problematic) and use verifier_bug_if().
In line with Eduard's suggestion, do not introduce prev_aux() because
that requires one to understand that after do_check_insn() call what was
current became previous. This would at-least require a comment.
bpf: Fix improper int-to-ptr cast in dump_stack_cb
On 32-bit platforms, we'll try to convert a u64 directly to a pointer
type which is 32-bit, which causes the compiler to complain about cast
from an integer of a different size to a pointer type. Cast to long
before casting to the pointer type to match the pointer width.
Reported-by: kernelci.org bot <bot@kernelci.org> Reported-by: Randy Dunlap <rdunlap@infradead.org> Fixes: d7c431cafcb4 ("bpf: Add dump_stack() analogue to print to BPF stderr") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Tested-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20250705053035.3020320-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf: Fix bounds for bpf_prog_get_file_line linfo loop
We may overrun the bounds because linfo and jited_linfo are already
advanced to prog->aux->linfo_idx, hence we must only iterate the
remaining elements until we reach prog->aux->nr_linfo. Adjust the
nr_linfo calculation to fix this. Reported in [0].
====================
bpf: additional use-cases for untrusted PTR_TO_MEM
This patch set introduces two usability enhancements leveraging
untrusted pointers to mem:
- When reading a pointer field from a PTR_TO_BTF_ID, the resulting
value is now assumed to be PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED
instead of SCALAR_VALUE, provided the pointer points to a primitive
type.
- __arg_untrusted attribute for global function parameters,
allowed for pointer arguments of both structural and primitive
types:
- For structural types, the attribute produces
PTR_TO_BTF_ID|PTR_UNTRUSTED.
- For primitive types, it yields
PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED.
Here are examples enabled by the series:
struct foo {
int *arr;
};
...
p = bpf_core_cast(..., struct foo);
bpf_for(i, 0, ...) {
... p->arr[i] ... // load at any offset is allowed
}
int memcmp(void *a __arg_untrusted, void *b __arg_untrusted, size_t n) {
bpf_for(i, 0, n)
if (a[i] - b[i]) // load at any offset is allowed
return ...;
return 0;
}
The patch-set was inspired by Anrii's series [1]. The goal of that
series was to define a generic global glob_match function, capable to
accept any pointer type:
__weak int glob_match(const char *pat, const char *str);
Changelog:
v1: https://lore.kernel.org/bpf/20250702224209.3300396-1-eddyz87@gmail.com/
v1 -> v2:
- Added safety check in btf_prepare_func_args() to ensure that only
struct or primitive types could be combined with __arg_untrusted
(Alexei).
- Removed unnecessary 'continue' btf_check_func_arg_match() (Alexei).
- Added test cases for __arg_untrusted pointers to enum and
__arg_untrusted combined with non-kernel type (Kumar).
- Added acks from Kumar.
====================
selftests/bpf: tests for __arg_untrusted void * global func params
Check usage of __arg_untrusted parameters of primitive type:
- passing of {trusted, untrusted, map value, scalar value, values with
variable offset} to untrusted `void *`, `char *` or enum is ok;
- varifier represents such parameters as rdonly_untrusted_mem(sz=0).
bpf: support for void/primitive __arg_untrusted global func params
Allow specifying __arg_untrusted for void */char */int */long *
parameters. Treat such parameters as
PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED of size zero.
Intended usage is as follows:
int memcmp(char *a __arg_untrusted, char *b __arg_untrusted, size_t n) {
bpf_for(i, 0, n) {
if (a[i] - b[i]) // load at any offset is allowed
return a[i] - b[i];
}
return 0;
}
Allocate register id for ARG_PTR_TO_MEM parameters only when
PTR_MAYBE_NULL is set. Register id for PTR_TO_MEM is used only to
propagate non-null status after conditionals.
Check usage of __arg_untrusted parameters with PTR_TO_BTF_ID:
- combining __arg_untrusted with other tags is forbidden;
- non-kernel (program local) types for __arg_untrusted are forbidden;
- passing of {trusted, untrusted, map value, scalar value, values with
variable offset} to untrusted is ok;
- passing of PTR_TO_BTF_ID with a different type to untrusted is ok;
- passing of untrusted to trusted is forbidden.
bpf: attribute __arg_untrusted for global function parameters
Add support for PTR_TO_BTF_ID | PTR_UNTRUSTED global function
parameters. Anything is allowed to pass to such parameters, as these
are read-only and probe read instructions would protect against
invalid memory access.
bpf: rdonly_untrusted_mem for btf id walk pointer leafs
When processing a load from a PTR_TO_BTF_ID, the verifier calculates
the type of the loaded structure field based on the load offset.
For example, given the following types:
struct foo {
struct foo *a;
int *b;
} *p;
The verifier would calculate the type of `p->a` as a pointer to
`struct foo`. However, the type of `p->b` is currently calculated as a
SCALAR_VALUE.
This commit updates the logic for processing PTR_TO_BTF_ID to instead
calculate the type of p->b as PTR_TO_MEM|MEM_RDONLY|PTR_UNTRUSTED.
This change allows further dereferencing of such pointers (using probe
memory instructions).
bpf: make makr_btf_ld_reg return error for unexpected reg types
Non-functional change:
mark_btf_ld_reg() expects 'reg_type' parameter to be either
SCALAR_VALUE or PTR_TO_BTF_ID. Next commit expands this set, so update
this function to fail if unexpected type is passed. Also update
callers to propagate the error.
Arnd Bergmann reported an issue ([1]) where clang compiler (less than
llvm18) may trigger an error where the stack frame size exceeds the limit.
I can reproduce the error like below:
kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check'
[-Werror,-Wframe-larger-than]
kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check'
[-Werror,-Wframe-larger-than]
This patch series fixed the above two errors by reducing stack size.
See each individual patches for details.
Changelogs:
v2 -> v3:
- v2: https://lore.kernel.org/bpf/20250702171134.2370432-1-yonghong.song@linux.dev/
- Rename env->callchain to env->callchain_buf so it is clear that
- env->callchain_buf is used for a temp buf.
v1 -> v2:
- v1: https://lore.kernel.org/bpf/20250702053332.1991516-1-yonghong.song@linux.dev/
- Simplify assignment to struct bpf_insn pointer in do_misc_fixups().
- Restore original implementation in opt_hard_wire_dead_code_branches()
as only one insn on the stack.
- Avoid unnecessary insns for 64bit modulo (mod 0/-1) operations.
====================
Yonghong Song [Thu, 3 Jul 2025 14:11:17 +0000 (07:11 -0700)]
bpf: Avoid putting struct bpf_scc_callchain variables on the stack
Add a 'struct bpf_scc_callchain callchain_buf' field in bpf_verifier_env.
This way, the previous bpf_scc_callchain local variables can be
replaced by taking address of env->callchain_buf. This can reduce stack
usage and fix the following error:
kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check'
[-Werror,-Wframe-larger-than]
Reported-by: Arnd Bergmann <arnd@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20250703141117.1485108-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yonghong Song [Thu, 3 Jul 2025 14:11:11 +0000 (07:11 -0700)]
bpf: Reduce stack frame size by using env->insn_buf for bpf insns
Arnd Bergmann reported an issue ([1]) where clang compiler (less than
llvm18) may trigger an error where the stack frame size exceeds the limit.
I can reproduce the error like below:
kernel/bpf/verifier.c:24491:5: error: stack frame size (2552) exceeds limit (1280) in 'bpf_check'
[-Werror,-Wframe-larger-than]
kernel/bpf/verifier.c:19921:12: error: stack frame size (1368) exceeds limit (1280) in 'do_check'
[-Werror,-Wframe-larger-than]
Use env->insn_buf for bpf insns instead of putting these insns on the
stack. This can resolve the above 'bpf_check' error. The 'do_check' error
will be resolved in the next patch.
Yonghong Song [Thu, 3 Jul 2025 14:11:06 +0000 (07:11 -0700)]
bpf: Simplify assignment to struct bpf_insn pointer in do_misc_fixups()
In verifier.c, the following code patterns (in two places)
struct bpf_insn *patch = &insn_buf[0];
can be simplified to
struct bpf_insn *patch = insn_buf;
which is easier to understand.
Paul Chaignon [Thu, 3 Jul 2025 21:35:09 +0000 (23:35 +0200)]
bpf: Avoid warning on unexpected map for tail call
Before handling the tail call in record_func_key(), we check that the
map is of the expected type and log a verifier error if it isn't. Such
an error however doesn't indicate anything wrong with the verifier. The
check for map<>func compatibility is done after record_func_key(), by
check_map_func_compatibility().
Therefore, this patch logs the error as a typical reject instead of a
verifier error.
Fixes: d2e4c1e6c294 ("bpf: Constant map key tracking for prog array pokes") Fixes: 0df1a55afa83 ("bpf: Warn on internal verifier errors") Reported-by: syzbot+efb099d5833bca355e51@syzkaller.appspotmail.com Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/1f395b74e73022e47e04a31735f258babf305420.1751578055.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
This set introduces a standard output interface with two streams, namely
stdout and stderr, for BPF programs. The idea is that these streams will
be written to by BPF programs and the kernel, and serve as standard
interfaces for informing user space of any BPF runtime violations. Users
can also utilize them for printing normal messages for debugging usage,
as is the case with bpf_printk() and trace pipe interface.
BPF programs and the kernel can use these streams to output messages.
User space can dump these messages using bpftool.
The stream interface itself is implemented using a lockless list, so
that we can queue messages from any context. Every printk statement into
the stream leads to memory allocation. Allocation itself relies on
try_alloc_pages() to construct a bespoke bump allocator to carve out
elements. If this fails, we finally give up and drop the message.
See commit logs for more details.
Two scenarios are covered:
- Deadlocks and timeouts in rqspinlock.
- Timeouts for may_goto.
In each we provide the stack trace and source information for the
offending BPF programs. Both the C source line and the file and line
numbers are printed. The output format is as follows:
ERROR: AA or ABBA deadlock detected for bpf_res_spin_lock
Attempted lock = 0xff11000108f3a5e0
Total held locks = 1
Held lock[ 0] = 0xff11000108f3a5e0
CPU: 48 UID: 0 PID: 786 Comm: test_progs
Call trace:
bpf_stream_stage_dump_stack+0xb0/0xd0
bpf_prog_report_rqspinlock_violation+0x10b/0x130
bpf_res_spin_lock+0x8c/0xa0
bpf_prog_3699ea119d1f6ed8_foo+0xe5/0x140
if (!bpf_res_spin_lock(&v2->lock)) @ stream_bpftool.c:62
bpf_prog_9b324ec4a1b2a5c0_stream_bpftool_dump_prog_stream+0x7e/0x2d0
foo(stream); @ stream_bpftool.c:93
bpf_prog_test_run_syscall+0x102/0x240
__sys_bpf+0xd68/0x2bf0
__x64_sys_bpf+0x1e/0x30
do_syscall_64+0x68/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
* Switch to alloc_pages_nolock(), avoid incorrect memcg accounting. (Alexei)
* We will figure out proper accounting later.
* Drop error limit logic, restrict stream capacity to 100,000 bytes. (Alexei)
* Remove extra invocation of is_bpf_text_address(). (Jiri)
* Avoid emitting NULL byte into the stream text, adjust regex in selftests. (Alexei)
* Add comment around rcu_read_lock() for bpf_prog_ksym_find. (Alexei)
* Tighten stream capacity check selftest.
* Add acks from Andrii.
* Fix bug when handling single element stream stage. (Eduard)
* Move to mutex for protection of stream read and copy_to_user(). (Alexei)
* Split bprintf refactor into its own patch. (Alexei)
* Move kfunc definition to common_btf_ids to avoid initcall proliferation. (Alexei)
* Return line number by reference in bpf_prog_get_file_line. (Alexei)
* Remove NULL checks for BTF name pointer. (Alexei)
* Add WARN_ON_ONCE(!rcu_read_lock_held()) in bpf_prog_ksym_find. (Eduard)
* Remove hardcoded stream stage from macros. (Alexei, Eduard)
* Move refactoring hunks to their own patch. (Alexei)
* Add empty opts parameter for future extensibility to libbpf API. (Andrii, Eduard)
* Add BPF_STREAM_{STDOUT,STDERR} to UAPI. (Andrii)
* Add code to match on backtrace output. (Eduard)
* Fix misc nits.
* Add acks.
* Drop arena page fault prints, will be done as follow up. (Alexei)
* Defer Andrii's request to reuse code and Alan's suggestion of error
counts to follow up.
* Drop bpf_dynptr_from_mem_slice patch.
* Drop some acks due to heavy reworking.
* Fix KASAN splat in bpf_prog_get_file_line. (Eduard)
* Collapse bpf_prog_ksym_find and is_bpf_text_address into single
call. (Eduard)
* Add missing RCU read lock in bpf_prog_ksym_find.
* Fix incorrect error handling in dump_stack_cb.
* Simplify libbpf macro. (Eduard, Andrii)
* Introduce bpf_prog_stream_read() libbpf API. (Eduard, Alexei, Andrii)
* Drop BPF prog from the bpftool, use libbpf API.
* Rework selftests.
* Rebase on bpf-next/master.
* Change output in dump_stack to also print source line. (Alexei)
* Simplify API to single pop() operation. (Eduard, Alexei)
* Add kdoc for bpf_dynptr_from_mem_slice.
* Fix -EINVAL returned from prog_dump_stream. (Eduard)
* Split dump_stack() patch into multiple commits.
* Add macro wrapping stream staging API.
* Change bpftool command from dump to tracelog. (Quentin)
* Add bpftool documentation and bash completion. (Quentin)
* Change license of bpftool to Dual BSD/GPL.
* Simplify memory allocator. (Alexei)
* No overflow into second page.
* Remove bpf_mem_alloc() fallback.
* Symlink bpftool BPF program and exercise as selftest. (Eduard)
* Verify output after dumping from ringbuf. (Eduard)
* More failure cases to check API invariants.
* Remove patches for dynptr lifetime fixes (split into separate set).
* Limit maximum error messages, and add stream capacity. (Eduard)
====================
Add support for printing the BPF stream contents of a program in
bpftool. The new bpftool prog tracelog command is extended to take
stdout and stderr arguments, and then the prog specification.
The bpf_prog_stream_read() API added in previous patch is simply reused
to grab data and then it is dumped to the respective file. The stdout
data is sent to stdout, and stderr is printed to stderr.
Introduce a libbpf API so that users can read data from a given BPF
stream for a BPF prog fd. For now, only the low-level syscall wrapper
is provided, we can add a bpf_program__* accessor as a follow up if
needed.
Add a convenience macro to print data to the BPF streams. BPF_STDOUT and
BPF_STDERR stream IDs in the vmlinux.h can be passed to the macro to
print to the respective streams.
bpf: Add dump_stack() analogue to print to BPF stderr
Introduce a kernel function which is the analogue of dump_stack()
printing some useful information and the stack trace. This is not
exposed to BPF programs yet, but can be made available in the future.
When we have a program counter for a BPF program in the stack trace,
also additionally output the filename and line number to make the trace
helpful. The rest of the trace can be passed into ./decode_stacktrace.sh
to obtain the line numbers for kernel symbols.
bpf: Add function to find program from stack trace
In preparation of figuring out the closest program that led to the
current point in the kernel, implement a function that scans through the
stack trace and finds out the closest BPF program when walking down the
stack trace.
Special care needs to be taken to skip over kernel and BPF subprog
frames. We basically scan until we find a BPF main prog frame. The
assumption is that if a program calls into us transitively, we'll
hit it along the way. If not, we end up returning NULL.
Contextually the function will be used in places where we know the
program may have called into us.
Due to reliance on arch_bpf_stack_walk(), this function only works on
x86 with CONFIG_UNWINDER_ORC, arm64, and s390. Remove the warning from
arch_bpf_stack_walk as well since we call it outside bpf_throw()
context.
Acked-by: Eduard Zingerman <eddyz87@gmail.com> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20250703204818.925464-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
bpf: Ensure RCU lock is held around bpf_prog_ksym_find
Add a warning to ensure RCU lock is held around tree lookup, and then
fix one of the invocations in bpf_stack_walker. The program has an
active stack frame and won't disappear. Use the opportunity to remove
unneeded invocation of is_bpf_text_address.
Prepare a function for use in future patches that can extract the file
info, line info, and the source line number for a given BPF program
provided it's program counter.
Only the basename of the file path is provided, given it can be
excessively long in some cases.
This will be used in later patches to print source info to the BPF
stream.
Add support for a stream API to the kernel and expose related kfuncs to
BPF programs. Two streams are exposed, BPF_STDOUT and BPF_STDERR. These
can be used for printing messages that can be consumed from user space,
thus it's similar in spirit to existing trace_pipe interface.
The kernel will use the BPF_STDERR stream to notify the program of any
errors encountered at runtime. BPF programs themselves may use both
streams for writing debug messages. BPF library-like code may use
BPF_STDERR to print warnings or errors on misuse at runtime.
The implementation of a stream is as follows. Everytime a message is
emitted from the kernel (directly, or through a BPF program), a record
is allocated by bump allocating from per-cpu region backed by a page
obtained using alloc_pages_nolock(). This ensures that we can allocate
memory from any context. The eventual plan is to discard this scheme in
favor of Alexei's kmalloc_nolock() [0].
This record is then locklessly inserted into a list (llist_add()) so
that the printing side doesn't require holding any locks, and works in
any context. Each stream has a maximum capacity of 4MB of text, and each
printed message is accounted against this limit.
Messages from a program are emitted using the bpf_stream_vprintk kfunc,
which takes a stream_id argument in addition to working otherwise
similar to bpf_trace_vprintk.
The bprintf buffer helpers are extracted out to be reused for printing
the string into them before copying it into the stream, so that we can
(with the defined max limit) format a string and know its true length
before performing allocations of the stream element.
For consuming elements from a stream, we expose a bpf(2) syscall command
named BPF_PROG_STREAM_READ_BY_FD, which allows reading data from the
stream of a given prog_fd into a user space buffer. The main logic is
implemented in bpf_stream_read(). The log messages are queued in
bpf_stream::log by the bpf_stream_vprintk kfunc, and then pulled and
ordered correctly in the stream backlog.
For this purpose, we hold a lock around bpf_stream_backlog_peek(), as
llist_del_first() (if we maintained a second lockless list for the
backlog) wouldn't be safe from multiple threads anyway. Then, if we
fail to find something in the backlog log, we splice out everything from
the lockless log, and place it in the backlog log, and then return the
head of the backlog. Once the full length of the element is consumed, we
will pop it and free it.
The lockless list bpf_stream::log is a LIFO stack. Elements obtained
using a llist_del_all() operation are in LIFO order, thus would break
the chronological ordering if printed directly. Hence, this batch of
messages is first reversed. Then, it is stashed into a separate list in
the stream, i.e. the backlog_log. The head of this list is the actual
message that should always be returned to the caller. All of this is
done in bpf_stream_backlog_fill().
From the kernel side, the writing into the stream will be a bit more
involved than the typical printk. First, the kernel typically may print
a collection of messages into the stream, and parallel writers into the
stream may suffer from interleaving of messages. To ensure each group of
messages is visible atomically, we can lift the advantage of using a
lockless list for pushing in messages.
To enable this, we add a bpf_stream_stage() macro, and require kernel
users to use bpf_stream_printk statements for the passed expression to
write into the stream. Underneath the macro, we have a message staging
API, where a bpf_stream_stage object on the stack accumulates the
messages being printed into a local llist_head, and then a commit
operation splices the whole batch into the stream's lockless log list.
This is especially pertinent for rqspinlock deadlock messages printed to
program streams. After this change, we see each deadlock invocation as a
non-interleaving contiguous message without any confusion on the
reader's part, improving their user experience in debugging the fault.
While programs cannot benefit from this staged stream writing API, they
could just as well hold an rqspinlock around their print statements to
serialize messages, hence this is kept kernel-internal for now.
Overall, this infrastructure provides NMI-safe any context printing of
messages to two dedicated streams.
Later patches will add support for printing splats in case of BPF arena
page faults, rqspinlock deadlocks, and cond_break timeouts, and
integration of this facility into bpftool for dumping messages to user
space.
Refactor code to be able to get and put bprintf buffers and use
bpf_printf_prepare independently. This will be used in the next patch to
implement BPF streams support, particularly as a staging buffer for
strings that need to be formatted and then allocated and pushed into a
stream.
Tao Chen [Wed, 2 Jul 2025 15:39:56 +0000 (23:39 +0800)]
bpf: Show precise link_type for {uprobe,kprobe}_multi fdinfo
Alexei suggested, 'link_type' can be more precise and differentiate
for human in fdinfo. In fact BPF_LINK_TYPE_KPROBE_MULTI includes
kretprobe_multi type, the same as BPF_LINK_TYPE_UPROBE_MULTI, so we
can show it more concretely.
Implement bpf_dynptr_memset() kfunc and add tests for it.
v3->v4:
* do error checks after slice, nits
v2->v3:
* nits and slow-path loop rewrite (Andrii)
* simplify xdp chunks test (Mykyta)
v1->v2:
* handle non-linear buffers with bpf_dynptr_write()
* change function signature to include offset arg
* add more test cases
selftests/bpf: Add test cases for bpf_dynptr_memset()
Add tests to verify the behavior of bpf_dynptr_memset():
* normal memset 0
* normal memset non-0
* memset with an offset
* memset in dynptr that was adjusted
* error: size overflow
* error: offset+size overflow
* error: readonly dynptr
* memset into non-linear xdp dynptr
Currently there is no straightforward way to fill dynptr memory with a
value (most commonly zero). One can do it with bpf_dynptr_write(), but
a temporary buffer is necessary for that.
Implement bpf_dynptr_memset() - an analogue of memset() from libc.
Veristat is synced into the standalone repo, where it compiles without
kernel private dependencies. This patch fixes compilation errors in
standalone veristat.
Paul Chaignon [Wed, 2 Jul 2025 13:54:54 +0000 (15:54 +0200)]
bpf: Avoid warning on multiple referenced args in call
The description of full helper calls in syzkaller [1] and the addition of
kernel warnings in commit 0df1a55afa83 ("bpf: Warn on internal verifier
errors") allowed syzbot to reach a verifier state that was thought to
indicate a verifier bug [2]:
12: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204
verifier bug: more than one arg with ref_obj_id R2 2 2
This error can be reproduced with the program from the previous commit:
bpf_tcp_raw_gen_syncookie_ipv4 expects R1 and R2 to be
ARG_PTR_TO_FIXED_SIZE_MEM (with a size of at least sizeof(struct iphdr)
for R1). R0 is a ring buffer payload of 20B and therefore matches this
requirement.
The verifier reaches the check on ref_obj_id while verifying R2 and
rejects the program because the helper isn't supposed to take two
referenced arguments.
This case is a legitimate rejection and doesn't indicate a kernel bug,
so we shouldn't log it as such and shouldn't emit a kernel warning.
bpf: avoid jump misprediction for PTR_TO_MEM | PTR_UNTRUSTED
Commit f2362a57aeff ("bpf: allow void* cast using bpf_rdonly_cast()")
added a notion of PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED type.
This simultaneously introduced a bug in jump prediction logic for
situations like below:
p = bpf_rdonly_cast(..., 0);
if (p) a(); else b();
Here verifier would wrongly predict that else branch cannot be taken.
This happens because:
- Function reg_not_null() assumes that PTR_TO_MEM w/o PTR_MAYBE_NULL
flag cannot be zero.
- The call to bpf_rdonly_cast() returns a rdonly_untrusted_mem value
w/o PTR_MAYBE_NULL flag.
Tracking of PTR_MAYBE_NULL flag for untrusted PTR_TO_MEM does not make
sense, as the main purpose of the flag is to catch null pointer access
errors. Such errors are not possible on load of PTR_UNTRUSTED values
and verifier makes sure that PTR_UNTRUSTED can't be passed to helpers
or kfuncs.
Hence, modify reg_not_null() to assume that nullness of untrusted
PTR_TO_MEM is not known.
selftests/bpf: Don't call fsopen() as privileged user
In the BPF token example, the fsopen() syscall is called as privileged
user. This is unneeded because fsopen() can be called also as
unprivileged user from the user namespace.
As the `fs_fd` file descriptor which was sent back and forth is still the
same, keep it open instead of cloning and closing it twice via SCM_RIGHTS.
Paul Chaignon [Tue, 1 Jul 2025 18:36:15 +0000 (20:36 +0200)]
bpf: Warn on internal verifier errors
This patch is a follow up to commit 1cb0f56d9618 ("bpf: WARN_ONCE on
verifier bugs"). It generalizes the use of verifier_error throughout
the verifier, in particular for logs previously marked "verifier
internal error". As a consequence, all of those verifier bugs will now
come with a kernel warning (under CONFIG_DBEUG_KERNEL) detectable by
fuzzers.
While at it, some error messages that were too generic (ex., "bpf
verifier is misconfigured") have been reworded.
Ilya Leoshkevich [Tue, 24 Jun 2025 12:04:28 +0000 (14:04 +0200)]
s390/bpf: Describe the frame using a struct instead of constants
Currently the caller-allocated portion of the stack frame is described
using constants, hardcoded values, and an ASCII drawing, making it
harder than necessary to ensure that everything is in sync.
Declare a struct and use offsetof() and offsetofend() macros to refer
to various values stored within the frame.
Eduard Zingerman [Fri, 27 Jun 2025 17:53:09 +0000 (10:53 -0700)]
bpf: guard BTF_ID_FLAGS(bpf_cgroup_read_xattr) with CONFIG_BPF_LSM
Function bpf_cgroup_read_xattr is defined in fs/bpf_fs_kfuncs.c,
which is compiled only when CONFIG_BPF_LSM is set. Add CONFIG_BPF_LSM
check to bpf_cgroup_read_xattr spec in common_btf_ids in
kernel/bpf/helpers.c to avoid build failures for configs w/o
CONFIG_BPF_LSM.
Build failure example:
BTF .tmp_vmlinux1.btf.o
btf_encoder__tag_kfunc: failed to find kfunc 'bpf_cgroup_read_xattr' in BTF
...
WARN: resolve_btfids: unresolved symbol bpf_cgroup_read_xattr
make[2]: *** [scripts/Makefile.vmlinux:91: vmlinux.unstripped] Error 255
Fixes: 535b070f4a80 ("bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node") Reported-by: Jake Hillion <jakehillion@meta.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250627175309.2710973-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Andrii Nakryiko [Thu, 26 Jun 2025 17:28:51 +0000 (10:28 -0700)]
Merge branch 'support-array-presets-in-veristat'
Mykyta Yatsenko says:
====================
Support array presets in veristat
From: Mykyta Yatsenko <yatsenko@meta.com>
This patch series implements support for array variable presets in
veristat. Currently users can set values to global variables before
loading BPF program, but not for arrays. With this change array
elements are supported as well, for example:
```
sudo ./veristat set_global_vars.bpf.o -G "arr[0] = 1"
```
v4 -> v5
* Simplify array elements offset calculation using btf__resolve_size
* Support white spaces in array indexes, e.g. arr [ 0 ]
* Rename btf_find_member into adjust_var_secinfo_member
v3 -> v4
* Rework implementation to support multi dimensional arrays
* Rework data structures for storing parsed presets
v2 -> v3
* Added more negative tests
* Fix mem leak
* Other small fixes
v1 -> v2
* Support enums as indexes
* Separating parsing logic from preset processing
* Add more tests
====================
Mykyta Yatsenko [Wed, 25 Jun 2025 16:59:04 +0000 (17:59 +0100)]
selftests/bpf: Test array presets in veristat
Modify existing veristat tests to verify that array presets are applied
as expected.
Introduce few negative tests as well to check that common error modes
are handled.
Mykyta Yatsenko [Wed, 25 Jun 2025 16:59:03 +0000 (17:59 +0100)]
selftests/bpf: Support array presets in veristat
Implement support for presetting values for array elements in veristat.
For example:
```
sudo ./veristat set_global_vars.bpf.o -G "arr[3] = 1"
```
Arrays of structures and structure of arrays work, but each individual
scalar value has to be set separately: `foo[1].bar[2] = value`.
Mykyta Yatsenko [Wed, 25 Jun 2025 16:59:02 +0000 (17:59 +0100)]
selftests/bpf: Separate var preset parsing in veristat
Refactor var preset parsing in veristat to simplify implementation.
Prepare parsed variable beforehand so that parsing logic is separated
from functionality of calculating offsets and searching fields.
Introduce rvalue struct, storing either int or enum (string value),
will be reused in the next patch, extract parsing rvalue into a
separate function.
Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf after rc3
Cross-merge BPF, perf and other fixes after downstream PRs.
It restores BPF CI to green after critical fix
commit bc4394e5e79c ("perf: Fix the throttle error of some clock events")
====================
bpf: Add kfuncs for read-only string operations
String operations are commonly used in programming and BPF programs are
no exception. Since it is cumbersome to reimplement them over and over,
this series introduce kfuncs which provide the most common operations.
For now, we only limit ourselves to functions which do not copy memory
since these usually introduce undefined behaviour in case the
source/destination buffers overlap which would have to be prevented by
the verifier.
The kernel already contains implementations for all of these, however,
it is not possible to use them from BPF context. The main reason is that
the verifier is not able to check that it is safe to access the entire
string and that the string is null-terminated and the function won't
loop forever. Therefore, the operations are open-coded using
__get_kernel_nofault instead of plain dereference and bounded to at most
XATTR_SIZE_MAX characters to make them safe. That allows to skip all the
verfier checks for the passed-in strings as safety is ensured
dynamically.
All of the proposed functions return integers, even those that normally
(in the kernel or libc) return pointers into the strings. The reason is
that since the strings are generally treated as unsafe, the pointers
couldn't be dereferenced anyways. So, instead, we return an index to the
string and let user decide what to do with it. The integer APIs also
allow to return various error codes when unexpected situations happen
while processing the strings.
The series include both positive and negative tests using the kfuncs.
Changelog
---------
Changes in v8:
- Return -ENOENT (instead of -1) when "item not found" for relevant
functions (Alexei).
- Small adjustments of the string algorithms (Andrii).
- Adapt comments to kernel style (Alexei).
Changes in v7:
- Disable negative tests passing NULL and 0x1 to kfuncs on s390 as they
aren't relevant (see comment in string_kfuncs_failure1.c for details).
Changes in v6:
- Improve the third patch which allows to use macros in __retval in
selftests. The previous solution broke several tests.
Changes in v5:
- Make all kfuncs return integers (Andrii).
- Return -ERANGE when passing non-kernel pointers on arches with
non-overlapping address spaces (Alexei).
- Implement "unbounded" variants using the bounded ones (Andrii).
- Add more negative test cases.
Changes in v4 (all suggested by Andrii):
- Open-code all the kfuncs, not just the unbounded variants.
- Introduce `pagefault` lock guard to simplify the implementation
- Return appropriate error codes (-E2BIG and -EFAULT) on failures
- Const-ify all arguments and return values
- Add negative test-cases
Changes in v3:
- Open-code unbounded variants with __get_kernel_nofault instead of
dereference (suggested by Alexei).
- Use the __sz suffix for size parameters in bounded variants (suggested
by Eduard and Alexei).
- Make tests more compact (suggested by Eduard).
- Add benchmark.
====================
Viktor Malik [Thu, 26 Jun 2025 06:08:31 +0000 (08:08 +0200)]
selftests/bpf: Add tests for string kfuncs
Add both positive and negative tests cases using string kfuncs added in
the previous patches.
Positive tests check that the functions work as expected.
Negative tests pass various incorrect strings to the kfuncs and check
for the expected error codes:
-E2BIG when passing too long strings
-EFAULT when trying to read inaccessible kernel memory
-ERANGE when passing userspace pointers on arches with non-overlapping
address spaces
A majority of the tests use the RUN_TESTS helper which executes BPF
programs with BPF_PROG_TEST_RUN and check for the expected return value.
An exception to this are tests for long strings as we need to memset the
long string from userspace (at least I haven't found an ergonomic way to
memset it from a BPF program), which cannot be done using the RUN_TESTS
infrastructure.
Viktor Malik [Thu, 26 Jun 2025 06:08:30 +0000 (08:08 +0200)]
selftests/bpf: Allow macros in __retval
Allow macro expansion for values passed to the `__retval` and
`__retval_unpriv` attributes. This is especially useful for testing
programs which return various error codes.
With this change, the code for parsing special literals can be made
simpler, as the literals are defined via macros. The only exception is
INT_MIN which expands to (-INT_MAX -1), which is not single number and
cannot be parsed by strtol. So, we instead use a prefixed literal
_INT_MIN in __retval and handle it separately (assign the expected
return to INT_MIN). Also, strtol cannot handle the "ll" suffix so change
the value of POINTER_VALUE from 0xcafe4all to 0xbadcafe.
Viktor Malik [Thu, 26 Jun 2025 06:08:29 +0000 (08:08 +0200)]
bpf: Add kfuncs for read-only string operations
String operations are commonly used so this exposes the most common ones
to BPF programs. For now, we limit ourselves to operations which do not
copy memory around.
Unfortunately, most in-kernel implementations assume that strings are
%NUL-terminated, which is not necessarily true, and therefore we cannot
use them directly in the BPF context. Instead, we open-code them using
__get_kernel_nofault instead of plain dereference to make them safe and
limit the strings length to XATTR_SIZE_MAX to make sure the functions
terminate. When __get_kernel_nofault fails, functions return -EFAULT.
Similarly, when the size bound is reached, the functions return -E2BIG.
In addition, we return -ERANGE when the passed strings are outside of
the kernel address space.
Note that thanks to these dynamic safety checks, no other constraints
are put on the kfunc args (they are marked with the "__ign" suffix to
skip any verifier checks for them).
All of the functions return integers, including functions which normally
(in kernel or libc) return pointers to the strings. The reason is that
since the strings are generally treated as unsafe, the pointers couldn't
be dereferenced anyways. So, instead, we return an index to the string
and let user decide what to do with it. This also nicely fits with
returning various error codes when necessary (see above).
Jiawen Wu [Wed, 25 Jun 2025 02:39:24 +0000 (10:39 +0800)]
net: libwx: fix the creation of page_pool
'rx_ring->size' means the count of ring descriptors multiplied by the
size of one descriptor. When increasing the count of ring descriptors,
it may exceed the limit of pool size.
[ 864.209610] page_pool_create_percpu() gave up with errno -7
[ 864.209613] txgbe 0000:11:00.0: Page pool creation failed: -7
Fix to set the pool_size to the count of ring descriptors.
Fixes: 850b971110b2 ("net: libwx: Allocate Rx and Tx resources") Cc: stable@vger.kernel.org Signed-off-by: Jiawen Wu <jiawenwu@trustnetic.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Mina Almasry <almasrymina@google.com> Link: https://patch.msgid.link/434C72BFB40E350A+20250625023924.21821-1-jiawenwu@trustnetic.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Jakub Kicinski [Tue, 24 Jun 2025 18:32:58 +0000 (11:32 -0700)]
net: selftests: fix TCP packet checksum
The length in the pseudo header should be the length of the L3 payload
AKA the L4 header+payload. The selftest code builds the packet from
the lower layers up, so all the headers are pushed already when it
constructs L4. We need to subtract the lower layer headers from skb->len.
Linus Torvalds [Thu, 26 Jun 2025 04:09:02 +0000 (21:09 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix use-after-free in libbpf when map is resized (Adin Scannell)
- Fix verifier assumptions about 2nd argument of bpf_sysctl_get_name
(Jerome Marchand)
- Fix verifier assumption of nullness of d_inode in dentry (Song Liu)
- Fix global starvation of LRU map (Willem de Bruijn)
- Fix potential NULL dereference in btf_dump__free (Yuan Chen)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: adapt one more case in test_lru_map to the new target_free
libbpf: Fix possible use-after-free for externs
selftests/bpf: Convert test_sysctl to prog_tests
bpf: Specify access type of bpf_sysctl_get_name args
libbpf: Fix null pointer dereference in btf_dump__free on allocation failure
bpf: Adjust free target to avoid global starvation of LRU map
bpf: Mark dentry->d_inode as trusted_or_null
Linus Torvalds [Thu, 26 Jun 2025 03:48:48 +0000 (20:48 -0700)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull mount fixes from Al Viro:
"Several mount-related fixes"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
userns and mnt_idmap leak in open_tree_attr(2)
attach_recursive_mnt(): do not lock the covering tree when sliding something under it
replace collect_mounts()/drop_collected_mounts() with a safer variant
atm: Release atm_dev_mutex after removing procfs in atm_dev_deregister().
syzbot reported a warning below during atm_dev_register(). [0]
Before creating a new device and procfs/sysfs for it, atm_dev_register()
looks up a duplicated device by __atm_dev_lookup(). These operations are
done under atm_dev_mutex.
However, when removing a device in atm_dev_deregister(), it releases the
mutex just after removing the device from the list that __atm_dev_lookup()
iterates over.
So, there will be a small race window where the device does not exist on
the device list but procfs/sysfs are still not removed, triggering the
splat.
Let's hold the mutex until procfs/sysfs are removed in
atm_dev_deregister().