Leon Hwang [Wed, 7 Jan 2026 02:20:17 +0000 (10:20 +0800)]
bpf: Add BPF_F_CPU and BPF_F_ALL_CPUS flags support for percpu_array maps
Introduce support for the BPF_F_ALL_CPUS flag in percpu_array maps to
allow updating values for all CPUs with a single value for both
update_elem and update_batch APIs.
Introduce support for the BPF_F_CPU flag in percpu_array maps to allow:
* update value for specified CPU for both update_elem and update_batch
APIs.
* lookup value for specified CPU for both lookup_elem and lookup_batch
APIs.
The BPF_F_CPU flag is passed via:
* map_flags of lookup_elem and update_elem APIs along with embedded cpu
info.
* elem_flags of lookup_batch and update_batch APIs along with embedded
cpu info.
====================
bpf/verifier: Allow calling arena functions when holding BPF lock
BPF arena-related kfuncs now cannot sleep, so they are safe to call
while holding a spinlock. However, the verifier still rejects
programs that do so. Update the verifier to allow arena kfunc
calls while holding a lock.
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
Changes v1->v2: (https://lore.kernel.org/r/20260106-arena-under-lock-v1-0-6ca9c121d826@etsalapatis.com)
- Added patch to account for active locks in_sleepable_context() (AI)
====================
Emil Tsalapatis [Tue, 6 Jan 2026 23:36:44 +0000 (18:36 -0500)]
bpf: Allow calls to arena functions while holding spinlocks
The bpf_arena_*_pages() kfuncs can be called from sleepable contexts,
but the verifier still prevents BPF programs from calling them while
holding a spinlock. Amend the verifier to allow for BPF programs
calling arena page management functions while holding a lock.
Emil Tsalapatis [Tue, 6 Jan 2026 23:36:43 +0000 (18:36 -0500)]
bpf: Check active lock count in in_sleepable_context()
The in_sleepable_context() function is used to specialize the BPF code
in do_misc_fixups(). With the addition of nonsleepable arena kfuncs,
there are kfuncs whose specialization depends on whether we are
holding a lock. We should use the nonsleepable version while
holding a lock and the sleepable one when not.
Add a check for active_locks to account for locking when specializing
arena kfuncs.
Puranjay Mohan [Fri, 2 Jan 2026 22:15:12 +0000 (14:15 -0800)]
bpf: Replace __opt annotation with __nullable for kfuncs
The __opt annotation was originally introduced specifically for
buffer/size argument pairs in bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr(), allowing the buffer pointer to be NULL while
still validating the size as a constant. The __nullable annotation
serves the same purpose but is more general and is already used
throughout the BPF subsystem for raw tracepoints, struct_ops, and other
kfuncs.
This patch unifies the two annotations by replacing __opt with
__nullable. The key change is in the verifier's
get_kfunc_ptr_arg_type() function, where mem/size pair detection is now
performed before the nullable check. This ensures that buffer/size
pairs are correctly classified as KF_ARG_PTR_TO_MEM_SIZE even when the
buffer is nullable, while adding an !arg_mem_size condition to the
nullable check prevents interference with mem/size pair handling.
When processing KF_ARG_PTR_TO_MEM_SIZE arguments, the verifier now uses
is_kfunc_arg_nullable() instead of the removed is_kfunc_arg_optional()
to determine whether to skip size validation for NULL buffers.
This is the first documentation added for the __nullable annotation,
which has been in use since it was introduced but was previously
undocumented.
No functional changes to verifier behavior - nullable buffer/size pairs
continue to work exactly as before.
====================
memcg accounting for BPF arena
v4: https://lore.kernel.org/all/20260102181333.3033679-1-puranjay@kernel.org/
Changes in v4->v5:
- Remove unused variables from bpf_map_alloc_pages() (CI)
v3: https://lore.kernel.org/all/20260102151852.570285-1-puranjay@kernel.org/
Changes in v3->v4:
- Do memcg set/recover in arena_reserve_pages() rather than
bpf_arena_reserve_pages() for symmetry with other kfuncs (Alexei)
v2: https://lore.kernel.org/all/20251231141434.3416822-1-puranjay@kernel.org/
Changes in v2->v3:
- Remove memcg accounting from bpf_map_alloc_pages() as the caller does
it already. (Alexei)
- Do memcg set/recover in arena_alloc/free_pages() rather than
bpf_arena_alloc/free_pages(), it reduces copy pasting in
sleepable/non_sleepable functions.
v1: https://lore.kernel.org/all/20251230153006.1347742-1-puranjay@kernel.org/
Changes in v1->v2:
- Return both pointers through arguments from bpf_map_memcg_enter and
make it return void. (Alexei)
- Add memcg accounting in arena_free_worker (AI)
This set adds memcg accounting logic into arena kfuncs and other places
that do allocations in arena.c.
====================
Puranjay Mohan [Fri, 2 Jan 2026 20:02:28 +0000 (12:02 -0800)]
bpf: arena: Reintroduce memcg accounting
When arena allocations were converted from bpf_map_alloc_pages() to
kmalloc_nolock() to support non-sleepable contexts, memcg accounting was
inadvertently lost. This commit restores proper memory accounting for
all arena-related allocations.
All arena related allocations are accounted into memcg of the process
that created bpf_arena.
====================
bpf: Make KF_TRUSTED_ARGS default
v2: https://lore.kernel.org/all/20251231171118.1174007-1-puranjay@kernel.org/
Changes in v2->v3:
- Fix documentation: add a new section for kfunc parameters (Eduard)
- Remove all occurances of KF_TRUSTED from comments, etc. (Eduard)
- Fix the netfilter kfuncs to drop dead NULL checks.
- Fix selftest for netfilter kfuncs to check for verification failures
and remove the runtime failure that are not possible after this
changes
v1: https://lore.kernel.org/all/20251224192448.3176531-1-puranjay@kernel.org/
Changes in v1->v2:
- Update kfunc_dynptr_param selftest to use a real pointer that is not
ptr_to_stack and not CONST_PTR_TO_DYNPTR rather than casting 1
(Alexei)
- Thoroughly review all kfuncs in the to find regressions or missing
annotations. (Eduard)
- Fix kfuncs found from the above step.
This series makes trusted arguments the default requirement for all BPF
kfuncs, inverting the current opt-in model. Instead of requiring
explicit KF_TRUSTED_ARGS flags, kfuncs now require trusted arguments by
default and must explicitly opt-out using __nullable/__opt annotations
or the KF_RCU flag.
This improves security and type safety by preventing BPF programs from
passing untrusted or NULL pointers to kernel functions at verification
time, while maintaining flexibility for the small number of kfuncs that
legitimately need to accept NULL or RCU pointers.
MOTIVATION
The current opt-in model is error-prone and inconsistent. Most kfuncs already
require trusted pointers from sources like KF_ACQUIRE, struct_ops callbacks, or
tracepoints. Making trusted arguments the default:
- Prevents NULL pointer dereferences at verification time
- Reduces defensive NULL checks in kernel code
- Provides better error messages for invalid BPF programs
- Aligns with existing patterns (context pointers, struct_ops already trusted)
IMPACT ANALYSIS
Comprehensive analysis of all 304+ kfuncs across 37 kernel files found:
- Most kfuncs (299/304) are already safe and require no changes
- Only 4 kfuncs required fixes (all included in this series)
- 0 regressions found in independent verification
All bpf selftests are passing. The hid_bpf tests are also passing:
# PASSED: 20 / 20 tests passed.
# Totals: pass:20 fail:0 xfail:0 xpass:0 skip:0 error:0
bpf programs in drivers/hid/bpf/progs/ show no regression as shown by
veristat:
The verifier now validates kfunc arguments in this order:
1. NULL check (runs first): Rejects NULL unless parameter has __nullable/__opt
2. Trusted check: Rejects untrusted pointers unless kfunc has KF_RCU
Special cases that bypass trusted checking:
- Context pointers (xdp_md, __sk_buff): Handled via KF_ARG_PTR_TO_CTX
- Struct_ops callbacks: Pre-marked as PTR_TRUSTED during initialization
- KF_RCU kfuncs: Have separate validation path for RCU pointers
BACKWARD COMPATIBILITY
This affects BPF program verification, not runtime:
- Valid programs passing trusted pointers: Continue to work
- Programs with bugs: May now fail verification (preventing runtime crashes)
This series introduces two intentional breaking changes to the BPF
verifier's kfunc handling:
1. NULL pointer rejection timing: Kfuncs that previously accepted NULL
pointers without KF_TRUSTED_ARGS will now reject NULL at verification
time instead of returning runtime errors. This affects netfilter
connection tracking functions (bpf_xdp_ct_lookup, bpf_skb_ct_lookup,
bpf_xdp_ct_alloc, bpf_skb_ct_alloc), which now enforce their documented
"Cannot be NULL" requirements at load time rather than returning -EINVAL
at runtime.
2. Fentry/fexit program restrictions: BPF programs using fentry/fexit
attachment points can no longer pass their callback arguments directly
to kfuncs, as these arguments are not marked as trusted by default.
Programs requiring trusted argument semantics should migrate to tp_btf
(tracepoint with BTF) attachment points where arguments are guaranteed
trusted by the verifier.
Both changes strengthen the verifier's safety guarantees by catching
errors earlier in the development cycle and are accompanied by
comprehensive test updates demonstrating the new expected behaviors.
====================
Puranjay Mohan [Fri, 2 Jan 2026 18:00:36 +0000 (10:00 -0800)]
selftests: bpf: Fix test_bpf_nf for trusted args becoming default
With trusted args now being the default, passing NULL to kfunc
parameters that are pointers causes verifier rejection rather than a
runtime error. The test_bpf_nf test was failing because it attempted to
pass NULL to bpf_xdp_ct_lookup() to verify runtime error handling.
Since the NULL check now happens at verification time, remove the
runtime test case that passed NULL to the bpf_tuple parameter and
instead add verification-time tests to ensure the verifier correctly
rejects programs that pass NULL to trusted arguments.
Puranjay Mohan [Fri, 2 Jan 2026 18:00:35 +0000 (10:00 -0800)]
selftests: bpf: fix cgroup_hierarchical_stats
The cgroup_hierarchical_stats selftests uses an fentry program attached
to cgroup_attach_task and then passes the received &dst_cgrp->self to
the css_rstat_updated() kfunc. The verifier now assumes that all kfuncs
only takes trusted pointer arguments, and pointers received by fentry
are not marked trustes by default.
Use a tp_btf program in place for fentry for this test, pointers
received by tp_btf programs are marked trusted by the verifier.
Puranjay Mohan [Fri, 2 Jan 2026 18:00:34 +0000 (10:00 -0800)]
selftests: bpf: fix test_kfunc_dynptr_param
As verifier now assumes that all kfuncs only takes trusted pointer
arguments, passing 0 (NULL) to a kfunc that doesn't mark the argument as
__nullable or __opt will be rejected with a failure message of: Possibly
NULL pointer passed to trusted arg<n>
Pass a non-null value to the kfunc to test the expected failure mode.
Puranjay Mohan [Fri, 2 Jan 2026 18:00:33 +0000 (10:00 -0800)]
selftests: bpf: Update failure message for rbtree_fail
The rbtree_api_use_unchecked_remove_retval() selftest passes a pointer
received from bpf_rbtree_remove() to bpf_rbtree_add() without checking
for NULL, this was earlier caught by __check_ptr_off_reg() in the
verifier. Now the verifier assumes every kfunc only takes trusted pointer
arguments, so it catches this NULL pointer earlier in the path and
provides a more accurate failure message.
Puranjay Mohan [Fri, 2 Jan 2026 18:00:32 +0000 (10:00 -0800)]
selftests: bpf: Update kfunc_param_nullable test for new error message
With trusted args now being the default, the NULL pointer check runs
before type-specific validation. Update test3 to expect the new error
message "Possibly NULL pointer passed to trusted arg0" instead of the
old dynptr-specific error message.
Puranjay Mohan [Fri, 2 Jan 2026 18:00:31 +0000 (10:00 -0800)]
HID: bpf: drop dead NULL checks in kfuncs
As KF_TRUSTED_ARGS is now considered default for all kfuns, the verifier
will not allow passing NULL pointers to these kfuns. These checks for
NULL pointers can therefore be removed.
Puranjay Mohan [Fri, 2 Jan 2026 18:00:30 +0000 (10:00 -0800)]
bpf: xfrm: drop dead NULL check in bpf_xdp_get_xfrm_state()
As KF_TRUSTED_ARGS is now considered the default for all kfuncs, the
opts parameter in bpf_xdp_get_xfrm_state() can never be NULL. Verifier
will detect this at load time and will not allow passing NULL to this
function. This matches the documentation above the kfunc that says this
parameter (opts) Cannot be NULL.
Puranjay Mohan [Fri, 2 Jan 2026 18:00:29 +0000 (10:00 -0800)]
bpf: net: netfilter: drop dead NULL checks
bpf_xdp_ct_lookup() and bpf_skb_ct_lookup() receive bpf_tuple and opts
parameter that are expected to be not NULL for real usages (see doc
string above functions). They return an error if NULL is passed for opts
or tuple.
The verifier will now reject programs that pass NULL to these
parameters, the kfuns can assume that these are always valid pointer, so
drop the NULL checks for these parameters.
Puranjay Mohan [Fri, 2 Jan 2026 18:00:28 +0000 (10:00 -0800)]
bpf: Remove redundant KF_TRUSTED_ARGS flag from all kfuncs
Now that KF_TRUSTED_ARGS is the default for all kfuncs, remove the
explicit KF_TRUSTED_ARGS flag from all kfunc definitions and remove the
flag itself.
Puranjay Mohan [Fri, 2 Jan 2026 18:00:27 +0000 (10:00 -0800)]
bpf: Make KF_TRUSTED_ARGS the default for all kfuncs
Change the verifier to make trusted args the default requirement for
all kfuncs by removing is_kfunc_trusted_args() assuming it be to always
return true.
This works because:
1. Context pointers (xdp_md, __sk_buff, etc.) are handled through their
own KF_ARG_PTR_TO_CTX case label and bypass the trusted check
2. Struct_ops callback arguments are already marked as PTR_TRUSTED during
initialization and pass is_trusted_reg()
3. KF_RCU kfuncs are handled separately via is_kfunc_rcu() checks at
call sites (always checked with || alongside is_kfunc_trusted_args)
This simple change makes all kfuncs require trusted args by default
while maintaining correct behavior for all existing special cases.
Note: This change means kfuncs that previously accepted NULL pointers
without KF_TRUSTED_ARGS will now reject NULL at verification time.
Several netfilter kfuncs are affected: bpf_xdp_ct_lookup(),
bpf_skb_ct_lookup(), bpf_xdp_ct_alloc(), and bpf_skb_ct_alloc() all
accept NULL for their bpf_tuple and opts parameters internally (checked
in __bpf_nf_ct_lookup), but after this change the verifier rejects NULL
before the kfunc is even called. This is acceptable because these kfuncs
don't work with NULL parameters in their proper usage. Now they will be
rejected rather than returning an error, which shouldn't make a
difference to BPF programs that were using these kfuncs properly.
Ihor Solodrai [Wed, 31 Dec 2025 18:39:29 +0000 (10:39 -0800)]
scripts/gen-btf.sh: Reduce log verbosity
Remove info messages from gen-btf.sh, as they are unnecessarily
detailed and sometimes inaccurate [1]. Verbose log can be produced by
passing V=1 to make, which will set -x for the shell.
Ihor Solodrai [Wed, 31 Dec 2025 01:25:57 +0000 (17:25 -0800)]
resolve_btfids: Implement --patch_btfids
Recent changes in BTF generation [1] rely on ${OBJCOPY} command to
update .BTF_ids section data in target ELF files.
This exposed a bug in llvm-objcopy --update-section code path, that
may lead to corruption of a target ELF file. Specifically, because of
the bug st_shndx of some symbols may be (incorrectly) set to 0xffff
(SHN_XINDEX) [2][3].
While there is a pending fix for LLVM, it'll take some time before it
lands (likely in 22.x). And the kernel build must keep working with
older LLVM toolchains in the foreseeable future.
Using GNU objcopy for .BTF_ids update would work, but it would require
changes to LLVM-based build process, likely breaking existing build
environments as discussed in [2].
To work around llvm-objcopy bug, implement --patch_btfids code path in
resolve_btfids as a drop-in replacement for:
====================
bpf: unify state pruning handling of invalid/misc stack slots
This change unifies states pruning handling of NOT_INIT registers,
STACK_INVALID/STACK_MISC stack slots for regular and iterator/callback
based loop cases.
The change results in a modest verifier performance improvement:
========= selftests: master vs loop-stack-misc-pruning =========
Total progs: 305
Old success: 292
New success: 292
total_insns diff min: -49.55%
total_insns diff max: 0.00%
0 -> value: 0
value -> 0: 0
total_insns abs max old: 257,545
total_insns abs max new: 243,678
-50 .. -45 %: 7
-30 .. -20 %: 5
-20 .. -10 %: 14
-10 .. 0 %: 18
0 .. 5 %: 261
There is also a significant verifier performance improvement for some
bpf_loop() heavy Meta internal programs (~ -40% processed instructions).
====================
Eduard Zingerman [Wed, 31 Dec 2025 05:36:04 +0000 (21:36 -0800)]
selftests/bpf: iterator based loop and STACK_MISC states pruning
The test case first initializes 9 stack slots as STACK_MISC,
then conditionally updates each of them to SCALAR spill inside an
iterator based loop. This leads to 2**9 combinations of MISC/SPILL
marks for these slots at the iterator next call.
The loop converges only if the verifier treats such states as
equivalent, otherwise visited states are evicted from the states cache
too quickly.
Eduard Zingerman [Wed, 31 Dec 2025 05:36:03 +0000 (21:36 -0800)]
bpf: allow states pruning for misc/invalid slots in iterator loops
Within an iterator or callback based loop, it should be safe to prune
the current state if the old state stack slot is marked as
STACK_INVALID or STACK_MISC:
- either all branches of the old state lead to a program exit;
- or some branch of the old state leads the current state.
This is the same logic as applied in non-loop cases when
states_equal() is called in NOT_EXACT mode.
The test case that exercises stacksafe() and demonstrates the
difference in verification performance is included in the next patch.
I'm not sure if it is possible to prepare a test case that exercises
regsafe(); it appears that the compute_live_registers() pass makes
this impossible.
Nevertheless, for code readability reasons, I think that stacksafe()
and regsafe() should handle STACK_INVALID / NOT_INIT symmetrically.
Hence, this commit changes both functions.
====================
bpf: calls to bpf_loop() should have an SCC and accumulate backedges
This is a correctness fix for the verification of BPF programs that
work with callback-calling functions. The problem is the same as the
issue fixed by series [1] for iterator-based loops: some of the states
created while processing the callback function body might have
incomplete read or precision marks.
An example of an unsafe program that is accepted without this fix can
be found in patch #2.
Eduard Zingerman [Tue, 30 Dec 2025 07:13:08 +0000 (23:13 -0800)]
selftests/bpf: test cases for bpf_loop SCC and state graph backedges
Test for state graph backedges accumulation for SCCs formed by
bpf_loop(). Equivalent to the following C program:
int main(void) {
1: fp[-8] = bpf_get_prandom_u32();
2: fp[-16] = -32; // used in a memory access below
3: bpf_loop(7, loop_cb4, fp, 0);
4: return 0;
}
int loop_cb4(int i, void *ctx) {
5: if (unlikely(ctx[-8] > bpf_get_prandom_u32()))
6: *(u64 *)(fp + ctx[-16]) = 42; // aligned access expected
7: if (unlikely(fp[-8] > bpf_get_prandom_u32()))
8: ctx[-16] = -31; // makes said access unaligned
9: return 0;
}
If state graph backedges are not accumulated properly at the SCC
formed by loop_cb4() call from bpf_loop(), the state {ctx[-16]=-32}
injected at instruction 9 on verification path 1,2,3,5,7,9,4 would be
considered fully verified and would lack precision mark for ctx[-16].
This would lead to early pruning of verification path 1,2,3,5,7,8,9 in
state {ctx[-16]=-31}, which in turn leads to the incorrect assumption
that the above program is safe.
Eduard Zingerman [Tue, 30 Dec 2025 07:13:07 +0000 (23:13 -0800)]
bpf: bpf_scc_visit instance and backedges accumulation for bpf_loop()
Calls like bpf_loop() or bpf_for_each_map_elem() introduce loops that
are not explicitly present in the control-flow graph. The verifier
processes such calls by repeatedly interpreting the callback function
body within the same verification path (until the current state
converges with a previous state).
Such loops require a bpf_scc_visit instance in order to allow the
accumulation of the state graph backedges. Otherwise, certain
checkpoint states created within the bodies of such loops will have
incomplete precision marks.
See the next patch for an example of a program that leads to the
verifier accepting an unsafe program.
Fixes: 96c6aa4c63af ("bpf: compute SCCs in program control flow graph") Fixes: c9e31900b54c ("bpf: propagate read/precision marks over state graph backedges") Reported-by: Breno Leitao <leitao@debian.org> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Tested-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20251229-scc-for-callbacks-v1-1-ceadfe679900@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Puranjay Mohan [Tue, 30 Dec 2025 19:51:32 +0000 (11:51 -0800)]
selftests/bpf: Fix verifier_arena_large/big_alloc3 test
The big_alloc3() test tries to allocate 2051 pages at once in
non-sleepable context and this can fail sporadically on resource
contrained systems, so skip this test in case of such failures.
Ihor Solodrai [Mon, 29 Dec 2025 20:28:23 +0000 (12:28 -0800)]
scripts/gen-btf.sh: Fix .btf.o generation when compiling for RISCV
gen-btf.sh emits a .btf.o file with BTF sections to be linked into
vmlinux in link-vmlinux.sh
This .btf.o file is created by compiling an emptystring with ${CC},
and then adding BTF sections into it with ${OBJCOPY}.
To ensure the .btf.o is linkable when cross-compiling with LLVM, we
have to also pass ${KBUILD_FLAGS}, which in particular control the
target word size.
====================
Remove KF_SLEEPABLE from arena kfuncs
V7: https://lore.kernel.org/all/20251222190815.4112944-1-puranjay@kernel.org/
Changes in V7->v8:
- Use clear_lo32(arena->user_vm_start) in place of user_vm_start in patch 3
V6: https://lore.kernel.org/all/20251217184438.3557859-1-puranjay@kernel.org/
Changes in v6->v7:
- Fix a deadlock in patch 1, that was being fixed in patch 2. Move the fix to patch 1.
- Call flush_cache_vmap() after setting up the mappings as it is
required by some architectures.
V5: https://lore.kernel.org/all/20251212044516.37513-1-puranjay@kernel.org/
Changes in v5->v6:
Patch 1:
- Add a missing ; to make sure this patch builds individually. (AI)
V4: https://lore.kernel.org/all/20251212004350.6520-1-puranjay@kernel.org/
Changes in v4->v5:
Patch 1:
- Fix a memory leak in arena_alloc_pages(), it was being fixed in
Patch 3 but, every patch should be complete in itself. (AI)
Patch 3:
- Don't do useless addition in arena_alloc_pages() (Alexei)
- Add a comment about kmalloc_nolock() failure and expectations.
v3: https://lore.kernel.org/all/20251117160150.62183-1-puranjay@kernel.org/
Changes in v3->v4:
- Coding style changes related to comments in Patch 2/3 (Alexei)
v2: https://lore.kernel.org/all/20251114111700.43292-1-puranjay@kernel.org/
Changes in v2->v3:
Patch 1:
- Call range_tree_destroy() in error path of
populate_pgtable_except_pte() in arena_map_alloc() (AI)
Patch 2:
- Fix double mutex_unlock() in the error path of
arena_alloc_pages() (AI)
- Fix coding style issues (Alexei)
Patch 3:
- Unlock spinlock before returning from arena_vm_fault() in case
BPF_F_SEGV_ON_FAULT is set by user. (AI)
- Use __llist_del_all() in place of llist_del_all for on-stack
llist (free_pages) (Alexei)
- Fix build issues on 32-bit systems where arena.c is not compiled.
(kernel test robot)
- Make bpf_arena_alloc_pages() polymorphic so it knows if it has
been called in sleepable or non-sleepable context. This
information is passed to arena_free_pages() in the error path.
Patch 4:
- Add a better comment for the big_alloc3() test that triggers
kmalloc_nolock()'s limit and if bpf_arena_alloc_pages() works
correctly above this limit.
v1: https://lore.kernel.org/all/20251111163424.16471-1-puranjay@kernel.org/
Changes in v1->v2:
Patch 1:
- Import tlbflush.h to fix build issue in loongarch. (kernel
test robot)
- Fix unused variable error in apply_range_clear_cb() (kernel
test robot)
- Call bpf_map_area_free() on error path of
populate_pgtable_except_pte() (AI)
- Use PAGE_SIZE in apply_to_existing_page_range() (AI)
Patch 2:
- Cap allocation made by kmalloc_nolock() for pages array to
KMALLOC_MAX_CACHE_SIZE and reuse the array in an explicit loop
to overcome this limit. (AI)
Patch 3:
- Do page_ref_add(page, 1); under the spinlock to mitigate a
race (AI)
Patch 4:
- Add a new testcase big_alloc3() verifier_arena_large.c that
tries to allocate a large number of pages at once, this is to
trigger the kmalloc_nolock() limit in Patch 2 and see if the
loop logic works correctly.
This set allows arena kfuncs to be called from non-sleepable contexts.
It is acheived by the following changes:
The range_tree is now protected with a rqspinlock and not a mutex,
this change is enough to make bpf_arena_reserve_pages() any context
safe.
bpf_arena_alloc_pages() had four points where it could sleep:
1. Mutex to protect range_tree: now replaced with rqspinlock
2. kvcalloc() for allocations: now replaced with kmalloc_nolock()
3. Allocating pages with bpf_map_alloc_pages(): this already calls
alloc_pages_nolock() in non-sleepable contexts and therefore is safe.
4. Setting up kernel page tables with vm_area_map_pages():
vm_area_map_pages() may allocate memory while inserting pages into
bpf arena's vm_area. Now, at arena creation time populate all page
table levels except the last level and when new pages need to be
inserted call apply_to_page_range() again which will only do
set_pte_at() for those pages and will not allocate memory.
The above four changes make bpf_arena_alloc_pages() any context safe.
bpf_arena_free_pages() has to do the following steps:
1. Update the range_tree
2. vm_area_unmap_pages(): to unmap pages from kernel vm_area
3. flush the tlb: done in step 2, already.
4. zap_pages(): to unmap pages from user page tables
5. free pages.
The third patch in this set makes bpf_arena_free_pages() polymorphic using
the specialize_kfunc() mechanism. When called from a sleepable context,
arena_free_pages() remains mostly unchanged except the following:
1. rqspinlock is taken now instead of the mutex for the range tree
2. Instead of using vm_area_unmap_pages() that can free intermediate page
table levels, apply_to_existing_page_range() with a callback is used
that only does pte_clear() on the last level and leaves the intermediate
page table levels intact. This is needed to make sure that
bpf_arena_alloc_pages() can safely do set_pte_at() without allocating
intermediate page tables.
When arena_free_pages() is called from a non-sleepable context or it fails to
acquire the rqspinlock in the sleepable case, a lock-less list of struct
arena_free_span is used to queue the uaddr and page cnt. kmalloc_nolock()
is used to allocate this arena_free_span, this can fail but we need to make
this trade-off for frees done from non-sleepable contexts.
arena_free_pages() then raises an irq_work whose handler in turn schedules
work that iterate this list and clears ptes, flushes tlbs, zap pages, and
frees pages for the queued uaddr and page cnts.
apply_range_clear_cb() with apply_to_existing_page_range() is used to
clear PTEs and collect pages to be freed, struct llist_node pcp_llist;
in the struct page is used to do this.
====================
Puranjay Mohan [Mon, 22 Dec 2025 19:50:19 +0000 (11:50 -0800)]
selftests: bpf: test non-sleepable arena allocations
As arena kfuncs can now be called from non-sleepable contexts, test this
by adding non-sleepable copies of tests in verifier_arena, this is done
by using a socket program instead of syscall.
Add a new test case in verifier_arena_large to check that the
bpf_arena_alloc_pages() works for more than 1024 pages.
1024 * sizeof(struct page *) is the upper limit of kmalloc_nolock() but
bpf_arena_alloc_pages() should still succeed because it re-uses this
array in a loop.
Augment the arena_list selftest to also run in non-sleepable context by
taking rcu_read_lock.
Puranjay Mohan [Mon, 22 Dec 2025 19:50:18 +0000 (11:50 -0800)]
bpf: arena: make arena kfuncs any context safe
Make arena related kfuncs any context safe by the following changes:
bpf_arena_alloc_pages() and bpf_arena_reserve_pages():
Replace the usage of the mutex with a rqspinlock for range tree and use
kmalloc_nolock() wherever needed. Use free_pages_nolock() to free pages
from any context.
apply_range_set/clear_cb() with apply_to_page_range() has already made
populating the vm_area in bpf_arena_alloc_pages() any context safe.
bpf_arena_free_pages(): defer the main logic to a workqueue if it is
called from a non-sleepable context.
specialize_kfunc() is used to replace the sleepable arena_free_pages()
with bpf_arena_free_pages_non_sleepable() when the verifier detects the
call is from a non-sleepable context.
In the non-sleepable case, arena_free_pages() queues the address and the
page count to be freed to a lock-less list of struct arena_free_spans
and raises an irq_work. The irq_work handler calls schedules_work() as
it is safe to be called from irq context. arena_free_worker() (the work
queue handler) iterates these spans and clears ptes, flushes tlb, zaps
pages, and calls __free_page().
Puranjay Mohan [Mon, 22 Dec 2025 19:50:17 +0000 (11:50 -0800)]
bpf: arena: use kmalloc_nolock() in place of kvcalloc()
To make arena_alloc_pages() safe to be called from any context, replace
kvcalloc() with kmalloc_nolock() so as it doesn't sleep or take any
locks. kmalloc_nolock() returns NULL for allocations larger than
KMALLOC_MAX_CACHE_SIZE, which is (PAGE_SIZE * 2) = 8KB on systems with
4KB pages. So, round down the allocation done by kmalloc_nolock to 1024
* 8 and reuse the array in a loop.
Puranjay Mohan [Mon, 22 Dec 2025 19:50:16 +0000 (11:50 -0800)]
bpf: arena: populate vm_area without allocating memory
vm_area_map_pages() may allocate memory while inserting pages into bpf
arena's vm_area. In order to make bpf_arena_alloc_pages() kfunc
non-sleepable change bpf arena to populate pages without
allocating memory:
- at arena creation time populate all page table levels except
the last level
- when new pages need to be inserted call apply_to_page_range() again
with apply_range_set_cb() which will only set_pte_at() those pages and
will not allocate memory.
- when freeing pages call apply_to_existing_page_range with
apply_range_clear_cb() to clear the pte for the page to be removed. This
doesn't free intermediate page table levels.
Daniel Gomez [Sat, 20 Dec 2025 03:48:09 +0000 (04:48 +0100)]
bpf: crypto: replace -EEXIST with -EBUSY
The -EEXIST error code is reserved by the module loading infrastructure
to indicate that a module is already loaded. When a module's init
function returns -EEXIST, userspace tools like kmod interpret this as
"module already loaded" and treat the operation as successful, returning
0 to the user even though the module initialization actually failed.
This follows the precedent set by commit 54416fd76770 ("netfilter:
conntrack: helper: Replace -EEXIST by -EBUSY") which fixed the same
issue in nf_conntrack_helper_register().
This affects bpf_crypto_skcipher module. While the configuration
required to build it as a module is unlikely in practice, it is
technically possible, so fix it for correctness.
====================
mm: bpf kfuncs to access memcg data
Introduce kfuncs to simplify the access to the memcg data.
These kfuncs can be used to accelerate monitoring use cases and
for implementing custom OOM policies once BPF OOM is landed.
This patchset was separated out from the BPF OOM patchset to simplify
the logistics and accelerate the landing of the part which is useful
by itself. No functional changes since BPF OOM v2.
v4:
- refactored memcg vm event and stat item idx checks (by Alexei)
v3:
- dropped redundant kfuncs flags (by Alexei)
- fixed kdocs warnings (by Alexei)
- merged memcg stats access patches into one (by Alexei)
- restored root memcg usage reporting, added a comment
- added checks for enum boundaries
- added Shakeel and JP as co-maintainers (by Shakeel)
v2:
- added mem_cgroup_disabled() checks (by Shakeel B.)
- added special handling of the root memcg in bpf_mem_cgroup_usage()
(by Shakeel B.)
- minor fixes in the kselftest (by Shakeel B.)
- added a MAINTAINERS entry (by Shakeel B.)
JP Kobryn [Tue, 23 Dec 2025 04:41:55 +0000 (20:41 -0800)]
bpf: selftests: selftests for memcg stat kfuncs
Add test coverage for the kfuncs that fetch memcg stats. Using some common
stats, test scenarios ensuring that the given stat increases by some
arbitrary amount. The stats selected cover the three categories represented
by the enums: node_stat_item, memcg_stat_item, vm_event_item.
Since only a subset of all stats are queried, use a static struct made up
of fields for each stat. Write to the struct with the fetched values when
the bpf program is invoked and read the fields in the user mode program for
verification.
These functions are useful for implementing BPF OOM policies, but
also can be used to accelerate access to the memcg data. Reading
it through cgroupfs is much more expensive, roughly 5x, mostly
because of the need to convert the data into the text and back.
JP Kobryn:
An experiment was setup to compare the performance of a program that
uses the traditional method of reading memory.stat vs a program using
the new kfuncs. The control program opens up the root memory.stat file
and for 1M iterations reads, converts the string values to numeric data,
then seeks back to the beginning. The experimental program sets up the
requisite libbpf objects and for 1M iterations invokes a bpf program
which uses the kfuncs to fetch all available stats for node_stat_item,
memcg_stat_item, and vm_event_item types.
The results showed a significant perf benefit on the experimental side,
outperforming the control side by a margin of 93%. In kernel mode,
elapsed time was reduced by 80%, while in user mode, over 99% of time
was saved.
control: elapsed time
real 0m38.318s
user 0m25.131s
sys 0m13.070s
experiment: elapsed time
real 0m2.789s
user 0m0.187s
sys 0m2.512s
Roman Gushchin [Tue, 23 Dec 2025 04:41:53 +0000 (20:41 -0800)]
mm: introduce bpf_get_root_mem_cgroup() BPF kfunc
Introduce a BPF kfunc to get a trusted pointer to the root memory
cgroup. It's very handy to traverse the full memcg tree, e.g.
for handling a system-wide OOM.
It's possible to obtain this pointer by traversing the memcg tree
up from any known memcg, but it's sub-optimal and makes BPF programs
more complex and less efficient.
bpf_get_root_mem_cgroup() has a KF_ACQUIRE | KF_RET_NULL semantics,
however in reality it's not necessary to bump the corresponding
reference counter - root memory cgroup is immortal, reference counting
is skipped, see css_get(). Once set, root_mem_cgroup is always a valid
memcg pointer. It's safe to call bpf_put_mem_cgroup() for the pointer
obtained with bpf_get_root_mem_cgroup(), it's effectively a no-op.
Roman Gushchin [Tue, 23 Dec 2025 04:41:52 +0000 (20:41 -0800)]
mm: introduce BPF kfuncs to deal with memcg pointers
To effectively operate with memory cgroups in BPF there is a need
to convert css pointers to memcg pointers. A simple container_of
cast which is used in the kernel code can't be used in BPF because
from the verifier's point of view that's a out-of-bounds memory access.
Introduce helper get/put kfuncs which can be used to get
a refcounted memcg pointer from the css pointer:
- bpf_get_mem_cgroup,
- bpf_put_mem_cgroup.
bpf_get_mem_cgroup() can take both memcg's css and the corresponding
cgroup's "self" css. It allows it to be used with the existing cgroup
iterator which iterates over cgroup tree, not memcg tree.
Puranjay Mohan [Fri, 19 Dec 2025 19:13:08 +0000 (11:13 -0800)]
bpf: arm64: Fix sparse warnings
ctx->image is declared as __le32 because arm64 instructions are LE
regardless of CPU's runtime endianness. emit_u32_data() emits raw data
and not instructions so cast the value to __le32 to fix the sparse
warning.
Cast function pointer to void * before doing arithmetic.
Matt Bobrowski [Tue, 16 Dec 2025 13:30:00 +0000 (13:30 +0000)]
selftests/bpf: add test case for BPF LSM hook bpf_lsm_mmap_file
Add a trivial test case asserting that the BPF verifier enforces
PTR_MAYBE_NULL semantics on the struct file pointer argument of BPF
LSM hook bpf_lsm_mmap_file().
Dereferencing the struct file pointer passed into bpf_lsm_mmap_file()
without explicitly performing a NULL check first should not be
permitted by the BPF verifier as it can lead to NULL pointer
dereferences and a kernel crash.
Matt Bobrowski [Tue, 16 Dec 2025 13:29:59 +0000 (13:29 +0000)]
bpf: annotate file argument as __nullable in bpf_lsm_mmap_file
As reported in [0], anonymous memory mappings are not backed by a
struct file instance. Consequently, the struct file pointer passed to
the security_mmap_file() LSM hook is NULL in such cases.
The BPF verifier is currently unaware of this, allowing BPF LSM
programs to dereference this struct file pointer without needing to
perform an explicit NULL check. This leads to potential NULL pointer
dereference and a kernel crash.
Add a strong override for bpf_lsm_mmap_file() which annotates the
struct file pointer parameter with the __nullable suffix. This
explicitly informs the BPF verifier that this pointer (PTR_MAYBE_NULL)
can be NULL, forcing BPF LSM programs to perform a check on it before
dereferencing it.
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Closes: https://lore.kernel.org/bpf/5e460d3c.4c3e9.19adde547d8.Coremail.kaiyanm@hust.edu.cn/ Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20251216133000.3690723-1-mattbobrowski@google.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Uros Bizjak [Mon, 8 Dec 2025 16:33:34 +0000 (17:33 +0100)]
x86/bpf: Avoid emitting LOCK prefix for XCHG atomic ops
The x86 XCHG instruction is implicitly locked when one of the
operands is a memory location, making an explicit LOCK prefix
unnecessary.
Stop emitting the LOCK prefix for BPF_XCHG in the JIT atomic
read-modify-write helpers. This avoids redundant instruction
prefixes while preserving correct atomic semantics.
====================
bpf: Optimize recursion detection on arm64
V2: https://lore.kernel.org/all/20251217233608.2374187-1-puranjay@kernel.org/
Changes in v2->v3:
- Added acked by Yonghong
- Patch 2:
- Change alignment of active from 8 to 4
- Use le32_to_cpu in place of get_unaligned_le32()
V1: https://lore.kernel.org/all/20251217162830.2597286-1-puranjay@kernel.org/
Changes in V1->V2:
- Patch 2:
- Put preempt_enable()/disable() around RMW accesses to mitigate
race conditions. Because on CONFIG_PREEMPT_RCU and sleepable
bpf programs, preemption can cause no bpf prog to execute in
case of recursion.
BPF programs detect recursion using a per-CPU 'active' flag in struct
bpf_prog. The trampoline currently sets/clears this flag with atomic
operations.
On some arm64 platforms (e.g., Neoverse V2 with LSE), per-CPU atomic
operations are relatively slow. Unlike x86_64 - where per-CPU updates
can avoid cross-core atomicity, arm64 LSE atomics are always atomic
across all cores, which is unnecessary overhead for strictly per-CPU
state.
This patch removes atomics from the recursion detection path on arm64.
It was discovered in [1] that per-CPU atomics that don't return a value
were extremely slow on some arm64 platforms, Catalin added a fix in
commit 535fdfc5a228 ("arm64: Use load LSE atomics for the non-return
per-CPU atomic operations") to solve this issue, but it seems to have
caused a regression on the fentry benchmark.
Using the fentry benchmark from the bpf selftests shows the following:
./tools/testing/selftests/bpf/bench trig-fentry
+---------------------------------------------+------------------------+
| Configuration | Total Operations (M/s) |
+---------------------------------------------+------------------------+
| bpf-next/master with Catalin’s fix reverted | 51.770 |
|---------------------------------------------|------------------------|
| bpf-next/master | 43.271 |
| bpf-next/master with this change | 43.271 |
+---------------------------------------------+------------------------+
All benchmarks were run on a KVM based vm with Neoverse-V2 and 8 cpus.
This patch yields a 25% improvement in this benchmark compared to
bpf-next. Notably, reverting Catalin's fix also results in a performance
gain for this benchmark, which is interesting but expected.
For completeness, this benchmark was also run with the change enabled on
x86-64, which resulted in a 30% regression in the fentry benchmark. So,
it is only enabled on arm64.
P.S. - Here is more data with other program types:
Puranjay Mohan [Fri, 19 Dec 2025 18:44:18 +0000 (10:44 -0800)]
bpf: arm64: Optimize recursion detection by not using atomics
BPF programs detect recursion using a per-CPU 'active' flag in struct
bpf_prog. The trampoline currently sets/clears this flag with atomic
operations.
On some arm64 platforms (e.g., Neoverse V2 with LSE), per-CPU atomic
operations are relatively slow. Unlike x86_64 - where per-CPU updates
can avoid cross-core atomicity, arm64 LSE atomics are always atomic
across all cores, which is unnecessary overhead for strictly per-CPU
state.
This patch removes atomics from the recursion detection path on arm64 by
changing 'active' to a per-CPU array of four u8 counters, one per
context: {NMI, hard-irq, soft-irq, normal}. The running context uses a
non-atomic increment/decrement on its element. After increment,
recursion is detected by reading the array as a u32 and verifying that
only the expected element changed; any change in another element
indicates inter-context recursion, and a value > 1 in the same element
indicates same-context recursion.
For example, starting from {0,0,0,0}, a normal-context trigger changes
the array to {0,0,0,1}. If an NMI arrives on the same CPU and triggers
the program, the array becomes {1,0,0,1}. When the NMI context checks
the u32 against the expected mask for normal (0x00000001), it observes
0x01000001 and correctly reports recursion. Same-context recursion is
detected analogously.
Puranjay Mohan [Fri, 19 Dec 2025 18:44:17 +0000 (10:44 -0800)]
bpf: move recursion detection logic to helpers
BPF programs detect recursion by doing atomic inc/dec on a per-cpu
active counter from the trampoline. Create two helpers for operations on
this active counter, this makes it easy to changes the recursion
detection logic in future.
====================
resolve_btfids: Support for BTF modifications
This series changes resolve_btfids and kernel build scripts to enable
BTF transformations in resolve_btfids. Main motivation for enhancing
resolve_btfids is to reduce dependency of the kernel build on pahole
capabilities [1] and enable BTF features and optimizations [2][3]
particular to the kernel.
Patches #1-#4 in the series are non-functional changes in
resolve_btfids.
Patch #5 makes kernel build notice pahole version changes between
builds.
Patch #6 changes minimum version of pahole required for
CONFIG_DEBUG_INFO_BTF to v1.22
Patch #7 makes a small prep change in selftests/bpf build.
The last patch (#8) makes significant changes in resolve_btfids and
introduces scripts/gen-btf.sh. See implementation details in the patch
description.
Successful BPF CI run: https://github.com/kernel-patches/bpf/actions/runs/20378061470
v2->v3:
- add patch #4 bumping minimum pahole version (Andrii, Alan)
- add patch #5 pre-fixing resolve_btfids test (Donglin)
- add GEN_BTF var and assemble RESOLVE_BTFIDS_FLAGS in Makefile.btf (Alan)
- implement --distill_base flag in resolve_btfids, set it depending
on KBUILD_EXTMOD in Makefile.btf (Eduard)
- various implementation nits, see the v2 thread for details (Andrii, Eduard)
v1->v2:
- gen-btf.sh and other shell script fixes (Donglin)
- update selftests build (Donglin)
- generate .BTF.base only when KBUILD_EXTMOD is set (Alan)
- proper endianness handling for cross-compilation
- change elf_begin mode from ELF_C_RDWR_MMAP to ELF_C_READ_MMAP_PRIVATE
- remove compressed_section_fix()
- nit NULL check in patch #3 (suggested by AI)
Ihor Solodrai [Fri, 19 Dec 2025 18:18:25 +0000 (10:18 -0800)]
resolve_btfids: Change in-place update with raw binary output
Currently resolve_btfids updates .BTF_ids section of an ELF file
in-place, based on the contents of provided BTF, usually within the
same input file, and optionally a BTF base.
Change resolve_btfids behavior to enable BTF transformations as part
of its main operation. To achieve this, in-place ELF write in
resolve_btfids is replaced with generation of the following binaries:
* ${1}.BTF with .BTF section data
* ${1}.BTF_ids with .BTF_ids section data if it existed in ${1}
* ${1}.BTF.base with .BTF.base section data for out-of-tree modules
The execution of resolve_btfids and consumption of its output is
orchestrated by scripts/gen-btf.sh introduced in this patch.
The motivation for emitting binary data is that it allows simplifying
resolve_btfids implementation by delegating ELF update to the $OBJCOPY
tool [1], which is already widely used across the codebase.
There are two distinct paths for BTF generation and resolve_btfids
application in the kernel build: for vmlinux and for kernel modules.
For the vmlinux binary a .BTF section is added in a roundabout way to
ensure correct linking. The patch doesn't change this approach, only
the implementation is a little different.
Before this patch it worked as follows:
* pahole consumed .tmp_vmlinux1 [2] and added .BTF section with
llvm-objcopy [3] to it
* then everything except the .BTF section was stripped from .tmp_vmlinux1
into a .tmp_vmlinux1.bpf.o object [2], later linked into vmlinux
* resolve_btfids was executed later on vmlinux.unstripped [4],
updating it in-place
After this patch gen-btf.sh implements the following:
* pahole consumes .tmp_vmlinux1 and produces a *detached* file with
raw BTF data
* resolve_btfids consumes .tmp_vmlinux1 and detached BTF to produce
(potentially modified) .BTF, and .BTF_ids sections data
* a .tmp_vmlinux1.bpf.o object is then produced with objcopy copying
BTF output of resolve_btfids
* .BTF_ids data gets embedded into vmlinux.unstripped in
link-vmlinux.sh by objcopy --update-section
For kernel modules, creating a special .bpf.o file is not necessary,
and so embedding of sections data produced by resolve_btfids is
straightforward with objcopy.
With this patch an ELF file becomes effectively read-only within
resolve_btfids, which allows deleting elf_update() call and satellite
code (like compressed_section_fix [5]).
Endianness handling of .BTF_ids data is also changed. Previously the
"flags" part of the section was bswapped in sets_patch() [6], and then
Elf_Type was modified before elf_update() to signal to libelf that
bswap may be necessary. With this patch we explicitly bswap entire
data buffer on load and on dump.
Ihor Solodrai [Fri, 19 Dec 2025 18:18:24 +0000 (10:18 -0800)]
selftests/bpf: Run resolve_btfids only for relevant .test.o objects
A selftest targeting resolve_btfids functionality relies on a resolved
.BTF_ids section to be available in the TRUNNER_BINARY. The underlying
BTF data is taken from a special BPF program (btf_data.c), and so
resolve_btfids is executed as a part of a TRUNNER_BINARY build recipe
on the final binary.
Subsequent patches in this series allow resolve_btfids to modify BTF
before resolving the symbols, which means that the test needs access
to that modified BTF [1]. Currently the test simply reads in
btf_data.bpf.o on the assumption that BTF hasn't changed.
Implement resolve_btfids call only for particular test objects (just
resolve_btfids.test.o for now). The test objects are linked into the
TRUNNER_BINARY, and so .BTF_ids section will be available there.
This will make it trivial for the resolve_btfids test to access BTF
modified by resolve_btfids.
Ihor Solodrai [Fri, 19 Dec 2025 18:18:23 +0000 (10:18 -0800)]
lib/Kconfig.debug: Set the minimum required pahole version to v1.22
Subsequent patches in the series change vmlinux linking scripts to
unconditionally pass --btf_encode_detached to pahole, which was
introduced in v1.22 [1][2].
This change allows to remove PAHOLE_HAS_SPLIT_BTF Kconfig option and
other checks of older pahole versions.
Ihor Solodrai [Fri, 19 Dec 2025 18:13:18 +0000 (10:13 -0800)]
kbuild: Sync kconfig when PAHOLE_VERSION changes
This patch implements kconfig re-sync when the pahole version changes
between builds, similar to how it happens for compiler version change
via CC_VERSION_TEXT.
Define PAHOLE_VERSION in the top-level Makefile and export it for
config builds. Set CONFIG_PAHOLE_VERSION default to the exported
variable.
Kconfig records the PAHOLE_VERSION value in
include/config/auto.conf.cmd [1].
The Makefile includes auto.conf.cmd, so if PAHOLE_VERSION changes
between builds, make detects a dependency change and triggers
syncconfig to update the kconfig [2].
For external module builds, add a warning message in the prepare
target, similar to the existing compiler version mismatch warning.
Note that if pahole is not installed or available, PAHOLE_VERSION is
set to 0 by pahole-version.sh, so the (un)installation of pahole is
treated as a version change.
Ihor Solodrai [Fri, 19 Dec 2025 18:13:15 +0000 (10:13 -0800)]
resolve_btfids: Factor out load_btf()
Increase the lifetime of parsed BTF in resolve_btfids by factoring
load_btf() routine out of symbols_resolve() and storing the base_btf
and btf pointers in the struct object.
- Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu)
- Fix warnings in libbpf when built with -Wdiscarded-qualifiers under
C23 (Mikhail Gavrilov)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: add regression test for bpf_d_path()
bpf: Fix verifier assumptions of bpf_d_path's output buffer
selftests/bpf: Add test for truncated dmabuf_iter reads
bpf: Fix truncated dmabuf iterator reads
x86/unwind/orc: Support reliable unwinding through BPF stack frames
bpf: Add bpf_has_frame_pointer()
bpf, arm64: Do not audit capability check in do_jit()
libbpf: Fix -Wdiscarded-qualifiers under C23
bpftool: Fix build warnings due to MS extensions
net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT
selftests/bpf: Add -fms-extensions to bpf build flags
Linus Torvalds [Wed, 17 Dec 2025 03:48:30 +0000 (15:48 +1200)]
Merge tag 's390-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- clear 'Search boot program' flag when 'bootprog' sysfs file is
written to override a value set from Hardware Management Console
- fix cyclic dead-lock in zpci_zdev_put() and zpci_scan_devices()
functions when triggering PCI device recovery using sysfs
- annotate the expected lock context imbalance in zpci_release_device()
function to fix a sparse complaint
- fix the logic to fallback to the return address register value in the
topmost frame when stack tracing uses a back chain
* tag 's390-6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/stacktrace: Do not fallback to RA register
s390/pci: Annotate lock context imbalance in zpci_release_device()
s390/pci: Fix cyclic dead-lock in zpci_zdev_put() and zpci_scan_devices()
s390/ipl: Clear SBP flag when bootprog is set
====================
libbpf: move arena variables out of the zero page
Modify libbpf to place arena globals at the end of the arena mapping
instead of the very beginning. This allows programs to leave the
"zero page" of the arena unmapped, so that NULL arena pointer
dereferences trigger a page fault and associated backtrace in BPF streams.
In contrast, the current policy of placing global data in the zero pages
means that NULL dereferences silently corrupt global data, e.g, arena
qspinlock state. This makes arena bugs more difficult to debug.
The patchset adds code to libbpf to move global arena data to the end of
the arena. At load time, libbpf adjusts each symbol's location within
the arena to point to the right location in the arena. The patchset
also adjusts the arena skeleton pointer to point to the arena globals,
now that they are not in the beginning of the arena region.
- Added Acks by Eduard
- Changed jumptable sym_off to unsigned int for consistency (AI)
- Adjusted selftests to ensure arena globals are actually mapped in (Eduard)
- (Patch 2) Adjusted selftests that were failing because they were expecting the
now removed "direct map offset" error message
- Remove unnecessary kernel bounds check in resolve_pseudo_ldimm64
(Andrii)
- Added patch to turn sym_off unsigned to prevent overflow (AI)
- Remove obsolete references to offsets from test patch description
(Andrii)
- Use size_t for arena_data_off (Andrii)
- Remove extra mutable variable from offset calculations (Andrii)
- Moved globals to the end of the mapping: (Andrii)
- Removed extra parameter for offset and parameter picking logic
- Removed padding in the skeleton
- Removed additional libbpf call
- Added Reviewed-by from Eduard on patch 1
Signed-off-by: Emil Tsalapatis <emil@etsalapatis.com>
====================
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:25 +0000 (12:33 -0500)]
selftests/bpf: Add tests for the arena offset of globals
Add tests for the new libbpf globals arena offset logic. The
tests cover the case of globals being as large as the arena
itself, and being smaller than the arena. In that case, the
data is placed at the end of the arena, and the beginning
of the arena is free.
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:24 +0000 (12:33 -0500)]
libbpf: Move arena globals to the end of the arena
Arena globals are currently placed at the beginning of the arena
by libbpf. This is convenient, but prevents users from reserving
guard pages in the beginning of the arena to identify NULL pointer
dereferences. Adjust the load logic to place the globals at the
end of the arena instead.
Also modify bpftool to set the arena pointer in the program's BPF
skeleton to point to the globals. Users now call bpf_map__initial_value()
to find the beginning of the arena mapping and use the arena pointer
in the skeleton to determine which part of the mapping holds the
arena globals and which part is free.
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:23 +0000 (12:33 -0500)]
libbpf: Turn relo_core->sym_off unsigned
The symbols' relocation offsets in BPF are stored in an int field,
but cannot actually be negative. When in the next patch libbpf relocates
globals to the end of the arena, it is also possible to have valid
offsets > 2GiB that are used to calculate the final relo offsets.
Avoid accidentally interpreting large offsets as negative by turning
the sym_off field unsigned.
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:22 +0000 (12:33 -0500)]
bpf/verifier: Do not limit maximum direct offset into arena map
The verifier currently limits direct offsets into a map to 512MiB
to avoid overflow during pointer arithmetic. However, this prevents
arena maps from using direct addressing instructions to access data
at the end of > 512MiB arena maps. This is necessary when moving
arena globals to the end of the arena instead of the front.
Refactor the verifier code to remove the offset calculation during
direct value access calculations. This is possible because the only
two map types that implement .map_direct_value_addr() are arrays and
arenas, and they both do their own internal checks to ensure the
offset is within bounds.
Adjust selftests that expect the old error. These tests still fail
because the verifier identifies the access as out of bounds for the
map, so change them to expect an "invalid access to map value pointer"
error instead.
Emil Tsalapatis [Tue, 16 Dec 2025 17:33:21 +0000 (12:33 -0500)]
selftests/bpf: Explicitly account for globals in verifier_arena_large
The big_alloc1 test in verifier_arena_large assumes that the arena base
and the first page allocated by bpf_arena_alloc_pages are identical.
This is not the case, because the first page in the arena is populated
by global arena data. The test still passes because the code makes the
tacit assumption that the first page is on offset PAGE_SIZE instead of
0.
Make this distinction explicit in the code, and adjust the page offsets
requested during the test to count from the beginning of the arena
instead of using the address of the first allocated page.
Linus Torvalds [Tue, 16 Dec 2025 07:44:36 +0000 (19:44 +1200)]
Merge tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull shmem rename fixes from Al Viro:
"A couple of shmem rename fixes - recent regression from tree-in-dcache
series and older breakage from stable directory offsets stuff"
* tag 'pull-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
shmem: fix recovery on rename failures
shmem_whiteout(): fix regression from tree-in-dcache series
Linus Torvalds [Tue, 16 Dec 2025 07:34:09 +0000 (19:34 +1200)]
Merge tag 'v6.19-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd
Pull smb server fixes from Steve French:
- Fix set xattr name validation
- Fix session refcount leak
- Minor cleanup
- smbdirect (RDMA) fixes: improve receive completion, and connect
* tag 'v6.19-rc1-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
ksmbd: fix buffer validation by including null terminator size in EA length
ksmbd: Fix refcount leak when invalid session is found on session lookup
ksmbd: remove redundant DACL check in smb_check_perm_dacl
ksmbd: convert comma to semicolon
smb: server: defer the initial recv completion logic to smb_direct_negotiate_recv_work()
smb: server: initialize recv_io->cqe.done = recv_done just once
smb: smbdirect: introduce smbdirect_socket.connect.{lock,work}
Linus Torvalds [Tue, 16 Dec 2025 07:28:20 +0000 (19:28 +1200)]
Merge tag 'for-6.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix missing btrfs_path release after printing a relocation error
message
- fix extent changeset leak on mmap write after failure to reserve
metadata
- fix fs devices list structure freeing, it could be potentially leaked
under some circumstances
- tree log fixes:
- fix incremental directory logging where inodes for new dentries
were incorrectly skipped
- don't log conflicting inode if it's a directory moved in the
current transaction
- regression fixes:
- fix incorrect btrfs_path freeing when it's auto-cleaned
- revert commit simplifying preallocation of temporary structures
in qgroup functions, some cases were not handled properly
* tag 'for-6.19-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix changeset leak on mmap write after failure to reserve metadata
btrfs: fix memory leak of fs_devices in degraded seed device path
btrfs: fix a potential path leak in print_data_reloc_error()
Revert "btrfs: add ASSERTs on prealloc in qgroup functions"
btrfs: do not skip logging new dentries when logging a new name
btrfs: don't log conflicting inode if it's a dir moved in the current transaction
btrfs: tests: fix double btrfs_path free in remove_extent_ref()
Linus Torvalds [Tue, 16 Dec 2025 07:24:35 +0000 (19:24 +1200)]
Merge tag 'sched_ext-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix memory leak when destroying helper kthread workers during
scheduler disable
- Fix bypass depth accounting on scx_enable() failure which could leave
the system permanently in bypass mode
- Fix missing preemption handling when moving tasks to local DSQs via
scx_bpf_dsq_move()
- Misc fixes including NULL check for put_prev_task(), flushing stdout
in selftests, and removing unused code
* tag 'sched_ext-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
sched_ext: Remove unused code in the do_pick_task_scx()
selftests/sched_ext: flush stdout before test to avoid log spam
sched_ext: Fix missing post-enqueue handling in move_local_task_to_local_dsq()
sched_ext: Factor out local_dsq_post_enq() from dispatch_enqueue()
sched_ext: Fix bypass depth leak on scx_enable() failure
sched/ext: Avoid null ptr traversal when ->put_prev_task() is called with NULL next
sched_ext: Fix the memleak for sch->helper objects
Linus Torvalds [Tue, 16 Dec 2025 07:21:17 +0000 (19:21 +1200)]
Merge tag 'cgroup-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fix from Tejun Heo:
- Fix a race condition in css_rstat_updated() where CMPXCHG without
LOCK prefix could cause lnode corruption when the flusher runs
concurrently on another CPU. The issue was introduced in 6.17 and
causes memcg stats to become corrupted in production.
* tag 'cgroup-for-6.19-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: rstat: use LOCK CMPXCHG in css_rstat_updated
Al Viro [Sat, 13 Dec 2025 22:50:23 +0000 (17:50 -0500)]
shmem: fix recovery on rename failures
maple_tree insertions can fail if we are seriously short on memory;
simple_offset_rename() does not recover well if it runs into that.
The same goes for simple_offset_rename_exchange().
Moreover, shmem_whiteout() expects that if it succeeds, the caller will
progress to d_move(), i.e. that shmem_rename2() won't fail past the
successful call of shmem_whiteout().
Not hard to fix, fortunately - mtree_store() can't fail if the index we
are trying to store into is already present in the tree as a singleton.
For simple_offset_rename_exchange() that's enough - we just need to be
careful about the order of operations.
For simple_offset_rename() solution is to preinsert the target into the
tree for new_dir; the rest can be done without any potentially failing
operations.
That preinsertion has to be done in shmem_rename2() rather than in
simple_offset_rename() itself - otherwise we'd need to deal with the
possibility of failure after successful shmem_whiteout().
Fixes: a2e459555c5f ("shmem: stable directory offsets") Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Namjae Jeon [Sun, 14 Dec 2025 06:06:34 +0000 (15:06 +0900)]
ksmbd: fix buffer validation by including null terminator size in EA length
The smb2_set_ea function, which handles Extended Attributes (EA),
was performing buffer validation checks that incorrectly omitted the size
of the null terminating character (+1 byte) for EA Name.
This patch fixes the issue by explicitly adding '+ 1' to EaNameLength where
the null terminator is expected to be present in the buffer, ensuring
the validation accurately reflects the total required buffer size.
Cc: stable@vger.kernel.org Reported-by: Roger <roger.andersen@protonmail.com> Reported-by: Stanislas Polu <spolu@dust.tt> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Namjae Jeon [Sun, 14 Dec 2025 06:05:56 +0000 (15:05 +0900)]
ksmbd: Fix refcount leak when invalid session is found on session lookup
When a session is found but its state is not SMB2_SESSION_VALID, It
indicates that no valid session was found, but it is missing to decrement
the reference count acquired by the session lookup, which results in
a reference count leak. This patch fixes the issue by explicitly calling
ksmbd_user_session_put to release the reference to the session.
Cc: stable@vger.kernel.org Reported-by: Alexandre <roger.andersen@protonmail.com> Reported-by: Stanislas Polu <spolu@dust.tt> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Chen Ni [Tue, 18 Nov 2025 01:32:29 +0000 (09:32 +0800)]
ksmbd: convert comma to semicolon
Replace comma between expressions with semicolons.
Using a ',' in place of a ';' can have unintended side effects.
Although that is not the case here, it is seems best to use ';'
unless ',' is intended.
Found by inspection.
No functional change intended.
Compile tested only.
Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: server: defer the initial recv completion logic to smb_direct_negotiate_recv_work()
The previous change to relax WARN_ON_ONCE(SMBDIRECT_SOCKET_*) checks in
recv_done() and smb_direct_cm_handler() seems to work around the
problem that the order of initial recv completion and
RDMA_CM_EVENT_ESTABLISHED is random, but it's still
a bit ugly.
This implements a better solution deferring the recv completion
processing to smb_direct_negotiate_recv_work(), which is queued
only if both events arrived.
In order to avoid more basic changes to the main recv_done
callback, I introduced a smb_direct_negotiate_recv_done,
which is only used for the first pdu, this will allow
further cleanup and simplifications in recv_done
as a future patch.
smb_direct_negotiate_recv_work() is also very basic
with only basic error checking and the transition
from SMBDIRECT_SOCKET_NEGOTIATE_NEEDED to
SMBDIRECT_SOCKET_NEGOTIATE_RUNNING, which allows
smb_direct_prepare() to continue as before.
Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
smb: server: initialize recv_io->cqe.done = recv_done just once
smbdirect_recv_io structures are pre-allocated so we can set the
callback function just once.
This will make it easy to move smb_direct_post_recv to common code
soon.
Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
This will first be used by the server in order to defer
the processing of the initial recv of the negotiation
request.
But in future it will also be used by the client in order
to implement an async connect.
Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
Jens Remus [Thu, 11 Dec 2025 11:24:50 +0000 (12:24 +0100)]
s390/stacktrace: Do not fallback to RA register
The logic to fallback to the return address (RA) register value in
the topmost frame when stack tracing using back chain is broken in
multiple ways:
When assuming the RA register 14 has not been saved yet one must assume
that a new user stack frame has not been allocated either. Therefore
the back chain would not contain the stack pointer (SP) at entry, but
the caller's SP at its entry instead.
Therefore when falling back to the RA register 14 value it would also be
necessary to fallback to the SP register 15 value. Otherwise an invalid
combination of RA register 14 and caller's SP at its entry (from the
back chain) is used.
In the topmost frame the back chain contains either the caller's SP at
its entry (before having allocated a new stack frame in the prologue),
the SP at entry (after having allocated a new stack frame), or an
uninitialized value (during static/dynamic stack allocation). In both
cases where the back chain is valid either the caller or prologue must
have saved its respective RA to the respective frame. Therefore, if the
RA obtained from the frame pointed to by the back chain is invalid, this
does not indicate that the IP in the topmost frame is still early in the
prologue and the RA has not been saved.
Benjamin Block [Fri, 5 Dec 2025 15:47:17 +0000 (16:47 +0100)]
s390/pci: Fix cyclic dead-lock in zpci_zdev_put() and zpci_scan_devices()
When triggering PCI device recovery by writing into the SysFS attribute
`recover` of a Physical Function with existing child SR-IOV Virtual
Functions, lockdep is reporting a possible deadlock between three
threads:
In zpci_bus_scan_busses() the `zbus_list_lock` is taken for the whole
duration of the function, which also includes taking
`pci_rescan_remove_lock`, among other things. But `zbus_list_lock` only
really needs to protect the modification of the global registration
`zbus_list`, it can be dropped while the functions within the list
iteration run; this way we break the cycle above.
Break up zpci_bus_scan_busses() into an "iterator" zpci_bus_get_next()
that iterates over `zbus_list` element by element, and acquires and
releases `zbus_list_lock` as necessary, but never keep holding it.
References to `zpci_bus` objects are also acquired and released.
The reference counting on `zpci_bus` objects is also changed so that all
put() and get() operations are done under the protection of
`zbus_list_lock`, and if the operation results in a modification of
`zpci_bus_list`, this modification is done in the same critical section
(apart the very first initialization). This way objects are never seen
on the list that are about to be released and/or half-initialized.
Fixes: 14c87ba8123a ("s390/pci: separate zbus registration from scanning") Suggested-by: Niklas Schnelle <schnelle@linux.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.ibm.com> Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Sven Schnelle [Fri, 5 Dec 2025 09:58:57 +0000 (10:58 +0100)]
s390/ipl: Clear SBP flag when bootprog is set
With z16 a new flag 'search boot program' was introduced for
list-directed IPL (SCSI, NVMe, ECKD DASD). If this flag is set,
e.g. via selecting the "Automatic" value for the "Boot program
selector" control on an HMC load panel, it is copied to the reipl
structure from the initial ipl structure. When a user now sets a
boot prog via sysfs, the flag is not cleared and the bootloader
will again automatically select the boot program, ignoring user
configuration.
To avoid that, clear the SBP flag when a bootprog sysfs file is
written.
Cc: stable@vger.kernel.org Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Linus Torvalds [Sun, 14 Dec 2025 03:35:35 +0000 (15:35 +1200)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"The only core fix is in doc; all the others are in drivers, with the
biggest impacts in libsas being the rollback on error handling and in
ufs coming from a couple of error handling fixes, one causing a crash
if it's activated before scanning and the other fixing W-LUN
resumption"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: qcom: Fix confusing cleanup.h syntax
scsi: libsas: Add rollback handling when an error occurs
scsi: device_handler: Return error pointer in scsi_dh_attached_handler_name()
scsi: ufs: core: Fix a deadlock in the frequency scaling code
scsi: ufs: core: Fix an error handler crash
scsi: Revert "scsi: libsas: Fix exp-attached device scan after probe failure scanned in again after probe failed"
scsi: ufs: core: Fix RPMB link error by reversing Kconfig dependencies
scsi: qla4xxx: Use time conversion macros
scsi: qla2xxx: Enable/disable IRQD_NO_BALANCING during reset
scsi: ipr: Enable/disable IRQD_NO_BALANCING during reset
scsi: imm: Fix use-after-free bug caused by unfinished delayed work
scsi: target: sbp: Remove KMSG_COMPONENT macro
scsi: core: Correct documentation for scsi_device_quiesce()
scsi: mpi3mr: Prevent duplicate SAS/SATA device entries in channel 1
scsi: target: Reset t_task_cdb pointer in error case
scsi: ufs: core: Fix EH failure after W-LUN resume error
Linus Torvalds [Sun, 14 Dec 2025 03:24:10 +0000 (15:24 +1200)]
Merge tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client
Pull ceph updates from Ilya Dryomov:
"We have a patch that adds an initial set of tracepoints to the MDS
client from Max, a fix that hardens osdmap parsing code from myself
(marked for stable) and a few assorted fixups"
* tag 'ceph-for-6.19-rc1' of https://github.com/ceph/ceph-client:
rbd: stop selecting CRC32, CRYPTO, and CRYPTO_AES
ceph: stop selecting CRC32, CRYPTO, and CRYPTO_AES
libceph: make decode_pool() more resilient against corrupted osdmaps
libceph: Amend checking to fix `make W=1` build breakage
ceph: Amend checking to fix `make W=1` build breakage
ceph: add trace points to the MDS client
libceph: fix log output race condition in OSD client
T.J. Mercier [Sun, 7 Dec 2025 09:10:04 +0000 (01:10 -0800)]
bpf: Fix bpf_seq_read docs for increased buffer size
Commit af65320948b8 ("bpf: Bump iter seq size to support BTF
representation of large data structures") increased the fixed buffer
size from PAGE_SIZE to PAGE_SIZE << 3, but the docs for the function
didn't get updated at the same time. Update them.
Linus Torvalds [Sat, 13 Dec 2025 18:12:46 +0000 (06:12 +1200)]
Merge tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull CPU hotplug fix from Ingo Molnar:
- Fix CPU hotplug callbacks to disable interrupts on UP kernels
* tag 'smp-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
cpu: Make atomic hotplug callbacks run with interrupts disabled on UP