====================
bpf: Support stack arguments for BPF functions and kfuncs
Currently, bpf function calls and kfunc's are limited by 5 reg-level
parameters. For function calls with more than 5 parameters,
developers can use always inlining or pass a struct pointer
after packing more parameters in that struct although it may have
some inconvenience. But there is no workaround for kfunc if more
than 5 parameters is needed.
This patch set lifts the 5-argument limit by introducing stack-based
argument passing for BPF functions and kfunc's, coordinated with
compiler support in LLVM [1]. The compiler emits loads/stores through
a new bpf register r11 (BPF_REG_PARAMS), to pass arguments beyond
the 5th, keeping the stack arg area separate from the r10-based program
stack. The current maximum number of arguments is capped at
MAX_BPF_FUNC_ARGS (12), which is sufficient for the vast majority of
use cases.
All kfunc/bpf-function arguments are caller saved, including stack
arguments. For register arguments (r1-r5), the verifier already marks
them as clobbered after each call. For stack arguments, the verifier
invalidates all outgoing stack arg slots immediately after a call,
requiring the compiler to re-store them before any subsequent call.
This follows the native calling convention where all function
parameters are caller saved.
The x86_64 JIT translates r11-relative accesses to RBP-relative
native instructions. Each function's stack allocation is extended
by 'max_outgoing' bytes to hold the outgoing arg area below the
callee-saved registers. This makes implementation easier as the r10
can be reused for stack argument access. At both BPF-to-BPF and kfunc
calls, outgoing args are pushed onto the expected calling convention
locations directly. The incoming parameters can directly get the value
from caller.
Global subprogs and freplace progs with >5 args are not yet supported.
Only x86_64 and arm64 are supported for now. Same selftests are tested
by both x86_64 and arm64. Please see each individual patch for details.
Changelogs:
v3 -> v4:
- v3: https://lore.kernel.org/bpf/20260511053301.1878610-1-yonghong.song@linux.dev/
- Added no_stack_arg_load comparison in func_states_equal() to ensure
correctness of pruning.
- Shrink bpf_jmp_history_entry.flags to 4bit to match the number of flags.
- Instead of passing bpf_subprog_info to JIT, use prog->aux->func_idx to
find corresponding bpf_subprog_info from 'env'.
- For patch 'bpf: Reject stack arguments if tail call reachable', use stack_arg_cnt
instead of just incoming stack arg cnt.
- Tighten invalidate_outgoing_stack_args() for kfunc/helper/bpf-to-bpf calls.
- Disable private stack in verifier for x86_64 instead of in JIT.
v2 -> v3:
- v2: https://lore.kernel.org/bpf/20260507212942.1122000-1-yonghong.song@linux.dev/
- In do_check_common() and for main prog, if btf does not match with actual
parameter, the verification will continue and will ignore arg_cnt. Make
arg_cnt=1 explictly to prevent any incoming stack arguments.
- Remove the loop which clear current frame stack slot and set the upper level frame
stack slot. This is not needed unless there is a bug. Add a verifier_bug
if the bug happens.
- For liveness, avoid r11 based load/stores mixing with r10 based stack tracking.
Also, print out stack arguments properly.
- Pass bpf_subprog_info the JIT so we can avoid copy bpf_subprog_info fields to
bpf_prog_aux.
- Fix the missed allocation free for test infra BTF fixup.
- Remove selftest result for precision backtracking test since the result would
be change (two possible output).
v1 -> v2:
- v1: https://lore.kernel.org/bpf/20260424171433.2034470-1-yonghong.song@linux.dev/
- Several refactoring (convert bpf_get_spilled_reg macro to static inline func,
Remove copy_register_state(), Refactor jmp history, Refactor record_call_access(), etc),
suggested by Eduard.
- Use incoming_stack_arg_cnt/stack_arg_cnt instead of incoming_stack_arg_depth/stack_arg_depth,
suggested by Eduard.
- Fix a stack arg pruning bug, from Eduard.
- Fix a bug for precision marking and backtracking, basically callee needs to get the
stack arg value from callers, helped from Eduard.
- Set sub->arg_cnt earlier in btf_prepare_func_args(), this will avoid having
incoming_stack_arg_cnt in bpf_subprog_info.
- Do stack-arg liveness analysis together with r10 based liveness analysis,
suggested by Eduard.
- Fix a few tests to ensure that r11-based loads cannot be ahead of r11-based stores,
and r11-based loads cannot be after kfunc/helper/bpf-function.
====================
Puranjay Mohan [Wed, 13 May 2026 04:51:58 +0000 (21:51 -0700)]
bpf, arm64: Add JIT support for stack arguments
Implement stack argument passing for BPF-to-BPF and kfunc calls with
more than 5 parameters on arm64, following the AAPCS64 calling
convention.
BPF R1-R5 already map to x0-x4. With BPF_REG_0 moved to x8 by the
previous commit, x5-x7 are free for arguments 6-8. Arguments 9-12
spill onto the stack at [SP+0], [SP+8], ... and the callee reads
them from [FP+16], [FP+24], ... (above the saved FP/LR pair).
BPF convention uses fixed offsets from BPF_REG_PARAMS (r11): off=-8 is
always arg 6, off=-16 arg 7, etc. The verifier invalidates all outgoing
stack arg slots after each call, so the compiler must re-store before
every call. This means x5-x7 don't need to be saved on stack.
Puranjay Mohan [Wed, 13 May 2026 04:51:53 +0000 (21:51 -0700)]
bpf, arm64: Map BPF_REG_0 to x8 instead of x7
Move the BPF return value register from x7 to x8, freeing x7 for use
as an argument register. AAPCS64 designates x8 as the indirect result
location register; it is caller-saved and not used for argument
passing, making it a suitable home for BPF_REG_0.
This is a prerequisite for stack argument support, which needs x5-x7
to pass arguments 6-8 to native kfuncs following the AAPCS64 calling
convention.
Yonghong Song [Wed, 13 May 2026 04:51:48 +0000 (21:51 -0700)]
selftests/bpf: Add precision backtracking test for stack arguments
Add a test that verifies precision backtracking works correctly
across BPF-to-BPF calls when stack arguments are involved.
The test passes a size value as incoming stack arg (arg6) to a
subprog, which forwards it as the mem__sz parameter (outgoing arg7)
to bpf_kfunc_call_stack_arg_mem. The expected __msg annotations
verify that precision propagates from the kfunc's mem__sz argument
back through the subprog frame to the caller's outgoing stack arg
store.
A companion BTF file (btf__stack_arg_precision.c) provides named
parameter BTF for the __naked subprog via __btf_func_path.
Yonghong Song [Wed, 13 May 2026 04:51:43 +0000 (21:51 -0700)]
selftests/bpf: Add verifier tests for stack argument validation
Add inline-asm based verifier tests that exercise stack argument
validation logic directly.
Positive tests:
- subprog call with 6 arg's
- Two sequential calls to different subprogs (6-arg and 7-arg)
- Share a r11 store for both branches
Negative tests — verifier rejection:
- Read from uninitialized incoming stack arg slot
- Gap in outgoing slots: only r11-16 written, r11-8 missing
- Write at r11-80, exceeding max 7 stack args
- Missing store on one branch with a shared store
- First call has proper stack arguments and the second
call intends to inherit stack arguments but not working
- r11 load ordering issue
Negative tests — pointer/ref tracking:
- Pruning type mismatch: one branch stores PTR_TO_STACK, the
other stores a scalar, callee dereferences — must not prune
- Release invalidation: bpf_sk_release invalidates a socket
pointer stored in a stack arg slot
- Packet pointer invalidation: bpf_skb_pull_data invalidates
a packet pointer stored in a stack arg slot
- Null propagation: PTR_TO_MAP_VALUE_OR_NULL stored in stack
arg slot, null branch attempts dereference via callee
Yonghong Song [Wed, 13 May 2026 04:51:38 +0000 (21:51 -0700)]
selftests/bpf: Add BTF fixup for __naked subprog parameter names
When __naked subprogs are used in verifier tests, clang drops
parameter names from their BTF FUNC_PROTO entries. This prevents
the verifier from resolving stack argument slots by name.
Add a __btf_func_path(path) annotation that points to a separate
BTF file containing properly-named FUNC entries. The test_loader
matches FUNC entries by name, detects anonymous parameters, and
replaces the FUNC_PROTO with a new one that carries parameter
names from the custom file while preserving the original type IDs.
The custom BTF file also serves as btf_custom_path for kfunc
resolution when no separate btf_custom_path is specified.
Yonghong Song [Wed, 13 May 2026 04:51:32 +0000 (21:51 -0700)]
selftests/bpf: Add tests for stack argument validation
Add negative tests that verify the kfunc (rejecting kfunc call
with >8 byte struct as stack argument) and the verifier
(rejecting invalid uses of r11 for stack arguments).
Yonghong Song [Wed, 13 May 2026 04:51:27 +0000 (21:51 -0700)]
selftests/bpf: Add tests for BPF function stack arguments
Add selftests covering stack argument passing for both BPF-to-BPF
subprog calls and kfunc calls with more than 5 arguments. All tests
are guarded by __BPF_FEATURE_STACK_ARGUMENT and __TARGET_ARCH_x86.
Yonghong Song [Wed, 13 May 2026 04:51:19 +0000 (21:51 -0700)]
bpf,x86: Implement JIT support for stack arguments
Add x86_64 JIT support for BPF functions and kfuncs with more than
5 arguments. The extra arguments are passed through a stack area
addressed by register r11 (BPF_REG_PARAMS) in BPF bytecode,
which the JIT translates to native code.
The JIT follows the x86-64 calling convention for both BPF-to-BPF
and kfunc calls:
- Arg 6 is passed in the R9 register
- Args 7+ are passed on the stack
Incoming arg 6 (BPF r11+8) is translated to a MOV from R9 rather
than a memory load. Incoming args 7+ (BPF r11+16, r11+24, ...) map
directly to [rbp + 16], [rbp + 24], ..., matching the x86-64 stack
layout after CALL + PUSH RBP, so no offset adjustment is needed.
tail_call_reachable is rejected by the verifier and priv_stack is
disabled by the JIT when stack args exist, so R9 is always
available. When BPF bytecode writes to the arg-6 stack slot
(offset -8), the JIT emits a MOV into R9 instead of a memory store.
Outgoing args 7+ are placed at [rsp] in a pre-allocated area below
callee-saved registers, using:
native_off = outgoing_arg_base - outgoing_rsp - bpf_off - 16
The native x86_64 stack layout with stack arguments:
Yonghong Song [Wed, 13 May 2026 04:51:09 +0000 (21:51 -0700)]
bpf: Reject stack arguments if tail call reachable
Tail calls are deprecated and will be replaced by indirect calls
in the future. Reject programs that combine tail calls with stack
arguments rather than adding complexity for a deprecated feature.
Yonghong Song [Wed, 13 May 2026 04:51:04 +0000 (21:51 -0700)]
bpf: Support stack arguments for kfunc calls
Extend the stack argument mechanism to kfunc calls, allowing kfuncs
with more than 5 parameters to receive additional arguments via the
r11-based stack arg area.
For kfuncs, the caller is a BPF program and the callee is a kernel
function. The BPF program writes outgoing args at negative r11
offsets, following the same convention as BPF-to-BPF calls:
Later JIT will marshal outgoing arguments to the native calling
convention for kfunc1() and kfunc2().
For kfunc calls where stack args are used as constant or size
parameters, a mark_stack_arg_precision() helper is used to propagate
precision and do proper backtracking.
There are two places where meta->release_regno needs to keep
regno for later releasing the reference. Also, 'cur_aux(env)->arg_prog = regno'
is also keeping regno for later fixup. Since stack arguments don't have a valid
register number (regno is negative), these three cases are rejected for now
if the argument is on the stack.
Yonghong Song [Wed, 13 May 2026 04:50:59 +0000 (21:50 -0700)]
bpf: Enable r11 based insns
BPF_REG_PARAMS (r11) is used for stack argument accesses and
the following are only insns with r11 presence:
- load incoming stack arg
- store register to outgoing stack arg
- store immediate to outgoing stack arg
The detailed insn format can be found in is_stack_arg_ldx/st/stx()
helpers. After this patch, stack arg ldx/st/stx insns become valid
for kernel and these insns can be properly checked by verifier.
The LLVM compiler [1] implemented the above BPF_REG_PARAMS insns.
Yonghong Song [Wed, 13 May 2026 04:50:54 +0000 (21:50 -0700)]
bpf: Prepare architecture JIT support for stack arguments
Add bpf_jit_supports_stack_args() as a weak function defaulting to
false. Architectures that implement JIT support for stack arguments
override it to return true.
Reject BPF functions with more than 5 parameters at verification
time if the architecture does not support stack arguments.
Yonghong Song [Wed, 13 May 2026 04:50:49 +0000 (21:50 -0700)]
bpf: Reject stack arguments in non-JITed programs
The interpreter does not understand the bpf register r11
(BPF_REG_PARAMS) used for stack arguments. So reject interpreter
usage if stack arguments are used either in the main program or
any subprogram.
Yonghong Song [Wed, 13 May 2026 04:50:40 +0000 (21:50 -0700)]
bpf: Extend liveness analysis to track stack argument slots
BPF_REG_PARAMS (R11) is at index MAX_BPF_REG, which is beyond the
register tracking arrays in const_fold.c and liveness.c. Handle it
explicitly to avoid out-of-bounds accesses.
Extend the arg tracking dataflow to cover stack arg slots. Otherwise,
pointers passed through stack args are invisible to liveness, causing
the pointed-to stack slots to be incorrectly poisoned.
Extend the at_out tracking array to MAX_AT_TRACK_REGS (registers
plus stack arg slots) so that outgoing stack arg stores are tracked
alongside registers. Add a separate at_stack_arg_entry array in
compute_subprog_args(), passed to arg_track_xfer(), to restore
FP-derived values on incoming stack arg reads.
Extend record_call_access() to check stack arg slots for FP-derived
pointers at kfunc call sites, reusing the record_arg_access() helper
extracted in the previous patch. Pass stack arg state from caller to
callee in analyze_subprog() so that callees can track pointers received
through stack args, hence avoid poisoning.
Skip stack arg instructions in record_load_store_access(). Stack arg
STX uses dst_reg=BPF_REG_PARAMS (index 11), but at[11] is repurposed
to track the value stored in stack arg slot 0. Without the skip, if a
prior stack arg STX stored an FP-derived pointer (e.g., fp-64) into
slot 0, a subsequent stack arg STX would read that FP-derived value as
the base pointer and spuriously mark a regular stack slot (e.g., fp-72
from -64 + -8) as accessed in the liveness bitmap.
Extend arg_track_log() to log state transitions for outgoing stack arg
slots at indices MAX_BPF_REG through MAX_AT_TRACK_REGS-1. Without this,
changes to at_out[11..17] caused by stack arg store instructions are
silently omitted from BPF_LOG_LEVEL2 output. For example, when a
caller passes fp-64 through a stack argument:
Without the fix, the "sa0=fp0-64" annotation at insn 13 would not
appear, making it harder to debug liveness analysis for programs
that pass FP-derived pointers through stack arguments.
Extend has_fp_args() to also check stack arg slots for FP-derived
pointers, so that callees receiving pointers only through stack args
are still recursively analyzed.
Yonghong Song [Wed, 13 May 2026 04:50:35 +0000 (21:50 -0700)]
bpf: Use arg_is_fp() in has_fp_args()
Replace "frame != ARG_NONE" with arg_is_fp() in has_fp_args().
The function's purpose is to check whether any argument is derived
from a frame pointer, which is exactly what arg_is_fp() tests
(frame >= 0 || frame == ARG_IMPRECISE). Using the dedicated
predicate is clearer and more consistent with the rest of the file.
Yonghong Song [Wed, 13 May 2026 04:50:30 +0000 (21:50 -0700)]
bpf: Refactor record_call_access() to extract per-arg logic
Extract the per-argument FP-derived pointer handling from
record_call_access() into a new record_arg_access() helper.
The existing loop body — checking arg_is_fp, querying stack access
bytes, and calling record_stack_access/record_imprecise — will be
reused for stack argument slots in the next patch. Factoring it out
now avoids duplicating the logic.
Yonghong Song [Wed, 13 May 2026 04:50:25 +0000 (21:50 -0700)]
bpf: Add precision marking and backtracking for stack argument slots
Extend the precision marking and backtracking infrastructure to
support stack argument slots (r11-based accesses). Without this,
precision demands for scalar values passed through stack arguments
are silently dropped, which could allow the verifier to incorrectly
prune states with different constant values in stack arg slots.
Yonghong Song [Wed, 13 May 2026 04:50:20 +0000 (21:50 -0700)]
bpf: Refactor jmp history to use dedicated spi/frame fields
Move stack slot index (spi) and frame number out of the flags field
in bpf_jmp_history_entry into dedicated bitfields. This simplifies
the encoding and makes room for new flags.
Previously, spi and frame were packed into the lower 9 bits of the
12-bit flags field (3 bits frame + 6 bits spi), with INSN_F_STACK_ACCESS
at BIT(9) and INSN_F_DST/SRC_REG_STACK at BIT(10)/BIT(11).
But this has no room for an INSN_F_* flag for stack arguments.
To resolve this issue, bpf_jmp_history_entry field idx is narrowed to
20 bits (sufficient for insn indices up to 1M), and the freed bits hold
spi (6 bits) and frame (3 bits) as dedicated struct fields. The flags
enum is simplified accordingly:
INSN_F_STACK_ACCESS -> BIT(0)
INSN_F_DST_REG_STACK -> BIT(1)
INSN_F_SRC_REG_STACK -> BIT(2)
which allows more room for additional INSN_F_* flags.
bpf_push_jmp_history() now takes explicit spi and frame parameters
instead of encoding them into flags. The insn_stack_access_flags(),
insn_stack_access_spi(), and insn_stack_access_frameno() helpers are
removed.
Yonghong Song [Wed, 13 May 2026 04:50:15 +0000 (21:50 -0700)]
bpf: Support stack arguments for bpf functions
Currently BPF functions (subprogs) are limited to 5 register arguments.
With [1], the compiler can emit code that passes additional arguments
via a dedicated stack area through bpf register BPF_REG_PARAMS (r11),
introduced in an earlier patch ([2]).
The compiler uses positive r11 offsets for incoming (callee-side) args
and negative r11 offsets for outgoing (caller-side) args, following the
x86_64/arm64 calling convention direction. There is an 8-byte gap at
offset 0 separating two regions:
Incoming (callee reads): r11+8 (arg6), r11+16 (arg7), ...
Outgoing (caller writes): r11-8 (arg6), r11-16 (arg7), ...
The following is an example to show how stack arguments are saved
and transferred between caller and callee:
int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
...
bar(a1, a2, a3, a4, a5, a6, a7, a8);
...
}
The verifier tracks outgoing stack arguments in stack_arg_regs[] and
out_stack_arg_cnt in bpf_func_state, separately from the regular
r10 stack. The callee does not copy incoming args — it reads them
directly from the caller's outgoing slots at positive r11 offsets.
Similar to stacksafe(), introduce stack_arg_safe() to do pruning
check.
Outgoing stack arg slots are invalidated when the callee returns
(e.g. in prepare_func_exit), not at call time. This allows the callee to
read incoming args from the caller's outgoing slots during
verification. The following are a few examples.
Example 3:
The compiler can hoist the shared stack arg stores above the branch:
*(u64 *)(r11 - 16) = r7;
if cond goto else;
*(u64 *)(r11 - 8) = r8;
call bar1; // arg6 = r8, arg7 = r7
goto end;
else:
*(u64 *)(r11 - 8) = r9;
call bar2; // arg6 = r9, arg7 = r7
end:
Example 4:
Within a loop:
loop:
*(u64 *)(r11 - 8) = r6; // arg6, before loop
call bar; // reuses arg6 each iteration
if ... goto loop;
A separate max_out_stack_arg_cnt field in bpf_subprog_info tracks
the deepest outgoing slot actually written. This intends to
reject programs that write to slots beyond what any callee expects.
It is necessary for JIT.
Similar to typical compiler generated code, enforce the following
orderings:
- all stack arg reads must be ahead of any stack arg write
- all stack arg reads must be before any bpf func, kfunc and helpers
This is needed as JIT may emit 'mov' insns for read/write with
the same register and bpf function, kfunc and helper will invalidate
all arguments immediately after the call.
Callback functions with stack arguments need kernel setup parameter
types (including stack parameters) properly and then callback function
can retrieve such information for verification purpose.
Global subprogs and freplace with >5 args are not yet supported.
Yonghong Song [Wed, 13 May 2026 04:50:10 +0000 (21:50 -0700)]
bpf: Set sub->arg_cnt earlier in btf_prepare_func_args()
Move the "sub->arg_cnt = nargs" assignment to immediately after
nargs is computed from btf_type_vlen(), instead of at the end of
btf_prepare_func_args().
btf_prepare_func_args() can return -EINVAL early in several cases,
e.g. when a static function has some non-int/enum arguments.
Since -EINVAL from btf_prepare_func_args() does not immediately
reject verification, arg_cnt remains zero after the early return.
This causes later stack argument based load/store insns to
incorrectly assume the function has no arguments.
Setting arg_cnt right after nargs ensures it is available regardless
of which path btf_prepare_func_args() takes.
Yonghong Song [Wed, 13 May 2026 04:50:05 +0000 (21:50 -0700)]
bpf: Add helper functions for r11-based stack argument insns
Add three static inline helper functions — is_stack_arg_ldx(),
is_stack_arg_st(), and is_stack_arg_stx() — that identify r11-based
(BPF_REG_PARAMS) instructions used for stack argument passing. These
helpers encapsulate the detailed encoding requirements (operand size,
register, offset alignment and sign) and hide raw BPF_REG_PARAMS usage
from the verifier, making call sites more readable and explicit.
A later patch ("bpf: Enable r11 based insns") will wire these helpers
into the verifier. Until then, check_and_resolve_insns() rejects any
r11-based registers.
Yonghong Song [Wed, 13 May 2026 04:50:00 +0000 (21:50 -0700)]
bpf: Remove copy_register_state wrapper function
Remove the copy_register_state() helper which was just a plain struct
assignment wrapper and replace all call sites with direct struct
assignment. This simplifies the code in preparation for upcoming stack
argument support.
Yonghong Song [Wed, 13 May 2026 04:49:54 +0000 (21:49 -0700)]
bpf: Convert bpf_get_spilled_reg macro to static inline function
Convert the bpf_get_spilled_reg() macro to a static inline function
for better type safety and readability. This also simplifies the macro
definition in preparation for upcoming stack argument support which
will introduce additional macros.
====================
bpf: Extend BPF syscall with common attributes support
This patch series builds upon the discussion in
"[PATCH bpf-next v4 0/4] bpf: Improve error reporting for freplace attachment failure" [1].
This patch series introduces support for *common attributes* in the BPF
syscall, providing a unified mechanism for passing shared metadata across
all BPF commands, initially used by BPF_PROG_LOAD, BPF_BTF_LOAD, and
BPF_MAP_CREATE.
The initial set of common attributes includes:
1. 'log_buf': User-provided buffer for storing log output.
2. 'log_size': Size of the provided log buffer.
3. 'log_level': Verbosity level for logging.
4. 'log_true_size': Actual log size reported by kernel.
With this extension, the BPF syscall will be able to return meaningful
error messages (e.g., map creation failures), improving debuggability
and user experience.
Changes:
v13 -> v14:
* Replace __u64 with __aligned_u64 in struct bpf_common_attr in patch #1
(per bot+bpf-ci).
* Add a single line comment for preserving original error in patch #6
(per bot+bpf-ci and Alexei).
* Drop unused label in patch #6 (per bot+bpf-ci).
* v13: https://lore.kernel.org/bpf/20260511152817.89191-1-leon.hwang@linux.dev/
v12 -> v13:
* Rebase on bpf-next tree to resolve code conflict.
* Report log_true_size on success path in patch #6.
* Check size instead of common->log_buf in patch #6 (per bot+bpf-ci).
* v12: https://lore.kernel.org/bpf/20260420141804.27179-1-leon.hwang@linux.dev/
v11 -> v12:
* Drop "log_" prefix in struct bpf_log_attr in patch #3.
* Drop "log_" prefix in struct bpf_log_opts in patch #7.
* Copy log_true_size using copy_to_bpfptr_offset() in patch #3 (per Alexei).
* v11: https://lore.kernel.org/bpf/20260216150445.68278-1-leon.hwang@linux.dev/
v10 -> v11:
* Collect Acked-by from Andrii, thanks.
* Validate whether log_buf, log_size, and log_level are valid by reusing
bpf_verifier_log_attr_valid() in patch #4 (per Andrii).
* v10: https://lore.kernel.org/bpf/20260211151115.78013-1-leon.hwang@linux.dev/
v9 -> v10:
* Collect Acked-by from Andrii, thanks.
* Address comments from Andrii:
* Drop log NULL check in bpf_log_attr_finalize().
* Return -EFAULT early in bpf_log_attr_finalize().
* Validate whether log_buf, log_size, and log_level are set.
* Keep log_buf, log_size, log_level, and user-pointer log_true_size in struct
bpf_log_attr.
* Make prog_load and btf_load work with the new struct bpf_log_attr.
* Add comment to log_true_size of struct bpf_log_opts in libbpf.
* Address comment from Alexei:
* Avoid using BPF_LOG_FIXED as log_level in tests.
* v9: https://lore.kernel.org/bpf/20260202144046.30651-1-leon.hwang@linux.dev/
v8 -> v9:
* Rework reporting 'log_true_size' for prog_load, btf_load, and map_create to
simplify struct bpf_log_attr (per Alexei).
* v8: https://lore.kernel.org/bpf/20260126151409.52072-1-leon.hwang@linux.dev/
v7 -> v8:
* Return 0 when fd < 0 and errno != EFAULT in probe_sys_bpf_ext(), then simplify
probe_bpf_syscall_common_attrs() (per Alexei and Andrii).
* v7: https://lore.kernel.org/bpf/20260123032445.125259-1-leon.hwang@linux.dev/
v6 -> v7:
* Return -errno when fd < 0 and errno != EFAULT in probe_sys_bpf_ext().
* Convert return value of probe_sys_bpf_ext() to bool in
probe_bpf_syscall_common_attrs().
* Address comments from Andrii:
* Drop the comment, and handle fd >= 0 case explicitly in
probe_sys_bpf_ext().
* Return an error when fd >= 0 in probe_sys_bpf_ext().
* v6: https://lore.kernel.org/bpf/20260120152424.40766-1-leon.hwang@linux.dev/
v5 -> v6:
* Address comments from Andrii:
* Update some variables' name.
* Drop unnecessary 'close(fd)' in libbpf.
* Rename FEAT_EXTENDED_SYSCALL to FEAT_BPF_SYSCALL_COMMON_ATTRS with
updated description in libbpf.
* Use EINVAL instead of EUSERS, as EUSERS is not used in bpf yet.
* Rename struct bpf_syscall_common_attr_opts to bpf_log_opts in libbpf.
* Add 'OPTS_SET(log_opts, log_true_size, 0);' in libbpf's 'bpf_map_create()'.
* v5: https://lore.kernel.org/bpf/20260112145616.44195-1-leon.hwang@linux.dev/
v4 -> v5:
* Rework reporting 'log_true_size' for prog_load, btf_load, and map_create
(per Alexei).
* v4: https://lore.kernel.org/bpf/20260106172018.57757-1-leon.hwang@linux.dev/
RFC v3 -> v4:
* Drop RFC.
* Address comments from Andrii:
* Add parentheses in 'sys_bpf_ext()'.
* Avoid creating new fd in 'probe_sys_bpf_ext()'.
* Add a new struct to wrap log fields in libbpf.
* Address comments from Alexei:
* Do not skip writing to user space when log_true_size is zero.
* Do not use 'bool' arguments.
* Drop the adding WARN_ON_ONCE()'s.
* v3: https://lore.kernel.org/bpf/20251002154841.99348-1-leon.hwang@linux.dev/
RFC v2 -> RFC v3:
* Rename probe_sys_bpf_extended to probe_sys_bpf_ext.
* Refactor reporting 'log_true_size' for prog_load.
* Refactor reporting 'btf_log_true_size' for btf_load.
* Add warnings for internal bugs in map_create.
* Check log_true_size in test cases.
* Address comment from Alexei:
* Change kvzalloc/kvfree to kzalloc/kfree.
* Address comments from Andrii:
* Move BPF_COMMON_ATTRS to 'enum bpf_cmd' alongside brief comment.
* Add bpf_check_uarg_tail_zero() for extra checks.
* Rename sys_bpf_extended to sys_bpf_ext.
* Rename sys_bpf_fd_extended to sys_bpf_ext_fd.
* Probe the new feature using NULL and -EFAULT.
* Move probe_sys_bpf_ext to libbpf_internal.h and drop LIBBPF_API.
* Return -EUSERS when log attrs are conflict between bpf_attr and
bpf_common_attr.
* Avoid touching bpf_vlog_init().
* Update the reason messages in map_create.
* Finalize the log using __cleanup().
* Report log size to users.
* Change type of log_buf from '__u64' to 'const char *' and cast type
using ptr_to_u64() in bpf_map_create().
* Do not return -EOPNOTSUPP when kernel doesn't support this feature
in bpf_map_create().
* Add log_level support for map creation for consistency.
* Address comment from Eduard:
* Use common_attrs->log_level instead of BPF_LOG_FIXED.
* v2: https://lore.kernel.org/bpf/20250911163328.93490-1-leon.hwang@linux.dev/
RFC v1 -> RFC v2:
* Fix build error reported by test bot.
* Address comments from Alexei:
* Drop new uapi for freplace.
* Add common attributes support for prog_load and btf_load.
* Add common attributes support for map_create.
* v1: https://lore.kernel.org/bpf/20250728142346.95681-1-leon.hwang@linux.dev/
====================
Leon Hwang [Tue, 12 May 2026 15:31:56 +0000 (23:31 +0800)]
libbpf: Add syscall common attributes support for map_create
With the previous commit adding common attribute support for
BPF_MAP_CREATE, users can now retrieve detailed error messages when map
creation fails via the log_buf field.
Introduce struct bpf_log_opts with the following fields:
log_buf, log_size, log_level, and log_true_size.
Extend bpf_map_create_opts with a new field log_opts, allowing users to
capture and inspect log messages on map creation failures.
Leon Hwang [Tue, 12 May 2026 15:31:55 +0000 (23:31 +0800)]
bpf: Add syscall common attributes support for map_create
Many BPF_MAP_CREATE validation failures currently return -EINVAL without
any explanation to userspace.
Plumb common syscall log attributes into map_create(), create a verifier
log from bpf_common_attr::log_buf/log_size/log_level, and report
map-creation failure reasons through that buffer.
This improves debuggability by allowing userspace to inspect why map
creation failed and read back log_true_size from common attributes.
Leon Hwang [Tue, 12 May 2026 15:31:54 +0000 (23:31 +0800)]
bpf: Add syscall common attributes support for btf_load
BPF_BTF_LOAD can now take log parameters from both union bpf_attr and
struct bpf_common_attr, with the same merge rules as BPF_PROG_LOAD:
- if both sides provide a complete log tuple (buf/size/level) and they
match, use it;
- if only one side provides log parameters, use that one;
- if both sides provide complete tuples but they differ, return -EINVAL.
Leon Hwang [Tue, 12 May 2026 15:31:53 +0000 (23:31 +0800)]
bpf: Add syscall common attributes support for prog_load
BPF_PROG_LOAD can now take log parameters from both union bpf_attr and
struct bpf_common_attr. The merge rules are:
- if both sides provide a complete log tuple (buf/size/level) and they
match, use it;
- if only one side provides log parameters, use that one;
- if both sides provide complete tuples but they differ, return -EINVAL.
Leon Hwang [Tue, 12 May 2026 15:31:52 +0000 (23:31 +0800)]
bpf: Refactor reporting log_true_size for prog_load
The next commit will add support for reporting logs via extended common
attributes, including 'log_true_size'.
To prepare for that, refactor the 'log_true_size' reporting logic by
introducing a new struct bpf_log_attr to encapsulate log-related behavior:
* bpf_log_attr_init(): initialize log fields, which will support
extended common attributes in the next commit.
* bpf_log_attr_finalize(): handle log finalization and write back
'log_true_size' to userspace.
Leon Hwang [Tue, 12 May 2026 15:31:51 +0000 (23:31 +0800)]
libbpf: Add support for extended BPF syscall
To support the extended BPF syscall introduced in the previous commit,
introduce the following internal APIs:
* 'sys_bpf_ext()'
* 'sys_bpf_ext_fd()'
They wrap the raw 'syscall()' interface to support passing extended
attributes.
* 'probe_sys_bpf_ext()'
Check whether current kernel supports the BPF syscall common attributes.
Leon Hwang [Tue, 12 May 2026 15:31:50 +0000 (23:31 +0800)]
bpf: Extend BPF syscall with common attributes support
Add generic BPF syscall support for passing common attributes.
The initial set of common attributes includes:
1. 'log_buf': User-provided buffer for storing logs.
2. 'log_size': Size of the log buffer.
3. 'log_level': Log verbosity level.
4. 'log_true_size': Actual log size reported by kernel.
The common-attribute pointer and its size are passed as the 4th and 5th
syscall arguments. A new command bit, 'BPF_COMMON_ATTRS' ('1 << 16'),
indicates that common attributes are supplied.
This commit adds syscall and uapi plumbing. Command-specific handling is
added in follow-up patches.
Ihor Solodrai [Sat, 9 May 2026 00:57:30 +0000 (17:57 -0700)]
selftests/bpf: Use both hrtimer enqueue helpers in vmlinux test
The vmlinux selftest triggers nanosleep and checks that both kprobe
and fentry programs observe the hrtimer enqueue path.
After the hrtimer_start_expires_user() conversion [1], nanosleep
reaches hrtimer_start_range_ns_user() instead of
hrtimer_start_range_ns(). Hard-coding either symbol makes the test
fail either on bpf tree or on linux-next [2].
Update the test to resolve the target symbol at runtime via
libbpf_find_vmlinux_btf_id(). This is a nice example of how to modify
a BPF program to work on both older and newer kernel revision.
Changelog:
RFC: https://lore.kernel.org/all/20260420111726.2118636-1-puranjay@kernel.org/
Changes in v1:
- Replace bpf_get_cpu_time_counter() with bpf_ktime_get_ns()
- Replace bpf_repeat() with plain for loop and may_goto
- Refactor collect_measurements() to reuse bench_force_done()
- Remove histogram, verbose calibration output, and per-scenario status prints
- Trim run script table to p50/stddev/p99
- Set env.quiet when --machine-readable is passed
- Add || true to run script benchmark invocation for set -e safety
- Add bpf-nop benchmark as timing overhead baseline (patch 3)
- Use named struct for LRU inner map to fix build on older toolchains
This series adds an XDP load-balancer benchmark (based on Katran) to the BPF
selftest bench framework.
Motivation
----------
Existing BPF bench tests measure individual operations (map lookups,
kprobes, ring buffers) in isolation. Production BPF programs combine
parsing, map lookups, branching, and packet rewriting in a single call
chain. The performance characteristics of such programs depend on the
interaction of these operations -- register pressure, spills, inlining
decisions, branch layout -- which isolated micro-benchmarks do not
capture.
This benchmark implements a simplified L4 load-balancer modeled after
katran [1]. The BPF program reproduces katran's core datapath:
L3/L4 parsing -> VIP hash lookup -> per-CPU LRU connection table
with consistent-hash fallback -> real server selection -> per-VIP
and per-real stats -> IPIP/IP6IP6 encapsulation
The BPF code exercises hash maps, array-of-maps (per-CPU LRU),
percpu arrays, jhash, bpf_xdp_adjust_head(), bpf_ktime_get_ns(),
and bpf_get_smp_processor_id() in a single pipeline.
This is intended as the first in a series of BPF workload benchmarks
covering other use cases (sched_ext, etc.).
Design
------
A userspace loop calling bpf_prog_test_run_opts(repeat=1) would
measure syscall overhead, not BPF program cost -- the ~4 ns early-exit
paths would be buried under kernel entry/exit. Using repeat=N is
also unsuitable: the kernel re-runs the same packet without resetting
state between iterations, so the second iteration of an encap scenario
would process an already-encapsulated packet.
Instead, timing is measured inside the BPF program using
bpf_ktime_get_ns(). BENCH_BPF_LOOP() brackets N iterations with
timestamp reads using a plain for loop with may_goto, runs a
caller-supplied reset block between iterations to undo side effects
(e.g. strip encapsulation), and records the elapsed time per batch.
One extra untimed iteration runs afterward for output validation.
Auto-calibration picks a batch size targeting ~10 ms per invocation.
A proportionality sanity check verifies that 2N iterations take ~2x
as long as N.
24 scenarios cover the code-path matrix:
- Protocol: TCP, UDP
- Address family: IPv4, IPv6, cross-AF (IPv4-in-IPv6)
- LRU state: hit, miss (16M flow space), diverse (4K flows), cold
- Consistent-hash: direct (LRU bypass)
- TCP flags: SYN (skip LRU, force CH), RST (skip LRU insert)
- Early exits: unknown VIP, non-IP, ICMP, fragments, IP options
Each scenario validates correctness before benchmarking by comparing
the output packet byte-for-byte against a pre-built expected packet
and checking BPF map counters.
Wire up the userspace side of the XDP load-balancer benchmark.
24 scenarios cover the full code-path matrix: TCP/UDP, IPv4/IPv6,
cross-AF encap, LRU hit/miss/diverse/cold, consistent-hash bypass,
SYN/RST flag handling, and early exits (unknown VIP, non-IP, ICMP,
fragments, IP options).
Before benchmarking each scenario validates correctness: the output
packet is compared byte-for-byte against a pre-built expected packet
and BPF map counters are checked against the expected values.
Add the BPF datapath for the XDP load-balancer benchmark, a
simplified L4 load-balancer inspired by katran.
The pipeline: L3/L4 parse -> VIP lookup -> per-CPU LRU connection
table or consistent-hash fallback -> real server lookup -> per-VIP
and per-real stats -> IPIP/IP6IP6 encapsulation. TCP SYN forces
the consistent-hash path (skipping LRU); TCP RST skips LRU insert
to avoid polluting the table.
process_packet() is marked __noinline so that the BENCH_BPF_LOOP
reset block (which strips encapsulation) operates on valid packet
pointers after bpf_xdp_adjust_head().
selftests/bpf: Add XDP load-balancer common definitions
Add the shared header for the XDP load-balancer benchmark. This
defines the data structures used by both the BPF program and
userspace: flow_key, vip_definition, real_definition, and the
stats/control structures.
Also provides the encapsulation source-address helpers shared
between the BPF datapath (for encap) and userspace (for building
expected output packets used in validation).
selftests/bpf: Add bpf-nop benchmark for timing overhead baseline
Add a minimal benchmark that measures the overhead of the batch-timing
infrastructure itself. The BPF program runs an empty BENCH_BPF_LOOP body
(~1.5-2 ns/op), establishing the floor cost that all timing-library
benchmarks include.
[root@virtme-ng tools/testing/selftests/bpf]# sudo ./bench -a -p8 bpf-nop
Setting up benchmark 'bpf-nop'...
Benchmark 'bpf-nop' started.
bpf-nop: median 1.82 ns/op, stddev 0.01, p99 1.86 (1754 samples)
Add a reusable timing library for BPF benchmarks that need to measure
BPF program execution time.
The BPF side (progs/bench_bpf_timing.bpf.h) provides per-CPU sample
arrays and BENCH_BPF_LOOP(), a macro that brackets batch_iters
iterations with bpf_ktime_get_ns() reads and records the elapsed time.
One extra untimed iteration runs afterward for output validation.
The userspace side (benchs/bench_bpf_timing.c) collects samples from
the skeleton BSS, computes percentile statistics, and auto-calibrates
batch_iters to target ~10 ms per batch.
selftests/bpf: Add bench_force_done() for early benchmark completion
The bench framework waits for duration_sec to elapse before collecting
results. Benchmarks that know exactly how many samples they need can
call bench_force_done() to signal completion early, avoiding wasted
wall-clock time.
Also refactor collect_measurements() to reuse bench_force_done()
instead of open-coding the same mutex/cond_signal sequence.
Hyunwoo Kim [Fri, 8 May 2026 08:53:09 +0000 (17:53 +0900)]
rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present
The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
handler in rxrpc_verify_response() copy the skb to a linear one before
calling into the security ops only when skb_cloned() is true. An skb
that is not cloned but still carries externally-owned paged fragments
(e.g. SKBFL_SHARED_FRAG set by splice() into a UDP socket via
__ip_append_data, or a chained skb_has_frag_list()) falls through to
the in-place decryption path, which binds the frag pages directly into
the AEAD/skcipher SGL via skb_to_sgvec().
Extend the gate to also unshare when skb_has_frag_list() or
skb_has_shared_frag() is true. This catches the splice-loopback vector
and other externally-shared frag sources while preserving the
zero-copy fast path for skbs whose frags are kernel-private (e.g. NIC
page_pool RX, GRO). The OOM/trace handling already in place is reused.
Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()") Cc: stable@vger.kernel.org Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com> Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 10 May 2026 15:10:47 +0000 (08:10 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk driver fixes from Stephen Boyd:
- Mark the DDR bus clk critical in the SpaceMiT driver so that
boot doesn't fail
- Fix boot on Mobile EyeQ by creating the auxiliary device for
the ethernet PHY
- Plug an OF node leak in Rockchip rk808 clk driver
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: rk808: fix OF node reference imbalance
MAINTAINERS: add myself as a reviewer for the clk subsystem
reset: eyeq: drop device_set_of_node_from_dev() done by parent
clk: eyeq: add EyeQ5 children auxiliary device for generic PHYs
clk: eyeq: use the auxiliary device creation helper
clk: spacemit: k3: mark top_dclk as CLK_IS_CRITICAL
Linus Torvalds [Sun, 10 May 2026 01:42:54 +0000 (18:42 -0700)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Fix sk_local_storage diag dump via netlink (Amery Hung)
- Fix off-by-one in arena direct-value access (Junyoung Jang)
- Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan)
- Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima)
- Reject TX-only AF_XDP sockets (Linpu Yu)
- Don't run arg-tracking analysis twice on main subprog (Paul Chaignon)
- Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup
(Weiming Shi)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
bpf: Fix off-by-one boundary validation in arena direct-value access
xskmap: reject TX-only AF_XDP sockets
bpf: Don't run arg-tracking analysis twice on main subprog
bpf: Free reuseport cBPF prog after RCU grace period.
bpf: tcp: Fix type confusion in sol_tcp_sockopt().
bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
bpf: tcp: Fix type confusion in bpf_tcp_sock().
tools/headers: Regenerate stddef.h to fix BPF selftests
bpf: Fix sk_local_storage diag dumping uninitialized special fields
bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()
sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
bpf: Reject TCP_NODELAY in bpf-tcp-cc
bpf: Reject TCP_NODELAY in TCP header option callbacks
Junyoung Jang [Sun, 26 Apr 2026 17:25:05 +0000 (02:25 +0900)]
bpf: Fix off-by-one boundary validation in arena direct-value access
BPF_MAP_TYPE_ARENA accepts BPF_PSEUDO_MAP_VALUE offsets at exactly
the end of the arena mapping (off == arena_size). The boundary check
in arena_map_direct_value_addr() uses `>` instead of `>=`, which
incorrectly allows a one-past-end pointer to be accepted.
Change the condition to `>=` to correctly reject offsets that fall
outside the valid arena user_vm range.
Linpu Yu [Fri, 8 May 2026 14:43:43 +0000 (22:43 +0800)]
xskmap: reject TX-only AF_XDP sockets
XSKMAP entries are used as redirect targets for incoming XDP frames.
A TX-only AF_XDP socket lacks an Rx ring and cannot handle redirected
traffic, but xsk_map_update_elem() currently allows such sockets to
be inserted into the map.
Redirecting packets to such a socket on the veth generic-XDP path
causes a kernel crash in xsk_generic_rcv().
This became possible after xsk_is_setup_for_bpf_map() was removed from
the XSKMAP update path, which allowed bound TX-only sockets to be
inserted into the map.
Reject TX-only sockets during XSKMAP updates to avoid the crash.
They remain fully operational for pure Tx purposes outside XSKMAP.
Fixes: 968be23ceaca ("xsk: Fix possible segfault at xskmap entry insertion") Reported-by: Juefei Pu <tomapufckgml@gmail.com> Reported-by: Yuan Tan <yuantan098@gmail.com> Reported-by: Xin Liu <bird@lzu.edu.cn> Signed-off-by: Yifan Wu <yifanwucs@gmail.com> Signed-off-by: Linpu Yu <linpu5433@gmail.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20260508144344.694-1-linpu5433@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Paul Chaignon [Thu, 7 May 2026 18:22:06 +0000 (20:22 +0200)]
bpf: Don't run arg-tracking analysis twice on main subprog
Because subprog 0, the main subprog, is considered a global function,
we end up running the arg-tracking dataflow analysis twice on it. That
results in slightly longer verification but mostly in more verbose
verifier logs. This patch fixes it by keeping only the iteration over
global subprogs.
When running over all of Cilium's programs with BPF_LOG_LEVEL2, this
reduces verbosity by ~20% on average.
Linus Torvalds [Sat, 9 May 2026 18:47:39 +0000 (11:47 -0700)]
Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux
Pull fsverity fix from Eric Biggers:
"Fix a regression in overlayfs caused by an fsverity API change"
* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
ovl: fix verity lazy-load guard broken by fsverity_active() semantic change
Linus Torvalds [Sat, 9 May 2026 15:32:50 +0000 (08:32 -0700)]
Merge tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- ads7871: Fix endianness bug in 16-bit register reads
- lm75: Fix configuration register writes and AS6200/TMP112 setup and
alarm handling
- lm63: Fix TOCTOU problems
- corsair-psu: Close HID device on probe errors
- ltc2992: Fix overflow and threshold range
- Documentation: fix link to ideapad-laptop.c file
- Remove stale CONFIG_SENSORS_SBRMI Makefile reference
* tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (ads7871) Fix endianness bug in 16-bit register reads
hwmon: (lm75) Fix configuration register writes.
hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling
hwmon: (lm63) Add locking to avoid TOCTOU
hwmon: (corsair-psu) Close HID device on probe errors
hwmon: Remove stale CONFIG_SENSORS_SBRMI Makefile reference
Documentation: hwmon: fix link to ideapad-laptop.c file
hwmon: (ltc2992) Fix u32 overflow in power read path
hwmon: (ltc2992) Clamp threshold writes to hardware range
Linus Torvalds [Sat, 9 May 2026 15:10:07 +0000 (08:10 -0700)]
Merge tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fixes from Wolfram Sang:
- sanitize more input parameters in the core (found by syzkaller)
- usual set of driver fixes (proper completion handling, applying
quirks, correct workqueue selection...)
- ID additions to simplify dependency handling
- new email address for Peter Rosin
* tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: smbus: reject oversized block transfers in the common path
MAINTAINERS: Update mail for Peter Rosin
i2c: stub: Reject I2C block transfers with invalid length
i2c: Compare the return value of gpiod_get_direction against GPIO_LINE_DIRECTION_OUT
i2c: dev: prevent integer overflow in I2C_TIMEOUT ioctl
i2c: acpi: Add ELAN0678 to i2c_acpi_force_100khz_device_ids
dt-bindings: i2c: apple,i2c: Add t8122 compatible
i2c: stm32f7: reinit_completion() per transfer not per msg
dt-bindings: i2c: amlogic: Add compatible for T7 SOC
i2c: testunit: Replace system_long_wq with system_dfl_long_wq
Linus Torvalds [Sat, 9 May 2026 15:03:21 +0000 (08:03 -0700)]
Merge tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fix KASAN sanitization flag for core_$(BITS).o
- Fixes for handling offset values in pseries htmdump
- Fix interrupt mask in cpm1_gpiochip_add16()
- ps3/pasemi fixes to drop redundant result assignment
- Fixes in papr-hvpipe code path
- powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
Thanks to Aboorva Devarajan, Athira Rajeev, Christophe Leroy (CS GROUP),
Geert Uytterhoeven, Haren Myneni, Krzysztof Kozlowski, Mukesh Kumar
Chaurasiya (IBM), Nathan Chancellor, Ritesh Harjani (IBM), Shivani
Nittor, Sourabh Jain, Thomas Zimmermann, and Venkat Rao Bagalkote.
* tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits)
powerpc/pasemi: Drop redundant res assignment
powerpc/ps3: Drop redundant result assignment
powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang
arch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files
powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16()
powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec
powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o
pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ()
pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg()
pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info
pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release()
pseries/papr-hvpipe: Fix the usage of copy_to_user()
pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()
pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()
pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace
pseries/papr-hvpipe: Fix race with interrupt handler
powerpc/pseries/htmdump: Add memory configuration dump support to htmdump module
powerpc/pseries/htmdump: Fix the offset value used in htm status dump
powerpc/pseries/htmdump: Fix the offset value used in processor configuration dump
...
Linus Torvalds [Sat, 9 May 2026 03:28:45 +0000 (20:28 -0700)]
Merge tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- Fix memory map enumeration bug in the Xen e820 parsing code (Juergen
Gross)
- Re-enable e820 BIOS fallback if e820 table is empty (David Gow)
* tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/boot/e820: Re-enable BIOS fallback if e820 table is empty
x86/xen: Fix a potential problem in xen_e820_resolve_conflicts()
Linus Torvalds [Sat, 9 May 2026 02:42:10 +0000 (19:42 -0700)]
Merge tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix spurious failures in rseq self-tests (Mark Brown)
- Fix rseq rseq::cpu_id_start ABI regression due to TCMalloc's creative
use of the supposedly read-only field
The fix is to introduce a new ABI variant based on a new (larger)
rseq area registration size, to keep the TCMalloc use of rseq
backwards compatible on new kernels (Thomas Gleixner)
- Fix wakeup_preempt_fair() for not waking up task (Vincent Guittot)
- Fix s64 mult overflow in vruntime_eligible() (Zhan Xusheng)
* tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/fair: Fix wakeup_preempt_fair() for not waking up task
sched/fair: Fix overflow in vruntime_eligible()
selftests/rseq: Expand for optimized RSEQ ABI v2
rseq: Reenable performance optimizations conditionally
rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode
selftests/rseq: Validate legacy behavior
selftests/rseq: Make registration flexible for legacy and optimized mode
selftests/rseq: Skip tests if time slice extensions are not available
rseq: Revert to historical performance killing behaviour
rseq: Don't advertise time slice extensions if disabled
rseq: Protect rseq_reset() against interrupts
rseq: Set rseq::cpu_id_start to 0 on unregistration
selftests/rseq: Don't run tests with runner scripts outside of the scripts
Linus Torvalds [Sat, 9 May 2026 02:39:18 +0000 (19:39 -0700)]
Merge tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events fixes from Ingo Molnar:
- Fix deadlock in the perf_mmap() failure path (Peter Zijlstra)
- Intel ACR (Auto Counter Reload) fixes (Dapeng Mi):
- Fix validation and configuration of ACR masks
- Fix ACR rescheduling bug causing stale masks
- Disable the PMI on ACR-enabled hardware
- Enable ACR on Panther Cover uarch too
* tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/intel: Enable auto counter reload for DMR
perf/x86/intel: Disable PMI for self-reloaded ACR events
perf/x86/intel: Always reprogram ACR events to prevent stale masks
perf/x86/intel: Improve validation and configuration of ACR masks
perf/core: Fix deadlock in perf_mmap() failure path
Linus Torvalds [Fri, 8 May 2026 23:08:58 +0000 (16:08 -0700)]
Merge tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull PCI fixes from Bjorn Helgaas:
- Don't fallback to bus reset after failed slot reset; a bus reset
isn't safe if the .reset_slot() callback is implemented (Keith Busch)
- Update saved_config_space upon resource assignment to fix passthrough
regressions when x86 pcibios_assign_resources() updates BARs (Lukas
Wunner)
- Initialize a temporary pci_dev->dev in sysfs 'new_id' attribute to
fix a lockdep regression after driver_override was moved from PCI to
device core (Samiullah Khawaja)
- Update MAINTAINERS email addresses (Marek Vasut, Hans Zhang)
- Add MAINTAINERS reviewer for PCIe Cadence IP (Aksh Garg)
* tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer
MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1
MAINTAINERS: Update Marek Vasut email for PCIe R-Car
PCI: Initialize temporary device in new_id_store()
PCI: Update saved_config_space upon resource assignment
PCI: Don't fallback to bus reset after failed slot reset
PCI: Initialize temporary device in new_id_store()
When setting new_id of a PCI device driver using sysfs a lockdep splat
occurs. This is because new_id_store() builds a temporary pci_dev for
pci_match_device(), which calls device_match_driver_override(). That
depends on the driver_override.lock added by cb3d1049f4ea ("driver core:
generalize driver_override in struct device").
The new driver_override.lock was not initialized in the temporary pci_dev,
resulting in this lockdep splat.
Initialize the temporary pci_dev to fix this.
Repro:
Build with CONFIG_LOCKDEP=y, boot with QEMU, and add a new ID:
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 2 UID: 0 PID: 177 Comm: liveupdate-iomm Not tainted 7.0.0+ #9 PREEMPT(full)
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5d/0x80
register_lock_class+0x77e/0x790
lock_acquire+0xbf/0x2e0
pci_match_device+0x24/0x180
new_id_store+0x189/0x1d0
kernfs_fop_write_iter+0x14f/0x210
vfs_write+0x263/0x5e0
ksys_write+0x79/0xf0
do_syscall_64+0x117/0xf80
Fixes: 10a4206a2401 ("PCI: use generic driver_override infrastructure") Fixes: 8895d3bcb8ba ("PCI: Fail new_id for vendor/device values already built into driver") Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
[bhelgaas: add commit log details and repro, trim backtrace] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Link: https://patch.msgid.link/20260505234327.716630-1-skhawaja@google.com
PCI: Update saved_config_space upon resource assignment
Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):
ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
ddbridge 0000:05:00.0: cannot read registers
ddbridge 0000:05:00.0: fail
BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window. The kernel corrects the BAR assignment:
pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned
Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:
No other architecture performs such a late BAR correction.
Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit 4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.
Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction. Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:
But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7. Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.
Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well. The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.
Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.
To robustly fix the issue, always update saved_config_space upon resource
assignment.
Reported-by: Bernd Schumacher <bernd@bschu.de> Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/ Reported-by: Alexandre N. <an.tech@mailo.com> Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/ Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Bernd Schumacher <bernd@bschu.de> Tested-by: Alexandre N. <an.tech@mailo.com> Cc: stable@vger.kernel.org # v6.12+ Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de
bpf: Free reuseport cBPF prog after RCU grace period.
Eulgyu Kim reported the splat below with a repro. [0]
The repro sets up a UDP reuseport group with a cBPF prog and
replaces it with a new one while another thread is sending
a UDP packet to the group.
The reuseport prog is freed by sk_reuseport_prog_free().
bpf_prog_put() is called for "e"BPF prog to destruct through
multiple stages while cBPF prog is freed immediately by
bpf_release_orig_filter() and bpf_prog_free().
If a reuseport prog is detached from the setsockopt() path
(reuseport_attach_prog() or reuseport_detach_prog()),
sk_reuseport_prog_free() is called without waiting for RCU
readers to complete, resulting in various bugs.
Let's defer freeing the reuseport cBPF prog after one RCU
grace period.
Note "e"BPF prog is safe as is unless the fast path starts
to touch fields destroyed in bpf_prog_put_deferred() and
__bpf_prog_put_noref().
Linus Torvalds [Fri, 8 May 2026 20:18:13 +0000 (13:18 -0700)]
Merge tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- Fix for ublk not doing an actual issue from the task_work fallback
path. Any request hitting that should be canceled automatically
- Fix for uring_cmd prep side handling, for the block side uring_cmd
discard handling
- Fix for missing validation of the io and physical block size shifts
- Fix for a use-after-free in ublk's cancel command handling
* tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
ublk: fix use-after-free in ublk_cancel_cmd()
ublk: validate physical_bs_shift, io_min_shift and io_opt_shift
block: only read from sqe on initial invocation of blkdev_uring_cmd()
ublk: don't issue uring_cmd from fallback task work
Linus Torvalds [Fri, 8 May 2026 20:12:48 +0000 (13:12 -0700)]
Merge tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Ensure that the absolute timeouts for both the command side and the
waiting side honor the callers time namespace
- Ensure tracked NAPI entries are cleared at unregistration time, as
the NAPI polling loop checks the list state rather than the general
NAPI state. This can lead to NAPI polling even after unregistration
has been done. If unregistered, all NAPI polling should be disabled
- Fix for eventfd recursive invocation handling
* tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
io_uring/eventfd: reset deferred signal state
io_uring/napi: clear tracked NAPI entries on unregister
bpf: tcp: Fix type confusion in sol_tcp_sockopt().
sol_tcp_sockopt() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:
socket(AF_INET, SOCK_RAW, IPPROTO_TCP)
Let's use sk_is_tcp().
Note that initially sol_tcp_sockopt() checked sk->sk_prot->setsockopt.
Fixes: 2ab42c7b871f ("bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260504210610.180150-7-kuniyu@google.com
mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
bpf_mptcp_sock_from_subflow() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:
socket(AF_INET, SOCK_RAW, IPPROTO_TCP)
In this case, it would NOT be valid to call sk_is_mptcp() which will
assume sk is a pointer to a struct tcp_sock, and wrongly checks for:
tcp_sk(sk)->is_mptcp.
Linus Torvalds [Fri, 8 May 2026 17:24:35 +0000 (10:24 -0700)]
Merge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
- Fix for two ACL issues (security fix to validate dacloffset better
and chmod fix)
- Fix out of bounds reads (in check_wsl_eas and smb2_check_msg for
symlinks)
- Two Kerberos fixes including an important one when AES-256 encryption
chosen
- Fix open_cached_dir problem when directory leases disabled
* tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
smb: client: validate dacloffset before building DACL pointers
smb/client: fix out-of-bounds read in smb2_compound_op()
smb/client: fix out-of-bounds read in symlink_data()
smb: client: Zero-pad short GSS session keys per MS-SMB2
smb: client: Use FullSessionKey for AES-256 encryption key derivation
smb: client: use kzalloc to zero-initialize security descriptor buffer
cifs: abort open_cached_dir if we don't request leases
Linus Torvalds [Fri, 8 May 2026 17:14:51 +0000 (10:14 -0700)]
Merge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's two main series here, fixing issues that came up in the
Microchip QSPI and Freescale i.MX drivers. Both of those could result
in some quite noticable issues if they were encountered in production.
We also have one minor documentation fix in the ch341 driver"
* tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: ch341: correct company name in MODULE_DESCRIPTION
spi: microchip-core-qspi: remove some inline markings
spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations
spi: microchip-core-qspi: control built-in cs manually
spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()
spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()
spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()
The buggy address belongs to the object at ffff88801083d280
which belongs to the cache RAW of size 1792
The buggy address is located 1248 bytes inside of
allocated 1792-byte region [ffff88801083d280, ffff88801083d980)
Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock") Reported-by: Damiano Melotti <melotti@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://patch.msgid.link/20260504210610.180150-2-kuniyu@google.com
Linus Torvalds [Fri, 8 May 2026 15:23:06 +0000 (08:23 -0700)]
Merge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"Weekly fixes, lots of them but all pretty small, amdgpu and xe are the
usual but then a large amount of fixes all over.
xe:
- Add NULL check for media_gt in intel_hdcp_gsc_check_status
- Fix EAGAIN sign in pf_migration_consume
- Fix MMIO access using PF view instead of VF view during migration
- Exclude indirect ring state page from ADS engine state size
* tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
drm: Set old handle to NULL before prime swap in change_handle
drm/bochs: Drop manual put on probe error path
drm/xe/guc: Exclude indirect ring state page from ADS engine state size
drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
drm/exynos: remove bridge when component_add fails
drm/amdgpu: nuke amdgpu_userq_fence_slab v2
drm/amdgpu/userq: fix access to stale wptr mapping
drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
drm/amdgpu: zero-initialize GART table on allocation
drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission
drm/radeon: add missing revision check for CI
drm/amdgpu/pm: align Hawaii mclk workaround with radeon
drm/amdgpu/pm: add missing revision check for CI
drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ
drm/amdkfd: Make all TLB-flushes heavy-weight
drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds
drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds
drm/panel: feiyang-fy07024di26a30d: return display-on error
...
Linus Torvalds [Fri, 8 May 2026 15:16:07 +0000 (08:16 -0700)]
Merge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
"Core:
- Cache-flushing fix for non-x86 platforms
AMD-Vi:
- Security fix when SEV-SNP is enabled
- Operator precedence fix in DTE setting"
* tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
iommu/amd: Fix precedence order in set_dte_passthrough()
iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86
iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19
iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19
Ming Lei [Fri, 8 May 2026 12:37:46 +0000 (20:37 +0800)]
ublk: fix use-after-free in ublk_cancel_cmd()
When ublk_reset_ch_dev() clears io->cmd via ublk_queue_reinit()
concurrently with ublk_cancel_cmd(), ublk_cancel_cmd() can read a
stale pointer and pass it to io_uring_cmd_done(), causing a
use-after-free.
Fix by synchronizing the two paths with ubq->cancel_lock:
- ublk_cancel_cmd(): read and clear io->cmd under cancel_lock,
then call io_uring_cmd_done() on the saved local copy outside
the lock.
- ublk_reset_ch_dev(): hold cancel_lock across ublk_queue_reinit()
so that io->cmd and io->flags are cleared atomically with respect
to ublk_cancel_cmd().
Francis, David [Tue, 28 Apr 2026 19:25:50 +0000 (19:25 +0000)]
drm: Set old handle to NULL before prime swap in change_handle
There was a potential race condition in change_handle. The ioctl
briefly had a single object with two idr entries; a concurrent
gem_close could delete the object and remove one of the handles
while leaving the other one dangling, which could subsequently
be dereferenced for a use-after-free.
To fix this, do the same dance that gem_close itself does.
(f6cd7daecff5 drm: Release driver references to handle before making it available again)
First idr_replace the old handle to NULL. Later, if the prime
operations are successful, actually close it.
create_tail required a similar dance to avoid a similar problem.
(bd46cece51a3 drm/gem: Fix race in drm_gem_handle_create_tail())
It idr_allocs the new handle with NULL, then swaps in the correct
object later to avoid races. We don't need to do that here, since
the only operations that could race are drm_prime, and
change_handle holds the prime lock for the entire duration.
v2: cleanups of error paths
Signed-off-by: David Francis <David.Francis@amd.com> Co-authored-by: Dave Airlie <airlied@gmail.com> Reported-by: Puttimet Thammasaeng <pwn8official@gmail.com> Tested-by: Vitaly Prosyak <Vitaly.Prosyak@amd.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: stable@vger.kernel.org Cc: Christian Koenig <Christian.Koenig@amd.com> Fixes: 53096728b8910 ("drm: Add DRM prime interface to reassign GEM handle") Signed-off-by: Dave Airlie <airlied@redhat.com>
Linus Torvalds [Fri, 8 May 2026 00:26:43 +0000 (17:26 -0700)]
Merge tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux
Pull selinux fixes from Paul Moore:
- Allow for multiple opens of /sys/fs/selinux/policy
Prevent a single process from blocking others from reading the
SELinux policy loaded in the kernel. This does have the side effect
of potentially allowing userspace to trigger additional kernel memory
allocations as part of the open/read operation, but this is mitigated
by requiring the SELinux security/read_policy permission.
- Reduce the critical sections where the SELinux policy mutex is held
This includes the patch to the policy loader code where we move the
permission checks and an allocation outside the mutex as well as the
the patch to checkreqprot which drops the code/lock entirely.
While the checkreqprot code had effectively been dropped in an
earlier release, portions of the code still remained that would have
triggered the mutex to perform an IMA measurement. This finally drops
all of that while preserving the user visible behavior.
- Eliminate potential sources of log spamming
There were a few areas where processes could flood the system logs
and hide other, more critical events. The previously disabled
checkreqprot and runtime disable knobs in selinuxfs were two such
areas that have now been greatly simplified and a pr_err() replaced
with a pr_err_once().
The third such place is the /sys/fs/selinux/user file, which hasn't
been used by a userspace release since 2020 and was scheduled for
removal after 2025; this effectively disables this functionality, but
similar to checkreqprot, it is done in a way that should not break
old userspace.
* tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
selinux: shrink critical section in sel_write_load()
selinux: allow multiple opens of /sys/fs/selinux/policy
selinux: prune /sys/fs/selinux/user
selinux: prune /sys/fs/selinux/disable
selinux: prune /sys/fs/selinux/checkreqprot
Tabrez Ahmed [Sat, 2 May 2026 02:08:42 +0000 (07:38 +0530)]
hwmon: (ads7871) Fix endianness bug in 16-bit register reads
The ads7871_read_reg16() function relies on spi_w8r16() to read the
16-bit sensor output. The ADS7871 device transmits the Least Significant
Byte (LSB) first.
On Little-Endian architectures, spi_w8r16() correctly reconstructs the
16-bit value. However, on Big-Endian architectures, the byte swapping
causes the first received byte (LSB) to be placed in the most significant
byte of the u16, resulting in corrupted voltage readings.
To fix this, cast the integer result of spi_w8r16() to a restricted
__le16 type and convert it to the host CPU's native byte order using
le16_to_cpu(). Negative error codes returned by the SPI core are caught
and returned prior to the conversion to avoid mangling the error status.
Reported-by: Sashiko <sashiko-bot@kernel.org> Closes: https://sashiko.dev/#/patchset/20260418034601.90226-1-tabreztalks@gmail.com Fixes: e0c70b8078629 ("hwmon: add TI ads7871 a/d converter driver") Suggested-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Tabrez Ahmed <tabreztalks@gmail.com> Link: https://lore.kernel.org/r/20260502020844.110038-2-tabreztalks@gmail.com Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Sensors configurations are defined by set and clear masks. These
do not follow a consistent "clear mask is a superset of set mask"
rule. This relaxed definition breaks lm75_write_config()
Basically all bits from set_mask that are not defined in clr_mask are
dropped. Fix that by enhancing the helper to always combine clr_mask
and set_mask into the mask bits of regmap_update_bits().
hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling
The initialization of the AS6200 has two shortcomings
- The device-add-commit states "Conversion mode: continuous" but the
the lm75_params structure uses set_mask = 0x94c0. This activates
single shot mode (bit 15). According to the datasheet "The device
features a single shot measurement mode if the device is in sleep
mode (SM=1)". This is quite contradictionary.
- It is the only device that activates polarity active-high (bit 10)
All this is paired with a undefined clear mask bug in function
lm75_write_config() that was introduced with a later refactoring
commit.
[as6200] = {
.config_reg_16bits = true,
.set_mask = 0x94C0,
-> .clr_mask not defined here
.default_resolution = 12,
...
static inline int lm75_write_config(struct lm75_data *data, u16 set_mask,
u16 clr_mask)
{
return regmap_update_bits(data->regmap, LM75_REG_CONF,
clr_mask | LM75_SHUTDOWN, set_mask);
}
regmap_update_bits() requires clr_mask to be a superset of set_mask.
So basically all sensors with "wrong" masks like the AS6200 are not
initialized as intended.
Fix that by
- Change the set_mask to 0xc010 to reflect the current active-low
setup properly and to drive the sensor in continous mode. This
takes into account that the config register is little endian and
the first byte sent to the chip is the LSB.
- Adapt the alarm handling so it can report the alarm correctly
even if it is high active. This is done by comparing config register
bit 5 and 10 (translated to 2 and 13).
This commit does not introduce any ABI breakage as the mutliple bugs
effectly drive the AS6200 in standard active-low mode.
Fixes: 4b6358e1fe46 ("hwmon: (lm75) Add AMS AS6200 temperature sensor") Suggested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de> Link: https://lore.kernel.org/r/20260502173207.3567876-2-markus.stockhausen@gmx.de
[groeck: Update set_mask for as6200 further: As modeled, the upper bits
contain the conversion rate, so the config register needs to be set to
0xc010 instead of 0x10c0 to reflect 8 samples/s and 4 consecutive faults.
Fix the same problem for TMP112.] Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Dave Airlie [Thu, 7 May 2026 22:51:01 +0000 (08:51 +1000)]
Merge tag 'drm-xe-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
Driver Changes:
- Add NULL check for media_gt in intel_hdcp_gsc_check_status (Gustavo)
- Fix EAGAIN sign in pf_migration_consume (Shuicheng)
- Fix MMIO access using PF view instead of VF view during migration (Shuicheng)
- Exclude indirect ring state page from ADS engine state size (Satya)
Dave Airlie [Thu, 7 May 2026 22:34:34 +0000 (08:34 +1000)]
Merge tag 'drm-rust-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/rust/kernel into drm-fixes
DRM Rust fixes for v7.1-rc3
- Fix unsound initialization in drm::Device::new(); if pinned
initialization of drm::Device::Data fails, make sure
drm::Device::release() isn't called, so we don't run the data's
destructor
- Fix missing GEM state cleanup in the init failure case; call
drm_gem_private_object_fini() if drm_gem_object_init() fails
- Fix wrong ARef import in the DRM shmem GEM helper abstraction
- Replace the nouveau mailing list with the new nova-gpu mailing list
for both nova-core and nova-drm, and remove unused patchwork entries
smb: client: validate dacloffset before building DACL pointers
parse_sec_desc(), build_sec_desc(), and the chown path in
id_mode_to_cifs_acl() all add the server-supplied dacloffset to pntsd
before proving a DACL header fits inside the returned security
descriptor.
On 32-bit builds a malicious server can return dacloffset near
U32_MAX, wrap the derived DACL pointer below end_of_acl, and then slip
past the later pointer-based bounds checks. build_sec_desc() and
id_mode_to_cifs_acl() can then dereference DACL fields from the wrapped
pointer in the chmod/chown rewrite paths.
Validate dacloffset numerically before building any DACL pointer and
reuse the same helper at the three DACL entry points.
Fixes: bc3e9dd9d104 ("cifs: Change SIDs in ACEs while transferring file ownership.") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
Zisen Ye [Wed, 6 May 2026 03:49:08 +0000 (11:49 +0800)]
smb/client: fix out-of-bounds read in smb2_compound_op()
If a server sends a truncated response but a large OutputBufferLength, and
terminates the EA list early, check_wsl_eas() returns success without
validating that the entire OutputBufferLength fits within iov_len.
Then smb2_compound_op() does:
memcpy(idata->wsl.eas, data[0], size[0]);
Where size[0] is OutputBufferLength. If iov_len is smaller than size[0],
memcpy can read beyond the end of the rsp_iov allocation and leak adjacent
kernel heap memory.
Zisen Ye [Sat, 2 May 2026 10:48:36 +0000 (18:48 +0800)]
smb/client: fix out-of-bounds read in symlink_data()
Since smb2_check_message() returns success without length validation for
the symlink error response, in symlink_data() it is possible for
iov->iov_len to be smaller than sizeof(struct smb2_err_rsp). If the buffer
only contains the base SMB2 header (64 bytes), accessing
err->ErrorContextCount (at offset 66) or err->ByteCount later in
symlink_data() will cause an out-of-bounds read.