git.ipfire.org Git - thirdparty/linux.git/log

Merge branch 'bpf-support-stack-arguments-for-bpf-functions-and-kfuncs'

Yonghong Song says:

====================
bpf: Support stack arguments for BPF functions and kfuncs

Currently, bpf function calls and kfunc's are limited by 5 reg-level
parameters. For function calls with more than 5 parameters,
developers can use always inlining or pass a struct pointer
after packing more parameters in that struct although it may have
some inconvenience. But there is no workaround for kfunc if more
than 5 parameters is needed.

This patch set lifts the 5-argument limit by introducing stack-based
argument passing for BPF functions and kfunc's, coordinated with
compiler support in LLVM [1]. The compiler emits loads/stores through
a new bpf register r11 (BPF_REG_PARAMS), to pass arguments beyond
the 5th, keeping the stack arg area separate from the r10-based program
stack. The current maximum number of arguments is capped at
MAX_BPF_FUNC_ARGS (12), which is sufficient for the vast majority of
use cases.

All kfunc/bpf-function arguments are caller saved, including stack
arguments. For register arguments (r1-r5), the verifier already marks
them as clobbered after each call. For stack arguments, the verifier
invalidates all outgoing stack arg slots immediately after a call,
requiring the compiler to re-store them before any subsequent call.
This follows the native calling convention where all function
parameters are caller saved.

The x86_64 JIT translates r11-relative accesses to RBP-relative
native instructions. Each function's stack allocation is extended
by 'max_outgoing' bytes to hold the outgoing arg area below the
callee-saved registers. This makes implementation easier as the r10
can be reused for stack argument access. At both BPF-to-BPF and kfunc
calls, outgoing args are pushed onto the expected calling convention
locations directly. The incoming parameters can directly get the value
from caller.

Global subprogs and freplace progs with >5 args are not yet supported.
Only x86_64 and arm64 are supported for now. Same selftests are tested
by both x86_64 and arm64. Please see each individual patch for details.

  [1] https://github.com/llvm/llvm-project/pull/189060

Changelogs:
  v3 -> v4:
    - v3: https://lore.kernel.org/bpf/20260511053301.1878610-1-yonghong.song@linux.dev/
    - Added no_stack_arg_load comparison in func_states_equal() to ensure
      correctness of pruning.
    - Shrink bpf_jmp_history_entry.flags to 4bit to match the number of flags.
    - Instead of passing bpf_subprog_info to JIT, use prog->aux->func_idx to
      find corresponding bpf_subprog_info from 'env'.
    - For patch 'bpf: Reject stack arguments if tail call reachable', use stack_arg_cnt
      instead of just incoming stack arg cnt.
    - Tighten invalidate_outgoing_stack_args() for kfunc/helper/bpf-to-bpf calls.
    - Disable private stack in verifier for x86_64 instead of in JIT.
  v2 -> v3:
    - v2: https://lore.kernel.org/bpf/20260507212942.1122000-1-yonghong.song@linux.dev/
    - In do_check_common() and for main prog, if btf does not match with actual
      parameter, the verification will continue and will ignore arg_cnt. Make
      arg_cnt=1 explictly to prevent any incoming stack arguments.
    - Remove the loop which clear current frame stack slot and set the upper level frame
      stack slot. This is not needed unless there is a bug. Add a verifier_bug
      if the bug happens.
    - For liveness, avoid r11 based load/stores mixing with r10 based stack tracking.
      Also, print out stack arguments properly.
    - Pass bpf_subprog_info the JIT so we can avoid copy bpf_subprog_info fields to
      bpf_prog_aux.
    - Fix the missed allocation free for test infra BTF fixup.
    - Remove selftest result for precision backtracking test since the result would
      be change (two possible output).
  v1 -> v2:
    - v1: https://lore.kernel.org/bpf/20260424171433.2034470-1-yonghong.song@linux.dev/
    - Several refactoring (convert bpf_get_spilled_reg macro to static inline func,
      Remove copy_register_state(), Refactor jmp history, Refactor record_call_access(), etc),
      suggested by Eduard.
    - Use incoming_stack_arg_cnt/stack_arg_cnt instead of incoming_stack_arg_depth/stack_arg_depth,
      suggested by Eduard.
    - Fix a stack arg pruning bug, from Eduard.
    - Fix a bug for precision marking and backtracking, basically callee needs to get the
      stack arg value from callers, helped from Eduard.
    - Set sub->arg_cnt earlier in btf_prepare_func_args(), this will avoid having
      incoming_stack_arg_cnt in bpf_subprog_info.
    - Do stack-arg liveness analysis together with r10 based liveness analysis,
      suggested by Eduard.
    - Fix a few tests to ensure that r11-based loads cannot be ahead of r11-based stores,
      and r11-based loads cannot be after kfunc/helper/bpf-function.
====================

Link: https://patch.msgid.link/20260513044949.2382019-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Enable stack argument tests for arm64

Now that arm64 supports stack arguments, enable the existing stack_arg,
stack_arg_kfunc and verifier_stack_arg tests for __TARGET_ARCH_arm64.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045204.2403441-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, arm64: Add JIT support for stack arguments

Implement stack argument passing for BPF-to-BPF and kfunc calls with
more than 5 parameters on arm64, following the AAPCS64 calling
convention.

BPF R1-R5 already map to x0-x4. With BPF_REG_0 moved to x8 by the
previous commit, x5-x7 are free for arguments 6-8. Arguments 9-12
spill onto the stack at [SP+0], [SP+8], ... and the callee reads
them from [FP+16], [FP+24], ... (above the saved FP/LR pair).

BPF convention uses fixed offsets from BPF_REG_PARAMS (r11): off=-8 is
always arg 6, off=-16 arg 7, etc. The verifier invalidates all outgoing
stack arg slots after each call, so the compiler must re-store before
every call. This means x5-x7 don't need to be saved on stack.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045158.2402494-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, arm64: Map BPF_REG_0 to x8 instead of x7

Move the BPF return value register from x7 to x8, freeing x7 for use
as an argument register. AAPCS64 designates x8 as the indirect result
location register; it is caller-saved and not used for argument
passing, making it a suitable home for BPF_REG_0.

This is a prerequisite for stack argument support, which needs x5-x7
to pass arguments 6-8 to native kfuncs following the AAPCS64 calling
convention.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045153.2402197-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add precision backtracking test for stack arguments

Add a test that verifies precision backtracking works correctly
across BPF-to-BPF calls when stack arguments are involved.

The test passes a size value as incoming stack arg (arg6) to a
subprog, which forwards it as the mem__sz parameter (outgoing arg7)
to bpf_kfunc_call_stack_arg_mem. The expected __msg annotations
verify that precision propagates from the kfunc's mem__sz argument
back through the subprog frame to the caller's outgoing stack arg
store.

A companion BTF file (btf__stack_arg_precision.c) provides named
parameter BTF for the __naked subprog via __btf_func_path.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045148.2400087-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add verifier tests for stack argument validation

Add inline-asm based verifier tests that exercise stack argument
validation logic directly.

Positive tests:
  - subprog call with 6 arg's
  - Two sequential calls to different subprogs (6-arg and 7-arg)
  - Share a r11 store for both branches

Negative tests — verifier rejection:
  - Read from uninitialized incoming stack arg slot
  - Gap in outgoing slots: only r11-16 written, r11-8 missing
  - Write at r11-80, exceeding max 7 stack args
  - Missing store on one branch with a shared store
  - First call has proper stack arguments and the second
    call intends to inherit stack arguments but not working
  - r11 load ordering issue

Negative tests — pointer/ref tracking:
  - Pruning type mismatch: one branch stores PTR_TO_STACK, the
    other stores a scalar, callee dereferences — must not prune
  - Release invalidation: bpf_sk_release invalidates a socket
    pointer stored in a stack arg slot
  - Packet pointer invalidation: bpf_skb_pull_data invalidates
    a packet pointer stored in a stack arg slot
  - Null propagation: PTR_TO_MAP_VALUE_OR_NULL stored in stack
    arg slot, null branch attempts dereference via callee

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045143.2399278-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add BTF fixup for __naked subprog parameter names

When __naked subprogs are used in verifier tests, clang drops
parameter names from their BTF FUNC_PROTO entries. This prevents
the verifier from resolving stack argument slots by name.

Add a __btf_func_path(path) annotation that points to a separate
BTF file containing properly-named FUNC entries. The test_loader
matches FUNC entries by name, detects anonymous parameters, and
replaces the FUNC_PROTO with a new one that carries parameter
names from the custom file while preserving the original type IDs.

The custom BTF file also serves as btf_custom_path for kfunc
resolution when no separate btf_custom_path is specified.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045138.2398886-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tests for stack argument validation

Add negative tests that verify the kfunc (rejecting kfunc call
with >8 byte struct as stack argument) and the verifier
(rejecting invalid uses of r11 for stack arguments).

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045132.2398371-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tests for BPF function stack arguments

Add selftests covering stack argument passing for both BPF-to-BPF
subprog calls and kfunc calls with more than 5 arguments. All tests
are guarded by __BPF_FEATURE_STACK_ARGUMENT and __TARGET_ARCH_x86.

BPF-to-BPF subprog call tests (stack_arg.c):
  - Scalar stack args
  - Pointer stack args
  - Mixed pointer/scalar stack args
  - Nested calls
  - Dynptr stack arg
  - Two callees with different stack arg counts
  - Async callback

Kfunc call tests (stack_arg_kfunc.c, with bpf_testmod kfuncs):
  - Scalar stack args
  - Pointer stack args
  - Mixed pointer/scalar stack args
  - Dynptr stack arg
  - Memory buffer + size pair
  - Iterator
  - Const string pointer
  - Timer pointer

Acked-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045127.2397187-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf,x86: Implement JIT support for stack arguments

Add x86_64 JIT support for BPF functions and kfuncs with more than
5 arguments. The extra arguments are passed through a stack area
addressed by register r11 (BPF_REG_PARAMS) in BPF bytecode,
which the JIT translates to native code.

The JIT follows the x86-64 calling convention for both BPF-to-BPF
and kfunc calls:
  - Arg 6 is passed in the R9 register
  - Args 7+ are passed on the stack

Incoming arg 6 (BPF r11+8) is translated to a MOV from R9 rather
than a memory load. Incoming args 7+ (BPF r11+16, r11+24, ...) map
directly to [rbp + 16], [rbp + 24], ..., matching the x86-64 stack
layout after CALL + PUSH RBP, so no offset adjustment is needed.

tail_call_reachable is rejected by the verifier and priv_stack is
disabled by the JIT when stack args exist, so R9 is always
available. When BPF bytecode writes to the arg-6 stack slot
(offset -8), the JIT emits a MOV into R9 instead of a memory store.
Outgoing args 7+ are placed at [rsp] in a pre-allocated area below
callee-saved registers, using:
  native_off = outgoing_arg_base - outgoing_rsp - bpf_off - 16

The native x86_64 stack layout with stack arguments:

  high address
  +-------------------------+
  | incoming stack arg N    |  [rbp + 16 + (N-7)*8]  (from caller)
  | ...                     |
  | incoming stack arg 7    |  [rbp + 16]
  +-------------------------+
  | return address          |  [rbp + 8]
  | saved rbp               |  [rbp]
  +-------------------------+
  | BPF program stack       |  (round_up(stack_depth, 8) bytes)
  +-------------------------+
  | callee-saved regs       |  (r12, rbx, r13, r14, r15 as needed)
  +-------------------------+
  | outgoing arg M          |  [rsp + (M-7)*8]
  | ...                     |
  | outgoing arg 7          |  [rsp]
  +-------------------------+  rsp
  low address

Acked-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045122.2393118-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Disable private stack for x86_64 if stack arguments used

Other architectures like arm64, riscv, etc. have enough register
and for them private stack can be used together with
stack arguments.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045114.2392291-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Reject stack arguments if tail call reachable

Tail calls are deprecated and will be replaced by indirect calls
in the future. Reject programs that combine tail calls with stack
arguments rather than adding complexity for a deprecated feature.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045109.2392108-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Support stack arguments for kfunc calls

Extend the stack argument mechanism to kfunc calls, allowing kfuncs
with more than 5 parameters to receive additional arguments via the
r11-based stack arg area.

For kfuncs, the caller is a BPF program and the callee is a kernel
function. The BPF program writes outgoing args at negative r11
offsets, following the same convention as BPF-to-BPF calls:

  Outgoing: r11 - 8 (arg6), ..., r11 - N*8 (last arg)

The following is an example:

  int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
    ...
    kfunc1(a1, a2, a3, a4, a5, a6, a7, a8);
    ...
    kfunc2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
    ...
  }

   Caller (foo), generated by llvm
   ===============================
   Incoming (positive offsets):
     r11+8:  [incoming arg 6]
     r11+16: [incoming arg 7]

   Outgoing for kfunc1 (negative offsets):
     r11-8:  [outgoing arg 6]
     r11-16: [outgoing arg 7]
     r11-24: [outgoing arg 8]

   Outgoing for kfunc2 (negative offsets):
     r11-8:  [outgoing arg 6]
     r11-16: [outgoing arg 7]
     r11-24: [outgoing arg 8]
     r11-32: [outgoing arg 9]

Later JIT will marshal outgoing arguments to the native calling
convention for kfunc1() and kfunc2().

For kfunc calls where stack args are used as constant or size
parameters, a mark_stack_arg_precision() helper is used to propagate
precision and do proper backtracking.

There are two places where meta->release_regno needs to keep
regno for later releasing the reference. Also, 'cur_aux(env)->arg_prog = regno'
is also keeping regno for later fixup. Since stack arguments don't have a valid
register number (regno is negative), these three cases are rejected for now
if the argument is on the stack.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045104.2391543-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Enable r11 based insns

BPF_REG_PARAMS (r11) is used for stack argument accesses and
the following are only insns with r11 presence:
    - load incoming stack arg
    - store register to outgoing stack arg
    - store immediate to outgoing stack arg

The detailed insn format can be found in is_stack_arg_ldx/st/stx()
helpers. After this patch, stack arg ldx/st/stx insns become valid
for kernel and these insns can be properly checked by verifier.

The LLVM compiler [1] implemented the above BPF_REG_PARAMS insns.

  [1] https://github.com/llvm/llvm-project/pull/189060

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045059.2391192-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Prepare architecture JIT support for stack arguments

Add bpf_jit_supports_stack_args() as a weak function defaulting to
false. Architectures that implement JIT support for stack arguments
override it to return true.

Reject BPF functions with more than 5 parameters at verification
time if the architecture does not support stack arguments.

Acked-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045054.2390945-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Reject stack arguments in non-JITed programs

The interpreter does not understand the bpf register r11
(BPF_REG_PARAMS) used for stack arguments. So reject interpreter
usage if stack arguments are used either in the main program or
any subprogram.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045049.2390444-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Extend liveness analysis to track stack argument slots

BPF_REG_PARAMS (R11) is at index MAX_BPF_REG, which is beyond the
register tracking arrays in const_fold.c and liveness.c. Handle it
explicitly to avoid out-of-bounds accesses.

Extend the arg tracking dataflow to cover stack arg slots. Otherwise,
pointers passed through stack args are invisible to liveness, causing
the pointed-to stack slots to be incorrectly poisoned.

Extend the at_out tracking array to MAX_AT_TRACK_REGS (registers
plus stack arg slots) so that outgoing stack arg stores are tracked
alongside registers. Add a separate at_stack_arg_entry array in
compute_subprog_args(), passed to arg_track_xfer(), to restore
FP-derived values on incoming stack arg reads.

Extend record_call_access() to check stack arg slots for FP-derived
pointers at kfunc call sites, reusing the record_arg_access() helper
extracted in the previous patch. Pass stack arg state from caller to
callee in analyze_subprog() so that callees can track pointers received
through stack args, hence avoid poisoning.

Skip stack arg instructions in record_load_store_access(). Stack arg
STX uses dst_reg=BPF_REG_PARAMS (index 11), but at[11] is repurposed
to track the value stored in stack arg slot 0. Without the skip, if a
prior stack arg STX stored an FP-derived pointer (e.g., fp-64) into
slot 0, a subsequent stack arg STX would read that FP-derived value as
the base pointer and spuriously mark a regular stack slot (e.g., fp-72
from -64 + -8) as accessed in the liveness bitmap.

Extend arg_track_log() to log state transitions for outgoing stack arg
slots at indices MAX_BPF_REG through MAX_AT_TRACK_REGS-1. Without this,
changes to at_out[11..17] caused by stack arg store instructions are
silently omitted from BPF_LOG_LEVEL2 output. For example, when a
caller passes fp-64 through a stack argument:

  subprog#0:
   10: (bf) r6 = r10
   11: (07) r6 += -64
   12: (7b) *(u64 *)(r11 -8) = r6
sa0: none -> fp0-64
   13: (85) call pc+5

Without the fix, the "sa0: none -> fp0-64" transition at insn 12
would not appear.

Extend print_subprog_arg_access() to include stack arg slots in the
per-instruction FP-derived state dump. For example:

  subprog#0:
   12: (7b) *(u64 *)(r11 - 8) = r6  // r6=fp0-64
   13: (85) call pc+5              // r6=fp0-64 sa0=fp0-64

Without the fix, the "sa0=fp0-64" annotation at insn 13 would not
appear, making it harder to debug liveness analysis for programs
that pass FP-derived pointers through stack arguments.

Extend has_fp_args() to also check stack arg slots for FP-derived
pointers, so that callees receiving pointers only through stack args
are still recursively analyzed.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045043.2389049-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Use arg_is_fp() in has_fp_args()

Replace "frame != ARG_NONE" with arg_is_fp() in has_fp_args().
The function's purpose is to check whether any argument is derived
from a frame pointer, which is exactly what arg_is_fp() tests
(frame >= 0 || frame == ARG_IMPRECISE). Using the dedicated
predicate is clearer and more consistent with the rest of the file.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045035.2388671-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Refactor record_call_access() to extract per-arg logic

Extract the per-argument FP-derived pointer handling from
record_call_access() into a new record_arg_access() helper.

The existing loop body — checking arg_is_fp, querying stack access
bytes, and calling record_stack_access/record_imprecise — will be
reused for stack argument slots in the next patch. Factoring it out
now avoids duplicating the logic.

No functional change.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045030.2388067-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add precision marking and backtracking for stack argument slots

Extend the precision marking and backtracking infrastructure to
support stack argument slots (r11-based accesses). Without this,
precision demands for scalar values passed through stack arguments
are silently dropped, which could allow the verifier to incorrectly
prune states with different constant values in stack arg slots.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045025.2387526-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Refactor jmp history to use dedicated spi/frame fields

Move stack slot index (spi) and frame number out of the flags field
in bpf_jmp_history_entry into dedicated bitfields. This simplifies
the encoding and makes room for new flags.

Previously, spi and frame were packed into the lower 9 bits of the
12-bit flags field (3 bits frame + 6 bits spi), with INSN_F_STACK_ACCESS
at BIT(9) and INSN_F_DST/SRC_REG_STACK at BIT(10)/BIT(11).
But this has no room for an INSN_F_* flag for stack arguments.

To resolve this issue, bpf_jmp_history_entry field idx is narrowed to
20 bits (sufficient for insn indices up to 1M), and the freed bits hold
spi (6 bits) and frame (3 bits) as dedicated struct fields. The flags
enum is simplified accordingly:
  INSN_F_STACK_ACCESS  -> BIT(0)
  INSN_F_DST_REG_STACK -> BIT(1)
  INSN_F_SRC_REG_STACK -> BIT(2)
which allows more room for additional INSN_F_* flags.

bpf_push_jmp_history() now takes explicit spi and frame parameters
instead of encoding them into flags. The insn_stack_access_flags(),
insn_stack_access_spi(), and insn_stack_access_frameno() helpers are
removed.

No functional change.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045020.2385962-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Support stack arguments for bpf functions

Currently BPF functions (subprogs) are limited to 5 register arguments.
With [1], the compiler can emit code that passes additional arguments
via a dedicated stack area through bpf register BPF_REG_PARAMS (r11),
introduced in an earlier patch ([2]).

The compiler uses positive r11 offsets for incoming (callee-side) args
and negative r11 offsets for outgoing (caller-side) args, following the
x86_64/arm64 calling convention direction. There is an 8-byte gap at
offset 0 separating two regions:
  Incoming (callee reads):   r11+8 (arg6), r11+16 (arg7), ...
  Outgoing (caller writes):  r11-8 (arg6), r11-16 (arg7), ...

The following is an example to show how stack arguments are saved
and transferred between caller and callee:

  int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
    ...
    bar(a1, a2, a3, a4, a5, a6, a7, a8);
    ...
  }

  Caller (foo)                           Callee (bar)
  ============                           ============
  Incoming (positive offsets):           Incoming (positive offsets):

  r11+8:  [incoming arg 6]               r11+8:  [incoming arg 6] <-+
  r11+16: [incoming arg 7]               r11+16: [incoming arg 7] <-|+
                                         r11+24: [incoming arg 8] <-||+
  Outgoing (negative offsets):                                      |||
  r11-8:  [outgoing arg 6 to bar] -------->-------------------------+||
  r11-16: [outgoing arg 7 to bar] -------->--------------------------+|
  r11-24: [outgoing arg 8 to bar] -------->---------------------------+

If the bpf function has more than one call:

  int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) {
    ...
    bar1(a1, a2, a3, a4, a5, a6, a7, a8);
    ...
    bar2(a1, a2, a3, a4, a5, a6, a7, a8, a9);
    ...
  }

  Caller (foo)                             Callee (bar2)
  ============                             ==============
  Incoming (positive offsets):             Incoming (positive offsets):

  r11+8:  [incoming arg 6]                 r11+8:  [incoming arg 6] <+
  r11+16: [incoming arg 7]                 r11+16: [incoming arg 7] <|+
                                           r11+24: [incoming arg 8] <||+
  Outgoing for bar2 (negative offsets):    r11+32: [incoming arg 9] <|||+
  r11-8:  [outgoing arg 6] ---->----------->-------------------------+|||
  r11-16: [outgoing arg 7] ---->----------->--------------------------+||
  r11-24: [outgoing arg 8] ---->----------->---------------------------+|
  r11-32: [outgoing arg 9] ---->----------->----------------------------+

The verifier tracks outgoing stack arguments in stack_arg_regs[] and
out_stack_arg_cnt in bpf_func_state, separately from the regular
r10 stack. The callee does not copy incoming args — it reads them
directly from the caller's outgoing slots at positive r11 offsets.
Similar to stacksafe(), introduce stack_arg_safe() to do pruning
check.

Outgoing stack arg slots are invalidated when the callee returns
(e.g. in prepare_func_exit), not at call time. This allows the callee to
read incoming args from the caller's outgoing slots during
verification. The following are a few examples.

Example 1:
  *(u64 *)(r11 - 8) = r6;
  *(u64 *)(r11 - 16) = r7;
  call bar1;                // arg6 = r6, arg7 = r7
  call bar2;                // expected with 2 stack arguments, failed

Example 2:
To fix the Example 1:
  *(u64 *)(r11 - 8) = r6;
  *(u64 *)(r11 - 16) = r7;
  call bar1;                // arg6 = r6, arg7 = r7
  *(u64 *)(r11 - 8) = r8;
  *(u64 *)(r11 - 16) = r9;
  call bar2;                // arg6 = r8, arg7 = r9

Example 3:
The compiler can hoist the shared stack arg stores above the branch:
  *(u64 *)(r11 - 16) = r7;
  if cond goto else;
    *(u64 *)(r11 - 8) = r8;
    call bar1;               // arg6 = r8, arg7 = r7
    goto end;
  else:
    *(u64 *)(r11 - 8) = r9;
    call bar2;               // arg6 = r9, arg7 = r7
  end:

Example 4:
Within a loop:
  loop:
    *(u64 *)(r11 - 8) = r6;  // arg6, before loop
    call bar;                // reuses arg6 each iteration
    if ... goto loop;

A separate max_out_stack_arg_cnt field in bpf_subprog_info tracks
the deepest outgoing slot actually written. This intends to
reject programs that write to slots beyond what any callee expects.
It is necessary for JIT.

Similar to typical compiler generated code, enforce the following
orderings:
  - all stack arg reads must be ahead of any stack arg write
  - all stack arg reads must be before any bpf func, kfunc and helpers
This is needed as JIT may emit 'mov' insns for read/write with
the same register and bpf function, kfunc and helper will invalidate
all arguments immediately after the call.

Callback functions with stack arguments need kernel setup parameter
types (including stack parameters) properly and then callback function
can retrieve such information for verification purpose.

Global subprogs and freplace with >5 args are not yet supported.

  [1] https://github.com/llvm/llvm-project/pull/189060
  [2] https://lore.kernel.org/bpf/20260423033506.2542005-1-yonghong.song@linux.dev/

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045015.2385013-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Set sub->arg_cnt earlier in btf_prepare_func_args()

Move the "sub->arg_cnt = nargs" assignment to immediately after
nargs is computed from btf_type_vlen(), instead of at the end of
btf_prepare_func_args().

btf_prepare_func_args() can return -EINVAL early in several cases,
e.g. when a static function has some non-int/enum arguments.
Since -EINVAL from btf_prepare_func_args() does not immediately
reject verification, arg_cnt remains zero after the early return.
This causes later stack argument based load/store insns to
incorrectly assume the function has no arguments.

Setting arg_cnt right after nargs ensures it is available regardless
of which path btf_prepare_func_args() takes.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045010.2384635-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add helper functions for r11-based stack argument insns

Add three static inline helper functions — is_stack_arg_ldx(),
is_stack_arg_st(), and is_stack_arg_stx() — that identify r11-based
(BPF_REG_PARAMS) instructions used for stack argument passing. These
helpers encapsulate the detailed encoding requirements (operand size,
register, offset alignment and sign) and hide raw BPF_REG_PARAMS usage
from the verifier, making call sites more readable and explicit.

A later patch ("bpf: Enable r11 based insns") will wire these helpers
into the verifier. Until then, check_and_resolve_insns() rejects any
r11-based registers.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045005.2383881-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Remove copy_register_state wrapper function

Remove the copy_register_state() helper which was just a plain struct
assignment wrapper and replace all call sites with direct struct
assignment. This simplifies the code in preparation for upcoming stack
argument support.

No functional change.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513045000.2382933-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Convert bpf_get_spilled_reg macro to static inline function

Convert the bpf_get_spilled_reg() macro to a static inline function
for better type safety and readability. This also simplifies the macro
definition in preparation for upcoming stack argument support which
will introduce additional macros.

No functional change.

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260513044954.2382693-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-extend-bpf-syscall-with-common-attributes-support'

Leon Hwang says:

====================
bpf: Extend BPF syscall with common attributes support

This patch series builds upon the discussion in
"[PATCH bpf-next v4 0/4] bpf: Improve error reporting for freplace attachment failure" [1].

This patch series introduces support for *common attributes* in the BPF
syscall, providing a unified mechanism for passing shared metadata across
all BPF commands, initially used by BPF_PROG_LOAD, BPF_BTF_LOAD, and
BPF_MAP_CREATE.

The initial set of common attributes includes:

1. 'log_buf': User-provided buffer for storing log output.
2. 'log_size': Size of the provided log buffer.
3. 'log_level': Verbosity level for logging.
4. 'log_true_size': Actual log size reported by kernel.

With this extension, the BPF syscall will be able to return meaningful
error messages (e.g., map creation failures), improving debuggability
and user experience.

Links:
[1] https://lore.kernel.org/bpf/20250224153352.64689-1-leon.hwang@linux.dev/

Changes:
v13 -> v14:
* Replace __u64 with __aligned_u64 in struct bpf_common_attr in patch #1
  (per bot+bpf-ci).
* Add a single line comment for preserving original error in patch #6
  (per bot+bpf-ci and Alexei).
* Drop unused label in patch #6 (per bot+bpf-ci).
* v13: https://lore.kernel.org/bpf/20260511152817.89191-1-leon.hwang@linux.dev/

v12 -> v13:
* Rebase on bpf-next tree to resolve code conflict.
* Report log_true_size on success path in patch #6.
* Check size instead of common->log_buf in patch #6 (per bot+bpf-ci).
* v12: https://lore.kernel.org/bpf/20260420141804.27179-1-leon.hwang@linux.dev/

v11 -> v12:
* Drop "log_" prefix in struct bpf_log_attr in patch #3.
* Drop "log_" prefix in struct bpf_log_opts in patch #7.
* Copy log_true_size using copy_to_bpfptr_offset() in patch #3 (per Alexei).
* v11: https://lore.kernel.org/bpf/20260216150445.68278-1-leon.hwang@linux.dev/

v10 -> v11:
* Collect Acked-by from Andrii, thanks.
* Validate whether log_buf, log_size, and log_level are valid by reusing
  bpf_verifier_log_attr_valid() in patch #4 (per Andrii).
* v10: https://lore.kernel.org/bpf/20260211151115.78013-1-leon.hwang@linux.dev/

v9 -> v10:
* Collect Acked-by from Andrii, thanks.
* Address comments from Andrii:
  * Drop log NULL check in bpf_log_attr_finalize().
  * Return -EFAULT early in bpf_log_attr_finalize().
  * Validate whether log_buf, log_size, and log_level are set.
  * Keep log_buf, log_size, log_level, and user-pointer log_true_size in struct
    bpf_log_attr.
  * Make prog_load and btf_load work with the new struct bpf_log_attr.
  * Add comment to log_true_size of struct bpf_log_opts in libbpf.
* Address comment from Alexei:
  * Avoid using BPF_LOG_FIXED as log_level in tests.
* v9: https://lore.kernel.org/bpf/20260202144046.30651-1-leon.hwang@linux.dev/

v8 -> v9:
* Rework reporting 'log_true_size' for prog_load, btf_load, and map_create to
  simplify struct bpf_log_attr (per Alexei).
* v8: https://lore.kernel.org/bpf/20260126151409.52072-1-leon.hwang@linux.dev/

v7 -> v8:
* Return 0 when fd < 0 and errno != EFAULT in probe_sys_bpf_ext(), then simplify
  probe_bpf_syscall_common_attrs() (per Alexei and Andrii).
* v7: https://lore.kernel.org/bpf/20260123032445.125259-1-leon.hwang@linux.dev/

v6 -> v7:
* Return -errno when fd < 0 and errno != EFAULT in probe_sys_bpf_ext().
* Convert return value of probe_sys_bpf_ext() to bool in
  probe_bpf_syscall_common_attrs().
* Address comments from Andrii:
  * Drop the comment, and handle fd >= 0 case explicitly in
    probe_sys_bpf_ext().
  * Return an error when fd >= 0 in probe_sys_bpf_ext().
* v6: https://lore.kernel.org/bpf/20260120152424.40766-1-leon.hwang@linux.dev/

v5 -> v6:
* Address comments from Andrii:
  * Update some variables' name.
  * Drop unnecessary 'close(fd)' in libbpf.
  * Rename FEAT_EXTENDED_SYSCALL to FEAT_BPF_SYSCALL_COMMON_ATTRS with
    updated description in libbpf.
  * Use EINVAL instead of EUSERS, as EUSERS is not used in bpf yet.
  * Rename struct bpf_syscall_common_attr_opts to bpf_log_opts in libbpf.
  * Add 'OPTS_SET(log_opts, log_true_size, 0);' in libbpf's 'bpf_map_create()'.
* v5: https://lore.kernel.org/bpf/20260112145616.44195-1-leon.hwang@linux.dev/

v4 -> v5:
* Rework reporting 'log_true_size' for prog_load, btf_load, and map_create
  (per Alexei).
* v4: https://lore.kernel.org/bpf/20260106172018.57757-1-leon.hwang@linux.dev/

RFC v3 -> v4:
* Drop RFC.
* Address comments from Andrii:
  * Add parentheses in 'sys_bpf_ext()'.
  * Avoid creating new fd in 'probe_sys_bpf_ext()'.
  * Add a new struct to wrap log fields in libbpf.
* Address comments from Alexei:
  * Do not skip writing to user space when log_true_size is zero.
  * Do not use 'bool' arguments.
  * Drop the adding WARN_ON_ONCE()'s.
* v3: https://lore.kernel.org/bpf/20251002154841.99348-1-leon.hwang@linux.dev/

RFC v2 -> RFC v3:
* Rename probe_sys_bpf_extended to probe_sys_bpf_ext.
* Refactor reporting 'log_true_size' for prog_load.
* Refactor reporting 'btf_log_true_size' for btf_load.
* Add warnings for internal bugs in map_create.
* Check log_true_size in test cases.
* Address comment from Alexei:
  * Change kvzalloc/kvfree to kzalloc/kfree.
* Address comments from Andrii:
  * Move BPF_COMMON_ATTRS to 'enum bpf_cmd' alongside brief comment.
  * Add bpf_check_uarg_tail_zero() for extra checks.
  * Rename sys_bpf_extended to sys_bpf_ext.
  * Rename sys_bpf_fd_extended to sys_bpf_ext_fd.
  * Probe the new feature using NULL and -EFAULT.
  * Move probe_sys_bpf_ext to libbpf_internal.h and drop LIBBPF_API.
  * Return -EUSERS when log attrs are conflict between bpf_attr and
    bpf_common_attr.
  * Avoid touching bpf_vlog_init().
  * Update the reason messages in map_create.
  * Finalize the log using __cleanup().
  * Report log size to users.
  * Change type of log_buf from '__u64' to 'const char *' and cast type
    using ptr_to_u64() in bpf_map_create().
  * Do not return -EOPNOTSUPP when kernel doesn't support this feature
    in bpf_map_create().
  * Add log_level support for map creation for consistency.
* Address comment from Eduard:
  * Use common_attrs->log_level instead of BPF_LOG_FIXED.
* v2: https://lore.kernel.org/bpf/20250911163328.93490-1-leon.hwang@linux.dev/

RFC v1 -> RFC v2:
* Fix build error reported by test bot.
* Address comments from Alexei:
  * Drop new uapi for freplace.
  * Add common attributes support for prog_load and btf_load.
  * Add common attributes support for map_create.
* v1: https://lore.kernel.org/bpf/20250728142346.95681-1-leon.hwang@linux.dev/
====================

Link: https://patch.msgid.link/20260512153157.28382-1-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tests to verify map create failure log

Add tests to verify that the kernel reports the expected error messages
and correct log_true_size when map creation fails.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260512153157.28382-9-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Add syscall common attributes support for map_create

With the previous commit adding common attribute support for
BPF_MAP_CREATE, users can now retrieve detailed error messages when map
creation fails via the log_buf field.

Introduce struct bpf_log_opts with the following fields:
log_buf, log_size, log_level, and log_true_size.

Extend bpf_map_create_opts with a new field log_opts, allowing users to
capture and inspect log messages on map creation failures.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260512153157.28382-8-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add syscall common attributes support for map_create

Many BPF_MAP_CREATE validation failures currently return -EINVAL without
any explanation to userspace.

Plumb common syscall log attributes into map_create(), create a verifier
log from bpf_common_attr::log_buf/log_size/log_level, and report
map-creation failure reasons through that buffer.

This improves debuggability by allowing userspace to inspect why map
creation failed and read back log_true_size from common attributes.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260512153157.28382-7-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add syscall common attributes support for btf_load

BPF_BTF_LOAD can now take log parameters from both union bpf_attr and
struct bpf_common_attr, with the same merge rules as BPF_PROG_LOAD:

- if both sides provide a complete log tuple (buf/size/level) and they
match, use it;
- if only one side provides log parameters, use that one;
- if both sides provide complete tuples but they differ, return -EINVAL.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260512153157.28382-6-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Add syscall common attributes support for prog_load

BPF_PROG_LOAD can now take log parameters from both union bpf_attr and
struct bpf_common_attr. The merge rules are:

- if both sides provide a complete log tuple (buf/size/level) and they
match, use it;
- if only one side provides log parameters, use that one;
- if both sides provide complete tuples but they differ, return -EINVAL.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260512153157.28382-5-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Refactor reporting log_true_size for prog_load

The next commit will add support for reporting logs via extended common
attributes, including 'log_true_size'.

To prepare for that, refactor the 'log_true_size' reporting logic by
introducing a new struct bpf_log_attr to encapsulate log-related behavior:

* bpf_log_attr_init(): initialize log fields, which will support
extended common attributes in the next commit.
* bpf_log_attr_finalize(): handle log finalization and write back
'log_true_size' to userspace.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260512153157.28382-4-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Add support for extended BPF syscall

To support the extended BPF syscall introduced in the previous commit,
introduce the following internal APIs:

* 'sys_bpf_ext()'
* 'sys_bpf_ext_fd()'
  They wrap the raw 'syscall()' interface to support passing extended
  attributes.
* 'probe_sys_bpf_ext()'
  Check whether current kernel supports the BPF syscall common attributes.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260512153157.28382-3-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Extend BPF syscall with common attributes support

Add generic BPF syscall support for passing common attributes.

The initial set of common attributes includes:

1. 'log_buf': User-provided buffer for storing logs.
2. 'log_size': Size of the log buffer.
3. 'log_level': Log verbosity level.
4. 'log_true_size': Actual log size reported by kernel.

The common-attribute pointer and its size are passed as the 4th and 5th
syscall arguments. A new command bit, 'BPF_COMMON_ATTRS' ('1 << 16'),
indicates that common attributes are supplied.

This commit adds syscall and uapi plumbing. Command-specific handling is
added in follow-up patches.

Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260512153157.28382-2-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Use both hrtimer enqueue helpers in vmlinux test

The vmlinux selftest triggers nanosleep and checks that both kprobe
and fentry programs observe the hrtimer enqueue path.

After the hrtimer_start_expires_user() conversion [1], nanosleep
reaches hrtimer_start_range_ns_user() instead of
hrtimer_start_range_ns(). Hard-coding either symbol makes the test
fail either on bpf tree or on linux-next [2].

Update the test to resolve the target symbol at runtime via
libbpf_find_vmlinux_btf_id(). This is a nice example of how to modify
a BPF program to work on both older and newer kernel revision.

[1] https://lore.kernel.org/all/20260408114952.062400833@kernel.org/
[2] https://github.com/kernel-patches/bpf/actions/runs/25485909958/job/74782902203

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260509005730.250956-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'selftests-bpf-add-xdp-load-balancer-benchmark'

Puranjay Mohan says:

====================
selftests/bpf: Add XDP load-balancer benchmark

Changelog:
RFC: https://lore.kernel.org/all/20260420111726.2118636-1-puranjay@kernel.org/
Changes in v1:
- Replace bpf_get_cpu_time_counter() with bpf_ktime_get_ns()
- Replace bpf_repeat() with plain for loop and may_goto
- Refactor collect_measurements() to reuse bench_force_done()
- Remove histogram, verbose calibration output, and per-scenario status prints
- Trim run script table to p50/stddev/p99
- Set env.quiet when --machine-readable is passed
- Add || true to run script benchmark invocation for set -e safety
- Add bpf-nop benchmark as timing overhead baseline (patch 3)
- Use named struct for LRU inner map to fix build on older toolchains

This series adds an XDP load-balancer benchmark (based on Katran) to the BPF
selftest bench framework.

Motivation
----------

Existing BPF bench tests measure individual operations (map lookups,
kprobes, ring buffers) in isolation.  Production BPF programs combine
parsing, map lookups, branching, and packet rewriting in a single call
chain.  The performance characteristics of such programs depend on the
interaction of these operations -- register pressure, spills, inlining
decisions, branch layout -- which isolated micro-benchmarks do not
capture.

This benchmark implements a simplified L4 load-balancer modeled after
katran [1].  The BPF program reproduces katran's core datapath:

  L3/L4 parsing -> VIP hash lookup -> per-CPU LRU connection table
  with consistent-hash fallback -> real server selection -> per-VIP
  and per-real stats -> IPIP/IP6IP6 encapsulation

The BPF code exercises hash maps, array-of-maps (per-CPU LRU),
percpu arrays, jhash, bpf_xdp_adjust_head(), bpf_ktime_get_ns(),
and bpf_get_smp_processor_id() in a single pipeline.

This is intended as the first in a series of BPF workload benchmarks
covering other use cases (sched_ext, etc.).

Design
------

A userspace loop calling bpf_prog_test_run_opts(repeat=1) would
measure syscall overhead, not BPF program cost -- the ~4 ns early-exit
paths would be buried under kernel entry/exit.  Using repeat=N is
also unsuitable: the kernel re-runs the same packet without resetting
state between iterations, so the second iteration of an encap scenario
would process an already-encapsulated packet.

Instead, timing is measured inside the BPF program using
bpf_ktime_get_ns().  BENCH_BPF_LOOP() brackets N iterations with
timestamp reads using a plain for loop with may_goto, runs a
caller-supplied reset block between iterations to undo side effects
(e.g. strip encapsulation), and records the elapsed time per batch.
One extra untimed iteration runs afterward for output validation.

Auto-calibration picks a batch size targeting ~10 ms per invocation.
A proportionality sanity check verifies that 2N iterations take ~2x
as long as N.

24 scenarios cover the code-path matrix:

  - Protocol: TCP, UDP
  - Address family: IPv4, IPv6, cross-AF (IPv4-in-IPv6)
  - LRU state: hit, miss (16M flow space), diverse (4K flows), cold
  - Consistent-hash: direct (LRU bypass)
  - TCP flags: SYN (skip LRU, force CH), RST (skip LRU insert)
  - Early exits: unknown VIP, non-IP, ICMP, fragments, IP options

Each scenario validates correctness before benchmarking by comparing
the output packet byte-for-byte against a pre-built expected packet
and checking BPF map counters.

Sample single-scenario output:

  $ sudo ./bench xdp-lb --scenario tcp-v4-lru-hit
  Setting up benchmark 'xdp-lb'...
  Benchmark 'xdp-lb' started.
  tcp-v4-lru-hit: median 74.51 ns/op, stddev 0.11, p99 74.81 (202 samples)

Sample run script output:

  $ ./benchs/run_bench_xdp_lb.sh

  XDP load-balancer benchmark
  ===========================
  +----------------------------------+----------+---------+----------+
  | Single-flow baseline             |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | tcp-v4-lru-hit                   |    74.30 |    0.08 |    74.48 |
  | tcp-v4-ch                        |   101.73 |    0.11 |   102.01 |
  | tcp-v6-lru-hit                   |    76.77 |    0.14 |    77.04 |
  | tcp-v6-ch                        |   121.40 |    0.10 |   121.65 |
  | udp-v4-lru-hit                   |   107.42 |    0.22 |   107.90 |
  | udp-v6-lru-hit                   |   110.21 |    0.12 |   110.45 |
  | tcp-v4v6-lru-hit                 |    74.82 |    0.35 |    75.43 |
  +----------------------------------+----------+---------+----------+
  | Diverse flows (4K src addrs)     |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | tcp-v4-lru-diverse               |    86.63 |    0.37 |    89.04 |
  | tcp-v4-ch-diverse                |   104.09 |    0.19 |   105.67 |
  | tcp-v6-lru-diverse               |    89.34 |    0.42 |    90.70 |
  | tcp-v6-ch-diverse                |   122.20 |    0.21 |   123.78 |
  | udp-v4-lru-diverse               |   119.37 |    0.58 |   123.10 |
  +----------------------------------+----------+---------+----------+
  | TCP flags                        |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | tcp-v4-syn                       |   165.52 |   15.68 |   198.34 |
  | tcp-v4-rst-miss                  |   161.34 |    2.69 |   172.64 |
  +----------------------------------+----------+---------+----------+
  | LRU stress                       |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | tcp-v4-lru-miss                  |   440.39 |   35.75 |   550.62 |
  | udp-v4-lru-miss                  |   571.88 |   57.38 |   680.61 |
  | tcp-v4-lru-warmup                |   317.75 |    9.55 |   356.20 |
  +----------------------------------+----------+---------+----------+
  | Early exits                      |      p50 |  stddev |      p99 |
  +----------------------------------+----------+---------+----------+
  | pass-v4-no-vip                   |    18.26 |    0.13 |    18.66 |
  | pass-v6-no-vip                   |    19.08 |    0.01 |    19.10 |
  | pass-v4-icmp                     |     6.81 |    0.02 |     6.86 |
  | pass-non-ip                      |     5.71 |    0.03 |     5.76 |
  | drop-v4-frag                     |     6.09 |    0.01 |     6.10 |
  | drop-v4-options                  |     5.88 |    0.00 |     5.89 |
  | drop-v6-frag                     |     6.00 |    0.03 |     6.04 |
  +----------------------------------+----------+---------+----------+

Patches
-------

Patch 1 adds bench_force_done() to the bench framework so benchmarks
can signal early completion when enough samples have been collected.

Patch 2 adds the shared BPF batch-timing library (BPF-side timing
arrays, BENCH_BPF_LOOP macro, userspace statistics and calibration).

Patch 3 adds a bpf-nop benchmark as a timing overhead baseline and
usage example for the timing library.

Patch 4 adds the common header shared between the BPF program and
userspace (flow_key, vip_definition, real_definition, encap helpers).

Patch 5 adds the XDP load-balancer BPF program.

Patch 6 adds the userspace benchmark driver with 24 scenarios,
packet construction, validation, and bench framework integration.

Patch 7 adds the run script for running all scenarios.

[1] https://github.com/facebookincubator/katran
====================

Link: https://patch.msgid.link/20260427232313.1582588-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add XDP load-balancer benchmark run script

Add a convenience script that runs all 24 XDP load-balancer scenarios
and formats the results as a table with median, stddev, and p99
columns.

./benchs/run_bench_xdp_lb.sh

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-8-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add XDP load-balancer benchmark driver

Wire up the userspace side of the XDP load-balancer benchmark.

24 scenarios cover the full code-path matrix: TCP/UDP, IPv4/IPv6,
cross-AF encap, LRU hit/miss/diverse/cold, consistent-hash bypass,
SYN/RST flag handling, and early exits (unknown VIP, non-IP, ICMP,
fragments, IP options).

Before benchmarking each scenario validates correctness: the output
packet is compared byte-for-byte against a pre-built expected packet
and BPF map counters are checked against the expected values.

Usage:
sudo ./bench -a -w3 -p1 xdp-lb --scenario tcp-v4-lru-hit
sudo ./bench xdp-lb --list-scenarios

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-7-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add XDP load-balancer BPF program

Add the BPF datapath for the XDP load-balancer benchmark, a
simplified L4 load-balancer inspired by katran.

The pipeline: L3/L4 parse -> VIP lookup -> per-CPU LRU connection
table or consistent-hash fallback -> real server lookup -> per-VIP
and per-real stats -> IPIP/IP6IP6 encapsulation. TCP SYN forces
the consistent-hash path (skipping LRU); TCP RST skips LRU insert
to avoid polluting the table.

process_packet() is marked __noinline so that the BENCH_BPF_LOOP
reset block (which strips encapsulation) operates on valid packet
pointers after bpf_xdp_adjust_head().

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-6-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add XDP load-balancer common definitions

Add the shared header for the XDP load-balancer benchmark. This
defines the data structures used by both the BPF program and
userspace: flow_key, vip_definition, real_definition, and the
stats/control structures.

Also provides the encapsulation source-address helpers shared
between the BPF datapath (for encap) and userspace (for building
expected output packets used in validation).

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-5-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add bpf-nop benchmark for timing overhead baseline

Add a minimal benchmark that measures the overhead of the batch-timing
infrastructure itself. The BPF program runs an empty BENCH_BPF_LOOP body
(~1.5-2 ns/op), establishing the floor cost that all timing-library
benchmarks include.

[root@virtme-ng tools/testing/selftests/bpf]# sudo ./bench -a -p8 bpf-nop
Setting up benchmark 'bpf-nop'...
Benchmark 'bpf-nop' started.
bpf-nop: median 1.82 ns/op, stddev 0.01, p99 1.86 (1754 samples)

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-4-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add BPF batch-timing library

Add a reusable timing library for BPF benchmarks that need to measure
BPF program execution time.

The BPF side (progs/bench_bpf_timing.bpf.h) provides per-CPU sample
arrays and BENCH_BPF_LOOP(), a macro that brackets batch_iters
iterations with bpf_ktime_get_ns() reads and records the elapsed time.
One extra untimed iteration runs afterward for output validation.

The userspace side (benchs/bench_bpf_timing.c) collects samples from
the skeleton BSS, computes percentile statistics, and auto-calibrates
batch_iters to target ~10 ms per batch.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add bench_force_done() for early benchmark completion

The bench framework waits for duration_sec to elapse before collecting
results. Benchmarks that know exactly how many samples they need can
call bench_force_done() to signal completion early, avoiding wasted
wall-clock time.

Also refactor collect_measurements() to reuse bench_force_done()
instead of open-coding the same mutex/cond_signal sequence.

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260427232313.1582588-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc3

Cross-merge BPF and other fixes after downstream PR.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'edac_urgent_for_v7.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras

Pull EDAC fix from Borislav Petkov:

- Fix a string leak in the versalnet driver

* tag 'edac_urgent_for_v7.1_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/versalnet: Fix device name memory leak

rxrpc: Also unshare DATA/RESPONSE packets when paged frags are present

The DATA-packet handler in rxrpc_input_call_event() and the RESPONSE
handler in rxrpc_verify_response() copy the skb to a linear one before
calling into the security ops only when skb_cloned() is true.  An skb
that is not cloned but still carries externally-owned paged fragments
(e.g. SKBFL_SHARED_FRAG set by splice() into a UDP socket via
__ip_append_data, or a chained skb_has_frag_list()) falls through to
the in-place decryption path, which binds the frag pages directly into
the AEAD/skcipher SGL via skb_to_sgvec().

Extend the gate to also unshare when skb_has_frag_list() or
skb_has_shared_frag() is true.  This catches the splice-loopback vector
and other externally-shared frag sources while preserving the
zero-copy fast path for skbs whose frags are kernel-private (e.g. NIC
page_pool RX, GRO).  The OOM/trace handling already in place is reused.

Fixes: d0d5c0cd1e71 ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Cc: stable@vger.kernel.org
Signed-off-by: Hyunwoo Kim <imv4bel@gmail.com>
Reviewed-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Acked-by: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux

Pull clk driver fixes from Stephen Boyd:

- Mark the DDR bus clk critical in the SpaceMiT driver so that
   boot doesn't fail

- Fix boot on Mobile EyeQ by creating the auxiliary device for
   the ethernet PHY

- Plug an OF node leak in Rockchip rk808 clk driver

* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
  clk: rk808: fix OF node reference imbalance
  MAINTAINERS: add myself as a reviewer for the clk subsystem
  reset: eyeq: drop device_set_of_node_from_dev() done by parent
  clk: eyeq: add EyeQ5 children auxiliary device for generic PHYs
  clk: eyeq: use the auxiliary device creation helper
  clk: spacemit: k3: mark top_dclk as CLK_IS_CRITICAL

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Fix sk_local_storage diag dump via netlink (Amery Hung)

- Fix off-by-one in arena direct-value access (Junyoung Jang)

- Reject TCP_NODELAY in bpf-tcp congestion control (KaFai Wan)

- Fix type confusion in bpf_*_sock() (Kuniyuki Iwashima)

- Reject TX-only AF_XDP sockets (Linpu Yu)

- Don't run arg-tracking analysis twice on main subprog (Paul Chaignon)

- Fix NULL pointer dereference in bpf_sk_storage_clone and fib lookup
   (Weiming Shi)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Fix off-by-one boundary validation in arena direct-value access
  xskmap: reject TX-only AF_XDP sockets
  bpf: Don't run arg-tracking analysis twice on main subprog
  bpf: Free reuseport cBPF prog after RCU grace period.
  bpf: tcp: Fix type confusion in sol_tcp_sockopt().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().
  bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().
  mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()
  selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.
  bpf: tcp: Fix type confusion in bpf_tcp_sock().
  tools/headers: Regenerate stddef.h to fix BPF selftests
  bpf: Fix sk_local_storage diag dumping uninitialized special fields
  bpf: Fix NULL pointer dereference in bpf_skb_fib_lookup()
  sockmap: Fix sk_psock_drop() race vs sock_map_{unhash,close,destroy}().
  bpf: Fix NULL pointer dereference in bpf_sk_storage_clone and diag paths
  selftests/bpf: Verify bpf-tcp-cc rejects TCP_NODELAY
  selftests/bpf: Test TCP_NODELAY in TCP hdr opt callbacks
  bpf: Reject TCP_NODELAY in bpf-tcp-cc
  bpf: Reject TCP_NODELAY in TCP header option callbacks

bpf: Fix off-by-one boundary validation in arena direct-value access

BPF_MAP_TYPE_ARENA accepts BPF_PSEUDO_MAP_VALUE offsets at exactly
the end of the arena mapping (off == arena_size). The boundary check
in arena_map_direct_value_addr() uses `>` instead of `>=`, which
incorrectly allows a one-past-end pointer to be accepted.

Change the condition to `>=` to correctly reject offsets that fall
outside the valid arena user_vm range.

Fixes: 317460317a02 ("bpf: Introduce bpf_arena.")
Signed-off-by: Junyoung Jang <graypanda.inzag@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260426172505.1947915-1-graypanda.inzag@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

xskmap: reject TX-only AF_XDP sockets

XSKMAP entries are used as redirect targets for incoming XDP frames.
A TX-only AF_XDP socket lacks an Rx ring and cannot handle redirected
traffic, but xsk_map_update_elem() currently allows such sockets to
be inserted into the map.

Redirecting packets to such a socket on the veth generic-XDP path
causes a kernel crash in xsk_generic_rcv().

This became possible after xsk_is_setup_for_bpf_map() was removed from
the XSKMAP update path, which allowed bound TX-only sockets to be
inserted into the map.

Reject TX-only sockets during XSKMAP updates to avoid the crash.
They remain fully operational for pure Tx purposes outside XSKMAP.

Fixes: 968be23ceaca ("xsk: Fix possible segfault at xskmap entry insertion")
Reported-by: Juefei Pu <tomapufckgml@gmail.com>
Reported-by: Yuan Tan <yuantan098@gmail.com>
Reported-by: Xin Liu <bird@lzu.edu.cn>
Signed-off-by: Yifan Wu <yifanwucs@gmail.com>
Signed-off-by: Linpu Yu <linpu5433@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20260508144344.694-1-linpu5433@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Don't run arg-tracking analysis twice on main subprog

Because subprog 0, the main subprog, is considered a global function,
we end up running the arg-tracking dataflow analysis twice on it. That
results in slightly longer verification but mostly in more verbose
verifier logs. This patch fixes it by keeping only the iteration over
global subprogs.

When running over all of Cilium's programs with BPF_LOG_LEVEL2, this
reduces verbosity by ~20% on average.

Fixes: bf0c571f7feb6 ("bpf: introduce forward arg-tracking dataflow analysis")
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/e4d7b53d4963ef520541a782f5fc8108a168877c.1778176504.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity fix from Eric Biggers:
"Fix a regression in overlayfs caused by an fsverity API change"

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
ovl: fix verity lazy-load guard broken by fsverity_active() semantic change

Merge tag 'rust-fixes-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull Rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:

    - Add 'bindgen' target to make UML 32-bit builds work with GCC

    - Disable two Clippy warnings ('collapsible_{if,match}')

  'pin-init' crate:

    - Fix unsoundness issue that created &'static references"

* tag 'rust-fixes-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: allow `clippy::collapsible_if` globally
  rust: allow `clippy::collapsible_match` globally
  rust: pin-init: fix incorrect accessor reference lifetime
  rust: pin-init: internal: move alignment check to `make_field_check`
  rust: arch: um: Fix building 32-bit UML with GCC

Merge tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- ads7871: Fix endianness bug in 16-bit register reads

- lm75: Fix configuration register writes and AS6200/TMP112 setup and
   alarm handling

- lm63: Fix TOCTOU problems

- corsair-psu: Close HID device on probe errors

- ltc2992: Fix overflow and threshold range

- Documentation: fix link to ideapad-laptop.c file

- Remove stale CONFIG_SENSORS_SBRMI Makefile reference

* tag 'hwmon-for-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (ads7871) Fix endianness bug in 16-bit register reads
  hwmon: (lm75) Fix configuration register writes.
  hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling
  hwmon: (lm63) Add locking to avoid TOCTOU
  hwmon: (corsair-psu) Close HID device on probe errors
  hwmon: Remove stale CONFIG_SENSORS_SBRMI Makefile reference
  Documentation: hwmon: fix link to ideapad-laptop.c file
  hwmon: (ltc2992) Fix u32 overflow in power read path
  hwmon: (ltc2992) Clamp threshold writes to hardware range

Merge tag 'staging-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging

Pull staging driver fixes from Greg KH:
"Here are two small staging driver fixes for 7.1-rc3.  They are:

   - vme_user root device leak fix

   - NULL dereference bugfix in the rtl8723bs driver

  Both of these have been in linux-next all this week with no reported
  issues"

* tag 'staging-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: rtl8723bs: os_dep: avoid NULL pointer dereference in rtw_cbuf_alloc
  staging: vme_user: fix root device leak on init failure

Merge tag 'usb-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb

Pull USB driver fixes from Greg KH:
"Here are some small USB driver fixes for 7.1-rc3 to resolve some
  reported issues, and a new device id. These are:

   - usblp driver heap leak fixes

   - ulpi driver memory leak fix

   - typec driver fixes

   - dwc3 driver fix

   - omap dma driver fix

   - new option driver device id addition

  All of these have been in linux-next for over a week with no reported
  issues"

* tag 'usb-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
  USB: serial: option: add Telit Cinterion LE910Cx compositions
  usb: usblp: fix uninitialized heap leak via LPGETSTATUS ioctl
  usb: usblp: fix heap leak in IEEE 1284 device ID via short response
  usb: dwc3: Move GUID programming after PHY initialization
  usb: typec: tcpm: fix debug accessory mode detection for sink ports
  usb: typec: tcpm: reset internal port states on soft reset AMS
  usb: ulpi: fix memory leak on ulpi_register() error paths
  USB: omap_udc: DMA: Don't enable burst 4 mode

Merge tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- sanitize more input parameters in the core (found by syzkaller)

- usual set of driver fixes (proper completion handling, applying
   quirks, correct workqueue selection...)

- ID additions to simplify dependency handling

- new email address for Peter Rosin

* tag 'i2c-for-7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: smbus: reject oversized block transfers in the common path
  MAINTAINERS: Update mail for Peter Rosin
  i2c: stub: Reject I2C block transfers with invalid length
  i2c: Compare the return value of gpiod_get_direction against GPIO_LINE_DIRECTION_OUT
  i2c: dev: prevent integer overflow in I2C_TIMEOUT ioctl
  i2c: acpi: Add ELAN0678 to i2c_acpi_force_100khz_device_ids
  dt-bindings: i2c: apple,i2c: Add t8122 compatible
  i2c: stm32f7: reinit_completion() per transfer not per msg
  dt-bindings: i2c: amlogic: Add compatible for T7 SOC
  i2c: testunit: Replace system_long_wq with system_dfl_long_wq

Merge tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Madhavan Srinivasan:

- Fix KASAN sanitization flag for core_$(BITS).o

- Fixes for handling offset values in pseries htmdump

- Fix interrupt mask in cpm1_gpiochip_add16()

- ps3/pasemi fixes to drop redundant result assignment

- Fixes in papr-hvpipe code path

- powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events

Thanks to Aboorva Devarajan, Athira Rajeev, Christophe Leroy (CS GROUP),
Geert Uytterhoeven, Haren Myneni, Krzysztof Kozlowski, Mukesh Kumar
Chaurasiya (IBM), Nathan Chancellor, Ritesh Harjani (IBM), Shivani
Nittor, Sourabh Jain, Thomas Zimmermann, and Venkat Rao Bagalkote.

* tag 'powerpc-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (21 commits)
  powerpc/pasemi: Drop redundant res assignment
  powerpc/ps3: Drop redundant result assignment
  powerpc/vdso: Drop -DCC_USING_PATCHABLE_FUNCTION_ENTRY from 32-bit flags with clang
  arch/powerpc: Drop CONFIG_FIRMWARE_EDID from defconfig files
  powerpc/perf: Update check for PERF_SAMPLE_DATA_SRC marked events
  powerpc/8xx: Fix interrupt mask in cpm1_gpiochip_add16()
  powerpc/vmx: avoid KASAN instrumentation in enter_vmx_ops() for kexec
  powerpc/kdump: fix KASAN sanitization flag for core_$(BITS).o
  pseries/papr-hvpipe: Fix style and checkpatch issues in enable_hvpipe_IRQ()
  pseries/papr-hvpipe: Refactor and simplify hvpipe_rtas_recv_msg()
  pseries/papr-hvpipe: Kill task_struct pointer from struct hvpipe_source_info
  pseries/papr-hvpipe: Simplify spin unlock usage in papr_hvpipe_handle_release()
  pseries/papr-hvpipe: Fix the usage of copy_to_user()
  pseries/papr-hvpipe: Fix & simplify error handling in papr_hvpipe_init()
  pseries/papr-hvpipe: Fix null ptr deref in papr_hvpipe_dev_create_handle()
  pseries/papr-hvpipe: Prevent kernel stack memory leak to userspace
  pseries/papr-hvpipe: Fix race with interrupt handler
  powerpc/pseries/htmdump: Add memory configuration dump support to htmdump module
  powerpc/pseries/htmdump: Fix the offset value used in htm status dump
  powerpc/pseries/htmdump: Fix the offset value used in processor configuration dump
  ...

Merge tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

- Fix memory map enumeration bug in the Xen e820 parsing code (Juergen
   Gross)

- Re-enable e820 BIOS fallback if e820 table is empty (David Gow)

* tag 'x86-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/boot/e820: Re-enable BIOS fallback if e820 table is empty
  x86/xen: Fix a potential problem in xen_e820_resolve_conflicts()

Merge tag 'timers-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timer fix from Ingo Molnar:
"Fix CPU hotplug activation race in the timer migration code, by
Frederic Weisbecker"

* tag 'timers-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
timers/migration: Fix another hotplug activation race

Merge tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull scheduler fixes from Ingo Molnar:

- Fix spurious failures in rseq self-tests (Mark Brown)

- Fix rseq rseq::cpu_id_start ABI regression due to TCMalloc's creative
   use of the supposedly read-only field

   The fix is to introduce a new ABI variant based on a new (larger)
   rseq area registration size, to keep the TCMalloc use of rseq
   backwards compatible on new kernels (Thomas Gleixner)

- Fix wakeup_preempt_fair() for not waking up task (Vincent Guittot)

- Fix s64 mult overflow in vruntime_eligible() (Zhan Xusheng)

* tag 'sched-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/fair: Fix wakeup_preempt_fair() for not waking up task
  sched/fair: Fix overflow in vruntime_eligible()
  selftests/rseq: Expand for optimized RSEQ ABI v2
  rseq: Reenable performance optimizations conditionally
  rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode
  selftests/rseq: Validate legacy behavior
  selftests/rseq: Make registration flexible for legacy and optimized mode
  selftests/rseq: Skip tests if time slice extensions are not available
  rseq: Revert to historical performance killing behaviour
  rseq: Don't advertise time slice extensions if disabled
  rseq: Protect rseq_reset() against interrupts
  rseq: Set rseq::cpu_id_start to 0 on unregistration
  selftests/rseq: Don't run tests with runner scripts outside of the scripts

Merge tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull perf events fixes from Ingo Molnar:

- Fix deadlock in the perf_mmap() failure path (Peter Zijlstra)

- Intel ACR (Auto Counter Reload) fixes (Dapeng Mi):
     - Fix validation and configuration of ACR masks
     - Fix ACR rescheduling bug causing stale masks
     - Disable the PMI on ACR-enabled hardware
     - Enable ACR on Panther Cover uarch too

* tag 'perf-urgent-2026-05-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf/x86/intel: Enable auto counter reload for DMR
  perf/x86/intel: Disable PMI for self-reloaded ACR events
  perf/x86/intel: Always reprogram ACR events to prevent stale masks
  perf/x86/intel: Improve validation and configuration of ACR masks
  perf/core: Fix deadlock in perf_mmap() failure path

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fix from Catalin Marinas:

- ptrace(PTRACE_SETREGSET) fix to zero the target's fpsimd_state rather
than the tracer's

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64/fpsimd: ptrace: zero target's fpsimd_state, not the tracer's

Merge tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull PCI fixes from Bjorn Helgaas:

- Don't fallback to bus reset after failed slot reset; a bus reset
   isn't safe if the .reset_slot() callback is implemented (Keith Busch)

- Update saved_config_space upon resource assignment to fix passthrough
   regressions when x86 pcibios_assign_resources() updates BARs (Lukas
   Wunner)

- Initialize a temporary pci_dev->dev in sysfs 'new_id' attribute to
   fix a lockdep regression after driver_override was moved from PCI to
   device core (Samiullah Khawaja)

- Update MAINTAINERS email addresses (Marek Vasut, Hans Zhang)

- Add MAINTAINERS reviewer for PCIe Cadence IP (Aksh Garg)

* tag 'pci-v7.1-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer
  MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1
  MAINTAINERS: Update Marek Vasut email for PCIe R-Car
  PCI: Initialize temporary device in new_id_store()
  PCI: Update saved_config_space upon resource assignment
  PCI: Don't fallback to bus reset after failed slot reset

MAINTAINERS: Add Aksh Garg as PCIe CADENCE reviewer

I wish to contribute to the review process for Cadence PCIe IP drivers,
hence add myself as a reviewer.

Signed-off-by: Aksh Garg <a-garg7@ti.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260508060951.840233-1-a-garg7@ti.com

MAINTAINERS: Update Hans Zhang email for PCIe CIX Sky1

Update my email address as my work email account is no longer in use.

Signed-off-by: Hans Zhang <18255117159@163.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260508023006.1787674-1-18255117159@163.com

MAINTAINERS: Update Marek Vasut email for PCIe R-Car

Use up to date address. No functional change.

Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://patch.msgid.link/20260428052030.51101-1-marek.vasut+renesas@mailbox.org

PCI: Initialize temporary device in new_id_store()

When setting new_id of a PCI device driver using sysfs a lockdep splat
occurs. This is because new_id_store() builds a temporary pci_dev for
pci_match_device(), which calls device_match_driver_override().  That
depends on the driver_override.lock added by cb3d1049f4ea ("driver core:
generalize driver_override in struct device").

The new driver_override.lock was not initialized in the temporary pci_dev,
resulting in this lockdep splat.

Initialize the temporary pci_dev to fix this.

Repro:

  Build with CONFIG_LOCKDEP=y, boot with QEMU, and add a new ID:

  # echo "8086 10f5" > /sys/bus/pci/drivers/e1000e/new_id

  INFO: trying to register non-static key.
  The code is fine but needs lockdep annotation, or maybe
  you didn't initialize this object before use?
  turning off the locking correctness validator.
  CPU: 2 UID: 0 PID: 177 Comm: liveupdate-iomm Not tainted 7.0.0+ #9 PREEMPT(full)
  Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
  Call Trace:
   <TASK>
   dump_stack_lvl+0x5d/0x80
   register_lock_class+0x77e/0x790
   lock_acquire+0xbf/0x2e0
   pci_match_device+0x24/0x180
   new_id_store+0x189/0x1d0
   kernfs_fop_write_iter+0x14f/0x210
   vfs_write+0x263/0x5e0
   ksys_write+0x79/0xf0
   do_syscall_64+0x117/0xf80

Fixes: 10a4206a2401 ("PCI: use generic driver_override infrastructure")
Fixes: 8895d3bcb8ba ("PCI: Fail new_id for vendor/device values already built into driver")
Signed-off-by: Samiullah Khawaja <skhawaja@google.com>
[bhelgaas: add commit log details and repro, trim backtrace]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Danilo Krummrich <dakr@kernel.org>
Link: https://patch.msgid.link/20260505234327.716630-1-skhawaja@google.com

PCI: Update saved_config_space upon resource assignment

Bernd reports passthrough failure of a Digital Devices Cine S2 V6 DVB
adapter plugged into an ASRock X570S PG Riptide board with BIOS version
P5.41 (09/07/2023):

  ddbridge 0000:05:00.0: detected Digital Devices Cine S2 V6 DVB adapter
  ddbridge 0000:05:00.0: cannot read registers
  ddbridge 0000:05:00.0: fail

BIOS assigns an incorrect BAR to the DVB adapter which doesn't fit into the
upstream bridge window.  The kernel corrects the BAR assignment:

  pci 0000:07:00.0: BAR 0 [mem 0xfffffffffc500000-0xfffffffffc50ffff 64bit]: can't claim; no compatible bridge window
  pci 0000:07:00.0: BAR 0 [mem 0xfc500000-0xfc50ffff 64bit]: assigned

Correction of the BAR assignment happens in an x86-specific fs_initcall,
pcibios_assign_resources(), after device enumeration in a subsys_initcall.
This order was introduced at the behest of Linus in 2004:

  https://git.kernel.org/tglx/history/c/a06a30144bbc

No other architecture performs such a late BAR correction.

Bernd bisected the issue to commit a2f1e22390ac ("PCI/ERR: Ensure error
recoverability at all times"), but it only occurs in the absence of commit
4d4c10f763d7 ("PCI: Explicitly put devices into D0 when initializing").
This combination exists in stable kernel v6.12.70, but not in mainline,
hence Bernd cannot reproduce the issue with mainline.

Since a2f1e22390ac, config space is saved on enumeration, prior to BAR
correction.  Upon passthrough, the corrected BAR is overwritten with the
incorrect saved value by:

  vfio_pci_core_register_device()
    vfio_pci_set_power_state()
      pci_restore_state()

But only if the device's current_state is PCI_UNKNOWN, as it was prior to
commit 4d4c10f763d7.  Since the commit, it is PCI_D0, which changes the
behavior of vfio_pci_set_power_state() to no longer restore the state
without saving it first.

Alexandre is reporting the same issue as Bernd, but in his case, mainline
is affected as well.  The difference is that on Alexandre's system, the
host kernel binds a driver to the device which is unbound prior to
passthrough, whereas on Bernd's system no driver gets bound by the host
kernel.

Unbinding sets current_state to PCI_UNKNOWN in pci_device_remove(), so when
vfio-pci is subsequently bound to the device, pci_restore_state() is once
again called without invoking pci_save_state() first.

To robustly fix the issue, always update saved_config_space upon resource
assignment.

Reported-by: Bernd Schumacher <bernd@bschu.de>
Closes: https://lore.kernel.org/r/acfZrlP0Ua_5D3U4@eldamar.lan/
Reported-by: Alexandre N. <an.tech@mailo.com>
Closes: https://lore.kernel.org/r/dd3c3358-de0f-4a56-9c81-04aceaab4058@mailo.com/
Fixes: a2f1e22390ac ("PCI/ERR: Ensure error recoverability at all times")
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Bernd Schumacher <bernd@bschu.de>
Tested-by: Alexandre N. <an.tech@mailo.com>
Cc: stable@vger.kernel.org # v6.12+
Link: https://patch.msgid.link/febc3f354e0c1f5a9f5b3ee9ffddaa44caccf651.1776268054.git.lukas@wunner.de

bpf: Free reuseport cBPF prog after RCU grace period.

Eulgyu Kim reported the splat below with a repro. [0]

The repro sets up a UDP reuseport group with a cBPF prog and
replaces it with a new one while another thread is sending
a UDP packet to the group.

The reuseport prog is freed by sk_reuseport_prog_free().
bpf_prog_put() is called for "e"BPF prog to destruct through
multiple stages while cBPF prog is freed immediately by
bpf_release_orig_filter() and bpf_prog_free().

If a reuseport prog is detached from the setsockopt() path
(reuseport_attach_prog() or reuseport_detach_prog()),
sk_reuseport_prog_free() is called without waiting for RCU
readers to complete, resulting in various bugs.

Let's defer freeing the reuseport cBPF prog after one RCU
grace period.

Note "e"BPF prog is safe as is unless the fast path starts
to touch fields destroyed in bpf_prog_put_deferred() and
__bpf_prog_put_noref().

[0]:
BUG: KASAN: vmalloc-out-of-bounds in reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
Read of size 4 at addr ffffc9000051e004 by task slowme/10208
CPU: 6 UID: 1000 PID: 10208 Comm: slowme Not tainted 7.0.0-geb7ac95ff75e #32 PREEMPT(full)
Hardware name: QEMU Ubuntu 24.04 PC v2 (i440FX + PIIX, arch_caps fix, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
Call Trace:
<IRQ>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_address_description mm/kasan/report.c:378 [inline]
print_report+0xca/0x240 mm/kasan/report.c:482
kasan_report+0x118/0x150 mm/kasan/report.c:595
reuseport_select_sock+0xedc/0x1220 net/core/sock_reuseport.c:596
udp4_lib_lookup2+0x3bc/0x950 net/ipv4/udp.c:495
__udp4_lib_lookup+0x768/0xe20 net/ipv4/udp.c:723
__udp4_lib_lookup_skb+0x297/0x390 net/ipv4/udp.c:752
__udp4_lib_rcv+0x1312/0x2620 net/ipv4/udp.c:2752
ip_protocol_deliver_rcu+0x282/0x440 net/ipv4/ip_input.c:207
ip_local_deliver_finish+0x3bb/0x6f0 net/ipv4/ip_input.c:241
NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
NF_HOOK+0x30c/0x3a0 include/linux/netfilter.h:318
__netif_receive_skb_one_core net/core/dev.c:6181 [inline]
__netif_receive_skb net/core/dev.c:6294 [inline]
process_backlog+0xaa4/0x1960 net/core/dev.c:6645
__napi_poll+0xae/0x340 net/core/dev.c:7709
napi_poll net/core/dev.c:7772 [inline]
net_rx_action+0x5d7/0xf50 net/core/dev.c:7929
handle_softirqs+0x22b/0x870 kernel/softirq.c:622
do_softirq+0x76/0xd0 kernel/softirq.c:523
</IRQ>
<TASK>
__local_bh_enable_ip+0xf8/0x130 kernel/softirq.c:450
local_bh_enable include/linux/bottom_half.h:33 [inline]
rcu_read_unlock_bh include/linux/rcupdate.h:924 [inline]
__dev_queue_xmit+0x1dd7/0x3710 net/core/dev.c:4890
neigh_output include/net/neighbour.h:556 [inline]
ip_finish_output2+0xca9/0x1070 net/ipv4/ip_output.c:237
NF_HOOK_COND include/linux/netfilter.h:307 [inline]
ip_output+0x29f/0x450 net/ipv4/ip_output.c:438
ip_send_skb+0x45/0xc0 net/ipv4/ip_output.c:1508
udp_send_skb+0xb04/0x1510 net/ipv4/udp.c:1195
udp_sendmsg+0x1a71/0x2350 net/ipv4/udp.c:1485
sock_sendmsg_nosec net/socket.c:727 [inline]
__sock_sendmsg net/socket.c:742 [inline]
__sys_sendto+0x554/0x680 net/socket.c:2206
__do_sys_sendto net/socket.c:2213 [inline]
__se_sys_sendto net/socket.c:2209 [inline]
__x64_sys_sendto+0xde/0x100 net/socket.c:2209
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0x160/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x415a2d
Code: b3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007f6bc31e41e8 EFLAGS: 00000212 ORIG_RAX: 000000000000002c
RAX: ffffffffffffffda RBX: 00007f6bc31e4cdc RCX: 0000000000415a2d
RDX: 0000000000000001 RSI: 00007f6bc31e421f RDI: 0000000000000003
RBP: 00007f6bc31e4240 R08: 00007f6bc31e4220 R09: 0000000000000010
R10: 0000000000000000 R11: 0000000000000212 R12: 00007f6bc31e46c0
R13: ffffffffffffffb8 R14: 0000000000000000 R15: 00007ffc9b0d70b0
</TASK>

Fixes: 538950a1b752 ("soreuseport: setsockopt SO_ATTACH_REUSEPORT_[CE]BPF")
Reported-by: Eulgyu Kim <eulgyukim@snu.ac.kr>
Reported-by: Taeyang Lee <0wn@theori.io>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20260426012647.3233119-1-kuniyu@google.com

Merge tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- Fix for ublk not doing an actual issue from the task_work fallback
   path. Any request hitting that should be canceled automatically

- Fix for uring_cmd prep side handling, for the block side uring_cmd
   discard handling

- Fix for missing validation of the io and physical block size shifts

- Fix for a use-after-free in ublk's cancel command handling

* tag 'block-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  ublk: fix use-after-free in ublk_cancel_cmd()
  ublk: validate physical_bs_shift, io_min_shift and io_opt_shift
  block: only read from sqe on initial invocation of blkdev_uring_cmd()
  ublk: don't issue uring_cmd from fallback task work

Merge tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

- Ensure that the absolute timeouts for both the command side and the
   waiting side honor the callers time namespace

- Ensure tracked NAPI entries are cleared at unregistration time, as
   the NAPI polling loop checks the list state rather than the general
   NAPI state. This can lead to NAPI polling even after unregistration
   has been done. If unregistered, all NAPI polling should be disabled

- Fix for eventfd recursive invocation handling

* tag 'io_uring-7.1-20260508' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/wait: honour caller's time namespace for IORING_ENTER_ABS_TIMER
  io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS
  io_uring/eventfd: reset deferred signal state
  io_uring/napi: clear tracked NAPI entries on unregister

Merge branch 'bpf-tcp-fix-type-confusion-in-bpf-helper-functions'

Kuniyuki Iwashima says:

====================
bpf: tcp: Fix type confusion in bpf helper functions.

bpf_tcp_sock() only check if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

  socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

The same issues exist in other bpf functions:

  * bpf_mptcp_sock_from_subflow()
  * bpf_skc_to_tcp_sock()
  * bpf_skc_to_tcp6_sock()
  * sol_tcp_sockopt()

Patch 1 fixes bpf_tcp_sock() and Patch 2 adds a test for it.
Patch 3 ~ 6 fix the rest of the functions above.

Changes:
  v2:
    * Inverse if (err) to if (!err) in the selftest
    * Add patch 3 ~ 6

  v1: https://lore.kernel.org/bpf/20260430184405.1227386-1-kuniyu@google.com/
      https://lore.kernel.org/mptcp/20260430-mptcp-bpf-mptcp-sock-type-v1-1-d2ed5cda7da9@kernel.org/
====================

Link: https://patch.msgid.link/20260504210610.180150-1-kuniyu@google.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

bpf: tcp: Fix type confusion in sol_tcp_sockopt().

sol_tcp_sockopt() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Note that initially sol_tcp_sockopt() checked sk->sk_prot->setsockopt.

Fixes: 2ab42c7b871f ("bpf: Check the protocol of a sock to agree the calls to bpf_setsockopt().")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-7-kuniyu@google.com

bpf: tcp: Fix type confusion in bpf_skc_to_tcp6_sock().

bpf_skc_to_tcp6_sock() only checks if sk->sk_protocol is IPPROTO_TCP
and sk->sk_family is AF_INET6, but RAW socket can bypass it:

socket(AF_INET6, SOCK_RAW, IPPROTO_TCP)

Let's check sk->sk_type too.

Fixes: af7ec1383361 ("bpf: Add bpf_skc_to_tcp6_sock() helper")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-6-kuniyu@google.com

bpf: tcp: Fix type confusion in bpf_skc_to_tcp_sock().

bpf_skc_to_tcp_sock() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Let's use sk_is_tcp().

Fixes: 478cfbdf5f13 ("bpf: Add bpf_skc_to_{tcp, tcp_timewait, tcp_request}_sock() helpers")
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-5-kuniyu@google.com

mptcp: bpf: Fix type confusion in bpf_mptcp_sock_from_subflow()

bpf_mptcp_sock_from_subflow() only checks if sk->sk_protocol is
IPPROTO_TCP, but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

In this case, it would NOT be valid to call sk_is_mptcp() which will
assume sk is a pointer to a struct tcp_sock, and wrongly checks for:
tcp_sk(sk)->is_mptcp.

Fixes: 3bc253c2e652 ("bpf: Add bpf_skc_to_mptcp_sock_proto")
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260504210610.180150-4-kuniyu@google.com

selftest: bpf: Add test for bpf_tcp_sock() and RAW socket.

Let's extend sockopt_sk.c to cover bpf_tcp_sock() for the
wrong socket type.

Before:
  # ./test_progs -t sockopt_sk
  [  151.948613] ==================================================================
  [  151.951376] BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt+0xc7/0x8e0
  [  151.954159] Read of size 8 at addr ffff88801083d760 by task test_progs/1259
  ...
  run_test:FAIL:getsetsockopt unexpected error: -1 (errno 0)
  #427     sockopt_sk:FAIL

After:
  #427     sockopt_sk:OK

While at it, missing free() is fixed up.

Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20260504210610.180150-3-kuniyu@google.com

Merge tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:

- Fix for two ACL issues (security fix to validate dacloffset better
   and chmod fix)

- Fix out of bounds reads (in check_wsl_eas and smb2_check_msg for
   symlinks)

- Two Kerberos fixes including an important one when AES-256 encryption
   chosen

- Fix open_cached_dir problem when directory leases disabled

* tag 'v7.1-rc3-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: validate dacloffset before building DACL pointers
  smb/client: fix out-of-bounds read in smb2_compound_op()
  smb/client: fix out-of-bounds read in symlink_data()
  smb: client: Zero-pad short GSS session keys per MS-SMB2
  smb: client: Use FullSessionKey for AES-256 encryption key derivation
  smb: client: use kzalloc to zero-initialize security descriptor buffer
  cifs: abort open_cached_dir if we don't request leases

Merge tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"There's two main series here, fixing issues that came up in the
  Microchip QSPI and Freescale i.MX drivers. Both of those could result
  in some quite noticable issues if they were encountered in production.
  We also have one minor documentation fix in the ch341 driver"

* tag 'spi-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: ch341: correct company name in MODULE_DESCRIPTION
  spi: microchip-core-qspi: remove some inline markings
  spi: microchip-core-qspi: don't attempt to transmit during emulated read-only dual/quad operations
  spi: microchip-core-qspi: control built-in cs manually
  spi: imx: Propagate prepare_transfer() error from spi_imx_setupxfer()
  spi: imx: Fix UAF on package-1 prepare failure in spi_imx_dma_data_prepare()
  spi: imx: Fix precedence bug in spi_imx_dma_max_wml_find()

Merge tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fix from Mark Brown:
"A straightforward fix for an incorrect description of one of the
regulators on the Qualcomm PMH0101"

* tag 'regulator-fix-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: qcom-rpmh: Fix index for pmh0101 ldo16

bpf: tcp: Fix type confusion in bpf_tcp_sock().

bpf_tcp_sock() only checks if sk->sk_protocol is IPPROTO_TCP,
but RAW socket can bypass it:

socket(AF_INET, SOCK_RAW, IPPROTO_TCP)

Calling bpf_setsockopt() in SOCKOPT prog triggers out-of-bounds
access to another slab object. [0]

Let's use sk_is_tcp().

[0]:
BUG: KASAN: slab-out-of-bounds in sol_tcp_sockopt (net/core/filter.c:5519)
Read of size 8 at addr ffff88801083d760 by task test_progs/1259

CPU: 1 UID: 0 PID: 1259 Comm: test_progs Tainted: G OE 7.0.0-11175-gb5c111f4967b #1 PREEMPT(full)
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-debian-1.17.0-1 04/01/2014
Call Trace:
<TASK>
dump_stack_lvl (lib/dump_stack.c:94 lib/dump_stack.c:120)
print_report (mm/kasan/report.c:378 mm/kasan/report.c:482)
kasan_report (mm/kasan/report.c:595)
sol_tcp_sockopt (net/core/filter.c:5519)
__bpf_getsockopt (net/core/filter.c:5633)
bpf_sk_getsockopt (net/core/filter.c:5654)
bpf_prog_629ba00a1601e9f2__setsockopt+0x86/0x22c
__cgroup_bpf_run_filter_setsockopt (./include/linux/bpf.h:1402 ./include/linux/filter.h:722 ./include/linux/filter.h:729 kernel/bpf/cgroup.c:81 kernel/bpf/cgroup.c:2026)
do_sock_setsockopt (net/socket.c:2363)
__x64_sys_setsockopt (net/socket.c:2406)
do_syscall_64 (arch/x86/entry/syscall_64.c:63)
entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:121)
RIP: 0033:0x7f85f82fe7de
Code: 55 48 63 c9 48 63 ff 45 89 c9 48 89 e5 48 83 ec 08 6a 2c e8 34 69 f7 ff c9 c3 66 90 f3 0f 1e fa 49 89 ca b8 36 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 e1
RSP: 002b:00007ffe59dcecd8 EFLAGS: 00000202 ORIG_RAX: 0000000000000036
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f85f82fe7de
RDX: 000000000000001c RSI: 0000000000000006 RDI: 000000000000000d
RBP: 00007ffe59dcef20 R08: 000000000000003c R09: 0000000000000000
R10: 00007ffe59dcef00 R11: 0000000000000202 R12: 00007ffe59dcf268
R13: 0000000000000003 R14: 00007f85f9da5000 R15: 000055b2f3201400
</TASK>

The buggy address belongs to the object at ffff88801083d280
which belongs to the cache RAW of size 1792
The buggy address is located 1248 bytes inside of
allocated 1792-byte region [ffff88801083d280, ffff88801083d980)

Fixes: 655a51e536c0 ("bpf: Add struct bpf_tcp_sock and BPF_FUNC_tcp_sock")
Reported-by: Damiano Melotti <melotti@google.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://patch.msgid.link/20260504210610.180150-2-kuniyu@google.com

Merge tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Weekly fixes, lots of them but all pretty small, amdgpu and xe are the
  usual but then a large amount of fixes all over.

  core:
   - fix race condition in handle change ioctl

  fb-helper:
   - fix clipping

  rust:
   - fix unsound initialization
   - fix GEM state cleanup
   - fix wrong ARef import

  ttm:
   - update GPU MM stats on pool shrinking

  i915:
   - Re-enable ccs modifiers on dg2

  nova:
   - fix mailing list

  xe:
   - Add NULL check for media_gt in intel_hdcp_gsc_check_status
   - Fix EAGAIN sign in pf_migration_consume
   - Fix MMIO access using PF view instead of VF view during migration
   - Exclude indirect ring state page from ADS engine state size

  amdgpu:
   - GFX9 fixes
   - Hawaii SMU fixes
   - SDMA4 fix
   - GART fix
   - Userq fixes

  amdkfd:
   - GPUVM TLB flush fix
   - Hotplug fix

  radeon:
   - Hawaii SMU fixes

  bochs:
   - fix managed cleanup

  bridge:
   - tda998x: fix sparse warnings on type correctness

  etnaviv:
   - schedule armed jobs

  exynos:
   - managed bridge cleanup

  ivpu:
   - disallow reexport of GEM buffer objects

  noveau:
   - revert support for GA100

  panel:
   - boe-tv101wum-nl16: use correct MIPI_DSI mode
   - feyjang-fy07024di26a30d: fix error reporting
   - himax-hx83102: use correct MIPI_DSI mode
   - himax-hx83121a: fix error checks
   - himax-hx83121a: select DRM_DISPLAY_DSC_HELPER

  qaic:
   - fix RAS message handling

  qxl:
   - clean up polling

  sti:
   - managed bridge cleanup

* tag 'drm-fixes-2026-05-08-1' of https://gitlab.freedesktop.org/drm/kernel: (37 commits)
  drm: Set old handle to NULL before prime swap in change_handle
  drm/bochs: Drop manual put on probe error path
  drm/xe/guc: Exclude indirect ring state page from ADS engine state size
  drm/xe/pf: Fix MMIO access using PF view instead of VF view during migration
  drm/xe/pf: Fix EAGAIN sign in pf_migration_consume()
  drm/xe/hdcp: Add NULL check for media_gt in intel_hdcp_gsc_check_status()
  drm/exynos: remove bridge when component_add fails
  drm/amdgpu: nuke amdgpu_userq_fence_slab v2
  drm/amdgpu/userq: fix access to stale wptr mapping
  drm/amdkfd: Check if there are kfd porcesses using adev by kfd_processes_count
  drm/amdgpu: zero-initialize GART table on allocation
  drm/amdgpu/sdma4: replace BUG_ON with WARN_ON in fence emission
  drm/radeon: add missing revision check for CI
  drm/amdgpu/pm: align Hawaii mclk workaround with radeon
  drm/amdgpu/pm: add missing revision check for CI
  drm/amdgpu/gfx9: drop unnecessary 64-bit fence flag check in KIQ
  drm/amdkfd: Make all TLB-flushes heavy-weight
  drm/panel: himax-hx83102: restore MODE_LPM after sending disable cmds
  drm/panel: boe-tv101wum-nl6: restore MODE_LPM after sending disable cmds
  drm/panel: feiyang-fy07024di26a30d: return display-on error
  ...

Merge tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial into usb-linus

Johan writes:

USB serial device ids for 7.1-rc3

Here are some new modem device ids.

This one has been in linux-next with no reported issues.

* tag 'usb-serial-7.1-rc3' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/johan/usb-serial:
USB: serial: option: add Telit Cinterion LE910Cx compositions

Merge tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux

Pull iommu fixes from Joerg Roedel:
"Core:
   - Cache-flushing fix for non-x86 platforms

  AMD-Vi:
   - Security fix when SEV-SNP is enabled
   - Operator precedence fix in DTE setting"

* tag 'iommu-fixes-v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
  iommu/amd: Fix precedence order in set_dte_passthrough()
  iommu/pages: Fix iommu_pages_flush_incoherent() for non-x86
  iommu/amd: Use maximum PPR log buffer size when SNP is enabled on Family 0x19
  iommu/amd: Use maximum Event log buffer size when SNP is enabled on Family 0x19

ublk: fix use-after-free in ublk_cancel_cmd()

When ublk_reset_ch_dev() clears io->cmd via ublk_queue_reinit()
concurrently with ublk_cancel_cmd(), ublk_cancel_cmd() can read a
stale pointer and pass it to io_uring_cmd_done(), causing a
use-after-free.

Fix by synchronizing the two paths with ubq->cancel_lock:

- ublk_cancel_cmd(): read and clear io->cmd under cancel_lock,
  then call io_uring_cmd_done() on the saved local copy outside
  the lock.

- ublk_reset_ch_dev(): hold cancel_lock across ublk_queue_reinit()
  so that io->cmd and io->flags are cleared atomically with respect
  to ublk_cancel_cmd().

Fixes: 216c8f5ef0f2 ("ublk: replace monitor with cancelable uring_cmd")
Signed-off-by: Ming Lei <tom.leiming@gmail.com>
Link: https://patch.msgid.link/20260508123746.242018-1-tom.leiming@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>

drm: Set old handle to NULL before prime swap in change_handle

There was a potential race condition in change_handle. The ioctl
briefly had a single object with two idr entries; a concurrent
gem_close could delete the object and remove one of the handles
while leaving the other one dangling, which could subsequently
be dereferenced for a use-after-free.

To fix this, do the same dance that gem_close itself does.
(f6cd7daecff5 drm: Release driver references to handle before making it available again)
First idr_replace the old handle to NULL. Later, if the prime
operations are successful, actually close it.

create_tail required a similar dance to avoid a similar problem.
(bd46cece51a3 drm/gem: Fix race in drm_gem_handle_create_tail())
It idr_allocs the new handle with NULL, then swaps in the correct
object later to avoid races. We don't need to do that here, since
the only operations that could race are drm_prime, and
change_handle holds the prime lock for the entire duration.

v2: cleanups of error paths

Signed-off-by: David Francis <David.Francis@amd.com>
Co-authored-by: Dave Airlie <airlied@gmail.com>
Reported-by: Puttimet Thammasaeng <pwn8official@gmail.com>
Tested-by: Vitaly Prosyak <Vitaly.Prosyak@amd.com>
Cc: Simona Vetter <simona@ffwll.ch>
Cc: stable@vger.kernel.org
Cc: Christian Koenig <Christian.Koenig@amd.com>
Fixes: 53096728b8910 ("drm: Add DRM prime interface to reassign GEM handle")
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge tag 'amd-drm-fixes-7.1-2026-05-06' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-7.1-2026-05-06:

amdgpu:
- GFX9 fixes
- Hawaii SMU fixes
- SDMA4 fix
- GART fix
- Userq fixes

amdkfd:
- GPUVM TLB flush fix
- Hotplug fix

radeon:
- Hawaii SMU fixes

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260506154631.1733034-1-alexander.deucher@amd.com

Merge tag 'drm-misc-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

bochs:
- fix managed cleanup

bridge:
- tda998x: fix sparse warnings on type correctness

etnaviv:
- schedule armed jobs

exynos:
- managed bridge cleanup

fb-helper:
- fix clipping

ivpu:
- disallow reexport of GEM buffer objects

noveau:
- revert support for GA100

panel:
- boe-tv101wum-nl16: use correct MIPI_DSI mode
- feyjang-fy07024di26a30d: fix error reporting
- himax-hx83102: use correct MIPI_DSI mode
- himax-hx83121a: fix error checks
- himax-hx83121a: select DRM_DISPLAY_DSC_HELPER

qaic:
- fix RAS message handling

qxl:
- clean up polling

sti:
- managed bridge cleanup

ttm:
- update GPU MM stats on pool shrinking

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260507115213.GA206508@linux.fritz.box

Merge tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux

Pull selinux fixes from Paul Moore:

- Allow for multiple opens of /sys/fs/selinux/policy

   Prevent a single process from blocking others from reading the
   SELinux policy loaded in the kernel. This does have the side effect
   of potentially allowing userspace to trigger additional kernel memory
   allocations as part of the open/read operation, but this is mitigated
   by requiring the SELinux security/read_policy permission.

- Reduce the critical sections where the SELinux policy mutex is held

   This includes the patch to the policy loader code where we move the
   permission checks and an allocation outside the mutex as well as the
   the patch to checkreqprot which drops the code/lock entirely.

   While the checkreqprot code had effectively been dropped in an
   earlier release, portions of the code still remained that would have
   triggered the mutex to perform an IMA measurement. This finally drops
   all of that while preserving the user visible behavior.

- Eliminate potential sources of log spamming

   There were a few areas where processes could flood the system logs
   and hide other, more critical events. The previously disabled
   checkreqprot and runtime disable knobs in selinuxfs were two such
   areas that have now been greatly simplified and a pr_err() replaced
   with a pr_err_once().

   The third such place is the /sys/fs/selinux/user file, which hasn't
   been used by a userspace release since 2020 and was scheduled for
   removal after 2025; this effectively disables this functionality, but
   similar to checkreqprot, it is done in a way that should not break
   old userspace.

* tag 'selinux-pr-20260507' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux:
  selinux: shrink critical section in sel_write_load()
  selinux: allow multiple opens of /sys/fs/selinux/policy
  selinux: prune /sys/fs/selinux/user
  selinux: prune /sys/fs/selinux/disable
  selinux: prune /sys/fs/selinux/checkreqprot

hwmon: (ads7871) Fix endianness bug in 16-bit register reads

The ads7871_read_reg16() function relies on spi_w8r16() to read the
16-bit sensor output. The ADS7871 device transmits the Least Significant
Byte (LSB) first.

On Little-Endian architectures, spi_w8r16() correctly reconstructs the
16-bit value. However, on Big-Endian architectures, the byte swapping
causes the first received byte (LSB) to be placed in the most significant
byte of the u16, resulting in corrupted voltage readings.

To fix this, cast the integer result of spi_w8r16() to a restricted
__le16 type and convert it to the host CPU's native byte order using
le16_to_cpu(). Negative error codes returned by the SPI core are caught
and returned prior to the conversion to avoid mangling the error status.

Reported-by: Sashiko <sashiko-bot@kernel.org>
Closes: https://sashiko.dev/#/patchset/20260418034601.90226-1-tabreztalks@gmail.com
Fixes: e0c70b8078629 ("hwmon: add TI ads7871 a/d converter driver")
Suggested-by: David Laight <david.laight.linux@gmail.com>
Signed-off-by: Tabrez Ahmed <tabreztalks@gmail.com>
Link: https://lore.kernel.org/r/20260502020844.110038-2-tabreztalks@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

hwmon: (lm75) Fix configuration register writes.

Sensors configurations are defined by set and clear masks. These
do not follow a consistent "clear mask is a superset of set mask"
rule. This relaxed definition breaks lm75_write_config()

static inline int lm75_write_config(struct lm75_data *data, u16 set_mask,
u16 clr_mask)
{
return regmap_update_bits(data->regmap, LM75_REG_CONF,
clr_mask | LM75_SHUTDOWN, set_mask);
}

Basically all bits from set_mask that are not defined in clr_mask are
dropped. Fix that by enhancing the helper to always combine clr_mask
and set_mask into the mask bits of regmap_update_bits().

Fixes: 6da24a25f766 ("hwmon: (lm75) Hide register size differences in regmap access functions")
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://lore.kernel.org/r/20260502173207.3567876-3-markus.stockhausen@gmx.de
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

hwmon: (lm75) Fix AS6200 and TMP112 setup and alarm handling

The initialization of the AS6200 has two shortcomings

- The device-add-commit states "Conversion mode: continuous" but the
  the lm75_params structure uses set_mask = 0x94c0. This activates
  single shot mode (bit 15). According to the datasheet "The device
  features a single shot measurement mode if the device is in sleep
  mode (SM=1)". This is quite contradictionary.
- It is the only device that activates polarity active-high (bit 10)

All this is paired with a undefined clear mask bug in function
lm75_write_config() that was introduced with a later refactoring
commit.

[as6200] = {
.config_reg_16bits = true,
.set_mask = 0x94C0,
        -> .clr_mask not defined here
.default_resolution = 12,
...
static inline int lm75_write_config(struct lm75_data *data, u16 set_mask,
    u16 clr_mask)
{
return regmap_update_bits(data->regmap, LM75_REG_CONF,
  clr_mask | LM75_SHUTDOWN, set_mask);
}

regmap_update_bits() requires clr_mask to be a superset of set_mask.
So basically all sensors with "wrong" masks like the AS6200 are not
initialized as intended.

Fix that by

- Change the set_mask to 0xc010 to reflect the current active-low
  setup properly and to drive the sensor in continous mode. This
  takes into account that the config register is little endian and
  the first byte sent to the chip is the LSB.
- Adapt the alarm handling so it can report the alarm correctly
  even if it is high active. This is done by comparing config register
  bit 5 and 10 (translated to 2 and 13).

This commit does not introduce any ABI breakage as the mutliple bugs
effectly drive the AS6200 in standard active-low mode.

Fixes: 4b6358e1fe46 ("hwmon: (lm75) Add AMS AS6200 temperature sensor")
Suggested-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Markus Stockhausen <markus.stockhausen@gmx.de>
Link: https://lore.kernel.org/r/20260502173207.3567876-2-markus.stockhausen@gmx.de
[groeck: Update set_mask for as6200 further: As modeled, the upper bits
contain the conversion rate, so the config register needs to be set to
0xc010 instead of 0x10c0 to reflect 8 samples/s and 4 consecutive faults.
Fix the same problem for TMP112.]
Signed-off-by: Guenter Roeck <linux@roeck-us.net>

Merge tag 'drm-xe-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

UAPI Changes:

Cross-subsystem Changes:

Core Changes:

Driver Changes:
- Add NULL check for media_gt in intel_hdcp_gsc_check_status (Gustavo)
- Fix EAGAIN sign in pf_migration_consume (Shuicheng)
- Fix MMIO access using PF view instead of VF view during migration (Shuicheng)
- Exclude indirect ring state page from ADS engine state size (Satya)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Matthew Brost <matthew.brost@intel.com>
Link: https://patch.msgid.link/afw5lsrjE4pStEml@gsse-cloud1.jf.intel.com

Merge tag 'drm-rust-fixes-2026-05-07' of https://gitlab.freedesktop.org/drm/rust/kernel into drm-fixes

DRM Rust fixes for v7.1-rc3

- Fix unsound initialization in drm::Device::new(); if pinned
  initialization of drm::Device::Data fails, make sure
  drm::Device::release() isn't called, so we don't run the data's
  destructor

- Fix missing GEM state cleanup in the init failure case; call
  drm_gem_private_object_fini() if drm_gem_object_init() fails

- Fix wrong ARef import in the DRM shmem GEM helper abstraction

- Replace the nouveau mailing list with the new nova-gpu mailing list
  for both nova-core and nova-drm, and remove unused patchwork entries

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: "Danilo Krummrich" <dakr@kernel.org>
Link: https://patch.msgid.link/DIBZJ40ZC4J3.Y1DLA7JTS2PC@kernel.org

Merge tag 'drm-intel-fixes-2026-05-06' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Re-enable ccs modifiers on dg2 (Juha-Pekka Heikkila)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patch.msgid.link/aftSjG1D0-hKISDy@linux

smb: client: validate dacloffset before building DACL pointers

parse_sec_desc(), build_sec_desc(), and the chown path in
id_mode_to_cifs_acl() all add the server-supplied dacloffset to pntsd
before proving a DACL header fits inside the returned security
descriptor.

On 32-bit builds a malicious server can return dacloffset near
U32_MAX, wrap the derived DACL pointer below end_of_acl, and then slip
past the later pointer-based bounds checks. build_sec_desc() and
id_mode_to_cifs_acl() can then dereference DACL fields from the wrapped
pointer in the chmod/chown rewrite paths.

Validate dacloffset numerically before building any DACL pointer and
reuse the same helper at the three DACL entry points.

Fixes: bc3e9dd9d104 ("cifs: Change SIDs in ACEs while transferring file ownership.")
Cc: stable@vger.kernel.org
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: fix out-of-bounds read in smb2_compound_op()

If a server sends a truncated response but a large OutputBufferLength, and
terminates the EA list early, check_wsl_eas() returns success without
validating that the entire OutputBufferLength fits within iov_len.

Then smb2_compound_op() does:
memcpy(idata->wsl.eas, data[0], size[0]);

Where size[0] is OutputBufferLength. If iov_len is smaller than size[0],
memcpy can read beyond the end of the rsp_iov allocation and leak adjacent
kernel heap memory.

Link: https://lore.kernel.org/linux-cifs/d998240c-aca9-420d-9dbd-f5ba24af19e0@chenxiaosong.com/
Fixes: ea41367b2a60 ("smb: client: introduce SMB2_OP_QUERY_WSL_EA")
Cc: stable@vger.kernel.org
Signed-off-by: Zisen Ye <zisenye@stu.xidian.edu.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>

smb/client: fix out-of-bounds read in symlink_data()

Since smb2_check_message() returns success without length validation for
the symlink error response, in symlink_data() it is possible for
iov->iov_len to be smaller than sizeof(struct smb2_err_rsp). If the buffer
only contains the base SMB2 header (64 bytes), accessing
err->ErrorContextCount (at offset 66) or err->ByteCount later in
symlink_data() will cause an out-of-bounds read.

Link: https://lore.kernel.org/linux-cifs/297d8d9b-adf7-42fd-a1c2-5b1f230032bc@chenxiaosong.com/
Fixes: 76894f3e2f71 ("cifs: improve symlink handling for smb2+")
Cc: Stable@vger.kernel.org
Signed-off-by: Zisen Ye <zisenye@stu.xidian.edu.cn>
Reviewed-by: ChenXiaoSong <chenxiaosong@kylinos.cn>
Signed-off-by: Steve French <stfrench@microsoft.com>