git.ipfire.org Git - thirdparty/linux.git/log

]> git.ipfire.org Git - thirdparty/linux.git/log

projects / thirdparty / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

KP Singh [Mon, 1 Jun 2026 15:02:47 +0000 (17:02 +0200)]

selftests/bpf: Adjust verifier_map_ptr for the map's excl field

Adding the u32 excl field at offset 32 of struct bpf_map right after the
sha[SHA256_DIGEST_SIZE] hash shifts the ops pointer from offset 32 to 40.
Therefore, fix up the test case.

  # LDLIBS=-static PKG_CONFIG='pkg-config --static' ./vmtest.sh -- ./test_progs -t verifier_map_ptr
  [...]
  #637/1   verifier_map_ptr/bpf_map_ptr: read with negative offset rejected:OK
  #637/2   verifier_map_ptr/bpf_map_ptr: read with negative offset rejected @unpriv:OK
  #637/3   verifier_map_ptr/bpf_map_ptr: write rejected:OK
  #637/4   verifier_map_ptr/bpf_map_ptr: write rejected @unpriv:OK
  #637/5   verifier_map_ptr/bpf_map_ptr: read non-existent field rejected:OK
  #637/6   verifier_map_ptr/bpf_map_ptr: read non-existent field rejected @unpriv:OK
  #637/7   verifier_map_ptr/bpf_map_ptr: read ops field accepted:OK
  #637/8   verifier_map_ptr/bpf_map_ptr: read ops field accepted @unpriv:OK
  [...]
  Summary: 2/18 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: KP Singh <kpsingh@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260601150248.394863-7-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Borkmann [Mon, 1 Jun 2026 15:02:46 +0000 (17:02 +0200)]

libbpf: Skip max_entries override on signed loaders

bpf_gen__map_create() lets the host-supplied loader ctx override a
map's max_entries at runtime (map_desc[idx].max_entries, when non-zero).
This is how the light skeleton sizes maps to the target machine, but
it happens after emit_signature_match() and is covered by neither the
signed loader instructions nor the hashed blob.

For a signed loader this means an untrusted host can re-dimension the
program's maps, outside what the signature attests to. Gate the override
on gen_hash so signed loaders use the signer-provided max_entries baked
into the blob.

Fixes: ea923080c145 ("libbpf: Embed and verify the metadata hash in the loader")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260601150248.394863-6-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Borkmann [Mon, 1 Jun 2026 15:02:45 +0000 (17:02 +0200)]

libbpf: Skip initial_value override on signed loaders

bpf_gen__map_update_elem() emits code that, when the host-supplied
loader ctx provides a non-NULL map_desc[idx].initial_value, overwrites
the blob value with bytes read from the host (bpf_copy_from_user /
bpf_probe_read_kernel) before the BPF_MAP_UPDATE_ELEM that populates
the program's .data/.rodata/.bss maps.

This override runs after emit_signature_match() has validated map->sha[],
and initial_value is part of neither the signed loader instructions nor
the hashed data blob. For a signed loader this lets an untrusted host
substitute global-variable contents into a program whose code carries
a valid signature, thus weakening what the signature attests to.

The blob already contains the signer-provided value (added via add_data()
and covered by the embedded, signed hash), so simply skip emitting the
override for signed loaders (gen_hash). Runtime initialization stays
available for the unsigned light-skeleton path as before. The jump
offsets within the override block are internal to it, so guarding the
whole block leaves them unchanged.

Fixes: ea923080c145 ("libbpf: Embed and verify the metadata hash in the loader")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260601150248.394863-5-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

KP Singh [Mon, 1 Jun 2026 15:02:44 +0000 (17:02 +0200)]

libbpf: Reject non-exclusive metadata maps in the signed loader

The loader verifies map->sha against the metadata hash in its
instructions. map->sha is calculated when BPF_OBJ_GET_INFO_BY_FD is
called on the frozen map.

While the map is frozen, the /signed loader/ must also ensure the map
is exclusive, as, without exclusivity (which a hostile host could just
omit when loading the loader), another BPF program with map access can
mutate the contents afterwards, so the check passes on stale data.

With the extra check as part of the signed loader, it now refuses to
move on with map->sha validation if the host set it up wrongly.

Fixes: fb2b0e290147 ("libbpf: Update light skeleton for signing")
Signed-off-by: KP Singh <kpsingh@kernel.org>
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260601150248.394863-4-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Borkmann [Mon, 1 Jun 2026 15:02:43 +0000 (17:02 +0200)]

bpf: Drop redundant hash_buf from map_get_hash operation

bpf_map_get_info_by_fd() is the only caller of the ->map_get_hash
and always invokes it with hash_buf == map->sha and hash_buf_size
of SHA256_DIGEST_SIZE. array_map_get_hash() in turn lets sha256()
write the digest directly into that buffer (map->sha) and then
performs a trailing memcpy(), which evaluates to memcpy(map->sha,
map->sha, 32): a redundant self-copy. The hash_buf_size argument
was never used at all. Simplify this a bit, no functional change.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260601150248.394863-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Borkmann [Mon, 1 Jun 2026 15:02:42 +0000 (17:02 +0200)]

bpf: Reject exclusive maps as inner maps in map-in-map

An exclusive map (created with excl_prog_hash) is bound to a single
program by hash: check_map_prog_compatibility() refuses to load any
program whose digest does not match map->excl_prog_sha. That check
only runs for maps a program references directly, i.e. its used_maps.
A map reached at runtime through a map-of-maps is never in used_maps,
and bpf_map_meta_equal() does not consider excl_prog_sha, so an
exclusive map can be inserted into a non-exclusive outer map and
then looked up and mutated by an unrelated program, bypassing the
exclusivity guarantee.

For the signed loader this defeats the metadata map exclusivity check
added in the signed loader: the cached map->sha[] is validated against
the signed hash while another program on a hostile host rewrites the
frozen map's contents through the outer map.

Fixes: baefdbdf6812 ("bpf: Implement exclusive map creation")
Reported-by: sashiko <sashiko@sashiko.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260601150248.394863-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Tue, 2 Jun 2026 01:31:42 +0000 (18:31 -0700)]

Merge branch 'refactor-verifier-object-relationship-tracking'

Amery Hung says:

====================
Refactor verifier object relationship tracking

Hi all,

This patchset cleans up dynptr handling, refactors object relationship
tracking in the verifier by introducing parent_id and folding ref_obj_id
into id, and fixes dynptr use-after-free bugs where file/skb dynptrs
are not invalidated when the parent referenced object is freed.

* Motivation *

In BPF qdisc programs, an skb can be freed through kfuncs. However,
since dynptr does not track the parent referenced object (e.g., skb),
the verifier does not invalidate the dynptr after the skb is freed,
resulting in use-after-free. The same issue also affects file dynptr.

The figure below shows the current state of object tracking. The
verifier tracks objects using three fields: id for nullness tracking,
ref_obj_id for lifetime tracking, and dynptr_id for tracking the parent
dynptr of a slice (PTR_TO_MEM only). While dynptr_id links slices to
their parent dynptr, there is no field that links a dynptr back to its
parent skb. When the skb is freed via release_reference(ref_obj_id=1),
only objects with ref_obj_id=1 are invalidated. Since skb dynptr is
non-referenced (ref_obj_id=0), the dynptr and its derived slices remain
accessible.

Current: object (id, ref_obj_id, dynptr_id)
  id         = unique id of the object (for nullness tracking)
  ref_obj_id = id of the referenced object (for lifetime tracking)
  dynptr_id  = id of the parent dynptr (only for PTR_TO_MEM slices)

                      skb (0,1,0)
                             ^^
                          ! No link from dynptr to skb !
                             |+------------------------------+
                             |           bpf_dynptr_clone    |
                 dynptr A (2,0,0)                dynptr C (4,0,0)
                           ^                               ^
        bpf_dynptr_slice   |                               |
                           |                               |
              slice B (3,0,2)                 slice D (5,0,4)

* Why not simply use ref_obj_id to track the parent? *

A natural first approach is to link dynptr to its parent by sharing
the parent's ref_obj_id and propagating it to slices. Now, releasing
the skb via release_reference(ref_obj_id=1) correctly invalidates all
derived objects.

Attempted fix: share parent's ref_obj_id

                      skb (0,1,0)
                             ^^
                             ||
                             |+------------------------------+
                             |           bpf_dynptr_clone    |
                 dynptr A (2,1,0)                dynptr C (4,1,0)
                           ^                               ^
        bpf_dynptr_slice   |                               |
                           |                               |
              slice B (3,1,2)                 slice D (5,1,4)

However, this approach does not generalize to all dynptr types.
Referenced dynptrs such as file dynptr acquire their own ref_obj_id to
track the dynptr's lifetime. Since ref_obj_id is already used for the
dynptr's own reference, it cannot also be used to point to the parent
file object. While it is possible to add specialized handling for
individual dynptr types [0], it adds complexity and does not generalize.

An alternative approach is to avoid introducing a new field and instead
repurpose ref_obj_id as parent_id by folding lifetime tracking into id
[1]. In this design, each object is represented as (id, ref_obj_id)
where id is used for both nullness and lifetime tracking, and ref_obj_id
tracks the parent object's id.

Attempted: object (id, ref_obj_id)
  id         = id of the object (for nullness and lifetime tracking)
  ref_obj_id = id of the parent object
  '          = id is referenced

                        skb (1',0)
                             ^^
                             ||
        bpf_dynptr_from_skb  |+------------------------------+
                             |      bpf_dynptr_clone(A, C)   |
                 dynptr A (2,1')                 dynptr C (4,1')
                           ^                               ^
        bpf_dynptr_slice   |                               |
                           |                               |
                slice B (3,2)                   slice D (5,4)

However, this design cannot express the relationship between referenced
socket pointers and their casted counterparts. After pointer casting,
the original and casted pointers need the same lifetime (same ref_obj_id
in the current design) but different nullness (different id). The casted
pointer may be NULL even if the original is valid. With id serving as
the only field for both nullness and lifetime, and ref_obj_id repurposed
as parent, there is no way to express "different identity, same
lifetime."

Referenced socket pointer (expressed using current design):

                                C = ptr_casting_function(A)
                ptr A (1,1,0)                     ptr C (2,1,0)
                         ^                                 ^
                         |                                 |
                        ptr C may be NULL even if ptr A is valid
                        but they have the same lifetime

* New Design: parent_id with branch splitting and intermediate reference *

The patchset folds ref_obj_id into id and adds parent_id to
bpf_reg_state (patch 5). A child object's parent_id points to the
parent object's id. This replaces the PTR_TO_MEM-specific dynptr_id.
Whether a register is referenced is determined by checking if its id
appears in the reference array via reg_is_referenced() rather than
reading a dedicated ref_obj_id field.

Pointer casting:

The challenge with pointer casting is that a cast result may be NULL
even when the source is valid, requiring distinct identity but shared
lifetime. This is solved using branch splitting: when a helper like
bpf_sk_fullsock() is called with a referenced pointer, the verifier
pushes an explicit NULL branch and assigns the cast result the same id
as the source. Since the cast may return NULL for a non-NULL input, the
NULL case is explored as a separate verifier branch. This allows
releasing any of the original or cast pointers to invalidate all others,
while avoiding the need for a separate tracking mechanism.

Referenced dynptrs:

The challenge with referenced dynptrs is that clones of a referenced
dynptr have the same lifetime but different identities. When a
referenced dynptr is overwritten, only slices derived from it will be
invalidated. To solve this, the verifier creates an intermediate
reference. This reference serves as a shared lifetime anchor for the
dynptr and all its clones. All clones share the same parent_id but get
unique ids for independent slice tracking. Releasing a referenced dynptr
releases the intermediate reference, which in turn invalidates all
clones and their derived slices. If the parent object is released while
the intermediate reference still exists, it is reported as a leaked
reference.

Release cascading:

When releasing an object, release_reference() performs a stack-based DFS
to invalidate all descendants. It walks the object tree via parent_id
links, invalidating registers and dynptr stack slots. Child references
encountered during traversal are reported as leaked references.

parent_id is also added to bpf_reference_state to enable intermediate
reference. When acquiring a reference, a parent_id can be specified to
link the new reference to an existing one (e.g., file dynptr's
intermediate reference has parent_id linking to the file's reference).

Final: object (id, parent_id)
  id        = unique id of the object (for nullness and lifetime
              tracking)
  parent_id = id of the parent object (for object relationship
              tracking)
  I         = intermediate reference serving as lifetime anchor in
              acquired_refs
  '         = id is referenced (appears in reference array)

                          skb (1',0)
                               ^^
                               ||
          bpf_dynptr_from_skb  |+------------------------------+
                               |      bpf_dynptr_clone(A, C)   |
                   dynptr A (2,1')                 dynptr C (4,1')
                             ^                               ^
          bpf_dynptr_slice   |                               |
                             |                               |
                  slice B (3,2)                   slice D (5,4)

* Preserving reg->id after null-check *

For parent_id tracking to work, child objects need to refer to the
parent's id. This requires two preparatory changes: assigning reg->id
when reading referenced kptrs from program context (patch 3), and
preserving reg->id of pointer objects after null-check (patch 4).
Previously, null-check would clear reg->id, making it impossible for
children to reference the parent afterward. The latter causes a slight
increase in verified states for some programs. One selftest object
sees +19 states (+5.01%). For Meta BPF objects, the increase is
also minor, with the largest being +34 states (+3.63%).

* Object relationship in different scenarios (for reference) *

The figures below show how the final design handles all four
combinations of referenced/non-referenced dynptr with
referenced/non-referenced parent.

(1) Non-referenced dynptr with referenced parent (e.g., skb in Qdisc):

                          skb (1',0)
                               ^^
                               ||
          bpf_dynptr_from_skb  |+------------------------------+
                               |      bpf_dynptr_clone(A, C)   |
                   dynptr A (2,1')                 dynptr C (4,1')

                         dynptr A and C live independently

(2) Non-referenced dynptr with non-referenced parent (e.g., skb in TC,
    always valid):

      bpf_dynptr_from_skb
                                  bpf_dynptr_clone(A, C)
             dynptr A (1,0)                    dynptr C (2,0)

                         dynptr A and C live independently

(3) Referenced dynptr with referenced parent:

                     file (1',0)
                           ^
     bpf_dynptr_from_file  |
                     I (2',1')  <-- intermediate reference
                        ^^
                        ||
                        |+-------------------------------+
                        |       bpf_dynptr_clone(A, C)   |
            dynptr A (3,2')                  dynptr C (4,2')

                        dynptr A and C have the same lifetime

  Releasing either dynptr releases I, invalidating both.
  Releasing file (1') detects I as a leaked reference.

(4) Referenced dynptr with non-referenced parent:

bpf_ringbuf_reserve_dynptr
                     I (1',0)  <-- intermediate reference
                        ^^
                        ||
                        |+--------------------------------+
                        |       bpf_dynptr_clone(A, C)    |
            dynptr A (2,1')                   dynptr C (3,1')

                      dynptr A and C have the same lifetime

[0] https://lore.kernel.org/bpf/20250414161443.1146103-2-memxor@gmail.com/
[1] https://github.com/ameryhung/bpf/commits/obj_relationship_v2_no_parent_id/

Changelog:

v5 -> v6
  - Squash "bpf: Fold ref_obj_id into id and introduce virtual references"
    (v5 patch 9) into "bpf: Refactor object relationship tracking and
    fix dynptr UAF bug" (now patch 5). ref_obj_id is removed in the same
    patch that introduces parent_id, eliminating the intermediate state
    where both coexist (Eduard)
  - Drop virtual references for pointer casting. Instead, cast results
    reuse the source pointer's id and use branch splitting to explore
    the NULL case as a separate verifier branch. This avoids adding
    virtual reference infrastructure for a case that can be handled more
    simply (Eduard, Andrii)
  - Address nit from Eduard
Link: https://lore.kernel.org/bpf/20260519181314.2731658-1-ameryhung@gmail.com/
v4 -> v5
  - Add patch 9 folding ref_obj_id into id and introducing virtual
    references for pointer casting and referenced dynptr clones (Eduard, Andrii)
  - Add patch 10 fixing dynptr ref counting to scan all call frames
    instead of only the current frame (Eduard)
  - Add utility function validate_ref_obj() (Eduard)
Link: https://lore.kernel.org/bpf/20260506142709.2298255-1-ameryhung@gmail.com/
v3 -> v4
  - Add patch 1 clean up mark_stack_slot_obj_read() and callers
    (to address v3 ignoring err returned from mark_dynptr_read) (Andrii)
  - Fix release_reference() and move the logic allowing destroying a
    referenced object when refcnt > 1 from
    destroy_if_stack_slots_dynptr() to release_reference() (Mykyta)
  - Add patch 7 introducing ref_obj_desc and unifying ref_obj handling
    (to address Eduard's concern about unclear meta->{id,ref_obj_id}
    initialization/use and confusing function arguments of
    process_dynptr_func())
  - Add patch 8 unifying release_regno handling so that bpf_kptr_xchg
    also use release_reference()
Link: https://lore.kernel.org/bpf/20260421221016.2967924-1-ameryhung@gmail.com/
v2 -> v3
  - Rebase to bpf-next/master
  - Update veristat numbers
  - Update commit msg to explain multiple dropped checks (Mykyta, Andrii)
  - Reuse idmap as idstack in release_reference() and check for
    duplicate id (Mykyta, Andrii)
  - Change to use RUN_TEST for qdisc dynptr selftest (Eduard)
Link: https://lore.kernel.org/bpf/20260307064439.3247440-1-ameryhung@gmail.com/
v1 -> v2
  - Redesign: Use object (id, ref_obj_id, parent_id) instead of
    (id, ref_obj_id) as it cannot express ptr casting without
    introducing specialized code to handle the case
  - Use stack-based DFS to release objects to avoid recursion (Andrii)
  - Keep reg->id after null check
  - Add dynptr cleanup
  - Fix dynptr kfunc arg type determination
  - Add a file dynptr UAF selftest
Link: https://lore.kernel.org/bpf/20260202214817.2853236-1-ameryhung@gmail.com/
---
====================

Link: https://patch.msgid.link/20260529014936.2811085-1-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:36 +0000 (18:49 -0700)]

selftests/bpf: Test using dynptr after freeing the underlying object

Make sure the verifier invalidates the dynptr and dynptr slice derived
from an skb after the skb is freed.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-14-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:35 +0000 (18:49 -0700)]

selftests/bpf: Test using file dynptr after the reference on file is dropped

File dynptr and slice should be invalidated when the parent file's
reference is dropped in the program. Without the verifier tracking
dyntpr's parent referenced object, the dynptr would continute to be
incorrectly used even if the underlying file is being tear down or gone.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-13-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:34 +0000 (18:49 -0700)]

selftests/bpf: Test using slice after invalidating dynptr clone

The parent object of a cloned dynptr is skb not the original dynptr.
Invalidate the original dynptr should not prevent the program from
using the slice derived from the cloned dynptr.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-12-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:33 +0000 (18:49 -0700)]

selftests/bpf: Test creating dynptr from dynptr data and slice

The verifier currently does not allow creating dynptr from dynptr data
or slice. Add a selftest to test this explicitly.

Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-11-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:32 +0000 (18:49 -0700)]

bpf: Fix dynptr ref counting to scan all call frames

When checking whether a referenced dynptr can be overwritten,
destroy_if_dynptr_stack_slot only counted sibling dynptrs in the
current call frame. If a clone sharing the same virtual ref parent
existed in a different frame (e.g., passed to a subprog), it would
not be counted, causing the verifier to incorrectly reject the
overwrite with "cannot overwrite referenced dynptr".

Fix by extracting the counting into dynptr_ref_cnt() which uses
bpf_for_each_reg_in_vstate_mask() to scan dynptr stack slots across
all call frames.

Fixes: 017f5c4ef73c ("bpf: Allow overwriting referenced dynptr when refcnt > 1")
Reported-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-10-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:31 +0000 (18:49 -0700)]

bpf: Unify release handling for helpers and kfuncs

Introduce release_reg() to consolidate the release logic shared by both
helpers and kfuncs: dynptr release, kptr_xchg percpu-to-RCU conversion,
regular reference release, and NULL pass-through. NULL pass-through is
only allowed if the prototype indicates the argument may be null.

Determine release_regno from the function prototype/metadata before
argument checking, rather than discovering it dynamically during
argument processing. For helpers, scan the arg_type array in
check_func_proto() via check_proto_release_reg(). For kfuncs, set
release_regno to BPF_REG_1 in bpf_fetch_kfunc_arg_meta() when
KF_RELEASE is set. In the future when we start adding decl_tag to
kfunc arguments, we can just look at the function prototype instead
of a release_regno.

Extract ref_convert_alloc_rcu_protected() and
invalidate_rcu_protected_refs() to make it more clear what the code is
doing. For ref_convert_alloc_rcu_protected(), it pre-converts
MEM_ALLOC | MEM_PERCPU registers to MEM_RCU (clearing id so they
survive), then calls release_reference() to invalidate the remaining
registers and release the reference state.

Add KF_RELEASE to bpf_dynptr_file_discard() so its release_regno is set
via fetch_kfunc_meta rather than being assigned manually in the dynptr
argument processing. Set arg_type to ARG_PTR_TO_DYNPTR for
KF_ARG_PTR_TO_DYNPTR so that check_func_arg_reg_off() correctly allows
non-zero stack offsets for dynptr release arguments same as helper.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-9-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:30 +0000 (18:49 -0700)]

bpf: Unify referenced object tracking in verifier

Helpers and kfuncs independently tracked referenced object metadata
using standalone id fields in their respective arg_meta structs.
This led to duplicated logic and inconsistent error handling between the
two paths.

Introduce struct ref_obj_desc to consolidate id and parent_id along with
a count of how many arguments carry a reference. Add update_ref_obj() to
populate it from a bpf_reg_state, replacing open-coded assignments in
check_func_arg(), check_kfunc_args(), and process_iter_arg(). Add
validate_ref_obj() to check for ambiguous ref_obj before using it.

For ref_obj releasing helpers and kfuncs, keep checking it before
calling update_ref_obj() for now. A later patch will make these
functions not depending on ref_obj. For other users of ref_obj, move the
checks to the use locations. For helper, this means moving the checks
inside helper_multiple_ref_obj_use() to use locations.
is_acquire_function() is dropped as ref_obj is never used.

Pass ref_obj_desc into process_dynptr_func()/mark_stack_slots_dynptr()
instead of a bare parent_id to make it less confusing.

Drop the selftest introduced in 7ec899ac90a2 ("selftests/bpf: Negative
test case for ref_obj_id in args") since the verifier no longer
complains about ambiguous ref_obj if it is not used.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-8-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:29 +0000 (18:49 -0700)]

bpf: Remove redundant dynptr arg check for helper

unmark_stack_slots_dynptr() already makes sure that CONST_PTR_TO_DYNPTR
cannot be released. process_dynptr_func() also prevents passing
uninitialized dynptr to helpers expecting initialized dynptr. Now that
unmark_stack_slots_dynptr() also reports error returned from
release_reference(), there should be no reason to keep these redundant
checks.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-7-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:28 +0000 (18:49 -0700)]

bpf: Refactor object relationship tracking and fix dynptr UAF bug

Refactor object relationship tracking in the verifier and fix a dynptr
use-after-free bug where file/skb dynptrs are not invalidated when the
parent referenced object is freed.

Add parent_id to bpf_reg_state to precisely track child-parent
relationships. A child object's parent_id points to the parent object's
id. This replaces the PTR_TO_MEM-specific dynptr_id.

Remove ref_obj_id from bpf_reg_state by folding its role into the
existing id field. Previously, id tracked pointer identity for null
checking while ref_obj_id tracked the owning reference for lifetime
management. These are now unified: acquire helpers and kfuncs set id
to the acquired reference id, and release paths use id directly.

Add reg_is_referenced() which checks if a register is referenced by
looking up its id in the reference array. This replaces all former
ref_obj_id checks.

For release_reference(), invalidating an object now also invalidates
all descendants by traversing the object tree. This is done using
stack-based DFS to avoid recursive call chains of release_reference() ->
unmark_stack_slots_dynptr() -> release_reference(). Referenced objects
encountered during tree traversal are reported as leaked references.

Add parent_id to bpf_reference_state to enable hierarchical reference
tracking. When acquiring a reference, a parent_id can be specified to
link the new reference to an existing one (e.g., referenced dynptrs
acquire a reference with parent_id linking to the parent object's
reference).

Pointer casting:

For pointer casting helpers (bpf_sk_fullsock, bpf_tcp_sock), instead of
propagating ref_obj_id, the cast result reuses the same reference id as
the source pointer. Since the cast may return NULL for a non-NULL input,
the NULL case is explored as a separate verifier branch. This allows
releasing any of the original or cast pointers to invalidate all others.

Referenced dynptrs:

When constructing a referenced dynptr, acquire a intermediate reference
with parent_id linking to the parent referenced object. The dynptr and
all clones share the same parent_id (pointing to the intermediate ref)
but get unique ids for independent slice tracking. Releasing a
referenced dynptr releases the parent reference, which in turn
invalidates all clones and their derived slices.

Owning to non-owning reference conversion:

After converting owning to non-owning by clearing id (e.g.,
object(id=1) -> object(id=0)), the verifier releases the reference
state via release_reference_nomark().

Note that the error message "reference has not been acquired before" in
the helper and kfunc release paths is removed. This message was already
unreachable. The verifier only calls release_reference() after
confirming the reference is valid, so the condition could never trigger
in practice.

Fixes: 870c28588afa ("bpf: net_sched: Add basic bpf qdisc kfuncs")
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-6-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:27 +0000 (18:49 -0700)]

bpf: Preserve reg->id of pointer objects after null-check

Preserve reg->id of pointer objects after null-checking the register so
that children objects derived from it can still refer to it in the new
object relationship tracking mechanism introduced in a later patch. This
change incurs a slight increase in the number of states in one selftest
bpf object, rbtree_search.bpf.o. For Meta bpf objects, the increase of
states is also negligible.

Selftest BPF objects with insns_diff > 0

Program                   Insns (A)  Insns (B)  Insns   (DIFF)  States (A)  States (B)  States (DIFF)
------------------------  ---------  ---------  --------------  ----------  ----------  -------------
rbtree_search                  6820       7326   +506 (+7.42%)         379         398   +19 (+5.01%)

Meta BPF objects with insns_diff > 0

Program                   Insns (A)  Insns (B)  Insns   (DIFF)  States (A)  States (B)  States (DIFF)
------------------------  ---------  ---------  --------------  ----------  ----------  -------------
ned_imex_be_tclass               52         57     +5 (+9.62%)           5           6   +1 (+20.00%)
ned_imex_be_tclass               52         57     +5 (+9.62%)           5           6   +1 (+20.00%)
ned_skop_auto_flowlabel         523        526     +3 (+0.57%)          39          40    +1 (+2.56%)
ned_skop_mss                    289        292     +3 (+1.04%)          20          20    +0 (+0.00%)
ned_skopt_bet_classifier         78         82     +4 (+5.13%)           8           8    +0 (+0.00%)
dctcp_update_alpha              252        320   +68 (+26.98%)          21          27   +6 (+28.57%)
dctcp_update_alpha              252        320   +68 (+26.98%)          21          27   +6 (+28.57%)
ned_ts_func                     119        126     +7 (+5.88%)           6           7   +1 (+16.67%)
tw_egress                      1119       1128     +9 (+0.80%)          95          96    +1 (+1.05%)
tw_ingress                     1128       1137     +9 (+0.80%)          95          96    +1 (+1.05%)
tw_tproxy_router               4380       4465    +85 (+1.94%)         114         118    +4 (+3.51%)
tw_tproxy_router4              3093       3170    +77 (+2.49%)          83          88    +5 (+6.02%)
ttls_tc_ingress               34656      35717  +1061 (+3.06%)         936         970   +34 (+3.63%)
tw_twfw_egress               222327     222338    +11 (+0.00%)       10563       10564    +1 (+0.01%)
tw_twfw_ingress               78295      78299     +4 (+0.01%)        3825        3826    +1 (+0.03%)
tw_twfw_tc_eg                222839     222859    +20 (+0.01%)       10584       10585    +1 (+0.01%)
tw_twfw_tc_in                 78295      78299     +4 (+0.01%)        3825        3826    +1 (+0.03%)
tw_twfw_egress                 8080       8085     +5 (+0.06%)         456         456    +0 (+0.00%)
tw_twfw_ingress                8053       8056     +3 (+0.04%)         454         454    +0 (+0.00%)
tw_twfw_tc_eg                  8154       8174    +20 (+0.25%)         456         457    +1 (+0.22%)
tw_twfw_tc_in                  8060       8063     +3 (+0.04%)         455         455    +0 (+0.00%)
tw_twfw_egress               222327     222338    +11 (+0.00%)       10563       10564    +1 (+0.01%)
tw_twfw_ingress               78295      78299     +4 (+0.01%)        3825        3826    +1 (+0.03%)
tw_twfw_tc_eg                222839     222859    +20 (+0.01%)       10584       10585    +1 (+0.01%)
tw_twfw_tc_in                 78295      78299     +4 (+0.01%)        3825        3826    +1 (+0.03%)
tw_twfw_egress                 8080       8085     +5 (+0.06%)         456         456    +0 (+0.00%)
tw_twfw_ingress                8053       8056     +3 (+0.04%)         454         454    +0 (+0.00%)
tw_twfw_tc_eg                  8154       8174    +20 (+0.25%)         456         457    +1 (+0.22%)
tw_twfw_tc_in                  8060       8063     +3 (+0.04%)         455         455    +0 (+0.00%)

Looking into rbtree_search, the reason for such increase is that the
verifier has to explore the main loop shown below for one more iteration
until state pruning decides the current state is safe.

long rbtree_search(void *ctx)
{
...
bpf_spin_lock(&glock0);
rb_n = bpf_rbtree_root(&groot0);
while (can_loop) {
if (!rb_n) {
bpf_spin_unlock(&glock0);
return __LINE__;
}

n = rb_entry(rb_n, struct node_data, r0);
if (lookup_key == n->key0)
break;
if (nr_gc < NR_NODES)
gc_ns[nr_gc++] = rb_n;
if (lookup_key < n->key0)
rb_n = bpf_rbtree_left(&groot0, rb_n);
else
rb_n = bpf_rbtree_right(&groot0, rb_n);
}
...
}

Below is what the verifier sees at the start of each iteration
(65: may_goto) after preserving id of rb_n. Without id of rb_n, the
verifier stops exploring the loop at iter 16.

           rb_n  gc_ns[15]
iter 15    257   257

iter 16    290   257    rb_n: idmap add 257->290
                        gc_ns[15]: check 257 != 290 --> state not equal

iter 17    325   257    rb_n: idmap add 290->325
                        gc_ns[15]: idmap add 257->257 --> state safe

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-5-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:26 +0000 (18:49 -0700)]

bpf: Assign reg->id when getting referenced kptr from ctx

Assign reg->id when getting referenced kptr from read program context
to be consistent with R0 of KF_ACQUIRE kfunc. skb dynptr will track the
referenced skb in qdisc programs using a new field reg->parent_id in
a later patch.

Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-4-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:25 +0000 (18:49 -0700)]

bpf: Unify dynptr handling in the verifier

Simplify dynptr checking for helper and kfunc by unifying it. Remember
the initialized dynptr (i.e.,g !(arg_type |= MEM_UNINIT)) pass to a
dynptr kfunc during process_dynptr_func() so that we can easily
retrieve the information for verification later. By saving it in
meta->dynptr, there is no need to call dynptr helpers such as
dynptr_id(), dynptr_ref_obj_id() and dynptr_type() in check_func_arg().

Remove and open code the helpers in process_dynptr_func() when
saving id, ref_obj_id, and type.

Besides, since dynptr ref_obj_id information is now pass around in
meta->bpf_dynptr_desc, drop the check in helper_multiple_ref_obj_use.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-3-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Amery Hung [Fri, 29 May 2026 01:49:24 +0000 (18:49 -0700)]

bpf: Simplify mark_stack_slot_obj_read() and callers

Rename mark_stack_slot_obj_read() as mark_stack_slots_scratched() and
directly call it from functions processing iter, dynptr and irq_flag.
Commit 6762e3a0bce5 ("bpf: simplify liveness to use (callsite, depth)
keyed func_instances") has removed the dynamic liveness component in
mark_stack_slot_obj_read(). The function effectively only marks stack
slots as scratched and always succeed. Therefore, return void, drop the
unused bpf_reg_state argument and rename it to
mark_stack_slots_scratched() to reflect what it does now.

In addition, to prepare for unifying dynptr handling, dynptr_get_spi()
will be moved out of mark_dynptr_read(). As mark_dynptr_read() would join
mark_iter_read() as a thin wrapper of mark_stack_slots_scratched(), just
open code these helpers.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Amery Hung <ameryhung@gmail.com>
Link: https://lore.kernel.org/r/20260529014936.2811085-2-ameryhung@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Paul Moore [Sat, 23 May 2026 16:00:26 +0000 (12:00 -0400)]

bpf: Fix security_bpf_prog_load() error handling

If security_bpf_prog_load() fails there is no need to call into
security_bpf_prog_free() as the LSM will handle the cleanup of any partial
LSM state before returning to the caller with an error. Thankfully this
isn't an issue with any of the existing code as the LSMs which currently
provide BPF hook callback implementations don't allocate any internal
state, but this is something we want to fix for potential future users.

Signed-off-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/20260523160025.16363-2-paul@paul-moore.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Taegu Ha [Thu, 28 May 2026 06:21:55 +0000 (15:21 +0900)]

bpf: reject overlarge global subprog argument sizes

Global subprogram argument checking derives generic pointer sizes from BTF
and passes the resolved size to check_mem_reg() as a u32. The access-size
validation path then uses a signed int, and stack pointers negate the value
before calling check_helper_mem_access().

This creates a wrap when BTF describes a pointee size larger than S32_MAX.
For example, a global subprogram argument of type:

  int (*p)[0x3fffffff]

has a BTF-resolved pointee size of 0xfffffffc bytes. At a call site the
caller can pass a pointer to a 4-byte stack slot at fp-4. The current
PTR_TO_STACK path computes:

  size = -(int)mem_size

so 0xfffffffc becomes -4 as a signed int and the negation validates only
a 4-byte stack range. That range is covered by the caller's stack slot,
so the call is accepted.

The callee is then verified independently with R1 as PTR_TO_MEM and
mem_size 0xfffffffc. A small instruction such as:

  r0 = *(u32 *)(r1 + 4)

is accepted as being inside that BTF-described memory region. At run time,
however, the actual argument value is still fp-4, so r1 + 4 addresses fp+0,
outside the 4-byte object that the caller provided.

Reject sizes that cannot be represented by the verifier's signed
access-size API before the stack-specific negation. Add a verifier
regression test for the oversized BTF argument.

Fixes: 2cb27158adb3 ("bpf: poison dead stack slots")
Signed-off-by: Taegu Ha <hataegu0826@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260528062155.3988156-1-hataegu0826@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Mon, 1 Jun 2026 00:49:21 +0000 (17:49 -0700)]

Merge branch 'bpf-arm64-stack-argument-fixes'

Puranjay Mohan says:

====================
bpf, arm64: Stack argument fixes

Patch 1 fixes a redundant MOV in the arm64 JIT's
emit_stack_arg_store_imm() and clarifies the stack layout comments. This
is not a bug fix but an improvement.

Patch 2 bumps the stack argument tests from 6-8 args to at least 10 so
they actually exercise the native stack on arm64, where x0-x7 cover the
first 8 arguments.
====================

Link: https://patch.msgid.link/20260528161750.1900674-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Puranjay Mohan [Thu, 28 May 2026 16:17:48 +0000 (09:17 -0700)]

selftests/bpf: Use at least 10 args in stack argument tests

On arm64, the first 8 arguments are passed in registers (x0-x7), so
tests with 8 or fewer arguments never exercise the native stack argument
path in the JIT. Increase argument counts to at least 10 across all
BPF-to-BPF subprog and kfunc stack argument tests so that at least 2
arguments land on the arm64 stack.

For the two-callees test, bump foo1 from 8 to 10 and foo2 from 10 to 12
args to preserve the different-stack-depth flavor of the test.

The bpf_kfunc_call_stack_arg_mem kfunc is left unchanged at 7 args to
avoid breaking the precision backtracking test which relies on hardcoded
verifier log instruction indices.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20260528161750.1900674-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Puranjay Mohan [Thu, 28 May 2026 16:17:47 +0000 (09:17 -0700)]

bpf, arm64: Fix redundant MOV and clarify stack arg comments

emit_stack_arg_store_imm() materializes the immediate into tmp and
then moves tmp to the target register (x5-x7). Emit the immediate
directly into the target register to avoid the redundant MOV.

While here, qualify the bare "FP" in the stack-layout ASCII art as
"A64_FP" so it is not confused with BPF_FP, and note that incoming
stack arguments sit above the FP/LR pair pushed by the callee
prologue.

Suggested-by: Will Deacon <will@kernel.org>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260528161750.1900674-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Borkmann [Fri, 29 May 2026 16:28:29 +0000 (18:28 +0200)]

libbpf: Skip endianness swap when loader generation failed

bpf_gen__prog_load() byte-swaps the program insns and the {func,line}_info
and CO-RE relo blobs in place for cross-endian targets. The blob offsets
come from add_data(), which returns 0 on failure: realloc_data_buf() either
frees and NULLs gen->data_start (realloc OOM) or returns early on an
already-latched gen->error, leaving a stale, possibly too-small buffer.

Neither bswap site checked for this. With gen->swapped_endian set and a
failed generation, "gen->data_start + off" becomes NULL + 0. Guard the
same way via !gen->error so they are skipped once generation has failed.

Fixes: 8ca3323dce43 ("libbpf: Support creating light skeleton of either endianness")
Reported-by: sashiko <sashiko@sashiko.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260529162829.315921-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Borkmann [Fri, 29 May 2026 09:41:18 +0000 (11:41 +0200)]

libbpf: Also reset {insn,data}_cur on realloc failure

realloc_insn_buf() as well as realloc_data_buf() free and NULL
gen->insn_start / gen->data_start on -ENOMEM but leave gen->insn_cur /
gen->data_cur pointing into the old, freed buffer. Just reset the
cursors to NULL alongside the base pointers so the freed state is
coherent.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260529094119.307264-3-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Borkmann [Fri, 29 May 2026 09:41:17 +0000 (11:41 +0200)]

libbpf: Skip hash computation when loader generation failed

bpf_gen__finish() calls compute_sha_update_offsets() gated only on
the gen_hash option, without first consulting gen->error. On a failed
generation this is buggy: a failed realloc_data_buf() sets gen->data_start
to NULL (leaving gen->data_cur dangling), so compute_sha_update_offsets()
runs libbpf_sha256() over a NULL buffer with a bogus length; a failed
realloc_insn_buf() likewise sets gen->insn_start to NULL and the hash
immediates get patched through that NULL base.

The computed program is discarded in either case, since the following
"if (!gen->error)" block does not publish opts->insns once an error is
set. Thus, skip the hash pass when generation has already failed.

Fixes: ea923080c145 ("libbpf: Embed and verify the metadata hash in the loader")
Reported-by: sashiko <sashiko@sashiko.dev>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260529094119.307264-2-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Borkmann [Fri, 29 May 2026 09:41:16 +0000 (11:41 +0200)]

libbpf: Drop redundant self-loop in emit_check_err

When the cleanup-label jump offset does not fit in s16, emit_check_err()
sets gen->error = -ERANGE and then emits a BPF_JMP_IMM(BPF_JA, 0, 0, -1)
self-loop.

The latter emit() is dead: gen->error is assigned on the preceding line,
and emit() then bails out early in realloc_insn_buf() the moment gen->error
is set, so the jump is never written into the instruction stream.

gen->error alone already marks the generation as failed. This is a follow-up
to 7dd62566e0d1 ("libbpf: fix off-by-one in emit_signature_match jump offset")
which removed the jump in emit_signature_match() but not in other locations.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260529094119.307264-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Fri, 29 May 2026 20:39:09 +0000 (13:39 -0700)]

bpf: Update bpf maintainers

I am making a life change and will take a long break
from my current work, so I will step down from the "M:" responsibility.

I am currently a "R:" in "BPF [GENERAL]", this part stays unchanged.
I am folding most of the parts into "BPF [GENERAL]".

For "BPF [BTF]", it is long overdue as I am no longer involved.
It is folded into the "BPF [GENERAL]".

The "BPF [STORAGE & CGROUPS]" will also be covered by "BPF [GENERAL]".

For struct_ops, its usage is no longer limited to networking,
so this naturally should move back to "BPF [GENERAL]".

For the reuseport, it will continue to be maintained together
by "BPF [GENERAL]" and the "NETWORKING [SOCKETS]".

For other "BPF [NETWORKING]...", I am moving myself to "R:".

Thanks!

Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/r/20260529203909.1222164-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Sun, 31 May 2026 16:16:55 +0000 (09:16 -0700)]

Merge branch 'bpf-align-syscall-writeback-behavior-with-user-declared-size'

Yuyang Huang says:

====================
bpf: Align syscall writeback behavior with user-declared size

This series fixes an out-of-bounds write vulnerability in BPF_PROG_QUERY
while maintaining backward compatibility for older userspace applications.

BPF_PROG_QUERY unconditionally writes back the 'query.revision' field
to userspace. If userspace passes a smaller 'bpf_attr' structure (e.g. 40
bytes, which was the cgroup query layout before 'query.revision' was
added), the kernel performs an out-of-bounds write.

We address this by propagating the user-provided 'uattr_size' down to
the cgroup query handlers and conditionally skipping the write-back of
'query.revision' if the buffer is too small. This allows legacy cgroup
queries to succeed safely.

tcx and netkit queries are left unchanged since they were introduced in
the same merge window as 'query.revision' and have no legacy callers.

Finally, we add a selftest to verify these boundary behaviors.

Changes since v2:
- Propagate uattr_size to __cgroup_bpf_query() and conditionally write
  revision (instead of unconditionally rejecting smaller sizes in front-gate).
- Update BPF selftests to verify that cgroup queries succeed with
  OLD_QUERY_SIZE without writing revision, and succeed with FULL_QUERY_SIZE.
- Remove early size checks in the front-gate to keep the patch minimal.

Changes since v1:
- Simplify the kernel fix to checking the size only in bpf_prog_query().
- Revert all other subsystem query plumbing changes.
- Update BPF selftest to target BPF_CGROUP_INET_INGRESS cgroup query, and
  add verification for attr size boundaries.
====================

Link: https://patch.msgid.link/20260531075600.4058207-1-yuyanghuang@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Yuyang Huang [Sun, 31 May 2026 07:56:00 +0000 (15:56 +0800)]

selftests/bpf: add verification for BPF_PROG_QUERY attr size boundaries

Add a new selftest to verify that the BPF syscall (specifically
BPF_PROG_QUERY) correctly handles different user-declared attribute sizes.

Specifically, verify that:
- For cgroup queries, a query with a size that covers 'prog_cnt' but is
  smaller than 'revision' (OLD_QUERY_SIZE) succeeds, but does not write
  to 'revision' (verifying backward compatibility).
- A query with full size (FULL_QUERY_SIZE) succeeds and writes both
  'prog_cnt' and 'revision'.

Fixes: 120933984460 ("bpf: Implement mprog API on top of existing cgroup progs")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Link: https://lore.kernel.org/r/20260531075600.4058207-3-yuyanghuang@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Yuyang Huang [Sun, 31 May 2026 07:55:59 +0000 (15:55 +0800)]

bpf: fix BPF_PROG_QUERY OOB write and cgroup backward compat

BPF_PROG_QUERY writes back the 'query.revision' field unconditionally to
userspace. If userspace passes a smaller 'bpf_attr' structure (e.g. 40
bytes, which was the layout before the addition of 'query.revision'),
the kernel performs an out-of-bounds write.

Fix this by propagating the user-provided attribute size 'uattr_size'
down to the cgroup query handlers, and conditionally skipping writing
the revision field to userspace when the provided buffer size is
insufficient.

query.revision in bpf_mprog_query is structurally identical to the
cgroup case: a late tail field, written unconditionally.

But the backward-compat hazard is not the same.

The min-historical-size test is per command, and bpf_mprog_query only
serves attach types that were born with revision in the struct:

- tcx_prog_query -> BPF_TCX_INGRESS/EGRESS
- netkit_prog_query -> BPF_NETKIT_PRIMARY/PEER

tcx, netkit, the revision field, and bpf_mprog_query itself all landed in
the same v6.6 merge window (053c8e1f235d added the mprog query API +
revision; tcx in e420bed02507, netkit in 35dfaad7188c). There has never
been a tcx/netkit BPF_PROG_QUERY userspace that doesn't know about
revision. So for these commands the minimum legitimate struct already
covers offset 56-64 — no old binary can be broken here.

Contrast with cgroup: BPF_PROG_QUERY on cgroup attach types shipped in
2017; revision write-back was bolted on years later (120933984460). That
path has a real population of pre-revision callers.

Fixes: 120933984460 ("bpf: Implement mprog API on top of existing cgroup progs")
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Yuyang Huang <yuyanghuang@google.com>
Link: https://lore.kernel.org/r/20260531075600.4058207-2-yuyanghuang@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Suchit Karunakaran [Sun, 24 May 2026 02:58:53 +0000 (08:28 +0530)]

bpf: replace pop/push emptiness check with bpf_list_empty()

Simplify fq_flows_is_empty() by replacing the pop/push based emptiness
check with a direct call to bpf_list_empty().
This avoids unnecessary list mutation and simplifies the code while
preserving correctness.

Signed-off-by: Suchit Karunakaran <suchitkarunakaran@gmail.com>
Changes since v1:
- Removed unused variable node
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260524025853.13786-1-suchitkarunakaran@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Leon Hwang [Thu, 21 May 2026 14:29:09 +0000 (22:29 +0800)]

bpf: Fix race between bpf_map_new_fd() and close_fd()

Because there is time gap between bpf_map_new_fd() and close_fd(), a
concurrent thread is able to close the new fd and opens a new, unrelated
file with the exact same fd number. Thereafter, this close_fd() might
inadvertently close the unrelated file.

To avoid such regression, do finalize log before security_bpf_map_create().

However, in order to achieve it, move bpf_get_file_flag(),
security_bpf_map_create(), bpf_map_alloc_id(), and bpf_map_new_fd() from
__map_create() to map_create(). And, rename __map_create() to
map_create_alloc() meanwhile.

Then, in order to reuse the map and token when all checks pass in
map_create_alloc(), pass "struct bpf_map **" and "struct bpf_token **" to
map_create_alloc().

Fixes: 49f9b2b2a18c ("bpf: Add syscall common attributes support for map_create")
Signed-off-by: Leon Hwang <leon.hwang@linux.dev>
Link: https://lore.kernel.org/r/20260521142909.95818-1-leon.hwang@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Andrii Nakryiko [Thu, 28 May 2026 21:58:14 +0000 (14:58 -0700)]

Merge branch 'bpf-implement-stack_map_get_build_id_offset_sleepable'

Ihor Solodrai says:

====================
bpf: Implement stack_map_get_build_id_offset_sleepable()

The series introduces stack_map_get_build_id_offset_sleepable(),
fixing a gap with parsing build_id in sleepable context in stackmap.c

In particular, this fixes a deadlock in
stack_map_get_build_id_offset() doing a blocking __kernel_read(),
which happens since commit 777a8560fd29 ("lib/buildid: use
__kernel_read() for sleepable context").

See previous revisions for more details.
---

v6->v7:
  * Addressed feedback from Andrii (mostly patch #2):
    * implement proper CONFIG_PER_VMA_LOCK=n support, following a
      VMA locking pattern similar to one used in PROCMAP_QUERY
    * change the contract of stack_map_lock_vma(): if a non-NULL VMA
      is returned, then a read lock is held
      * remove now unnecessary vma_locked flag
    * and various other nits
  * Add vma_is_anonymous() checks where appropriate (AIs)
v6: https://lore.kernel.org/bpf/20260521225022.2695755-1-ihor.solodrai@linux.dev/

v5->v6:
  * Misc refactoring (Andrii):
    * add stack_map_build_id_set_valid() helper
    * simplify control flow in stack_map_get_build_id_offset_sleepable()
v5: https://lore.kernel.org/bpf/20260515005244.1333013-1-ihor.solodrai@linux.dev/

v4->v5:
  * Add comments explaining mmap_read_trylock() (Shakeel)
  * Rebase on bpf-next (Alexei)
v4: https://lore.kernel.org/bpf/20260514184727.1067141-1-ihor.solodrai@linux.dev/

v3->v4:
  * Change Fixes tag in patch #2 (AI)
  * Nit in caching implementation (Mykyta)
v3: https://lore.kernel.org/bpf/20260512032906.2670326-1-ihor.solodrai@linux.dev/

v2->v3:
  * Split patch #2 in two: stack_map_get_build_id_offset_sleepable()
    implementation, and then introduce caching
  * Drop taking mmap_lock if CONFIG_PER_VMA_LOCK=n, fall back to raw
    IPs instead
  * Cache vm_{start,end} in addition to prev_file (Mykyta)
v2: https://lore.kernel.org/bpf/20260409010604.1439087-1-ihor.solodrai@linux.dev/

v1->v2:
  * Addressed feedback from Puranjay:
    * split out a small refactoring patch
    * use mmap_read_trylock()
    * take into account CONFIG_PER_VMA_LOCK
    * replace find_vma() with vma_lookup()
    * cache prev_build_id to avoid re-parsing the same file
  * Snapshot vm_pgoff and vm_start before unlocking (AI)
  * To avoid repetitive unlocking statements, introduce struct
    stack_map_vma_lock to hold relevant lock state info and add an
    unlock helper
v1: https://lore.kernel.org/bpf/20260407223003.720428-1-ihor.solodrai@linux.dev/

---
====================

Link: https://patch.msgid.link/20260525223948.1920986-1-ihor.solodrai@linux.dev
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>

commit | commitdiff | tree

Ihor Solodrai [Mon, 25 May 2026 22:39:48 +0000 (15:39 -0700)]

bpf: Cache build IDs in sleepable stackmap path

Stack traces often contain adjacent IPs from the same VMA or from
different VMAs backed by the same ELF file. Cache the last successfully
parsed build id together with the resolved VMA range and backing file
so the sleepable build id path can avoid repeated VMA locking and file
parsing in common cases.

Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260525223948.1920986-4-ihor.solodrai@linux.dev

commit | commitdiff | tree

Ihor Solodrai [Mon, 25 May 2026 22:39:47 +0000 (15:39 -0700)]

bpf: Avoid faultable build ID reads under mm locks

Sleepable build ID parsing can block in __kernel_read() [1], so the
stackmap sleepable path must not call it while holding mmap_lock or a
per-VMA read lock.

The issue and the fix are conceptually similar to a recent procfs
patch [2]. A similar VMA locking pattern has already been used in
PROCMAP_QUERY [3].

Resolve each covered VMA with a stable read-side reference, preferring
lock_vma_under_rcu() and falling back to mmap_read_trylock() only long
enough to acquire the VMA read lock. Take a reference to the backing
file, drop the VMA lock, and then parse the build ID through
(sleepable) build_id_parse_file().

We have to use mmap_read_trylock() (and give up on failure) in this
context because taking mmap_read_lock() is generally unsafe on code
paths reachable from BPF programs [4], and may lead to deadlocks.

[1] https://lore.kernel.org/all/20251218005818.614819-1-shakeel.butt@linux.dev/
[2] https://lore.kernel.org/all/20260128183232.2854138-1-andrii@kernel.org/
[3] https://lore.kernel.org/all/20250808152850.2580887-1-surenb@google.com/
[4] https://lore.kernel.org/bpf/2895ecd8-df1e-4cc0-b9f9-aef893dc2360@linux.dev/

Fixes: d4dd9775ec24 ("bpf: wire up sleepable bpf_get_stack() and bpf_get_task_stack() helpers")
Suggested-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260525223948.1920986-3-ihor.solodrai@linux.dev

commit | commitdiff | tree

Ihor Solodrai [Mon, 25 May 2026 22:39:46 +0000 (15:39 -0700)]

bpf: Factor out stack_map build ID helpers

Factor out helpers from stack_map_get_build_id_offset() in
preparation for adding a sleepable build ID resolution path:
stack_map_build_id_set_ip(), stack_map_build_id_offset(), and
stack_map_build_id_set_valid().

While here, refactor stack_map_get_build_id_offset():
  * use continue-driven control flow in the main loop and remove
    build_id_valid label
  * update prev_vma and prev_build_id on the fall-back-to-IP branch so
    the cache reflects the actual VMA seen on the previous IP [1]
  * guard fetch_build_id() with vma_is_anonymous() [2] to skip parse
    attempts that would otherwise fail the ELF magic check

[1] https://lore.kernel.org/bpf/CAEf4Bzac9uWWqBvzH0iFzKvJcq3vxscZ3pKm0sUHmN-F-z9wVQ@mail.gmail.com/
[2] https://lore.kernel.org/bpf/226398c1ff3f2b686c0aeb010408d85fb15df13f9ff60a045bee31e79b9e41e9@mail.kernel.org/

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/bpf/20260525223948.1920986-2-ihor.solodrai@linux.dev

commit | commitdiff | tree

Tiezhu Yang [Tue, 26 May 2026 06:39:36 +0000 (14:39 +0800)]

libbpf: Add __NR_bpf definition for LoongArch

LoongArch uses the generic syscall table, where __NR_bpf is defined
as 280 in include/uapi/asm-generic/unistd.h.

To align with other architectures, add the __NR_bpf definition for
LoongArch to avoid a potential compilation failure: "error __NR_bpf
not defined. libbpf does not support your arch."

This is a follow up patch of:

  commit b0c47807d31d ("bpf: Add sparc support to tools and samples.")
  commit bad1926dd2f6 ("bpf, s390: fix build for libbpf and selftest suite")
  commit ca31ca8247e2 ("tools/bpf: fix perf build error with uClibc (seen on ARC)")
  commit e32cb12ff52a ("bpf, mips: Fix build errors about __NR_bpf undeclared")

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260526063936.16769-1-yangtiezhu@loongson.cn

commit | commitdiff | tree

Carlos Llamas [Sat, 23 May 2026 16:27:21 +0000 (16:27 +0000)]

libbpf: Fix UAF in strset__add_str()

strset_add_str_mem() might reallocate the strset data buffer in order to
accommodate the provided string 's'. However, if 's' points to a string
already present in the buffer, it becomes dangling after the realloc.
This leads to a use-after-free when attempting to memcpy() the string
into the new buffer.

One scenario that triggers this problematic path is when resolve_btfids
attempts to patch kfunc prototypes using existing BTF parameter names:

| resolve_btfids: function bpf_list_push_back_impl already exists in BTF
| Segmentation fault (core dumped)

Compiling resolve_btfids with fsanitize=address generates a detailed
report of the UAF:

| =================================================================
| ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4
| ==1507892==ERROR: AddressSanitizer: heap-use-after-free on address 0x7f4c4a500bd4 at pc 0x55d25155a2a8 bp 0x7ffcef879060 sp 0x7ffcef878818
| READ of size 5 at 0x7f4c4a500bd4 thread T0
|     #0 0x55d25155a2a7 in memcpy (tools/bpf/resolve_btfids/resolve_btfids+0xcf2a7)
|     #1 0x55d2515d708e in strset__add_str tools/lib/bpf/strset.c:162:2
|     #2 0x55d2515c730b in btf__add_str tools/lib/bpf/btf.c:2109:8
|     #3 0x55d2515c9020 in btf__add_func_param tools/lib/bpf/btf.c:3108:14
|     #4 0x55d25159f0b5 in process_kfunc_with_implicit_args tools/bpf/resolve_btfids/main.c:1196:9
|     #5 0x55d25159e004 in btf2btf tools/bpf/resolve_btfids/main.c:1229:9
|     #6 0x55d25159cee7 in main tools/bpf/resolve_btfids/main.c:1535:6
|     #7 0x7f4c78e29f76 in __libc_start_call_main csu/../sysdeps/nptl/libc_start_call_main.h:58:16
|     #8 0x7f4c78e2a026 in __libc_start_main csu/../csu/libc-start.c:360:3
|     #9 0x55d2514bb860 in _start (tools/bpf/resolve_btfids/resolve_btfids+0x30860)
|
| 0x7f4c4a500bd4 is located 13268 bytes inside of 2829000-byte region [0x7f4c4a4fd800,0x7f4c4a7b02c8)
| freed by thread T0 here:
|     #0 0x55d25155b700 in realloc (tools/bpf/resolve_btfids/resolve_btfids+0xd0700)
|     #1 0x55d2515c426c in libbpf_reallocarray tools/lib/bpf/./libbpf_internal.h:220:9
|     #2 0x55d2515c426c in libbpf_add_mem tools/lib/bpf/btf.c:224:13
|
| previously allocated by thread T0 here:
|     #0 0x55d25155b2e3 in malloc (tools/bpf/resolve_btfids/resolve_btfids+0xd02e3)
|     #1 0x55d2515d6e7d in strset__new tools/lib/bpf/strset.c:58:20

While resolve_btfids could be refactored to avoid this call path, let's
instead fix this issue at the source in strset__add_str() and avoid
similar scenarios.

Let's check if set->strs_data was reallocated and whether 's' points to
an internal string within the old strset buffer. In such case, 's' is
reconstructed to point to the new buffer.

While already here, also fix strset__find_str() which suffers from the
same problem by factoring out the common operations into a new helper
function strset_str_append().

Fixes: 90d76d3ececc ("libbpf: Extract internal set-of-strings datastructure APIs")
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Suggested-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20260523162722.2718940-1-cmllamas@google.com

commit | commitdiff | tree

Siddharth Nayyar [Wed, 20 May 2026 09:40:44 +0000 (09:40 +0000)]

bpftool: Fix typo in struct_ops map FD generation for light skeleton

When generating light skeletons for BPF programs containing struct_ops
maps, bpftool incorrectly outputs a stray literal 't' instead of a tab
character for the map file descriptor member in the links structure.
This causes a compilation error when the generated light skeleton is
used.

Correct the format string by replacing 't' with '\t'.

Fixes: 08ac454e258e ("libbpf: Auto-attach struct_ops BPF maps in BPF skeleton")
Signed-off-by: Siddharth Nayyar <sidnayyar@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/bpf/20260520-struct_ops_gen_typo_fix-v1-1-4dee3771da46@google.com

commit | commitdiff | tree

Michael Bommarito [Fri, 22 May 2026 20:13:53 +0000 (16:13 -0400)]

libbpf: Harden parse_vma_segs() path parsing

parse_vma_segs() in tools/lib/bpf/usdt.c parses /proc/<pid>/maps
with two widthless scansets, "%s" into mode[16] and "%[^\n]"
into line[4096]. A VMA name in maps is not limited to that local
buffer; a deeply nested backing path can produce a maps record long
enough to overflow the stack buffer.

Bound both scansets to the declared buffer sizes ("%15s" for mode[16]
and "%4095[^\n]" for line[4096]) and drain any residue past line[4094]
with "%*[^\n]" before the trailing "\n". Without the drain, the residue
of an over-long record would stay in the stream and break the next
"%zx-%zx" parse, so the loop would exit early and silently skip later
maps records.

Also stop using sscanf(..., "%s") to peel the /proc/<pid>/root prefix
from lib_path. Parse the pid and prefix length with "%n", check for the
following slash, and copy the remainder with libbpf_strlcpy(). That
removes a second unbounded stack write and preserves paths containing
spaces.

Fixes: 74cc6311cec9 ("libbpf: Add USDT notes parsing and resolution logic")
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/bpf/20260522201353.1454653-1-michael.bommarito@gmail.com

commit | commitdiff | tree

Tejun Heo [Wed, 27 May 2026 19:26:32 +0000 (09:26 -1000)]

bpf: Fix bpf_arena_handle_page_fault() redefinition without CONFIG_BPF_SYSCALL

On configs with CONFIG_BPF=y but CONFIG_BPF_SYSCALL=n (e.g. arm
multi_v7_defconfig), kernel/bpf/core.c defines a __weak
bpf_arena_handle_page_fault() while bpf_defs.h already supplies a static
inline stub for it, causing a redefinition error. Build the __weak
definition only under CONFIG_BPF_SYSCALL, matching the bpf_defs.h
declaration and the CONFIG_BPF_SYSCALL-gated strong definition in arena.c.

Fixes: dc11a4dba246 ("bpf: Recover arena kernel faults with scratch page")
Reported-by: Mark Brown <broonie@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20260527192632.2109419-1-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Mon, 25 May 2026 15:35:07 +0000 (08:35 -0700)]

Merge branch 'arena_direct_access'

Tejun Heo says:

====================
This makes BPF arena memory directly dereferenceable from kernel code
(struct_ops callbacks, kfuncs). Each arena gets a per-arena scratch page
that an arch fault hook installs into empty PTEs on kernel-side faults,
after KFENCE. The faulting instruction retries and the violation is reported
through the program's BPF stream.

v4:
- Patch 1: note that the strict-zero cmpxchg is narrower than pte_none() in
  inline comments on both x86 and arm64. (Andrea)
- Patch 2: stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL via a
  new include/linux/bpf_defs.h. (lkp)
- Patch 7: scx_arena_alloc() retries via a loop instead of a single retry on
  pool growth. (Andrea)
- Picked up Reviewed-by tags from Emil and Andrea.

v3: https://lore.kernel.org/r/20260520235052.4180316-1-tj@kernel.org
v2: https://lore.kernel.org/r/20260517211232.1670594-1-tj@kernel.org
v1 (RFC): https://lore.kernel.org/r/20260427105109.2554518-1-tj@kernel.org

Motivation
----------

sched_ext's ops_cid.set_cmask() hands the BPF scheduler a struct scx_cmask
*. The kernel translates a kernel cpumask to a cmask, but it had no way to
write into the arena, so the cmask lived in kernel memory and was passed as
a trusted pointer. BPF cmask helpers all operate on arena cmasks though, so
the BPF side had to word-by-word probe-read the kernel cmask into an arena
cmask via cmask_copy_from_kernel() before any helper could touch it. It
works, but is clumsy.

The shape isn't unique to set_cmask. Sub-scheduler support is on the way and
more sched_ext callbacks will want to pass structured data to BPF. Anywhere
a kfunc or struct_ops callback wants to hand a struct to a BPF program,
arena residence is the natural answer.

Approach
--------

Each arena gets a per-arena scratch page. Arenas stay sparsely mapped as
today - PTEs are populated only for allocated pages. A new arch fault hook
(bpf_arena_handle_page_fault) is wired into x86 page_fault_oops() and arm64
__do_kernel_fault(), after KFENCE. When a kernel-side access faults inside
an arena's kern_vm range, the helper walks the stack to find the BPF program
responsible, range-checks the fault address against prog->aux->arena, and
atomically installs the scratch page into the empty PTE via the new
ptep_try_set() wrapper. The kernel instruction retries and reads/writes the
scratch page. Free paths and map destruction treat scratch as non-owned.
Real allocation refuses to overwrite scratch (apply_range_set_cb returns
-EBUSY). A scratched address stays dead until map destroy, since its
presence means the BPF program has already malfunctioned.

The mechanism is default behavior - no UAPI flag.

What this preserves
-------------------

All the debugging properties of today's sparse-PTE design are preserved:

* BPF programs still fault on unmapped arena accesses. The fault semantics
  (instruction retry with rdst = 0) and the violation report through
  bpf_streams are unchanged for prog-side accesses.

* The first kernel-side touch of an unmapped address is reported via
  bpf_streams the same way as a prog-side fault, with the stack walk
  attributing it to the originating prog.

* User-side fault on a never-scratched address still lazy-allocates a real
  page (or returns SIGSEGV under BPF_F_SEGV_ON_FAULT). User-side fault on a
  scratched address SIGSEGVs.

What changes for the kernel-side caller is just that an unmapped deref no
longer oopses - it retries through the scratch page and emits a violation
report. The same shape today's BPF instruction faults have.

Patches 1-2 (atomic PTE install + arena scratch-page recovery)
--------------------------------------------------------------

  mm: Add ptep_try_set() for lockless empty-slot installs
  bpf: Recover arena kernel faults with scratch page

Patches 3-5 (helpers used by struct_ops registration)
-----------------------------------------------------

  bpf: Add sleepable variant of bpf_arena_alloc_pages for kernel callers
  bpf: Add bpf_struct_ops_for_each_prog()
  bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena()
====================

Link: https://lore.kernel.org/bpf/20260522172219.1423324-1-tj@kernel.org/
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Mon, 25 May 2026 13:33:15 +0000 (06:33 -0700)]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc5

Cross-merge BPF and other fixes after downstream PR.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Sun, 24 May 2026 20:48:06 +0000 (13:48 -0700)]

Linux 7.1-rc5

commit | commitdiff | tree

Linus Torvalds [Sun, 24 May 2026 19:50:36 +0000 (12:50 -0700)]

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"arm64:

   - Fix ITS EventID sanitisation when restoring an interrupt
     translation table.

   - Fix PPI memory leak when failing to initialise a vcpu.

   - Correctly return an error when the validation of a hypervisor trace
     descriptor fails, and limit this validation to protected mode only.

  RISC-V:

   - Fix invalid HVA warning in steal-time recording

   - Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info() and
     pmu_snapshot_set_shmem()

   - Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler

   - Fix sign extension of value for MMIO loads

  s390:

   - Fix bugs in vSIE (nested virtualization) and UCONTROL, caused by
     the page table rewrite.

  x86:

   - Apply erratum #1235 workaround (disable AVIC IPI virtualization) on
     Hygon Family 18h, just like on AMD Family 17h.

   - When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM,
     return the VM's configured APIC bus frequency instead of the
     default. This is less confusing (read: not wrong) and makes it
     easier to fill in CPUID information that communicates the APIC bus
     frequency to the guest.

  Selftests:

   - Do not include glibc-internal <bits/endian.h>; it worked by chance
     and broke building KVM selftests with musl"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235)
  KVM: selftests: Verify that KVM returns the configured APIC cycle length
  KVM: x86: Return the VM's configured APIC bus frequency when queried
  KVM: selftests: elf: Include <endian.h> instead of <bits/endian.h>
  KVM: s390: Properly reset zero bit in PGSTE
  KVM: s390: vsie: Fix redundant rmap entries
  KVM: s390: vsie: Fix unshadowing logic
  KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errors
  KVM: s390: vsie: Fix memory leak when unshadowing
  KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid desc
  KVM: arm64: vgic: Free private_irqs when init fails after allocation
  KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bits
  RISC-V: KVM: Fix sign extension for MMIO loads
  RISC-V: KVM: Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
  riscv: kvm: return SBI_ERR_FAILURE for pmu_event_info() when OOM
  riscv: kvm: return SBI_ERR_FAILURE for pmu_snapshot_set_shmem() when OOM
  RISC-V: KVM: Fix invalid HVA warning in steal-time recording

commit | commitdiff | tree

Linus Torvalds [Sun, 24 May 2026 18:00:45 +0000 (11:00 -0700)]

Merge tag 'x86-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Ingo Molnar:

- On SEV guests, handle set_memory_{encrypted,decrypted}() failures
   more conservatively by assuming that all affected pages are
   unencrypted (Carlos López)

- Disable broadcast TLB flush when PCID is disabled (Tom Lendacky)

- Fix VMX vs. hrtimer_rearm_deferred() regression (Peter Zijlstra)

- Move IRQ/NMI dispatch code from KVM into x86 core, to prepare for a
   KVM x2apic fix (Peter Zijlstra)

- Fix incorrect munmap() size on map_vdso() failure (Guilherme Giacomo
   Simoes)

* tag 'x86-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  virt: sev-guest: Explicitly leak pages in unknown state
  x86/mm: Disable broadcast TLB flush when PCID is disabled
  x86/kvm/vmx: Fix VMX vs hrtimer_rearm_deferred()
  x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core
  x86/vdso: Fix incorrect size in munmap() on map_vdso() failure

commit | commitdiff | tree

Linus Torvalds [Sun, 24 May 2026 17:55:21 +0000 (10:55 -0700)]

Merge tag 'irq-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull irqchip driver fixes from Ingo Molnar:

- Fix the hardware probing error path of the renesas-rzt2h
   irqchip driver

- Fix the exynos-combiner irqchip driver on -rt kernels
   by turning the IRQ controller spinlock into a raw spinlock

* tag 'irq-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  irqchip/renesas-rzt2h: Use pm_runtime_put_sync() in probe error path
  irqchip/exynos-combiner: Switch to raw_spinlock

commit | commitdiff | tree

Linus Torvalds [Sun, 24 May 2026 17:48:55 +0000 (10:48 -0700)]

Merge tag 'core-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull debugobjects fix from Ingo Molnar::

- Fix debugobjects regression on -rt kernels: don't fill the pool
   (which uses a coarse lock) if ->pi_blocked_on, because that messes up
   the priority inheritance of callers

* tag 'core-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  debugobjects: Do not fill_pool() if pi_blocked_on

commit | commitdiff | tree

Linus Torvalds [Sun, 24 May 2026 17:37:55 +0000 (10:37 -0700)]

Merge tag 'hwmon-for-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging

Pull hwmon fixes from Guenter Roeck:

- adm1266: Various fixes from Abdurrahman Hussain

   The fixed issues were reported by Sashiko as part of a code review of
   a functional change in the driver.

- lenovo-ec-sensors: Convert to devm_request_region() to fix
   release_region cleanup, and fix EC "MCHP" signature validation logic,
   from Kean Ren

* tag 'hwmon-for-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
  hwmon: (pmbus/adm1266) serialize sequencer_state debugfs read with pmbus_lock
  hwmon: (pmbus/adm1266) serialize NVMEM blackbox read with pmbus_lock
  hwmon: (pmbus/adm1266) serialize GPIO PMBus accesses with pmbus_lock
  hwmon: (pmbus/adm1266) register the nvmem device after pmbus_do_probe()
  hwmon: (pmbus/adm1266) register the gpio_chip after pmbus_do_probe()
  hwmon: (pmbus/adm1266) reject short block-read responses in the GPIO accessors
  hwmon: (pmbus/adm1266) don't clobber GPIO bits before PDIO read in get_multiple
  hwmon: (pmbus/adm1266) cap PDIO scan in get_multiple at ADM1266_PDIO_NR
  hwmon: (pmbus/adm1266) bounce blackbox records through a protocol-sized buffer
  hwmon: (pmbus/adm1266) include adapter number in GPIO line label
  hwmon: (pmbus/adm1266) include PEC byte in pmbus_block_xfer read buffer
  hwmon: (pmbus/adm1266) reject implausible blackbox record_count
  hwmon: (pmbus/adm1266) widen blackbox-info buffer to I2C_SMBUS_BLOCK_MAX
  hwmon: (pmbus/adm1266) seed timestamp from the real-time clock
  hwmon: (lenovo-ec-sensors): Fix EC "MCHP" signature validation logic
  hwmon: (lenovo-ec-sensors): Convert to devm_request_region()

commit | commitdiff | tree

Nathan Chancellor [Mon, 18 May 2026 22:17:14 +0000 (15:17 -0700)]

drm/msm: Restore second parameter name in purge() and evict()

After commit 3392291fc509 ("drm/msm: Fix shrinker deadlock"), all
supported versions of clang warn (or error with CONFIG_WERROR=y):

  drivers/gpu/drm/msm/msm_gem_shrinker.c:105:58: error: omitting the parameter name in a function definition is a C23 extension [-Werror,-Wc23-extensions]
    105 | purge(struct drm_gem_object *obj, struct ww_acquire_ctx *)
        |                                                          ^
  drivers/gpu/drm/msm/msm_gem_shrinker.c:117:58: error: omitting the parameter name in a function definition is a C23 extension [-Werror,-Wc23-extensions]
    117 | evict(struct drm_gem_object *obj, struct ww_acquire_ctx *)
        |                                                          ^
  2 errors generated.

With older but supported versions of GCC, this is an unconditional hard error:

  drivers/gpu/drm/msm/msm_gem_shrinker.c: In function 'purge':
  drivers/gpu/drm/msm/msm_gem_shrinker.c:105:35: error: parameter name omitted
   purge(struct drm_gem_object *obj, struct ww_acquire_ctx *)
                                     ^~~~~~~~~~~~~~~~~~~~~~~
  drivers/gpu/drm/msm/msm_gem_shrinker.c: In function 'evict':
  drivers/gpu/drm/msm/msm_gem_shrinker.c:117:35: error: parameter name omitted
   evict(struct drm_gem_object *obj, struct ww_acquire_ctx *)
                                     ^~~~~~~~~~~~~~~~~~~~~~~

Restore the parameter name to clear up the warnings, renaming it
"unused" to make it clear it is only needed to satisfy the prototype of
drm_gem_lru_scan().

Cc: stable@vger.kernel.org
Fixes: 3392291fc509 ("drm/msm: Fix shrinker deadlock")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

commit | commitdiff | tree

Linus Torvalds [Sun, 24 May 2026 16:53:17 +0000 (09:53 -0700)]

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Fix bpf_throw() and global subprog combination (Kumar Kartikeya
   Dwivedi)

- Fix out of bounds access in BPF interpreter (Yazhou Tang)

- Fix potential out of bounds access in inner per-cpu array map
   (Guannan Wang)

- Reject NULL data/sig in bpf_verify_pkcs7_signature (KP Singh)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  libbpf: fix off-by-one in emit_signature_match jump offset
  bpf: Reject NULL data/sig in bpf_verify_pkcs7_signature
  selftests/bpf: Cover global subprog exception leaks
  bpf: Check global subprog exception paths
  bpf: make bpf_session_is_return() reference optional
  bpf: Use array_map_meta_equal for percpu array inner map replacement
  selftests/bpf: Add test for large offset bpf-to-bpf call
  bpf: Fix s16 truncation for large bpf-to-bpf call offsets
  bpf: Fix out-of-bounds read in bpf_patch_call_args()

commit | commitdiff | tree

Linus Torvalds [Sat, 23 May 2026 23:59:02 +0000 (16:59 -0700)]

Merge tag 'v7.1-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- fix for creating tmpfiles

- fix durable reconnect error path

- validate SID in security descriptor when inheriting DACL

* tag 'v7.1-rc5-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  smb/server: promote S_DEL_ON_CLS to S_DEL_PENDING when close
  ksmbd: validate SID in parent security descriptor during ACL inheritance
  ksmbd: fix durable reconnect error path file lifetime

commit | commitdiff | tree

Linus Torvalds [Sat, 23 May 2026 23:54:48 +0000 (16:54 -0700)]

Merge tag 'for-7.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"A batch of fixes to simple quotas:

   - add conditional rescheduling point not dependent on the lock during
     inode iterations to avoid delays with PREEMPT_NONE enabled

   - fix subvolume deletion so it does not break the squota invariants

   - properly handle enabling squota, tracking extents in the initial
     transaction

   - catch and warn about underflows, clamp to zero to avoid further
     problems

  And one fix to inode size handling:

   - fix handling of preallocated extents beyond i_size when not using
     the no-holes feature"

* tag 'for-7.1-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: swallow btrfs_record_squota_delta() ENOENT
  btrfs: clamp to avoid squota underflow
  btrfs: fix squota accounting during enable generation
  btrfs: check for subvolume before deleting squota qgroup
  btrfs: always drop root->inodes lock before cond_resched()
  btrfs: mark file extent range dirty after converting prealloc extents

commit | commitdiff | tree

Linus Torvalds [Sat, 23 May 2026 23:51:22 +0000 (16:51 -0700)]

Merge tag 'xfs-fixes-7.1-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fix from Carlos Maiolino:
"A single fix for a race in xfs buffer cache which may lead to
  filesystem shutdown due to inconsistent metadata if the buffer
  lookup happens to find an old dead buffer still in the cache"

* tag 'xfs-fixes-7.1-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  xfs: fix a buffer lookup against removal race

commit | commitdiff | tree

Linus Torvalds [Sat, 23 May 2026 16:21:08 +0000 (09:21 -0700)]

Merge tag 'nios2_updates_for_v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux

Pull nios2 fixes from Dinh Nguyen:

- Implement _THIS_IP_ for inline asm

- Add Simon Schuster as a maintainer and mark the NIOS2 as Supported

* tag 'nios2_updates_for_v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: Implement _THIS_IP_ using inline asm
MAINTAINERS: arch/nios2: Add Simon Schuster as co-maintainer

commit | commitdiff | tree

Linus Torvalds [Sat, 23 May 2026 16:13:00 +0000 (09:13 -0700)]

Merge tag 'loongarch-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson

Pull LoongArch fixes from Huacai Chen:
"Rework KASLR to avoid initrd overlap, remove some unused code to avoid
  a build warning, fix some bugs in kprobes and KVM"

* tag 'loongarch-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
  LoongArch: KVM: Move some variable declarations to paravirt.h
  LoongArch: kprobes: Fix handling of fatal unrecoverable recursions
  LoongArch: kprobes: Use larch_insn_text_copy() to patch instructions
  LoongArch: Remove unused code to avoid build warning
  LoongArch: Avoid initrd overlap during kernel relocation
  LoongArch: Skip relocation-time KASLR if already applied
  efi/loongarch: Randomize kernel preferred address for KASLR

commit | commitdiff | tree

KP Singh [Fri, 22 May 2026 21:53:36 +0000 (23:53 +0200)]

libbpf: fix off-by-one in emit_signature_match jump offset

The offset for the cleanup-label jump is computed before the MOV R7
instruction is emitted, but the JMP lands after it. Account for the
extra insn in the offset calculation (-2 instead of -1). Drop the
redundant self-loop in the else branch; gen->error = -ERANGE already
marks the generation as failed.

Fixes: fb2b0e290147 ("libbpf: Update light skeleton for signing")
Signed-off-by: KP Singh <kpsingh@kernel.org>
Link: https://lore.kernel.org/r/20260522215337.662271-2-kpsingh@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Sat, 23 May 2026 14:49:05 +0000 (07:49 -0700)]

Merge tag 'driver-core-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core

Pull driver core fixes from Danilo Krummrich:

- Remove the software node on platform device release(); without this,
   the software node remains registered after the device is gone and a
   subsequent platform_device_register_full() reusing the same node
   fails with -EBUSY

- In sysfs_update_group(), do not remove a pre-existing directory when
   create_files() fails; the previous code would silently destroy a
   sysfs group that the caller did not create

- Set fwnode->secondary to NULL in fwnode_init() to avoid dereferencing
   uninitialized memory (e.g. in dev_to_swnode()) when the firmware node
   is allocated on the stack or via a non-zeroing allocator

* tag 'driver-core-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/driver-core/driver-core:
  device property: set fwnode->secondary to NULL in fwnode_init()
  sysfs: don't remove existing directory on update failure
  driver core: platform: remove software node on release()

commit | commitdiff | tree

Linus Torvalds [Sat, 23 May 2026 14:32:39 +0000 (07:32 -0700)]

Merge tag 'i2c-for-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:
"Core:
   - smbus: fix a potential uninitialization bug

  Tegra:
   - drop runtime PM reference when exiting on mutex_lock failure
   - preserve transfer errors when releasing the mutex"

* tag 'i2c-for-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  i2c: smbus: fix a potential uninitialization bug
  i2c: tegra: make tegra_i2c_mutex_unlock() return void
  i2c: tegra: fix pm_runtime leak on mutex_lock failure

commit | commitdiff | tree

Linus Torvalds [Sat, 23 May 2026 14:17:27 +0000 (07:17 -0700)]

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Jason Gunthorpe:

- syzbot triggred crash in rxe due to concurrent plug/unplug

- Possible non-zero'd memory exposed to userspace in bnxt_re

- Malicous 'magic packet' with SIW causes a buffer overflow

- Tighten the new uAPI validation code to not crash in debugging prints
   and have the right module dependencies in drivers

- mana was missing the max_msg_sz report to userspace

- UAF in rtrs on an error path

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  RDMA/rtrs: Fix use-after-free in path file creation cleanup
  RDMA/mana_ib: Report max_msg_sz in mana_ib_query_port
  RDMA/core: Do not read wild stack memory in uverbs_get_handler_fn()
  RDMA/core: Move the _ib_copy_validate_udata* functions to ib_core_uverbs
  RDMA/siw: Reject MPA FPDU length underflow before signed receive math
  RDMA/bnxt_re: zero shared page before exposing to userspace
  selftests/rdma: explicitly skip tests when required modules are missing
  RDMA/nldev: Add mutual exclusion in nldev_dellink()

commit | commitdiff | tree

Carlos Maiolino [Sat, 23 May 2026 14:15:18 +0000 (16:15 +0200)]

Merge tag 'xfs-fixes-7.1-rc5' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux into test_merge

xfs: fixes for v7.1-rc5

Signed-off-by: Carlos Maiolino <cem@kernel.org>
Lines starting with '#' will be ignored.

commit | commitdiff | tree

Linus Torvalds [Sat, 23 May 2026 14:13:06 +0000 (07:13 -0700)]

Merge tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl

Pull fwctl fix from Jason Gunthorpe:

- Buffer overflow due to missing input validation in pds

* tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl:
fwctl: pds: Validate RPC input size before parsing

commit | commitdiff | tree

Tejun Heo [Fri, 22 May 2026 17:22:16 +0000 (07:22 -1000)]

bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena()

struct bpf_arena is opaque to callers outside arena.c. Add two helpers
for struct_ops subsystems that need to reach into an arena:

  bpf_arena_map_kern_vm_start(struct bpf_map *map)
    returns @map's kern_vm_start. A sched_ext follow-up needs this
    to translate kern_va <-> uaddr.

  bpf_prog_arena(struct bpf_prog *prog)
    returns the bpf_map of the arena referenced by @prog (NULL if
    @prog references no arena). The verifier enforces at most one
    arena per program. Used by struct_ops callers that auto-discover
    an arena from a member prog and need to take a map reference.

Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260522172219.1423324-6-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Tejun Heo [Fri, 22 May 2026 17:22:15 +0000 (07:22 -1000)]

bpf: Add bpf_struct_ops_for_each_prog()

Add a helper that walks the member progs of the struct_ops map
containing a given @kdata vmtable. struct_ops ->reg() callbacks (and
similar) sometimes need to inspect the loaded BPF programs, e.g. to
discover maps they reference via prog->aux->used_maps.

The implementation mirrors bpf_struct_ops_id(): container_of @kdata
to recover the bpf_struct_ops_map, then iterate st_map->links[i]->prog
for i in [0, funcs_cnt). Same access pattern, no new locking - by the
time ->reg() fires st_map is fully populated and stable.

A sched_ext follow-up walks the member progs of a cid-form scheduler's
struct_ops map, reads prog->aux->arena directly, and requires all member
progs to reference exactly one arena, without requiring the BPF program
to call a registration kfunc.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260522172219.1423324-5-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Tejun Heo [Fri, 22 May 2026 17:22:14 +0000 (07:22 -1000)]

bpf: Add sleepable variant of bpf_arena_alloc_pages for kernel callers

The existing kernel-side export of bpf_arena_alloc_pages is _non_sleepable
only - it's used by the verifier to inline the kfunc when the call site is
non-sleepable. There is no sleepable equivalent for kernel callers. The
kfunc bpf_arena_alloc_pages itself is BPF-only.

sched_ext needs sleepable kernel-side allocs for its arena pool init/grow
paths. Add bpf_arena_alloc_pages_sleepable() mirroring the _non_sleepable
wrapper but passing sleepable=true to arena_alloc_pages().

Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Link: https://lore.kernel.org/r/20260522172219.1423324-4-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Kumar Kartikeya Dwivedi [Fri, 22 May 2026 17:22:13 +0000 (07:22 -1000)]

bpf: Recover arena kernel faults with scratch page

BPF arena usage is becoming more prevalent, but kernel <-> BPF communication
over arena memory is awkward today. Data has to be staged through a trusted
kernel pointer with extra code and copying on the BPF side. While reads
through arena pointers can use a fault-safe helper, writes don't have a good
solution. The in-line alternative would need instruction emulation or asm
fixup labels.

Enable direct kernel-side reads and writes within GUARD_SZ / 2 of any
handed-in arena pointer, without bounds checking. A per-arena scratch page
is installed by the arch fault path into empty arena kernel PTEs - x86 from
page_fault_oops() for not-present faults, arm64 from __do_kernel_fault() for
translation faults, both after the existing exception-table and KFENCE
handling. The faulting instruction retries and the access is also reported
through the program's BPF stream, preserving error reporting.

bpf_prog_find_from_stack() resolves the current BPF program (and its arena)
from the kernel stack - no new bpf_run_ctx state is added. Recovery covers
the 4 GiB arena plus the upper half-guard (GUARD_SZ / 2). The lower
half-guard is excluded because well-behaved kfuncs only access forward from
arena pointers. The kfunc-author contract - access at most GUARD_SZ / 2 past
a handed-in pointer - is documented in Documentation/bpf/kfuncs.rst.

The install is lock-free via ptep_try_set(). On race-loss the winning
installer's PTE is already valid, so the access retry succeeds. The arena
clear path uses ptep_get_and_clear() so installer and clearer race through
atomic accessors. No flush_tlb_kernel_range() afterwards. Stale "not mapped"
entries just cause one extra re-fault, cheaper than a global IPI on every
install.

Scratch exists only to keep the kernel from oopsing on an in-line arena
access. Its presence at a PTE means the BPF program has already
malfunctioned, and the violation is reported through the program's BPF
stream. The only requirement for behavior on a scratched PTE is that the
kernel doesn't crash. In particular, any user-side access through such a PTE
may segfault. The shared scratch page is freed once during map destruction.

BPF instruction faults continue to use the existing JIT exception-table
path. This patch changes only the kernel-text fault path. No UAPI flag is
added. The new behavior is the default.

v2: Use ptep_get_and_clear() in apply_range_clear_cb(). (David)
v3: Stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL. (lkp)

Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Cc: David Hildenbrand <david@kernel.org>
Link: https://lore.kernel.org/r/20260522172219.1423324-3-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Tejun Heo [Fri, 22 May 2026 17:22:12 +0000 (07:22 -1000)]

mm: Add ptep_try_set() for lockless empty-slot installs

Add ptep_try_set(ptep, new_pte): atomically set *ptep to new_pte iff it is
currently pte_none(). Returns true on success, false if the slot was already
populated or the arch has no implementation.

The intended caller is the upcoming bpf_arena kernel-side fault recovery
path. The install runs from a page fault that can be nested under locks
held by the faulting kernel caller (e.g. a BPF program holding
raw_res_spin_lock_irqsave on its arena's spinlock), so trylock-and-retry
would A-A deadlock. Lock-free cmpxchg is the only viable option, which
constrains this helper to special kernel page tables where concurrent
writers cooperate via atomic accessors.

The generic version in <linux/pgtable.h> returns false. x86 and arm64
override with try_cmpxchg-based implementations on the underlying pteval.
Other architectures get the false stub - the callers there already fall
through to oops.

v2: Rename to ptep_try_set(). Tighten kerneldoc. (David, Alexei)
v3: Note that strict-zero cmpxchg is narrower than pte_none(). (Andrea)

Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Acked-by: David Hildenbrand (arm) <david@kernel.org>
Link: https://lore.kernel.org/r/20260522172219.1423324-2-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Tina Zhang [Fri, 22 May 2026 04:00:14 +0000 (12:00 +0800)]

KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235)

Hygon Family 18h CPUs are derived from AMD Family 17h (Zen1) silicon and
share the same erratum #1235: hardware may read a stale IsRunning=1 bit
during ICR write emulation and silently fail to generate an
AVIC_IPI_FAILURE_TARGET_NOT_RUNNING VM-Exit on the sending vCPU.

The absence of the VM-Exit causes KVM to miss the required wakeup of
blocking target vCPUs, leading to hung vCPUs and unbounded delays in
guest execution.

Extend the existing AMD Family 17h erratum #1235 workaround to also cover
Hygon Family 18h. With IPI virtualization disabled, KVM never sets
IsRunning=1 in the Physical ID table, so every non-self IPI generates a
VM-Exit and is correctly emulated.

Fixes: 8de4a1c8164e ("KVM: SVM: Disable (x2)AVIC IPI virtualization if CPU has erratum #1235")
Cc: <stable@vger.kernel.org>
Signed-off-by: Tina Zhang <zhang_wei@open-hieco.net>
Message-ID: <20260522040014.3380201-1-zhang_wei@open-hieco.net>

commit | commitdiff | tree

Sean Christopherson [Fri, 22 May 2026 17:35:26 +0000 (10:35 -0700)]

KVM: selftests: Verify that KVM returns the configured APIC cycle length

Add checks in the APIC bus clock test to verify that querying
KVM_CAP_X86_APIC_BUS_CYCLES_NS on the VM after changing the frequency
returns the VM's actual APIC cycle length, not KVM's default. For
giggles, verify that KVM still returns its default frequency for the
system-scoped check.

Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260522173526.3539407-3-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Sean Christopherson [Fri, 22 May 2026 17:35:25 +0000 (10:35 -0700)]

KVM: x86: Return the VM's configured APIC bus frequency when queried

When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM, return the
VM's configured APIC bus frequency, not KVM's default. Aside from the fact
that returning the default frequency is blatantly wrong if userspace has
changed the frequency, returning the configured frequency means userspace
can blindly trust the result, e.g. when filling PV CPUID information that
communicates the APIC bus frequency to the guest.

Fixes: 6fef518594bc ("KVM: x86: Add a capability to configure bus frequency for APIC timer")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Closes: https://lore.kernel.org/all/ab84153e33fbe7c25667f595c56b310d4d5a93ef.camel@infradead.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260522173526.3539407-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Hisam Mehboob [Thu, 9 Apr 2026 16:40:22 +0000 (21:40 +0500)]

KVM: selftests: elf: Include <endian.h> instead of <bits/endian.h>

<bits/endian.h> is a glibc-internal header that explicitly states it
should never be included directly:

#error "Never use <bits/endian.h> directly; include <endian.h> instead."

Replace it with the correct public header <endian.h> which works on
all C libraries including musl. Building KVM selftests with musl-gcc
fails with:

lib/elf.c:10:10: fatal error: bits/endian.h: No such file or directory

Fixes: 6089ae0bd5e1 ("kvm: selftests: add sync_regs_test")
Signed-off-by: Hisam Mehboob <hisamshar@gmail.com>
Message-ID: <20260409164020.1575176-4-hisamshar@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

commit | commitdiff | tree

Paolo Bonzini [Sat, 23 May 2026 08:04:35 +0000 (10:04 +0200)]

Merge tag 'kvm-riscv-fixes-7.1-1' of https://github.com/kvm-riscv/linux into HEAD

KVM/riscv fixes for 7.1, take #1

- Fix invalid HVA warning in steal-time recording
- Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info()
and pmu_snapshot_set_shmem()
- Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
- Fix sign extension of value for MMIO loads

commit | commitdiff | tree

Paolo Bonzini [Sat, 23 May 2026 08:03:58 +0000 (10:03 +0200)]

Merge tag 'kvm-s390-master-7.1-2' of https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD

KVM: s390: some vSIE and UCONTROL fixes

Fix some memory issues and some hangs in vSIE.

commit | commitdiff | tree

Paolo Bonzini [Sat, 23 May 2026 08:03:10 +0000 (10:03 +0200)]

Merge tag 'kvmarm-fixes-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 7.1, take #3

- Fix ITS EventID sanitisation when restoring an interrupt translation
table.

- Fix PPI memory leak when failing to initialise a vcpu.

- Correctly return an error when the validation of a hypervisor trace
descriptor fails, and limit this validation to protected mode only.

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 23:43:33 +0000 (16:43 -0700)]

Merge tag 'sched_ext-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

- Spurious WARN in ops_dequeue() racing with concurrent dispatch

- Self-deadlock between scheduler disable and a concurrent sub-sched
   enable

* tag 'sched_ext-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  sched_ext: Fix spurious WARN on stale ops_state in ops_dequeue()
  sched_ext: Fix deadlock between scx_root_disable() and concurrent forks

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 23:28:47 +0000 (16:28 -0700)]

Merge tag 'cgroup-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
"Two rstat fixes:

   - Out-of-bounds access in the css_rstat_updated() BPF kfunc when
     called with an unchecked user-supplied cpu

   - Over-strict NMI guard after the recent switch to try_cmpxchg left
     sparc and ppc64 unable to queue rstat updates from NMI"

* tag 'cgroup-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: rstat: relax NMI guard after switch to try_cmpxchg
  cgroup/rstat: validate cpu before css_rstat_cpu() access

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 23:15:32 +0000 (16:15 -0700)]

Merge tag 'drm-fixes-2026-05-23' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"Regular fixes pull, amdgpu/xe being the usual, with bonus msm content
  to bulk things out, otherwise it has the usual scattered changes, with
  amdxdna dropping a badly thought out userspace api.

  gem:
   - clean up LRU locking

  msm:
   - Core:
     - Fixed bindings for SM8650, SM8750 and Eliza
     - Don't use UTS_RELEASE directly
     - Fix typo in clock-names property
   - DPU:
      - Fixed CWB description on Kaanapali
      - Fixed scanline strides for YUV UBWC formats
      - Stopped DSI register dumping to access past the end of region
   - DSI:
      - Fix dumping unaligned regions
   - GPU:
      - Fix GMEM_BASE for a6xx gen3
      - Fix userspace reachable crash on a2xx-a4xx
      - Fix sysprof_active for counter collection with IFPC enabled GPUs
      - Fix shrinker lockdep

  amdgpu:
   - Userq fixes
   - VPE fix
   - SMU 15 fix
   - Misc fixes
   - VCE fixes
   - DC bios parsing fixes
   - DC aux fix
   - Mode1 reset fix
   - RAS fixes

  amdkfd:
   - Misc fixes

  radeon:
   - CS parser fix

  xe:
   - SRIOV related fixes
   - Fix leak and double-free
   - Multi-cast register fixes
   - Multi-queue fix

  i915:
   - Fix joiner color pipeline selection [display]
   - Fix readback for target_rr in Adaptive Sync SDP [dp]
   - Apply Intel DPCD workaround when SDP on prior line used [psr]

  amdxdna:
   - remove mmap and export for ubuf

  bridge:
   - chipone-icn6211: managed bridge cleanup
   - lt66121: acquire reset GPIO
   - megachips: fix clean up on failed IRQ requests

  v3d:
   - fix UAF in error code paths
   - release GEM-object ref on free'd jobs

  virtio:
   - use uninterruptible resv locking in plane updates

  mediatek:
   - fix sparse warnings"

* tag 'drm-fixes-2026-05-23' of https://gitlab.freedesktop.org/drm/kernel: (78 commits)
  drm/xe/oa: Fix exec_queue leak on width check in stream open
  drm/virtio: use uninterruptible resv lock for plane updates
  drm/amdgpu: fix handling in amdgpu_userq_create
  drm/radeon/evergreen_cs: Add missing NULL prefix check in surface check
  drm/amdgpu: userq_va_mapped should remain true once done
  drm/amdgpu: avoid integer overflow in VA range check
  drm/amd/ras: Fix UMC error address allocation leak
  drm/amdgpu: unmap all user mappings of framebuffer and doorbell before mode1 reset
  drm/amd/display: Validate payload length and link_index in dc_process_dmub_aux_transfer_async
  drm/amd/display: Validate GPIO pin LUT table size before iterating
  drm/amd/display: Fix integer overflow in bios_get_image()
  drm/amdkfd: Check bounds for allocate_sdma_queue restore_sdma_id
  drm/amdgpu: use atomic operation to achieve lockless serialization
  drm/amdkfd: Check bounds on allocate_doorbell
  drm/amdgpu/vce3: Fix VCE 3 firmware size and offsets
  drm/amdgpu/vce2: Fix VCE 2 firmware size and offsets
  drm/amdgpu/vce1: Stop using amdgpu_vce_resume
  drm/amdgpu/vce1: Fix VCE 1 firmware size and offsets
  drm/amdgpu/vce1: Don't repeat GTT MGR node allocation
  drm/amdgpu/vce1: Check if VRAM address is lower than GART.
  ...

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 23:08:06 +0000 (16:08 -0700)]

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Small fixes, two in drivers and the remaining a sign conversion probem
  in sd with no user visible consequences (non-zero is error)"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: target: tcm_loop: Fix NULL ptr dereference
  scsi: isci: Fix use-after-free in device removal path
  scsi: sd: Fix return code handling in sd_spinup_disk()

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 22:45:26 +0000 (15:45 -0700)]

Merge tag 'platform-drivers-x86-v7.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver fixes from

- Add ACPI_HANDLE()/ACPI_COMPANION() NULL checks (many drivers) to
   handle match overrides gracefully

- asus-armoury:
    - Fix mini-LED mode get/set
    - Add support for FA401EA, FX607VU, G614FR, and GU605CP

- bitland-mifs-wmi:
    - Add CONFIG_LEDS_CLASS dependency

- hp-wmi:
    - Add thermal support for Omen 16-c0xxx (board 8902)

- intel/vsec:
    - Fix enable_cnt imbalance due to PCIe error recovery

- surface/aggregator_registry:
    - Remove battery & AC nodes on Surface Laptop 7 to avoid duplicated
      devices

- uniwill-laptop:
    - Handle uninitialized and invalid charging threshold values
    - Accept charging threshold of 0 through power supply sysfs ABI and
      clamp it to 1
    - Make 'force' parameter to work also when device descriptor is
      found
    - Do not enable charging limit despite the 'force' parameter to
      avoid permanent damage to battery

* tag 'platform-drivers-x86-v7.1-4' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (35 commits)
  platform/x86: bitland-mifs-wmi: add CONFIG_LEDS_CLASS dependency
  platform/x86: wireless-hotkey: Check ACPI_COMPANION() against NULL
  platform/x86: toshiba_haps: Check ACPI_COMPANION() against NULL
  platform/x86: toshiba_bluetooth: Check ACPI_COMPANION() against NULL
  platform/x86: toshiba_acpi: Check ACPI_COMPANION() against NULL
  platform/x86: system76: Check ACPI_COMPANION() against NULL
  platform/x86: sony-laptop: Check ACPI_COMPANION() against NULL
  platform/x86: panasonic-laptop: Check ACPI_COMPANION() against NULL
  platform/x86: lg-laptop: Check ACPI_COMPANION() against NULL
  platform/x86: intel/smartconnect: Check ACPI_HANDLE() against NULL
  platform/x86: intel/rst: Check ACPI_COMPANION() against NULL
  platform/x86: fujitsu-tablet: Check ACPI_COMPANION() against NULL
  platform/x86: fujitsu: Check ACPI_COMPANION() against NULL
  platform/x86: eeepc-laptop: Check ACPI_COMPANION() against NULL
  platform/x86: dell/dell-rbtn: Check ACPI_COMPANION() against NULL
  platform/x86: asus-laptop: Check ACPI_COMPANION() against NULL
  platform/x86: acer-wireless: Check ACPI_COMPANION() against NULL
  platform/x86: asus-armoury: add support for GU605CP
  platform/x86: asus-armoury: add support for FA401EA
  platform/x86: asus-armoury: add support for G614FR
  ...

commit | commitdiff | tree

Dave Airlie [Fri, 22 May 2026 21:50:49 +0000 (07:50 +1000)]

Merge tag 'drm-xe-fixes-2026-05-21' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

- SRIOV related fixes (Wajdeczko, Mohanram)
- Fix leak and double-free (Lin)
- Multi-cast register fixes (Gustavo)
- Multi-queue fix (Niranjana)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://patch.msgid.link/ag9rR5VwCdkA0lzI@intel.com

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 20:23:21 +0000 (13:23 -0700)]

Merge tag 'phy-fixes-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy

Pull phy fixes from Vinod Koul:

- Big pile of Qualcomm DP/eDP config fixes and kaanapali PHY PLL
   lock failure fix

- Apple typec switch/mux leak fix

- Marvell incoorect register fix for mvebu utmi phy

- Tegra per-pad calibration fix

* tag 'phy-fixes-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/phy/linux-phy:
  phy: qcom: qmp-usbc: Fix out-of-bounds array access in dp swing config
  phy: apple: atc: Fix typec switch/mux leak on unbind
  phy: spacemit: Remove incorrect clk_disable() in spacemit_usb2phy_init()
  phy: eswin: Fix incorrect error check in probe()
  phy: qcom-qmp-ufs: Fix kaanapali PHY PLL lock failure after SM8650 G4 fix
  phy: exynos5-usbdrd: fix USB 2.0 HS PHY tuning values for Exynos7870
  phy: tegra: xusb: Fix per-pad high-speed termination calibration
  phy: marvell: mvebu-a3700-utmi: fix incorrect USB2_PHY_CTRL register access
  phy: qcom: edp: Add PHY-specific LDO config for eDP low vdiff
  phy: qcom: edp: Fix AUX_CFG8 programming for DP mode
  phy: qcom: edp: Add SC7280/SC8180X swing/pre-emphasis tables
  phy: qcom: edp: Add eDP/DP mode switch support
  phy: qcom: edp: Unify generic DP/eDP swing and pre-emphasis tables

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 20:19:41 +0000 (13:19 -0700)]

Merge tag 'spi-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"Another batch of driver fixes from Johan fixing error handling paths,
  plus another from Felix. We also have a new device ID added in the DT
  bindings for SpacemiT K3"

* tag 'spi-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: dt-bindings: fsl-qspi: support SpacemiT K3
  spi: ti-qspi: fix use-after-free after DMA setup failure
  spi: sprd: fix error pointer deref after DMA setup failure
  spi: qup: fix error pointer deref after DMA setup failure
  spi: mtk-snfi: Fix resource leak in mtk_snand_read_page_cache()
  spi: ep93xx: fix error pointer deref after DMA setup failure

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 20:17:29 +0000 (13:17 -0700)]

Merge tag 'regulator-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A couple of fixes here, one very minor Kconfig fix and a fix for a
  nasty issue with error reporting in the tps65219 driver"

* tag 'regulator-fix-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: tps65219: fix irq_data.rdev not being assigned
  regulator: Kconfig: fix a typo in help

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 19:33:28 +0000 (12:33 -0700)]

Merge tag 'pinctrl-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl

Pull pin control fixes from Linus Walleij:

- Implement the GPIO .get_direction() callback in the Mediatek driver
   to rid dmesg warnings

- Mark the Qualcomm IPQ4019 pins used as GPIO as using the GPIO pin
   function, so there is no conflict with orthogonal muxing

- Fix incorrect settings of the "PUPD" (pull-up-pull-down) register
   during suspend/resume in the Renesas RZG2L

- Fix the SMT register cache to be per-bank in the Renesas RZG2L

- Fix the QDSS track clock and control pin group names in the Qualcomm
   Eliza driver

- Fix a deadlock in the Amlogic driver, caused by playing around in
   sysfs

- Fix some GPIO wakeup interrupt handling in Qualcomm QCS615. and a
   similar fix for the Qualcomm SM8150

- Allow parsing DTs without explicit function nodes in the Freescale
   i.MX1 driver

- Enable the IRQ for the WACF2200 touchscreen using a DMI quirk

* tag 'pinctrl-v7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
  pinctrl-amd: enable IRQ for WACF2200 touchscreen on Lenovo Yoga 7 14AGP11
  pinctrl: imx1: Allow parsing DT without function nodes
  pinctrl: qcom: Fix wakeirq map by removing disconnected irqs for sm8150
  pinctrl: qcom: Fix GPIO to PDC wake irq map for qcs615
  pinctrl: meson: amlogic-a4: fix deadlock issue
  pinctrl: qcom: eliza: Fix QDSS trace clock/control pingroup names
  pinctrl: renesas: rzg2l: Fix SMT register cache handling
  pinctrl: renesas: rzg2l: Fix incorrect PUPD register offset for high pins during suspend/resume
  pinctrl: qcom: ipq4019: mark gpio as a GPIO pin function
  pinctrl: mediatek: moore: implement gpio_chip::get_direction()

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 19:28:47 +0000 (12:28 -0700)]

Merge tag 'gpio-fixes-for-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- propagate the error code from regulator_enable() in resume path in
   gpio-pca953x

- take the device lock when calling device_is_bound() in virtual GPIO
   drivers

- fix software node leak in remove path in gpio-aggregator

- fix a potential use-after-free in gpio-aggregator

- harden the GPIO character device uAPI: check that line config
   attributes are correctly zeroed

* tag 'gpio-fixes-for-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: virtuser: lock device when calling device_is_bound()
  gpio: aggregator: lock device when calling device_is_bound()
  gpio: sim: lock device when calling device_is_bound()
  gpio: aggregator: remove the software node when deactivating the aggregator
  gpio: aggregator: fix a potential use-after-free
  gpio: cdev: check if uAPI v2 config attributes are correctly zeroed
  gpio: pca953x: propagate regulator_enable() error from resume

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 19:22:22 +0000 (12:22 -0700)]

Merge tag 'sound-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"As expected, we still continue receiving lots of small fixes.

  One major change is about HD-audio pending IRQ handling, but this
  would influence only on odd machines or slow VMs. There are a few
  other fixes for the core part, but most of them are not-too-serious
  UAF fixes, while the rest are mostly device-specific fixes and quirks.

  ALSA Core:
   - Fix for PCM silencing with bogus iov_iter
   - Fixes for past-the-end iterators in timer and seq
   - Serialization of UMP output teardown
   - Rate-limit ELD parsing errors

  HD-audio:
   - Fixes for IRQ work handling and SSID matching
   - Various Realtek quirks for HP and ASUS laptops, including LED fixes

  ASoC:
   - Intel: ACPI match table updates for PTL, NVL, and ARL platforms
   - Cirrus Logic: Fixes for cs-amp-lib and cs35l56 codecs
   - Various platform fixes for AMD, FSL SAI, TI OMAP, and Qualcomm
   - DT-binding fix for MediaTek

  Others:
   - USB ua101: Reject too-short USB descriptors
   - Scarlett2: Fix for flash writes
   - ASIHPI: Fix for potential OOB access
   - AMD SPI: Fix for bus number in ACPI probe

  MAINTAINERS:
   - Updates for SOF and TI maintainers"

* tag 'sound-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (47 commits)
  ASoC: codecs: pcm512x: fix null-ptr dereference in pcm512x_overclock_xxx_put()
  ASoC: Intel: soc-acpi-intel-ptl-match: Remove unnecessary cs42l43 match
  ASoC: soc-acpi-intel-ptl-match: Make Chrome matches conditional
  ASoC: Intel: soc-acpi: Add entry for sof_es8336 in NVL match table.
  ASoC: Intel: sof_sdw: Add support for nvlrvp in NVL platform
  ASoC: cs-amp-lib: Fix typo in error message: write -> read
  ASoC: cs-amp-lib: Fix missing dput() after debugfs_lookup()
  ASoC: cs-amp-lib: Fix wrong sizeof() in _cs_amp_set_efi_calibration_data()
  ASoC: cs35l56: Fix flushing of IRQ work in cs35l56_sdw_remove()
  MAINTAINERS: ASoC: Intel/SOF: Remove Ranjani Sridharan as maintainer
  ALSA: seq: Serialize UMP output teardown with event_input
  ALSA: scarlett2: Allow flash writes ending at segment boundary
  ALSA: hda/realtek: Add LED quirk for HP ProBook 430 G6
  ALSA: hda/intel: Make sure to cancel irq-pending work at closing PCM stream
  ALSA: hda: Move irq pending work into hda-intel stream
  ASoC: soc-utils: Add missing va_end in snd_soc_ret()
  ALSA: ua101: Reject too-short USB descriptors
  ALSA: hda/realtek: Fix mute and mic-mute LEDs for HP 16 Piston OmniBook X
  ALSA: seq: avoid past-the-end iterator in snd_seq_create_port()
  ALSA: timer: avoid past-the-end iterator in snd_timer_dev_register()
  ...

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 19:06:23 +0000 (12:06 -0700)]

Merge tag 'block-7.1-20260522' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- NVMe pull request via Keith:
      - Fix memory leak for peer-to-peer addresses
      - Fix dma map leaks on resource errors

- Another bio integrity fix, fixing a recent regression

- Fix for an issue with the request pre-allocation and caching when IO
   is queued, where if a bio split occurred and ended up blocking, the
   list could be corrupted

* tag 'block-7.1-20260522' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  block: avoid use-after-free in disk_free_zone_resources()
  blk-mq: pop cached request if it is usable
  nvme-pci: fix dma mapping leak on data setup error
  nvme-pci: fix dma_vecs leak on p2p memory
  bio-integrity-fs: pass data iter to bio_integrity_verify()

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 18:53:28 +0000 (11:53 -0700)]

Merge tag 'io_uring-7.1-20260522' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

- Fix for an issue with IORING_OP_NOP and using injection results

- Fix for an issue in IORING_OP_WAITID, where the info state was
   assumed cleared by the lower level syscall handler, but for some
   cases it is not. Just clear the data upfront, so that non-initialized
   data isn't copied back to userspace

- Fix for a lockdep reported issue, where IORING_OP_BIND enters file
   create and hence hits mnt_want_write(), which creates a three part
   lockdep cycle between the super lock, io_uring's uring_lock, and the
   cred mutex

- Fix a regression introduced in this cycle with how linked timeouts
   are deleted

- Ensure that the ->opcode nospec indexing on the opcode issue side
   covers all the cases

* tag 'io_uring-7.1-20260522' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/nop: pass all errors to userspace
  io_uring/timeout: splice timed out link in timeout handler
  io_uring: propagate array_index_nospec opcode into req->opcode
  io_uring/waitid: clear waitid info before copying it to userspace
  io_uring/net: punt IORING_OP_BIND async if it needs file create

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 17:52:26 +0000 (10:52 -0700)]

Merge tag 'v7.1-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
- Fix missing lock
- Fix dentry in use after unmounting
- cifs.upcall security fix
- require CAP_NET_ADMIN for swn netlink
- change allocation in DUP_CTX_STR to GFP_KERNEL
- minor smbdirect debug fix
- handle_read_data() folio fix

* tag 'v7.1-rc5-smb3-client-fixes' of git://git.samba.org/sfrench/cifs-2.6:
  smb: client: change allocation requirements in DUP_CTX_STR macro
  smb: client: require net admin for CIFS SWN netlink
  smb: smbdirect: divide, not multiply, milliseconds by 1000
  cifs: Fix busy dentry used after unmounting
  smb: client: use data_len for SMB2 READ encrypted folioq copy
  smb: client: reject userspace cifs.spnego descriptions
  smb: client: protect tc_count increment in smb2_find_smb_sess_tcon_unlocked()

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 17:44:18 +0000 (10:44 -0700)]

Merge tag 'zonefs-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs

Pull zonefs fix from Damien Le Moal:

- Avoid potential overflow when converting a zonefs file number string
to an inode number (from Johannes)

* tag 'zonefs-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/zonefs:
zonefs: handle integer overflow in zonefs_fname_to_fno

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 14:13:13 +0000 (07:13 -0700)]

Merge tag 'pm-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"These fix maximum frequency computation in the intel_pstate driver for
  two processor models, update its documentation and fix issues related
  to the dynamic EPP support (added during the current development
  cycle) in the amd-pstate driver:

   - Fix maximum frequency computation in the intel_pstate driver for
     Raptor Lake-E and Bartlett Lake that are SMP platforms derived from
     hybrid ones (Rafael Wysocki, Henry Tseng)

   - Fix the description of asymmetric packing with SMT in the
     intel_pstate driver documentation (Ricardo Neri)

   - Fix multiple amd-pstate driver issues related to dynamic EPP
     support added recently, including making it opt-in only (K Prateek
     Nayak, Mario Limonciello)"

* tag 'pm-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq/amd-pstate: Drop Kconfig option for dynamic EPP
  cpufreq: intel_pstate: Use HYBRID_SCALING_FACTOR_ADL for Bartlett Lake
  cpufreq: intel_pstate: Use correct scaling factor on Raptor Lake-E
  Documentation: intel_pstate: Fix description of asymmetric packing with SMT
  cpufreq/amd-pstate-ut: Drop policy reference before driver switch
  cpufreq/amd-pstate: Use "epp_default_dc" as default when dynamic_epp is disabled
  cpufreq/amd-pstate: Reorder notifier unregistration and floor perf reset
  cpufreq/amd-pstate: Allow writes to dynamic_epp when state isn't modified
  cpufreq/amd-pstate: Return -ENOMEM on failure to allocate profile_name
  cpufreq/amd-pstate: Grab "amd_pstate_driver_lock" when toggling dynamic_epp

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 14:06:21 +0000 (07:06 -0700)]

Merge tag 'acpi-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI support fix from Rafael Wysocki:
"Unbreak system wakeup on critical battery status in the ACPI battery
driver inadvertently broken during the 7.0 development cycle"

* tag 'acpi-7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: battery: Fix system wakeup on critical battery status

commit | commitdiff | tree

Damien Le Moal [Fri, 22 May 2026 11:56:22 +0000 (20:56 +0900)]

block: avoid use-after-free in disk_free_zone_resources()

The function disk_update_zone_resources() may call
disk_free_zone_resources() in case of error, and following this,
blk_revalidate_disk_zones() will again calls disk_free_zone_resources() if
disk_update_zone_resources() failed. If a zone worker thread is being used
(which is the default for a rotational media zoned device),
disk_free_zone_resources() will try to stop the zone worker thread twice
because disk->zone_wplugs_worker is not reset to NULL when the worker
thread is stopped the first time.

In disk_free_zone_resources(), fix this by correctly clearing
disk->zone_wplugs_worker to NULL when the worker thread is stopped.

And while at it, since disk_free_zone_resources() is always called after a
failed call to disk_update_zone_resources(), remove the unnecessary call
to disk_free_zone_resources() in disk_update_zone_resources().

Fixes: 1365b6904fd0 ("block: allow submitting all zone writes from a single context")
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260522115622.588535-1-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 13:53:11 +0000 (06:53 -0700)]

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- Handle probe on hinted conditional branch instructions.

   BC.cond instructions can be simulated in the same way as B.cond
   instructions, so extend the decode mask for B.cond to cover BC.cond

- Flush the walk cache when unsharing PMD tables. Recent changes to
   huge_pmd_unshare() introduced mmu_gather::unshared_tables but the
   arm64 code was still treating the TLB flushing as only targeting leaf
   entries (TLBI VALE1IS).

   Fix it by using non-leaf-only instructions (TLBI VAE1IS) when
   tlb->unshared_tables is set

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: tlb: Flush walk cache when unsharing PMD tables
  arm64: probes: Handle probes on hinted conditional branch instructions

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 13:40:31 +0000 (06:40 -0700)]

Merge tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux

Pull s390 fixes from Alexander Gordeev:

- Fix PAI NNPA mismatch between counting and recording, where sampling
   reports twice the value

- Fix loss of PAI counter increments during recording on systems with
   many CPUs under heavy load, while counting is not affected

- On some supported machines, CHSC cannot access memory outside the DMA
   zone, causing CHSC command failures. Restore GFP_DMA flag when
   allocating memory for CHSC control blocks

- Align the numbering scheme for higher-level topology structures like
   socket, book, drawer with other hardware identifiers e.g. in sysfs,
   procfs and tools like lscpu

* tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
  s390/topology: Use zero-based numbering for containing entities
  s390/cio: Restore GFP_DMA for CHSC allocation
  s390/pai: Fix missing PAI counter increments under heavy load
  s390/pai: Disable duplicate read of kernel PAI counter value

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 13:23:56 +0000 (06:23 -0700)]

Merge tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fix from Vlastimil Babka:

- Stable fix for a missing cpus_read_lock in one of the cpu sheaves
flushing paths (Qing Wang)

* tag 'slab-for-7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
mm/slub: hold cpus_read_lock around flush_rcu_sheaves_on_cache()

commit | commitdiff | tree

Linus Torvalds [Fri, 22 May 2026 13:16:00 +0000 (06:16 -0700)]

Merge tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux

Pull dma-mapping fixes from Marek Szyprowski:
"Two minor updates for the DMA-mapping code, mainly fixing some rare
  corner cases (Petr Tesarik, Jianpeng Chang)"

* tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux:
  dma-mapping: move dma_map_resource() sanity check into debug code
  dma-direct: fix use of max_pfn

A mirror of Linus' kernel repository