git.ipfire.org Git - thirdparty/linux.git/log

bpf: Improve bounds when tnum has a single possible value

We're hitting an invariant violation in Cilium that sometimes leads to
BPF programs being rejected and Cilium failing to start [1]. The
following extract from verifier logs shows what's happening:

  from 201 to 236: R1=0 R6=ctx() R7=1 R9=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) R10=fp0
  236: R1=0 R6=ctx() R7=1 R9=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100)) R10=fp0
  ; if (magic == MARK_MAGIC_HOST || magic == MARK_MAGIC_OVERLAY || magic == MARK_MAGIC_ENCRYPT) @ bpf_host.c:1337
  236: (16) if w9 == 0xe00 goto pc+45   ; R9=scalar(smin=umin=smin32=umin32=3585,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100))
  237: (16) if w9 == 0xf00 goto pc+1
  verifier bug: REG INVARIANTS VIOLATION (false_reg1): range bounds violation u64=[0xe01, 0xe00] s64=[0xe01, 0xe00] u32=[0xe01, 0xe00] s32=[0xe01, 0xe00] var_off=(0xe00, 0x0)

We reach instruction 236 with two possible values for R9, 0xe00 and
0xf00. This is perfectly reflected in the tnum, but of course the ranges
are less accurate and cover [0xe00; 0xf00]. Taking the fallthrough path
at instruction 236 allows the verifier to reduce the range to
[0xe01; 0xf00]. The tnum is however not updated.

With these ranges, at instruction 237, the verifier is not able to
deduce that R9 is always equal to 0xf00. Hence the fallthrough pass is
explored first, the verifier refines the bounds using the assumption
that R9 != 0xf00, and ends up with an invariant violation.

This pattern of impossible branch + bounds refinement is common to all
invariant violations seen so far. The long-term solution is likely to
rely on the refinement + invariant violation check to detect dead
branches, as started by Eduard. To fix the current issue, we need
something with less refactoring that we can backport.

This patch uses the tnum_step helper introduced in the previous patch to
detect the above situation. In particular, three cases are now detected
in the bounds refinement:

1. The u64 range and the tnum only overlap in umin.
   u64:  ---[xxxxxx]-----
   tnum: --xx----------x-

2. The u64 range and the tnum only overlap in the maximum value
   represented by the tnum, called tmax.
   u64:  ---[xxxxxx]-----
   tnum: xx-----x--------

3. The u64 range and the tnum only overlap in between umin (excluded)
   and umax.
   u64:  ---[xxxxxx]-----
   tnum: xx----x-------x-

To detect these three cases, we call tnum_step(tnum, umin), which
returns the smallest member of the tnum greater than umin, called
tnum_next here. We're in case (1) if umin is part of the tnum and
tnum_next is greater than umax. We're in case (2) if umin is not part of
the tnum and tnum_next is equal to tmax. Finally, we're in case (3) if
umin is not part of the tnum, tnum_next is inferior or equal to umax,
and calling tnum_step a second time gives us a value past umax.

This change implements these three cases. With it, the above bytecode
looks as follows:

  0: (85) call bpf_get_prandom_u32#7    ; R0=scalar()
  1: (47) r0 |= 3584                    ; R0=scalar(smin=0x8000000000000e00,umin=umin32=3584,smin32=0x80000e00,var_off=(0xe00; 0xfffffffffffff1ff))
  2: (57) r0 &= 3840                    ; R0=scalar(smin=umin=smin32=umin32=3584,smax=umax=smax32=umax32=3840,var_off=(0xe00; 0x100))
  3: (15) if r0 == 0xe00 goto pc+2      ; R0=3840
  4: (15) if r0 == 0xf00 goto pc+1
  4: R0=3840
  6: (95) exit

In addition to the new selftests, this change was also verified with
Agni [3]. For the record, the raw SMT is available at [4]. The property
it verifies is that: If a concrete value x is contained in all input
abstract values, after __update_reg_bounds, it will continue to be
contained in all output abstract values.

Link: https://github.com/cilium/cilium/issues/44216
Link: https://pchaigno.github.io/test-verifier-complexity.html
Link: https://github.com/bpfverif/agni
Link: https://pastebin.com/raw/naCfaqNx
Fixes: 0df1a55afa83 ("bpf: Warn on internal verifier errors")
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Tested-by: Marco Schirrmeister <mschirrmeister@gmail.com>
Co-developed-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com>
Link: https://lore.kernel.org/r/ef254c4f68be19bd393d450188946821c588565d.1772225741.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Introduce tnum_step to step through tnum's members

This commit introduces tnum_step(), a function that, when given t, and a
number z returns the smallest member of t larger than z. The number z
must be greater or equal to the smallest member of t and less than the
largest member of t.

The first step is to compute j, a number that keeps all of t's known
bits, and matches all unknown bits to z's bits. Since j is a member of
the t, it is already a candidate for result. However, we want our result
to be (minimally) greater than z.

There are only two possible cases:

(1) Case j <= z. In this case, we want to increase the value of j and
make it > z.
(2) Case j > z. In this case, we want to decrease the value of j while
keeping it > z.

(Case 1) j <= z

t = xx11x0x0
z = 10111101 (189)
j = 10111000 (184)
         ^
         k

(Case 1.1) Let's first consider the case where j < z. We will address j
== z later.

Since z > j, there had to be a bit position that was 1 in z and a 0 in
j, beyond which all positions of higher significance are equal in j and
z. Further, this position could not have been unknown in a, because the
unknown positions of a match z. This position had to be a 1 in z and
known 0 in t.

Let k be position of the most significant 1-to-0 flip. In our example, k
= 3 (starting the count at 1 at the least significant bit).  Setting (to
1) the unknown bits of t in positions of significance smaller than
k will not produce a result > z. Hence, we must set/unset the unknown
bits at positions of significance higher than k. Specifically, we look
for the next larger combination of 1s and 0s to place in those
positions, relative to the combination that exists in z. We can achieve
this by concatenating bits at unknown positions of t into an integer,
adding 1, and writing the bits of that result back into the
corresponding bit positions previously extracted from z.

>From our example, considering only positions of significance greater
than k:

t =  xx..x
z =  10..1
    +    1
     -----
     11..0

This is the exact combination 1s and 0s we need at the unknown bits of t
in positions of significance greater than k. Further, our result must
only increase the value minimally above z. Hence, unknown bits in
positions of significance smaller than k should remain 0. We finally
have,

result = 11110000 (240)

(Case 1.2) Now consider the case when j = z, for example

t = 1x1x0xxx
z = 10110100 (180)
j = 10110100 (180)

Matching the unknown bits of the t to the bits of z yielded exactly z.
To produce a number greater than z, we must set/unset the unknown bits
in t, and *all* the unknown bits of t candidates for being set/unset. We
can do this similar to Case 1.1, by adding 1 to the bits extracted from
the masked bit positions of z. Essentially, this case is equivalent to
Case 1.1, with k = 0.

t =  1x1x0xxx
z =  .0.1.100
    +       1
    ---------
     .0.1.101

This is the exact combination of bits needed in the unknown positions of
t. After recalling the known positions of t, we get

result = 10110101 (181)

(Case 2) j > z

t = x00010x1
z = 10000010 (130)
j = 10001011 (139)
^
k

Since j > z, there had to be a bit position which was 0 in z, and a 1 in
j, beyond which all positions of higher significance are equal in j and
z. This position had to be a 0 in z and known 1 in t. Let k be the
position of the most significant 0-to-1 flip. In our example, k = 4.

Because of the 0-to-1 flip at position k, a member of t can become
greater than z if the bits in positions greater than k are themselves >=
to z. To make that member *minimally* greater than z, the bits in
positions greater than k must be exactly = z. Hence, we simply match all
of t's unknown bits in positions more significant than k to z's bits. In
positions less significant than k, we set all t's unknown bits to 0
to retain minimality.

In our example, in positions of greater significance than k (=4),
t=x000. These positions are matched with z (1000) to produce 1000. In
positions of lower significance than k, t=10x1. All unknown bits are set
to 0 to produce 1001. The final result is:

result = 10001001 (137)

This concludes the computation for a result > z that is a member of t.

The procedure for tnum_step() in this commit implements the idea
described above. As a proof of correctness, we verified the algorithm
against a logical specification of tnum_step. The specification asserts
the following about the inputs t, z and output res that:

1. res is a member of t, and
2. res is strictly greater than z, and
3. there does not exist another value res2 such that
3a. res2 is also a member of t, and
3b. res2 is greater than z
3c. res2 is smaller than res

We checked the implementation against this logical specification using
an SMT solver. The verification formula in SMTLIB format is available
at [1]. The verification returned an "unsat": indicating that no input
assignment exists for which the implementation and the specification
produce different outputs.

In addition, we also automatically generated the logical encoding of the
C implementation using Agni [2] and verified it against the same
specification. This verification also returned an "unsat", confirming
that the implementation is equivalent to the specification. The formula
for this check is also available at [3].

Link: https://pastebin.com/raw/2eRWbiit
Link: https://github.com/bpfverif/agni
Link: https://pastebin.com/raw/EztVbBJ2
Co-developed-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Signed-off-by: Srinivas Narayana <srinivas.narayana@rutgers.edu>
Co-developed-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Santosh Nagarakatte <santosh.nagarakatte@rutgers.edu>
Signed-off-by: Harishankar Vishwanathan <harishankar.vishwanathan@gmail.com>
Link: https://lore.kernel.org/r/93fdf71910411c0f19e282ba6d03b4c65f9c5d73.1772225741.git.paul.chaignon@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'bpf-cpumap-devmap-fix-per-cpu-bulk-queue-races-on-preempt_rt'

Jiayuan Chen says:

====================
bpf: Fix per-CPU bulk queue races on PREEMPT_RT

On PREEMPT_RT kernels, local_bh_disable() only calls migrate_disable()
(when PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable
preemption. This means CFS scheduling can preempt a task inside the
per-CPU bulk queue (bq) operations in cpumap and devmap, allowing
another task on the same CPU to concurrently access the same bq,
leading to use-after-free, list corruption, and kernel panics.

Patch 1 fixes the cpumap race in bq_flush_to_queue(), originally
reported by syzbot [1].

Patch 2 fixes the same class of race in devmap's bq_xmit_all(),
identified by code inspection after Sebastian Andrzej Siewior pointed
out that devmap has the same per-CPU bulk queue pattern [2].

Both patches use local_lock_nested_bh() to serialize access to the
per-CPU bq. On non-RT this is a pure lockdep annotation with no
overhead; on PREEMPT_RT it provides a per-CPU sleeping lock.

To reproduce the devmap race, insert an mdelay(100) in bq_xmit_all()
after "cnt = bq->count" and before the actual transmit loop. Then pin
two threads to the same CPU, each running BPF_PROG_TEST_RUN with an XDP
program that redirects to a DEVMAP entry (e.g. a veth pair). CFS
timeslicing during the mdelay window causes interleaving. Without the
fix, KASAN reports null-ptr-deref due to operating on freed frames:

  BUG: KASAN: null-ptr-deref in __build_skb_around+0x22d/0x340
  Write of size 32 at addr 0000000000000d50 by task devmap_race_rep/449

  CPU: 0 UID: 0 PID: 449 Comm: devmap_race_rep Not tainted 6.19.0+ #31 PREEMPT_RT
  Call Trace:
   <TASK>
   __build_skb_around+0x22d/0x340
   build_skb_around+0x25/0x260
   __xdp_build_skb_from_frame+0x103/0x860
   veth_xdp_rcv_bulk_skb.isra.0+0x162/0x320
   veth_xdp_rcv.constprop.0+0x61e/0xbb0
   veth_poll+0x280/0xb50
   __napi_poll.constprop.0+0xa5/0x590
   net_rx_action+0x4b0/0xea0
   handle_softirqs.isra.0+0x1b3/0x780
   __local_bh_enable_ip+0x12a/0x240
   xdp_test_run_batch.constprop.0+0xedd/0x1f60
   bpf_test_run_xdp_live+0x304/0x640
   bpf_prog_test_run_xdp+0xd24/0x1b70
   __sys_bpf+0x61c/0x3e00
   </TASK>

  Kernel panic - not syncing: Fatal exception in interrupt

[1] https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@google.com/T/
[2] https://lore.kernel.org/bpf/20260212023634.366343-1-jiayuan.chen@linux.dev/

v3 -> v4: https://lore.kernel.org/all/20260213034018.284146-1-jiayuan.chen@linux.dev/
- Move panic trace to cover letter. (Sebastian Andrzej Siewior)
- Add Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> to both patches
  from cover letter.

v2 -> v3: https://lore.kernel.org/bpf/20260212023634.366343-1-jiayuan.chen@linux.dev/
- Fix commit message: remove incorrect "spin_lock() becomes rt_mutex"
claim, the per-CPU bq has no spin_lock at all. (Sebastian Andrzej Siewior)
- Fix commit message: accurately describe local_lock_nested_bh()
behavior instead of referencing local_lock(). (Sebastian Andrzej Siewior)
- Remove incomplete discussion of snapshot alternative.
(Sebastian Andrzej Siewior)
- Remove panic trace from commit message. (Sebastian Andrzej Siewior)
- Add patch 2/2 for devmap, same race pattern. (Sebastian Andrzej Siewior)

v1 -> v2: https://lore.kernel.org/bpf/20260211064417.196401-1-jiayuan.chen@linux.dev/
- Use local_lock_nested_bh()/local_unlock_nested_bh() instead of
local_lock()/local_unlock(), since these paths already run under
local_bh_disable(). (Sebastian Andrzej Siewior)
- Replace "Caller must hold bq->bq_lock" comment with
lockdep_assert_held() in bq_flush_to_queue(). (Sebastian Andrzej Siewior)
- Fix Fixes tag to 3253cb49cbad ("softirq: Allow to drop the
softirq-BKL lock on PREEMPT_RT") which is the actual commit that
makes the race possible. (Sebastian Andrzej Siewior)
====================

Link: https://patch.msgid.link/20260225121459.183121-1-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix race in devmap on PREEMPT_RT

On PREEMPT_RT kernels, the per-CPU xdp_dev_bulk_queue (bq) can be
accessed concurrently by multiple preemptible tasks on the same CPU.

The original code assumes bq_enqueue() and __dev_flush() run atomically
with respect to each other on the same CPU, relying on
local_bh_disable() to prevent preemption. However, on PREEMPT_RT,
local_bh_disable() only calls migrate_disable() (when
PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable
preemption, which allows CFS scheduling to preempt a task during
bq_xmit_all(), enabling another task on the same CPU to enter
bq_enqueue() and operate on the same per-CPU bq concurrently.

This leads to several races:

1. Double-free / use-after-free on bq->q[]: bq_xmit_all() snapshots
   cnt = bq->count, then iterates bq->q[0..cnt-1] to transmit frames.
   If preempted after the snapshot, a second task can call bq_enqueue()
   -> bq_xmit_all() on the same bq, transmitting (and freeing) the
   same frames. When the first task resumes, it operates on stale
   pointers in bq->q[], causing use-after-free.

2. bq->count and bq->q[] corruption: concurrent bq_enqueue() modifying
   bq->count and bq->q[] while bq_xmit_all() is reading them.

3. dev_rx/xdp_prog teardown race: __dev_flush() clears bq->dev_rx and
   bq->xdp_prog after bq_xmit_all(). If preempted between
   bq_xmit_all() return and bq->dev_rx = NULL, a preempting
   bq_enqueue() sees dev_rx still set (non-NULL), skips adding bq to
   the flush_list, and enqueues a frame. When __dev_flush() resumes,
   it clears dev_rx and removes bq from the flush_list, orphaning the
   newly enqueued frame.

4. __list_del_clearprev() on flush_node: similar to the cpumap race,
   both tasks can call __list_del_clearprev() on the same flush_node,
   the second dereferences the prev pointer already set to NULL.

The race between task A (__dev_flush -> bq_xmit_all) and task B
(bq_enqueue -> bq_xmit_all) on the same CPU:

  Task A (xdp_do_flush)          Task B (ndo_xdp_xmit redirect)
  ----------------------         --------------------------------
  __dev_flush(flush_list)
    bq_xmit_all(bq)
      cnt = bq->count  /* e.g. 16 */
      /* start iterating bq->q[] */
    <-- CFS preempts Task A -->
                                   bq_enqueue(dev, xdpf)
                                     bq->count == DEV_MAP_BULK_SIZE
                                     bq_xmit_all(bq, 0)
                                       cnt = bq->count  /* same 16! */
                                       ndo_xdp_xmit(bq->q[])
                                       /* frames freed by driver */
                                       bq->count = 0
    <-- Task A resumes -->
      ndo_xdp_xmit(bq->q[])
      /* use-after-free: frames already freed! */

Fix this by adding a local_lock_t to xdp_dev_bulk_queue and acquiring
it in bq_enqueue() and __dev_flush(). These paths already run under
local_bh_disable(), so use local_lock_nested_bh() which on non-RT is
a pure annotation with no overhead, and on PREEMPT_RT provides a
per-CPU sleeping lock that serializes access to the bq.

Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
Reported-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20260225121459.183121-3-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix race in cpumap on PREEMPT_RT

On PREEMPT_RT kernels, the per-CPU xdp_bulk_queue (bq) can be accessed
concurrently by multiple preemptible tasks on the same CPU.

The original code assumes bq_enqueue() and __cpu_map_flush() run
atomically with respect to each other on the same CPU, relying on
local_bh_disable() to prevent preemption. However, on PREEMPT_RT,
local_bh_disable() only calls migrate_disable() (when
PREEMPT_RT_NEEDS_BH_LOCK is not set) and does not disable
preemption, which allows CFS scheduling to preempt a task during
bq_flush_to_queue(), enabling another task on the same CPU to enter
bq_enqueue() and operate on the same per-CPU bq concurrently.

This leads to several races:

1. Double __list_del_clearprev(): after bq->count is reset in
   bq_flush_to_queue(), a preempting task can call bq_enqueue() ->
   bq_flush_to_queue() on the same bq when bq->count reaches
   CPU_MAP_BULK_SIZE. Both tasks then call __list_del_clearprev()
   on the same bq->flush_node, the second call dereferences the
   prev pointer that was already set to NULL by the first.

2. bq->count and bq->q[] races: concurrent bq_enqueue() can corrupt
   the packet queue while bq_flush_to_queue() is processing it.

The race between task A (__cpu_map_flush -> bq_flush_to_queue) and
task B (bq_enqueue -> bq_flush_to_queue) on the same CPU:

  Task A (xdp_do_flush)          Task B (cpu_map_enqueue)
  ----------------------         ------------------------
  bq_flush_to_queue(bq)
    spin_lock(&q->producer_lock)
    /* flush bq->q[] to ptr_ring */
    bq->count = 0
    spin_unlock(&q->producer_lock)
                                   bq_enqueue(rcpu, xdpf)
    <-- CFS preempts Task A -->      bq->q[bq->count++] = xdpf
                                     /* ... more enqueues until full ... */
                                     bq_flush_to_queue(bq)
                                       spin_lock(&q->producer_lock)
                                       /* flush to ptr_ring */
                                       spin_unlock(&q->producer_lock)
                                       __list_del_clearprev(flush_node)
                                         /* sets flush_node.prev = NULL */
    <-- Task A resumes -->
    __list_del_clearprev(flush_node)
      flush_node.prev->next = ...
      /* prev is NULL -> kernel oops */

Fix this by adding a local_lock_t to xdp_bulk_queue and acquiring it
in bq_enqueue() and __cpu_map_flush(). These paths already run under
local_bh_disable(), so use local_lock_nested_bh() which on non-RT is
a pure annotation with no overhead, and on PREEMPT_RT provides a
per-CPU sleeping lock that serializes access to the bq.

To reproduce, insert an mdelay(100) between bq->count = 0 and
__list_del_clearprev() in bq_flush_to_queue(), then run reproducer
provided by syzkaller.

Fixes: 3253cb49cbad ("softirq: Allow to drop the softirq-BKL lock on PREEMPT_RT")
Reported-by: syzbot+2b3391f44313b3983e91@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/69369331.a70a0220.38f243.009d.GAE@google.com/T/
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Jiayuan Chen <jiayuan.chen@shopee.com>
Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev>
Link: https://lore.kernel.org/r/20260225121459.183121-2-jiayuan.chen@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'close-race-in-freeing-special-fields-and-map-value'

Kumar Kartikeya Dwivedi says:

====================
Close race in freeing special fields and map value

There exists a race across various map types where the freeing of
special fields (tw, timer, wq, kptr, etc.) can be done eagerly when a
logical delete operation is done on a map value, such that the program
which continues to have access to such a map value can recreate the
fields and cause them to leak.

The set contains fixes for this case. It is a continuation of Mykyta's
previous attempt in [0], but applies to all fields. A test is included
which reproduces the bug reliably in absence of the fixes.

Local Storage Benchmarks
------------------------
Evaluation Setup: Benchmarked on a dual-socket Intel Xeon Gold 6348 (Ice
Lake) @ 2.60GHz (56 cores / 112 threads), with the CPU governor set to
performance. Bench was pinned to a single NUMA node throughout the test.

Benchmark comes from [1] using the following command:
./bench -p 1 local-storage-create --storage-type <socket,task> --batch-size <16,32,64>

Before the test, 10 runs of all cases ([socket|task] x 3 batch sizes x 7
iterations per batch size) are done to warm up and prime the machine.

Then, 3 runs of all cases are done (with and without the patch, across
reboots).

For each comparison, we have 21 samples, i.e. per batch size (e.g.
socket 16) of a given local storage, we have 3 runs x 7 iterations.

The statistics (mean, median, stddev) and t-test is done for each
scenario (local storage and batch size pair) individually (21 samples
for either case). All values are for local storage creations in thousand
creations / sec (k/s).

       Baseline (without patch)               With patch                       Delta
     Case      Median        Mean   Std. Dev.   Median        Mean   Std. Dev.   Median       %
---------------------------------------------------------------------------------------------------
socket 16     432.026     431.941    1.047     431.347     431.953    1.635      -0.679    -0.16%
socket 32     432.641     432.818    1.535     432.488     432.302    1.508      -0.153    -0.04%
socket 64     431.504     431.996    1.337     429.145     430.326    2.469      -2.359    -0.55%
  task 16      38.816      39.382    1.456      39.657      39.337    1.831      +0.841    +2.17%
  task 32      38.815      39.644    2.690      38.721      39.122    1.636      -0.094    -0.24%
  task 64      37.562      38.080    1.701      39.554      38.563    1.689      +1.992    +5.30%

The cases for socket are within the range of noise, and improvements in task
local storage are due to high variance (CV ~4%-6% across batch sizes). The only
statistically significant case worth mentioning is socket with batch size 64
with p-value from t-test < 0.05, but the absolute difference is small (~2k/s).

TL;DR there doesn't appear to be any significant regression or improvement.

  [0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com
  [1]: https://lore.kernel.org/bpf/20260205222916.1788211-1-ameryhung@gmail.com

Changelog:
----------
v2 -> v3
v2: https://lore.kernel.org/bpf/20260227052031.3988575-1-memxor@gmail.com

* Add syzbot Tested-by.
* Add Amery's Reviewed-by.
* Fix missing rcu_dereference_check() in __bpf_selem_free_rcu. (BPF CI Bot)
* Remove migrate_disable() in bpf_selem_free_rcu. (Alexei)

v1 -> v2
v1: https://lore.kernel.org/bpf/20260225185121.2057388-1-memxor@gmail.com

* Add Paul's Reviewed-by.
* Fix use-after-free in accessing bpf_mem_alloc embedded in map. (syzbot CI)
* Add benchmark numbers for local storage.
* Add extra test case for per-cpu hashmap coverage with up to 16 refcount leaks.
* Target bpf tree.
====================

Link: https://patch.msgid.link/20260227224806.646888-1-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add tests for special fields races

Add a couple of tests to ensure that the refcount drops to zero when we
exercise the race where creation of a special field succeeds the logical
bpf_obj_free_fields done when deleting an element. Prior to previous
changes, the fields would be freed eagerly and repopulate and end up
leaking, causing the reference to not drop down correctly. Running this
test on a kernel without fixes will cause a hang in delete_module, since
the module reference stays active due to the leaked kptr not dropping
it. After the fixes tests succeed as expected.

Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260227224806.646888-6-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Retire rcu_trace_implies_rcu_gp() from local storage

This assumption will always hold going forward, hence just remove the
various checks and assume it is true with a comment for the uninformed
reader.

Reviewed-by: Paul E. McKenney <paulmck@kernel.org>
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260227224806.646888-5-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Delay freeing fields in local storage

Currently, when use_kmalloc_nolock is false, the freeing of fields for a
local storage selem is done eagerly before waiting for the RCU or RCU
tasks trace grace period to elapse. This opens up a window where the
program which has access to the selem can recreate the fields after the
freeing of fields is done eagerly, causing memory leaks when the element
is finally freed and returned to the kernel.

Make a few changes to address this. First, delay the freeing of fields
until after the grace periods have expired using a __bpf_selem_free_rcu
wrapper which is eventually invoked after transitioning through the
necessary number of grace period waits. Replace usage of the kfree_rcu
with call_rcu to be able to take a custom callback. Finally, care needs
to be taken to extend the rcu barriers for all cases, and not just when
use_kmalloc_nolock is true, as RCU and RCU tasks trace callbacks can be
in flight for either case and access the smap field, which is used to
obtain the BTF record to walk over special fields in the map value.

While we're at it, drop migrate_disable() from bpf_selem_free_rcu, since
migration should be disabled for RCU callbacks already.

Fixes: 9bac675e6368 ("bpf: Postpone bpf_obj_free_fields to the rcu callback")
Reviewed-by: Amery Hung <ameryhung@gmail.com>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260227224806.646888-4-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Lose const-ness of map in map_check_btf()

BPF hash map may now use the map_check_btf() callback to decide whether
to set a dtor on its bpf_mem_alloc or not. Unlike C++ where members can
opt out of const-ness using mutable, we must lose the const qualifier on
the callback such that we can avoid the ugly cast. Make the change and
adjust all existing users, and lose the comment in hashtab.c.

Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260227224806.646888-3-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Register dtor for freeing special fields

There is a race window where BPF hash map elements can leak special
fields if the program with access to the map value recreates these
special fields between the check_and_free_fields done on the map value
and its eventual return to the memory allocator.

Several ways were explored prior to this patch, most notably [0] tried
to use a poison value to reject attempts to recreate special fields for
map values that have been logically deleted but still accessible to BPF
programs (either while sitting in the free list or when reused). While
this approach works well for task work, timers, wq, etc., it is harder
to apply the idea to kptrs, which have a similar race and failure mode.

Instead, we change bpf_mem_alloc to allow registering destructor for
allocated elements, such that when they are returned to the allocator,
any special fields created while they were accessible to programs in the
mean time will be freed. If these values get reused, we do not free the
fields again before handing the element back. The special fields thus
may remain initialized while the map value sits in a free list.

When bpf_mem_alloc is retired in the future, a similar concept can be
introduced to kmalloc_nolock-backed kmem_cache, paired with the existing
idea of a constructor.

Note that the destructor registration happens in map_check_btf, after
the BTF record is populated and (at that point) avaiable for inspection
and duplication. Duplication is necessary since the freeing of embedded
bpf_mem_alloc can be decoupled from actual map lifetime due to logic
introduced to reduce the cost of rcu_barrier()s in mem alloc free path in
9f2c6e96c65e ("bpf: Optimize rcu_barrier usage between hash map and bpf_mem_alloc.").

As such, once all callbacks are done, we must also free the duplicated
record. To remove dependency on the bpf_map itself, also stash the key
size of the map to obtain value from htab_elem long after the map is
gone.

[0]: https://lore.kernel.org/bpf/20260216131341.1285427-1-mykyta.yatsenko5@gmail.com

Fixes: 14a324f6a67e ("bpf: Wire up freeing of referenced kptr")
Fixes: 1bfbc267ec91 ("bpf: Enable bpf_timer and bpf_wq in any context")
Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: syzbot@syzkaller.appspotmail.com
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20260227224806.646888-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix OOB read in dmabuf_collector

Dmabuf name allocations can be less than DMA_BUF_NAME_LEN characters,
but bpf_probe_read_kernel always tries to read exactly that many bytes.
If a name is less than DMA_BUF_NAME_LEN characters,
bpf_probe_read_kernel will read past the end. bpf_probe_read_kernel_str
stops at the first NUL terminator so use it instead, like
iter_dmabuf_for_each already does.

Fixes: ae5d2c59ecd7 ("selftests/bpf: Add test for dmabuf_iter")
Reported-by: Jerome Lee <jaewookl@quicinc.com>
Signed-off-by: T.J. Mercier <tjmercier@google.com>
Link: https://lore.kernel.org/r/20260225003349.113746-1-tjmercier@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix a memory leak in xdp_flowtable test

test_progs run with ASAN reported [1]:

  ==126==ERROR: LeakSanitizer: detected memory leaks

  Direct leak of 32 byte(s) in 1 object(s) allocated from:
      #0 0x7f1ff3cfa340 in calloc ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:77
      #1 0x5610c15bb520 in bpf_program_attach_fd /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13164
      #2 0x5610c15bb740 in bpf_program__attach_xdp /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/lib/bpf/libbpf.c:13204
      #3 0x5610c14f91d3 in test_xdp_flowtable /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/prog_tests/xdp_flowtable.c:138
      #4 0x5610c1533566 in run_one_test /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:1406
      #5 0x5610c1537fb0 in main /codebuild/output/src685977285/src/actions-runner/_work/vmtest/vmtest/src/tools/testing/selftests/bpf/test_progs.c:2097
      #6 0x7f1ff25df1c9  (/lib/x86_64-linux-gnu/libc.so.6+0x2a1c9) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb)
      #7 0x7f1ff25df28a in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x2a28a) (BuildId: 8e9fd827446c24067541ac5390e6f527fb5947bb)
      #8 0x5610c0bd3180 in _start (/tmp/work/vmtest/vmtest/selftests/bpf/test_progs+0x593180) (BuildId: cdf9f103f42307dc0a2cd6cfc8afcbc1366cf8bd)

Fix by properly destroying bpf_link on exit in xdp_flowtable test.

[1] https://github.com/kernel-patches/vmtest/actions/runs/22361085418/job/64716490680

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com>
Link: https://lore.kernel.org/r/20260225003351.465104-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix stack-out-of-bounds write in devmap

get_upper_ifindexes() iterates over all upper devices and writes their
indices into an array without checking bounds.

Also the callers assume that the max number of upper devices is
MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack,
but that assumption is not correct and the number of upper devices could
be larger than MAX_NEST_DEV (e.g., many macvlans), causing a
stack-out-of-bounds write.

Add a max parameter to get_upper_ifindexes() to avoid the issue.
When there are too many upper devices, return -EOVERFLOW and abort the
redirect.

To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with
an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS.
Then send a packet to the device to trigger the XDP redirect path.

Reported-by: syzbot+10cc7f13760b31bd2e61@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/698c4ce3.050a0220.340abe.000b.GAE@google.com/T/
Fixes: aeea1b86f936 ("bpf, devmap: Exclude XDP broadcast to master device")
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Signed-off-by: Kohei Enju <kohei@enjuk.jp>
Link: https://lore.kernel.org/r/20260225053506.4738-1-kohei@enjuk.jp
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Fix kprobe_multi cookies access in show_fdinfo callback

We don't check if cookies are available on the kprobe_multi link
before accessing them in show_fdinfo callback, we should.

Cc: stable@vger.kernel.org
Fixes: da7e9c0a7fbc ("bpf: Add show_fdinfo for kprobe_multi")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20260225111249.186230-1-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf, arm64: Force 8-byte alignment for JIT buffer to prevent atomic tearing

struct bpf_plt contains a u64 target field. Currently, the BPF JIT
allocator requests an alignment of 4 bytes (sizeof(u32)) for the JIT
buffer.

Because the base address of the JIT buffer can be 4-byte aligned (e.g.,
ending in 0x4 or 0xc), the relative padding logic in build_plt() fails
to ensure that target lands on an 8-byte boundary.

This leads to two issues:
1. UBSAN reports misaligned-access warnings when dereferencing the
   structure.
2. More critically, target is updated concurrently via WRITE_ONCE() in
   bpf_arch_text_poke() while the JIT'd code executes ldr. On arm64,
   64-bit loads/stores are only guaranteed to be single-copy atomic if
   they are 64-bit aligned. A misaligned target risks a torn read,
   causing the JIT to jump to a corrupted address.

Fix this by increasing the allocation alignment requirement to 8 bytes
(sizeof(u64)) in bpf_jit_binary_pack_alloc(). This anchors the base of
the JIT buffer to an 8-byte boundary, allowing the relative padding math
in build_plt() to correctly align the target field.

Fixes: b2ad54e1533e ("bpf, arm64: Implement bpf_arch_text_poke() for arm64")
Signed-off-by: Fuad Tabba <tabba@google.com>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20260226075525.233321-1-tabba@google.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'selftests-bpf-fixes-for-userspace-asan'

Ihor Solodrai says:

====================
selftests/bpf: Fixes for userspace ASAN

This series includes various fixes aiming to enable test_progs run
with userspace address sanitizer on BPF CI.

The first five patches add a simplified implementation of strscpy() to
selftests/bpf and then replace strcpy/strncpy usages across the tests
with it. See relevant discussions [1][2].

Patch #6 fixes the selftests/bpf/test_progs build with:

    SAN_CFLAGS="-fsanitize=address -fno-omit-frame-pointer"

The subsequent patches fix bugs reported by the address sanitizer on
attempt to run the tests.

[1] https://lore.kernel.org/bpf/CAADnVQ+9uw2_o388j43EWiAPdMB=3FLx2jq-9zRSvqrv-wgRag@mail.gmail.com/
[2] https://lore.kernel.org/bpf/20260220182011.802116-1-ihor.solodrai@linux.dev/
---

v3->v4:
  - combine strscpy and ASAN series into one (Alexei)
  - make the count arg of strscpy() optional via macro and fixup
    relevant call sites (Alexei)
  - remove strscpy_cat() from this series (Alexei)

v3: https://lore.kernel.org/bpf/20260220222604.1155148-1-ihor.solodrai@linux.dev/

v2->v3:
  - rebase on top of "selftests/bpf: Add and use strscpy()"
    - https://lore.kernel.org/bpf/20260220182011.802116-1-ihor.solodrai@linux.dev/
  - uprobe_multi_test.c: memset static struct child at the beginning
    of a test *and* zero out child->thread in release_child (patch #9,
    Mykyta)
  - nits in test_sysctl.c (patch #11, Eduard)
  - bpftool_helpers.c: update to use strscpy (patch #14, Alexei)
  - add __asan_on_error handler to still dump test logs even with ASAN
    build (patch #15, Mykyta)

v2: https://lore.kernel.org/bpf/20260218003041.1156774-1-ihor.solodrai@linux.dev/

v1->v2:
  - rebase on bpf (v1 was targeting bpf-next)
  - add ASAN flag handling in selftests/bpf/Makefile (Eduard)
  - don't override SIGSEGV handler in test_progs with ASAN (Eduard)
  - add error messages in detect_bpftool_path (Mykyta)
  - various nits (Eduard, Jiri, Mykyta, Alexis)

v1: https://lore.kernel.org/bpf/20260212011356.3266753-1-ihor.solodrai@linux.dev/
====================

Link: https://patch.msgid.link/20260223190736.649171-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Don't override SIGSEGV handler with ASAN

test_progs has custom SIGSEGV handler, which interferes with the
address sanitizer [1]. Add an #ifndef to avoid this.

Additionally, declare an __asan_on_error() to dump the test logs in
the same way it happens in the custom SIGSEGV handler.

[1] https://lore.kernel.org/bpf/73d832948b01dbc0ebc60d85574bdf8537f3a810.camel@gmail.com/

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-3-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Check BPFTOOL env var in detect_bpftool_path()

The bpftool_maps_access and bpftool_metadata tests may fail on BPF CI
with "command not found", depending on a workflow.
This happens because detect_bpftool_path() only checks two hardcoded
relative paths:
- ./tools/sbin/bpftool
- ../tools/sbin/bpftool

Add support for a BPFTOOL environment variable that allows specifying
the exact path to the bpftool binary.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix out-of-bounds array access bugs reported by ASAN

- kmem_cache_iter: remove unnecessary debug output
- lwt_seg6local: change the type of foobar to char[]
  - the sizeof(foobar) returned the pointer size and not a string
    length as intended
- verifier_log: increase prog_name buffer size in verif_log_subtest()
  - compiler has a conservative estimate of fixed_log_sz value, making
    ASAN complain on snprint() call

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223191118.655185-1-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix array bounds warning in jit_disasm_helpers

Compiler cannot infer upper bound for labels.cnt and warns about
potential buffer overflow in snprintf. Add an explicit bounds
check (... && i < MAX_LOCAL_LABELS) in the loop condition to fix the
warning.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-18-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Free bpf_object in test_sysctl

ASAN reported a resource leak due to the bpf_object not being tracked
in test_sysctl. Add obj field to struct sysctl_test to properly clean
it up.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-17-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix resource leaks caused by missing cleanups

ASAN reported a number of resource leaks:
  - Add missing  *__destroy(skel) calls
  - Replace bpf_link__detach() with bpf_link__destroy() where appropriate
  - cgrp_local_storage: Add bpf_link__destroy() when bpf_iter_create fails
  - dynptr: Add missing bpf_object__close()

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-16-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix double thread join in uprobe_multi_test

ASAN reported a "joining already joined thread" error. The
release_child() may be called multiple times for the same struct
child.

Fix by resetting child->thread to 0 after pthread_join.

Also memset(0) static child variable in test_attach_api().

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-15-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix use-after-free in xdp_metadata test

ASAN reported a use-after-free in close_xsk().

The xsk->socket internally references xsk->umem via socket->ctx->umem,
so the socket must be deleted before the umem. Fix the order of
operations in close_xsk().

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-14-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

veristat: Fix a memory leak for preset ENUMERATOR

ASAN detected a memory leak in veristat. The cleanup code handling
ENUMERATOR value missed freeing strdup-ed svalue. Fix it.

Acked-by: Mykyta Yatsenko <yatsenko@meta.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-13-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix cleanup in check_fd_array_cnt__fd_array_too_big()

The Close() macro uses the passed in expression three times, which
leads to repeated execution in case it has side effects. That is,
Close(i--) would decrement i three times.

ASAN caught a stack-buffer-undeflow error at a point where this was
overlooked. Fix it.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-12-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Fix memory leaks in tests

Fix trivial memory leaks detected by userspace ASAN:
  - htab_update: free value buffer in test_reenter_update cleanup
  - test_xsk: inline pkt_stream_replace() in testapp_stats_rx_full()
    and testapp_stats_fill_empty()
  - testing_helpers: free buffer allocated by getline() in
    parse_test_list_file

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-11-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Refactor bpf_get_ksyms() trace helper

ASAN reported a memory leak in bpf_get_ksyms(): it allocates a struct
ksyms internally and never frees it.

Move struct ksyms to trace_helpers.h and return it from the
bpf_get_ksyms(), giving ownership to the caller. Add filtered_syms and
filtered_cnt fields to the ksyms to hold the filtered array of
symbols, previously returned by bpf_get_ksyms().

Fixup the call sites: kprobe_multi_test and bench_trigger.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20260223190736.649171-10-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add DENYLIST.asan

Add a denylist file for tests that should be skipped when built with
userspace ASAN:

    $ make ... SAN_CFLAGS="-fsanitize=address -fno-omit-frame-pointer"

Skip the following tests:

- *arena*: userspace ASAN does not understand BPF arena maps and gets
  confused particularly when map_extra is non-zero
  - non-zero map_extra leads to mmap with MAP_FIXED, and ASAN treats
    this as an unknown memory region
- task_local_data: ASAN complains about "incorrect" aligned_alloc()
  usage, but it's intentional in the test
- uprobe_multi_test: very slow with ASAN enabled

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-9-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

resolve_btfids: Fix memory leaks reported by ASAN

Running resolve_btfids with ASAN reveals memory leaks in btf_id
handling.

- Change get_id() to use a local buffer
- Make btf_id__add() strdup the name internally
- Add btf_id__free_all() that frees all nodese of a tree
- Call the cleanup function on exit for every tree

Acked-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-8-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Pass through build flags to bpftool and resolve_btfids

EXTRA_* and SAN_* build flags were not correctly propagated to bpftool
and resolve_btids when building selftests/bpf. This led to various
build errors on attempt to build with SAN_CFLAGS="-fsanitize=address",
for example.

Fix the makefiles to address this:
  - Pass SAN_CFLAGS/SAN_LDFLAGS to bpftool and resolve_btfids build
  - Propagate EXTRA_LDFLAGS to resolve_btfids link command
  - Use pkg-config to detect zlib and zstd for resolve_btfids, similar
    libelf handling

Also check for ASAN flag in selftests/bpf/Makefile for convenience.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-7-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Use memcpy() for bounded non-NULL-terminated copies

Replace strncpy() with memcpy() in cases where the source is
non-NULL-terminated and the copy length is known.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-6-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Use strscpy in bpftool_helpers.c

Replace strncpy() calls in bpftool_helpers.c with strscpy().

Pass the destination buffer size to detect_bpftool_path() instead of
hardcoding BPFTOOL_PATH_MAX_LEN.

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-5-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Replace strncpy() with strscpy()

strncpy() does not guarantee NULL-termination and is considered
deprecated [1]. Replace strncpy() calls with strscpy().

[1] https://docs.kernel.org/process/deprecated.html#strncpy-on-nul-terminated-strings

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-4-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Replace strcpy() calls with strscpy()

strcpy() does not perform bounds checking and is considered deprecated
[1]. Replace strcpy() calls with strscpy() defined in bpf_util.h.

[1] https://docs.kernel.org/process/deprecated.html#strcpy

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-3-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: Add simple strscpy() implementation

Replace bpf_strlcpy() in bpf_util.h with a sized_strscpy(), which is a
simplified sized_strscpy() from the kernel (lib/string.c [1]). It:
  * takes a count (destination size) parameter
  * guarantees NULL-termination
  * returns the number of characters copied or -E2BIG

Re-define strscpy macro similar to in-kernel implementation [2]: allow
the count parameter to be optional.

Add #ifdef-s to tools/include/linux/args.h, as they may be defined in
other system headers (for example, __CONCAT in sys/cdefs.h).

Fixup the single existing bpf_strlcpy() call in cgroup_helpers.c

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/string.c?h=v6.19#n113
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/string.h?h=v6.19#n91

Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20260223190736.649171-2-ihor.solodrai@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Remove WARN_ALL_UNSEEDED_RANDOM kernel config option

This config option goes way back - it used to be an internal debug
option to random.c (at that point called DEBUG_RANDOM_BOOT), then was
renamed and exposed as a config option as CONFIG_WARN_UNSEEDED_RANDOM,
and then further renamed to the current CONFIG_WARN_ALL_UNSEEDED_RANDOM.

It was all done with the best of intentions: the more limited
rate-limited reports were reporting some cases, but if you wanted to see
all the gory details, you'd enable this "ALL" option.

However, it turns out - perhaps not surprisingly - that when people
don't care about and fix the first rate-limited cases, they most
certainly don't care about any others either, and so warning about all
of them isn't actually helping anything.

And the non-ratelimited reporting causes problems, where well-meaning
people enable debug options, but the excessive flood of messages that
nobody cares about will hide actual real information when things go
wrong.

I just got a kernel bug report (which had nothing to do with randomness)
where two thirds of the the truncated dmesg was just variations of

random: get_random_u32 called from __get_random_u32_below+0x10/0x70 with crng_init=0

and in the process early boot messages had been lost (in addition to
making the messages that _hadn't_ been lost harder to read).

The proper way to find these things for the hypothetical developer that
cares - if such a person exists - is almost certainly with boot time
tracing. That gives you the option to get call graphs etc too, which is
likely a requirement for fixing any problems anyway.

See Documentation/trace/boottime-trace.rst for that option.

And if we for some reason do want to re-introduce actual printing of
these things, it will need to have some uniqueness filtering rather than
this "just print it all" model.

Fixes: cc1e127bfa95 ("random: remove ratelimiting for in-kernel unseeded randomness")
Acked-by: Jason Donenfeld <Jason@zx2c4.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

default_gfp(): avoid using the "newfangled" __VA_OPT__ trick

The default_gfp() helper that I added is not wrong, but it turns out
that it causes unnecessary headaches for 'sparse' which doesn't support
the use of __VA_OPT__ (introduced in C++20 and C23, and supported by gcc
and clang for a long time).

We do already use __VA_OPT__ in some other cases in the kernel (drm/xe
and btrfs), but it has been fairly limited. Now it triggers for pretty
much everything, and sparse ends up not working at all.

We can use the traditional gcc ',##__VA_ARGS__' syntax instead: it may
not be the "C standard" way and is slightly less natural in this
context, but it is the traditional model for this and avoids the sparse
problem.

Reported-and-tested-by: Ricardo Ribalda <ribalda@chromium.org>
Reported-and-tested-by: Richard Fitzgerald <rf@opensource.cirrus.com>
Reported-by: Ben Dooks <ben.dooks@codethink.co.uk>
Fixes: e19e1b480ac7 ("add default_gfp() helper macro and use it in the new *alloc_obj() helpers")
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Linux 7.0-rc1

Merge tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux

Pull fsverity fixes from Eric Biggers:

- Fix a build error on parisc

- Remove the non-large-folio-aware function fsverity_verify_page()

* tag 'fsverity-for-linus' of git://git.kernel.org/pub/scm/fs/fsverity/linux:
  fsverity: fix build error by adding fsverity_readahead() stub
  fsverity: remove fsverity_verify_page()
  f2fs: make f2fs_verify_cluster() partially large-folio-aware
  f2fs: remove unnecessary ClearPageUptodate in f2fs_verify_cluster()

Merge tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux

Pull crypto library fix from Eric Biggers:
"Fix a big endian specific issue in the PPC64-optimized AES code"

* tag 'libcrypto-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
lib/crypto: powerpc/aes: Fix rndkey_from_vsx() on big endian CPUs

CREDITS: Add -next to Stephen Rothwell's entry

Stephen retired and stepped back from -next maintainership, update his
entry in CREDITS to recognise his 18 years of hard work making it what
it is today and all the impact it's had on our development process.

Also update to his current GnuPG key while we're here.

Acked-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: SeongJae Park <sj@kernel.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Mark Brown <broonie@kernel.org>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

x509: select CONFIG_CRYPTO_LIB_SHA256

The x509 public key code gained a dependency on the sha256 hash
implementation, causing a rare link time failure in randconfig
builds:

  arm-linux-gnueabi-ld: crypto/asymmetric_keys/x509_public_key.o: in function `x509_get_sig_params':
  x509_public_key.c:(.text.x509_get_sig_params+0x12): undefined reference to `sha256'
  arm-linux-gnueabi-ld: (sha256): Unknown destination type (ARM/Thumb) in crypto/asymmetric_keys/x509_public_key.o
  x509_public_key.c:(.text.x509_get_sig_params+0x12): dangerous relocation: unsupported relocation

Select the necessary library code from Kconfig.

Fixes: 2c62068ac86b ("x509: Separately calculate sha256 for blacklist")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

xz: fix arm fdt compile error for kmalloc replacement

Align to the commit bf4afc53b77a ("Convert 'alloc_obj' family to use the
new default GFP_KERNEL argument") update the 'kmalloc_obj' declaration
for userspace to fix below compile error:

  In file included from arch/arm/boot/compressed/../../../../lib/decompress_unxz.c:241,
                   from arch/arm/boot/compressed/decompress.c:56:
  arch/arm/boot/compressed/../../../../lib/xz/xz_dec_stream.c: In function 'xz_dec_init':
  arch/arm/boot/compressed/../../../../lib/xz/xz_dec_stream.c:787:28: error: implicit declaration of function 'kmalloc_obj'; did you mean 'kmalloc'? [-Wimplicit-function-declaration]
     787 |         struct xz_dec *s = kmalloc_obj(*s);
         |                            ^~~~~~~~~~~
         |                            kmalloc

Signed-off-by: Haiyue Wang <haiyuewa@163.com>
Fixes: 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types")
Fixes: bf4afc53b77a ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument")
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Lasse Collin <lasse.collin@tukaani.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'rtc-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux

Pull RTC updates from Alexandre Belloni:

- loongson: Loongson-2K0300 support

- s35390a: nvmem support

- zynqmp: rework calibration

* tag 'rtc-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux:
  rtc: ds1390: fix number of bytes read from RTC
  rtc: class: Remove duplicate check for alarm
  rtc: optee: simplify OP-TEE context match
  rtc: interface: Alarm race handling should not discard preceding error
  rtc: s35390a: implement nvmem support
  rtc: loongson: Add Loongson-2K0300 support
  dt-bindings: rtc: loongson: Document Loongson-2K0300 compatible
  dt-bindings: rtc: loongson: Correct Loongson-1C interrupts property
  dt-bindings: rtc: renesas,rz-rtca3: Add RZ/V2N support
  dt-bindings: rtc: cpcap: convert to schema
  rtc: zynqmp: use dynamic max and min offset ranges
  rtc: zynqmp: rework set_offset
  rtc: zynqmp: rework read_offset
  rtc: zynqmp: check calibration max value
  rtc: zynqmp: correct frequency value
  rtc: amlogic-a4: Remove IRQF_ONESHOT
  rtc: pcf8563: use correct of_node for output clock
  rtc: max31335: use correct CONFIG symbol in IS_REACHABLE()
  rtc: nvvrs: Add ARCH_TEGRA to the NV VRS RTC driver

Merge tag 'rust-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull rust fixes from Miguel Ojeda:
"Toolchain and infrastructure:

   - Pass '-Zunstable-options' flag required by the future Rust 1.95.0

   - Fix 'objtool' warning for Rust 1.84.0

  'kernel' crate:

   - 'irq' module: add missing bound detected by the future Rust 1.95.0

   - 'list' module: add missing 'unsafe' blocks and placeholder safety
     comments to macros (an issue for future callers within the crate)

  'pin-init' crate:

   - Clean Clippy warning that changed behavior in the future Rust
     1.95.0"

* tag 'rust-fixes-7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: list: Add unsafe blocks for container_of and safety comments
  rust: pin-init: replace clippy `expect` with `allow`
  rust: irq: add `'static` bounds to irq callbacks
  objtool/rust: add one more `noreturn` Rust function
  rust: kbuild: pass `-Zunstable-options` for Rust 1.95.0

Merge tag 'trace-rv-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull runtime verifier fix from Steven Rostedt:

- Fix multiple definition of __pcpu_unique_da_mon_this

   After refactoring monitors, we used static per-cpu variables with the
   same names across different per-cpu monitors. This is explicitly
   disallowed for modules on some architectures (alpha) or if
   CONFIG_DEBUG_FORCE_WEAK_PER_CPU is enabled (e.g. Fedora's debug
   kernel). Make sure all those variables have different names to avoid
   compilation issues.

* tag 'trace-rv-7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  rv: Fix multiple definition of __pcpu_unique_da_mon_this

Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL uses

Conversion performed via this Coccinelle script:

  // SPDX-License-Identifier: GPL-2.0-only
  // Options: --include-headers-for-types --all-includes --include-headers --keep-comments
  virtual patch

  @gfp depends on patch && !(file in "tools") && !(file in "samples")@
  identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
    kzalloc_obj,kzalloc_objs,kzalloc_flex,
    kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
    kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
  @@

   ALLOC(...
  - , GFP_KERNEL
   )

  $ make coccicheck MODE=patch COCCI=gfp.cocci

Build and boot tested x86_64 with Fedora 42's GCC and Clang:

Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01

Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Convert more 'alloc_obj' cases to default GFP_KERNEL arguments

This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.

Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.

So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.

The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Convert 'alloc_flex' family to use the new default GFP_KERNEL argument

This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.

As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Convert 'alloc_obj' family to use the new default GFP_KERNEL argument

This was done entirely with mindless brute force, using

git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/$alloc_objs*(.*$, GFP_KERNEL)/\1)/'

to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.

Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.

For the same reason the 'flex' versions will be done as a separate
conversion.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

add default_gfp() helper macro and use it in the new *alloc_obj() helpers

Most simple allocations use GFP_KERNEL, and with the new allocation
helpers being introduced, let's just take advantage of that to simplify
that default case.

It's a numbers game:

git grep 'alloc_obj(' |
sed 's/.*$GFP_[_A-Z]*$.*/\1/' |
sort | uniq -c | sort -n | tail

shows that about 90% of all those new allocator instances just use that
standard GFP_KERNEL.

Those helpers are already macros, and we can easily just make it be the
default case when the gfp argument is missing.

And yes, we could do that for all the legacy interfaces too, but let's
keep it to just the new ones at least for now, since those all got
converted recently anyway, so this is not any "extra" noise outside of
that limited conversion.

And, in fact, I want to do this before doing the -rc1 release, exactly
so that we don't get extra merge conflicts.

Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

slab.h: disable completely broken overflow handling in flex allocations

Commit 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for
non-scalar types") started using the new allocation helpers, and in the
process showed that they were completely non-working.

The overflow logic in overflows_flex_counter_type() is completely the
wrong way around, and that broke __alloc_flex() completely.  By chance,
the resulting code was then such a mess that clang generated
sufficiently garbage code that objtool warned about it all.  Which made
it somewhat quicker to narrow things down.

While fixing overflows_flex_counter_type() would presumably fix this
all, I'm excising the whole broken overflow logic from __alloc_flex(),
because we don't want that kind of code in basic allocation functions
anyway.

That (no longer) broken overflows_flex_counter_type() thing needs to be
inserted into the actual __set_flex_counter() logic in the unlikely case
that we ever want this at all.  And made conditional.

Fixes: 81cee9166a90 ("compiler_types: Introduce __flex_counter() and family")
Fixes: 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types")
Cc: Kees Cook <kees@kernel.org>
Link: https://lore.kernel.org/all/CAHk-=whEd020BYzGTzYrENjD9Z5_82xx6h8HsQvH5xDSnv0=Hw@mail.gmail.com/
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kmalloc_obj conversion from Kees Cook:
"This does the tree-wide conversion to kmalloc_obj() and friends using
  coccinelle, with a subsequent small manual cleanup of whitespace
  alignment that coccinelle does not handle.

  This uncovered a clang bug in __builtin_counted_by_ref(), so the
  conversion is preceded by disabling that for current versions of
  clang.  The imminent clang 22.1 release has the fix.

  I've done allmodconfig build tests for x86_64, arm64, i386, and arm. I
  did defconfig builds for alpha, m68k, mips, parisc, powerpc, riscv,
  s390, sparc, sh, arc, csky, xtensa, hexagon, and openrisc"

* tag 'kmalloc_obj-treewide-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  kmalloc_obj: Clean up after treewide replacements
  treewide: Replace kmalloc with kmalloc_obj for non-scalar types
  compiler_types: Disable __builtin_counted_by_ref for Clang

Merge tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools

Pull perf tools updates from Arnaldo Carvalho de Melo:

- Introduce 'perf sched stats' tool with record/report/diff workflows
   using schedstat counters

- Add a faster libdw based addr2line implementation and allow selecting
   it or its alternatives via 'perf config addr2line.style='

- Data-type profiling fixes and improvements including the ability to
   select fields using 'perf report''s -F/-fields, e.g.:

     'perf report --fields overhead,type'

- Add 'perf test' regression tests for Data-type profiling with C and
   Rust workloads

- Fix srcline printing with inlines in callchains, make sure this has
   coverage in 'perf test'

- Fix printing of leaf IP in LBR callchains

- Fix display of metrics without sufficient permission in 'perf stat'

- Print all machines in 'perf kvm report -vvv', not just the host

- Switch from SHA-1 to BLAKE2s for build ID generation, remove SHA-1
   code

- Fix 'perf report's histogram entry collapsing with '-F' option

- Use system's cacheline size instead of a hardcoded value in 'perf
   report'

- Allow filtering conversion by time range in 'perf data'

- Cover conversion to CTF using 'perf data' in 'perf test'

- Address newer glibc const-correctness (-Werror=discarded-qualifiers)
   issues

- Fixes and improvements for ARM's CoreSight support, simplify ARM SPE
   event config in 'perf mem', update docs for 'perf c2c' including the
   ARM events it can be used with

- Build support for generating metrics from arch specific python
   script, add extra AMD, Intel, ARM64 metrics using it

- Add AMD Zen 6 events and metrics

- Add JSON file with OpenHW Risc-V CVA6 hardware counters

- Add 'perf kvm' stats live testing

- Add more 'perf stat' tests to 'perf test'

- Fix segfault in `perf lock contention -b/--use-bpf`

- Fix various 'perf test' cases for s390

- Build system cleanups, bump minimum shellcheck version to 0.7.2

- Support building the capstone based annotation routines as a plugin

- Allow passing extra Clang flags via EXTRA_BPF_FLAGS

* tag 'perf-tools-for-v7.0-1-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: (255 commits)
  perf test script: Add python script testing support
  perf test script: Add perl script testing support
  perf script: Allow the generated script to be a path
  perf test: perf data --to-ctf testing
  perf test: Test pipe mode with data conversion --to-json
  perf json: Pipe mode --to-ctf support
  perf json: Pipe mode --to-json support
  perf check: Add libbabeltrace to the listed features
  perf build: Allow passing extra Clang flags via EXTRA_BPF_FLAGS
  perf test data_type_profiling.sh: Skip just the Rust tests if code_with_type workload is missing
  tools build: Fix feature test for rust compiler
  perf libunwind: Fix calls to thread__e_machine()
  perf stat: Add no-affinity flag
  perf evlist: Reduce affinity use and move into iterator, fix no affinity
  perf evlist: Missing TPEBS close in evlist__close()
  perf evlist: Special map propagation for tool events that read on 1 CPU
  perf stat-shadow: In prepare_metric fix guard on reading NULL perf_stat_evsel
  Revert "perf tool_pmu: More accurately set the cpus for tool events"
  tools build: Emit dependencies file for test-rust.bin
  tools build: Make test-rust.bin be removed by the 'clean' target
  ...

Merge tag 'cocci-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux

Pull coccinelle updates from Julia Lawall:
"This simplifies and clarifies the handling of output generated by
  Coccinelle that is sent to standard error.

  By default, this goes to /dev/null. Remind the user of that and
  encourage them to provide another file name (Benjamin Philip)"

* tag 'cocci-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jlawall/linux:
  Documentation: Coccinelle: document debug log handling
  scripts: coccicheck: warn on unset debug file
  scripts: coccicheck: simplify debug file handling

Merge tag 'ntb-7.0' of https://github.com/jonmason/ntb

Pull NTB (PCIe non-transparent bridge) updates from Jon Mason:
"NTB updates include debugfs improvements, correctness fixes, cleanups,
  and new hardware support:

  ntb_transport QP stats are converted to seq_file, a tx_memcpy_offload
  module parameter is introduced with associated ordering fixes, and a
  debugfs queue name truncation bug is corrected.

  Additional fixes address format specifier mismatches in ntb_tool and
  boundary conditions in the Switchtec driver, while unused MSI helpers
  are removed and the codebase migrates to dma_map_phys().

  Intel Gen6 (Diamond Rapids) NTB support is also added"

* tag 'ntb-7.0' of https://github.com/jonmason/ntb:
  NTB: ntb_transport: Use seq_file for QP stats debugfs
  NTB: ntb_transport: Fix too small buffer for debugfs_name
  ntb/ntb_tool: correct sscanf format for u64 and size_t in tool_peer_mw_trans_write
  ntb: intel: Add Intel Gen6 NTB support for DiamondRapids
  NTB/msi: Remove unused functions
  ntb: ntb_hw_switchtec: Increase MAX_MWS limit to 256
  ntb: ntb_hw_switchtec: Fix array-index-out-of-bounds access
  ntb: ntb_hw_switchtec: Fix shift-out-of-bounds for 0 mw lut
  NTB: epf: allow built-in build
  ntb: migrate to dma_map_phys instead of map_page
  NTB: ntb_transport: Add 'tx_memcpy_offload' module option
  NTB: ntb_transport: Remove unused 'retries' field from ntb_queue_entry

Merge tag 'io_uring-20260221' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

- A fix for a missing URING_CMD128 opcode check, fixing an issue with
   the SQE mixed mode support introduced in 6.19. Merged late due to
   having multiple dependencies

- Add sqe->cmd size checking for big SQEs, similar to what we have for
   normal sized SQEs

- Fix a race condition in zcrx, that leads to a double free

* tag 'io_uring-20260221' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring: Add size check for sqe->cmd
  io_uring: add IORING_OP_URING_CMD128 to opcode checks
  io_uring/zcrx: fix user_ref race between scrub and refill paths

Merge tag 'fixes-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock

Pull memblock fix from Mike Rapoport:
"Fix detection of NUMA node for CXL windows

  phys_to_target_node() may assign a CXL Fixed Memory Window to the
  wrong NUMA node when a CXL node resides in the gap of discontinuous
  System RAM node.

  Fix this by checking both numa_meminfo and numa_reserved_meminfo,
  preferring the reserved NID when the address appears in both"

* tag 'fixes-2026-02-21' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock:
  mm: numa_memblks: Identify the accurate NUMA ID of CFMW

Merge tag 'sched_ext-for-7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext

Pull sched_ext fixes from Tejun Heo:

- Various bug fixes for the example schedulers and selftests

* tag 'sched_ext-for-7.0-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
  tools/sched_ext: fix getopt not re-parsed on restart
  tools/sched_ext: scx_userland: fix data races on shared counters
  tools/sched_ext: scx_pair: fix stride == 0 crash on single-CPU systems
  tools/sched_ext: scx_central: fix CPU_SET and skeleton leak on early exit
  tools/sched_ext: scx_userland: fix stale data on restart
  tools/sched_ext: scx_flatcg: fix potential stack overflow from VLA in fcg_read_stats
  selftests/sched_ext: Fix rt_stall flaky failure
  tools/sched_ext: scx_userland: fix restart and stats thread lifecycle bugs
  tools/sched_ext: scx_central: fix sched_setaffinity() call with the set size
  tools/sched_ext: scx_flatcg: zero-initialize stats counter array

Merge tag 'v7.0-rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:
"Two small fixes:

   - fix potential deadlock

   - minor cleanup"

* tag 'v7.0-rc-part2-ksmbd-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: call ksmbd_vfs_kern_path_end_removing() on some error paths
  smb: server: Remove duplicate include of misc.h

Documentation: Coccinelle: document debug log handling

The current debug documentation does not mention that logs are printed
to stdout unless DEBUG_FILE is set. It also doesn't mention that
Coccinelle cannot overwrite debug files.

Document this behaviour in the examples and reference it in the
debugging section.

Signed-off-by: Benjamin Philip <benjamin.philip495@gmail.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>

scripts: coccicheck: warn on unset debug file

coccicheck prints debug logs to stdout unless a debug file has been set.
This makes it hard to read coccinelle's suggested changes, especially
for someone new to coccicheck.

From this commit, we warn about this behaviour from within the script on
an unset debug file. Explicitly setting the debug file to /dev/null
suppresses the warning while keeping the default.

Signed-off-by: Benjamin Philip <benjamin.philip495@gmail.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>

scripts: coccicheck: simplify debug file handling

This commit separates handling unset files and pre-existing files. It
also eliminates a duplicated check for unset files in run_cmd_parmap().

Signed-off-by: Benjamin Philip <benjamin.philip495@gmail.com>
Signed-off-by: Julia Lawall <julia.lawall@inria.fr>

kmalloc_obj: Clean up after treewide replacements

Coccinelle doesn't handle re-indenting line escapes. Fix the 2 places
where these got misaligned.

Remove 2 now-redundant type casts, found with:
$ git grep -P 'struct (\S+).*\)\s*k\S+alloc_(objs?|flex)\(struct \1'

Signed-off-by: Kees Cook <kees@kernel.org>

treewide: Replace kmalloc with kmalloc_obj for non-scalar types

This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:

Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)

Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)

Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)

(where TYPE may also be *VAR)

The resulting allocations no longer return "void *", instead returning
"TYPE *".

Signed-off-by: Kees Cook <kees@kernel.org>

compiler_types: Disable __builtin_counted_by_ref for Clang

Unfortunately, there is a corner case of __builtin_counted_by_ref()
usage that crashes[1] Clang since support was introduced in Clang 19.
Disable it prior to Clang 22. Found while tested kmalloc_obj treewide
refactoring (via kmalloc_flex() usage).

Link: https://github.com/llvm/llvm-project/issues/182575
Signed-off-by: Kees Cook <kees@kernel.org>

tools/sched_ext: fix getopt not re-parsed on restart

After goto restart, optind retains its advanced position from the
previous getopt loop, causing getopt() to immediately return -1.
This silently drops all command-line options on the restarted skeleton.

Reset optind to 1 at the restart label so options are re-parsed.

Affected schedulers: scx_simple, scx_central, scx_flatcg, scx_pair,
scx_sdt, scx_cpu0.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

tools/sched_ext: scx_userland: fix data races on shared counters

The stats thread reads nr_vruntime_enqueues, nr_vruntime_dispatches,
nr_vruntime_failed, and nr_curr_enqueued concurrently with the main
thread writing them, with no synchronization.

Use __atomic builtins with relaxed ordering for all accesses to these
counters to eliminate the data races.

Only display accuracy is affected, not scheduling correctness.

Signed-off-by: David Carlier <devnexen@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>

Merge tag 'spi-fix-v7.0-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi

Pull spi fixes from Mark Brown:
"There's a relatively large but ultimately simple fix for spidev here
  which addresses some ABBA races by simplifying down to just using a
  single lock, it's not clear to me that there was ever any benefit in
  having the two separate locks in the first place.

  We also have simple missing error check fix in in the wpcm-fiu driver"

* tag 'spi-fix-v7.0-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
  spi: spidev: fix lock inversion between spi_lock and buf_lock
  spi: wpcm-fiu: Fix potential NULL pointer dereference in wpcm_fiu_probe()

Merge tag 'regulator-fix-v7.0-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator

Pull regulator fixes from Mark Brown:
"A few driver specific fixes, plus a patch from Bjorn which removes a
  fixed limit on regulator names that was breaking some Qualcomm
  systems"

* tag 'regulator-fix-v7.0-merge-window' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
  regulator: s2mps11: fix pctrlsel macro usage in s2mpg10_of_parse_cb()
  regulator: s2mps11: drop redundant sanity checks in s2mpg10_of_parse_cb()
  regulator: core: Remove regulator supply_name length limit
  regulator: mt6363: Fix interrmittent timeout

Merge tag 'pci-v7.0-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

- Fix bridge window selection bug that prevented resource assignment
   (Kai-Heng Feng)

- Fix bridge window sizing, which failed to assign resources for
   windows containing only optional resources (ROMs, SR-IOV BARs, etc)
   (Ilpo Järvinen)

- Select CONFIGFS_FS when PCI_EPF_TEST is enabled to avoid a link error
   (Arnd Bergmann)

- Fix recently merged Endpoint inbound submapping feature (Koichiro
   Den)

* tag 'pci-v7.0-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI: dwc: ep: Always clear IB maps on BAR update
  PCI: dwc: ep: Return after clearing BAR-match inbound mapping
  PCI: endpoint: pci-epf-test: Select configfs
  PCI: Account fully optional bridge windows correctly
  PCI: Validate window resource type in pbus_select_window_for_type()

Merge tag 'dmi-for-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging

Pull dmi update from Jean Delvare:

- include product_family info in dmi-id modalias

* tag 'dmi-for-v7.0' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
firmware/dmi: Include product_family info to modalias

Merge tag 'gpio-fixes-for-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux

Pull gpio fixes from Bartosz Golaszewski:

- add a missing IS_ERR() check in gpio-nomadik

- fix a NULL-pointer dereference in GPIO character device code

- restore label matching in swnode-lookup due to reported regressions
   in existing users (this will get removed again once we audit and
   update all drivers)

- fix remove path in GPIO sysfs code

- normalize the return value of gpio_chip::get() in gpio-amd-fch

* tag 'gpio-fixes-for-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux:
  gpio: amd-fch: ionly return allowed values from amd_fch_gpio_get()
  gpio: sysfs: fix chip removal with GPIOs exported over sysfs
  gpio: swnode: restore the swnode-name-against-chip-label matching
  gpio: cdev: Avoid NULL dereference in linehandle_create()
  gpio: nomadik: Add missing IS_ERR() check

Merge tag 'i2c-for-7.0-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull more i2c updates from Wolfram Sang:
"Designware:
   - refactor the transfer path to support I2C_M_STOP
   - handle pm runtime by using the active auto try macros
   - handle controllers lacking explicit START and STOP conditions
   - general cleanups

  Other i2c drivers:
   - qualcomm: add support for qcs8300-cci
   - amd8111: general cleanups
   - cp2112: add DT bindings"

* tag 'i2c-for-7.0-part2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  dt-bindings: i2c: Add CP2112 HID USB to SMBus Bridge
  i2c: amd8111: switch to devm_ functions
  i2c: amd8111: Remove spaces in MODULE_* macros
  i2c: designware-platdrv: fix cleanup on probe failure
  i2c: designware-platdrv: simplify reset control
  dt-bindings: i2c: qcom-cci: Document qcs8300 compatible
  i2c: designware: Remove dead code in AMD ISP case
  i2c: designware: Support of controller with IC_EMPTYFIFO_HOLD_MASTER disabled
  i2c: designware: Use runtime PM macro for auto-cleanup
  i2c: designware: Implement I2C_M_STOP support

Merge tag 'sound-fix-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"Here are a bunch of updates, but there should be no big surprises;
  mostly device-specific quirks and fix-ups or non-code changes:

   - Quirks for ASoC AMD, HD-audio and USB-audio

   - Fixes in ASoC fsl, rockchip, renesas, aw codecs

   - Fixes for USB-audio packet handling in the implicit feedback mode

   - Updates of SPDX license IDs in some files"

* tag 'sound-fix-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (28 commits)
  ASoC: rockchip: i2s-tdm: Use param rate if not provided by set_sysclk
  ALSA: hda/hdmi: Add quirk for TUXEDO IBS14G6
  ASoC: dt-bindings: asahi-kasei,ak5558: Fix the supply names
  ASoC: dt-bindings: asahi-kasei,ak4458: Fix the supply names
  ASoC: dt-bindings: asahi-kasei,ak4458: set unevaluatedProperties:false
  ASoC: amd: amd_sdw: add machine driver quirk for Lenovo models
  ASoC: amd: acp: Add ACP7.0 match entries for Realtek parts
  ALSA: echoaudio: Add SPDX ids to some files
  ALSA: isa: Add SPDX id lines to some files
  ALSA: core: Add SPDX license id to files
  ASoC: tas2783A: add explicit port prepare handling
  ASoC: renesas: rz-ssi: Fix playback and capture
  ALSA: hda/realtek: Fix headset mic on ASUS Zenbook 14 UX3405MA
  ALSA: hda/conexant: Fix headphone jack handling on Acer Swift SF314
  ASoC: qcom: sm8250: Add quinary MI2S support
  ASoC: amd: yc: Add DMI quirk for ASUS Vivobook Pro 15X M6501RR
  ALSA: usb-audio: Avoid potentially repeated XRUN error messages
  ALSA: usb-audio: Add sanity check for OOB writes at silencing
  ALSA: usb-audio: Optimize the copy of packet sizes for implicit fb handling
  ALSA: usb-audio: Update the number of packets properly at receiving
  ...

Merge tag 'drm-next-2026-02-21' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"This is the fixes and cleanups for the end of the merge window, it's
  nearly all amdgpu, with some amdkfd, then a pagemap core fix, i915/xe
  display fixes, and some xe driver fixes.

  Nothing seems out of the ordinary, except amdgpu is a little more
  volume than usual.

  pagemap:
   - drm/pagemap: pass pagemap_addr by reference

  amdgpu:
   - DML 2.1 fixes
   - Panel replay fixes
   - Display writeback fixes
   - MES 11 old firmware compat fix
   - DC CRC improvements
   - DPIA fixes
   - XGMI fixes
   - ASPM fix
   - SMU feature bit handling fixes
   - DC LUT fixes
   - RAS fixes
   - Misc memory leak in error path fixes
   - SDMA queue reset fixes
   - PG handling fixes
   - 5 level GPUVM page table fix
   - SR-IOV fix
   - Queue reset fix
   - SMU 13.x fixes
   - DC resume lag fix
   - MPO fixes
   - DCN 3.6 fix
   - VSDB fixes
   - HWSS clean up
   - Replay fixes
   - DCE cursor fixes
   - DCN 3.5 SR DDR5 latency fixes
   - HPD fixes
   - Error path unwind fixes
   - SMU13/14 mode1 reset fixes
   - PSP 15 updates
   - SMU 15 updates
   - Sync fix in amdgpu_dma_buf_move_notify()
   - HAINAN fix
   - PSP 13.x fix
   - GPUVM locking fix
   - Fixes for DC analog support
   - DC FAMS fixes
   - DML 2.1 fixes
   - eDP fixes
   - Misc DC fixes
   - Fastboot fix
   - 3DLUT fixes
   - GPUVM fixes
   - 64bpp format fix
   - Fix for MacBooks with switchable gfx

  amdkfd:
   - Fix possible double deletion of validate list
   - Event setup fix
   - Device disconnect regression fix
   - APU GTT as VRAM fix
   - Fix piority inversion with MQDs
   - NULL check fix

  radeon:
   - HAINAN fix

  i915/xe display:
   - Regresion fix for HDR 4k displays (#15503)
   - Fixup for Dell XPS 13 7390 eDP rate limit
   - Memory leak fix on ACPI _DSM handling
   - Add missing slice count check during DP mode validation

  xe:
   - drm/xe: Prevent VFs from exposing the CCS mode sysfs file
   - SRIOV related fixes
   - PAT cache fix
   - MMIO read fix
   - W/a fixes
   - Adjust type of xe_modparam.force_vram_bar_size
   - Wedge mode fix
   - HWMon fix

* tag 'drm-next-2026-02-21' of https://gitlab.freedesktop.org/drm/kernel: (143 commits)
  drm/amd/display: Remove unneeded DAC link encoder register
  drm/amd/display: Enable DAC in DCE link encoder
  drm/amd/display: Set CRTC source for DAC using registers
  drm/amd/display: Initialize DAC in DCE link encoder using VBIOS
  drm/amd/display: Turn off DAC in DCE link encoder using VBIOS
  drm/amd/display: Don't call find_analog_engine() twice
  drm/amdgpu: fix 4-level paging if GMC supports 57-bit VA v2
  drm/amdgpu: keep vga memory on MacBooks with switchable graphics
  drm/amdgpu: Set atomics to true for xgmi
  drm/amdkfd: Check for NULL return values
  drm/amd/display: Use same max plane scaling limits for all 64 bpp formats
  drm/amdgpu: Set vmid0 PAGE_TABLE_DEPTH for GFX12.1
  drm/amdkfd: Disable MQD queue priority
  drm/amd/display: Remove conditional for shaper 3DLUT power-on
  drm/amd/display: Check return of shaper curve to HW format
  drm/amd/display: Correct logic check error for fastboot
  drm/amd/display: Skip eDP detection when no sink
  Revert "drm/amd/display: Add Gfx Base Case For Linear Tiling Handling"
  Revert "drm/amd/display: Correct hubp GfxVersion verification"
  Revert "drm/amd/display: Add Handling for gfxversion DcGfxBase"
  ...

Merge tag 'fbdev-for-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev

Pull more fbdev updates from Helge Deller:
"Code cleanups for the au1100fb fbdev driver (Uwe Kleine-König)"

* tag 'fbdev-for-7.0-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/linux-fbdev:
  fbdev: au1100fb: Replace license boilerplate by SPDX header
  fbdev: au1100fb: Fold au1100fb.h into its only user
  fbdev: au1100fb: Replace custom printk wrappers by pr_*
  fbdev: au1100fb: Make driver compilable on non-mips platforms
  fbdev: au1100fb: Use proper conversion specifiers in printk formats
  fbdev: au1100fb: Mark several local functions as static
  fbdev: au1100fb: Don't store device specific data in global variables

Merge tag 'trace-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace

Pull tracing fixes from Steven Rostedt:

- Fix possible dereference of uninitialized pointer

   When validating the persistent ring buffer on boot up, if the first
   validation fails, a reference to "head_page" is performed in the
   error path, but it skips over the initialization of that variable.
   Move the initialization before the first validation check.

- Fix use of event length in validation of persistent ring buffer

   On boot up, the persistent ring buffer is checked to see if it is
   valid by several methods. One being to walk all the events in the
   memory location to make sure they are all valid. The length of the
   event is used to move to the next event. This length is determined by
   the data in the buffer. If that length is corrupted, it could
   possibly make the next event to check located at a bad memory
   location.

   Validate the length field of the event when doing the event walk.

- Fix function graph on archs that do not support use of ftrace_ops

   When an architecture defines HAVE_DYNAMIC_FTRACE_WITH_ARGS, it means
   that its function graph tracer uses the ftrace_ops of the function
   tracer to call its callbacks. This allows a single registered
   callback to be called directly instead of checking the callback's
   meta data's hash entries against the function being traced.

   For architectures that do not support this feature, it must always
   call the loop function that tests each registered callback (even if
   there's only one). The loop function tests each callback's meta data
   against its hash of functions and will call its callback if the
   function being traced is in its hash map.

   The issue was that there was no check against this and the direct
   function was being called even if the architecture didn't support it.
   This meant that if function tracing was enabled at the same time as a
   callback was registered with the function graph tracer, its callback
   would be called for every function that the function tracer also
   traced, even if the callback's meta data only wanted to be called
   back for a small subset of functions.

   Prevent the direct calling for those architectures that do not
   support it.

- Fix references to trace_event_file for hist files

   The hist files used event_file_data() to get a reference to the
   associated trace_event_file the histogram was attached to. This would
   return a pointer even if the trace_event_file is about to be freed
   (via RCU). Instead it should use the event_file_file() helper that
   returns NULL if the trace_event_file is marked to be freed so that no
   new references are added to it.

- Wake up hist poll readers when an event is being freed

   When polling on a hist file, the task is only awoken when a hist
   trigger is triggered. This means that if an event is being freed
   while there's a task waiting on its hist file, it will need to wait
   until the hist trigger occurs to wake it up and allow the freeing to
   happen. Note, the event will not be completely freed until all
   references are removed, and a hist poller keeps a reference. But it
   should still be woken when the event is being freed.

* tag 'trace-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  tracing: Wake up poll waiters for hist files when removing an event
  tracing: Fix checking of freed trace_event_file for hist files
  fgraph: Do not call handlers direct when not using ftrace_ops
  tracing: ring-buffer: Fix to check event length before using
  ring-buffer: Fix possible dereference of uninitialized pointer

Merge tag 'for-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- multiple error handling fixes of unexpected conditions

- reset block group size class once it becomes empty so that
   its class can be changed

- error message level adjustments

- fixes of returned error values

- use correct block reserve for delayed refs

* tag 'for-7.0-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: fix invalid leaf access in btrfs_quota_enable() if ref key not found
  btrfs: fix lost error return in btrfs_find_orphan_roots()
  btrfs: fix lost return value on error in finish_verity()
  btrfs: change unaligned root messages to error level in btrfs_validate_super()
  btrfs: use the correct type to initialize block reserve for delayed refs
  btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure()
  btrfs: reset block group size class when it becomes empty
  btrfs: replace BUG() with error handling in __btrfs_balance()
  btrfs: handle unexpected exact match in btrfs_set_inode_index_count()

Merge tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs

Pull ecryptfs updates from Tyler Hicks:
"This consists of some really minor typo fixes that fell through the
  cracks and some more recent code cleanups:

   - Comment typo fixes

   - Removal of an unused function declaration

   - Use strscpy() instead of the deprecated strcpy()

   - Use string copying helpers instead of memcpy() and manually
     terminating strings"

* tag 'ecryptfs-7.0-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
  ecryptfs: Replace memcpy + NUL termination in ecryptfs_copy_filename
  ecryptfs: Drop redundant NUL terminations after calling ecryptfs_to_hex
  ecryptfs: Replace memcpy + NUL termination in ecryptfs_new_file_context
  ecryptfs: Replace strcpy with strscpy in ecryptfs_validate_options
  ecryptfs: Replace strcpy with strscpy in ecryptfs_cipher_code_to_string
  ecryptfs: Replace strcpy with strscpy in ecryptfs_set_default_crypt_stat_vals
  ecryptfs: simplify list initialization in ecryptfs_parse_packet_set()
  ecryptfs: Remove unused declartion ecryptfs_fill_zeros()
  ecryptfs: Fix packet format comment in parse_tag_67_packet()
  ecryptfs: comment typo fix
  ecryptfs: keystore: Fix typo 'the the' in comment

NTB: ntb_transport: Use seq_file for QP stats debugfs

The ./qp*/stats debugfs file for each NTB transport QP is currently
implemented with a hand-crafted kmalloc() buffer and a series of
scnprintf() calls. This is a pre-seq_file style pattern and makes future
extensions easy to truncate.

Convert the stats file to use the seq_file helpers via
DEFINE_SHOW_ATTRIBUTE(), which simplifies the code and lets the seq_file
core handle buffering and partial reads.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

NTB: ntb_transport: Fix too small buffer for debugfs_name

The buffer used for "qp%d" was only 4 bytes, which truncates names like
"qp10" to "qp1" and causes multiple queues to share the same directory.

Enlarge the buffer and use sizeof() to avoid truncation.

Fixes: fce8a7bb5b4b ("PCI-Express Non-Transparent Bridge Support")
Cc: <stable@vger.kernel.org> # v3.9+
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb/ntb_tool: correct sscanf format for u64 and size_t in tool_peer_mw_trans_write

The sscanf() call in tool_peer_mw_trans_write() uses "%lli:%zi" to parse
user input into 'u64 addr' and 'size_t wsize'. This is incorrect:

- "%lli" expects a signed long long *, but 'addr' is u64 (unsigned).
   Input like "0x8000000000000000" is misinterpreted as negative,
   leading to corrupted address values.

- "%zi" expects a signed ssize_t *, but 'wsize' is size_t (unsigned).
   Input of "-1" is successfully parsed and stored as SIZE_MAX
   (e.g., 0xFFFFFFFFFFFFFFFF), which may cause buffer overflows
   or infinite loops in subsequent memory operations.

Fix by using format specifiers that match the actual variable types:
- "%llu" for u64 (supports hex/decimal, standard for kernel u64 parsing)
- "%zu" for size_t (standard and safe; rejects negative input)

Signed-off-by: yangqixiao <yangqixiao@inspur.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb: intel: Add Intel Gen6 NTB support for DiamondRapids

Add DiamondRapids NTB support by adding the DID and adjust the changed
PPD0 offset.

Signed-off-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

NTB/msi: Remove unused functions

ntbm_msi_free_irq() and ntb_msi_peer_addr() were both added in 2019's
commit 26b3a37b9284 ("NTB: Introduce MSI library")
but have remained unused.

Remove them, and the ntbm_msi_callback_match() helper that
was used by ntbm_msi_free_irq().

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb: ntb_hw_switchtec: Increase MAX_MWS limit to 256

Microchip NTB switchtec devices supports up to 512 LUT's across all
NT partitions. This patch enable symmetric NTB configuration to utilize
all 512 memory windows across 2 peers partitions.

Signed-off-by: Maciej Grochowski <Maciej.Grochowski@sony.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb: ntb_hw_switchtec: Fix array-index-out-of-bounds access

Number of MW LUTs depends on NTB configuration and can be set to MAX_MWS,
This patch protects against invalid index out of bounds access to mw_sizes
When invalid access print message to user that configuration is not valid.

Signed-off-by: Maciej Grochowski <Maciej.Grochowski@sony.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb: ntb_hw_switchtec: Fix shift-out-of-bounds for 0 mw lut

Number of MW LUTs depends on NTB configuration and can be set to zero,
in such scenario rounddown_pow_of_two will cause undefined behaviour and
should not be performed.
This patch ensures that rounddown_pow_of_two is called on valid value.

Signed-off-by: Maciej Grochowski <Maciej.Grochowski@sony.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

NTB: epf: allow built-in build

ntb_hw_epf works just as well when built into the kernel image. Don't
force module build.

Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

ntb: migrate to dma_map_phys instead of map_page

After introduction of dma_map_phys(), there is no need to convert
from physical address to struct page in order to map page. So let's
use it directly.

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

NTB: ntb_transport: Add 'tx_memcpy_offload' module option

Some platforms (e.g. R-Car S4) do not gain from using a DMAC on TX path
in ntb_transport and end up CPU-bound on memcpy_toio(). Add a module
parameter 'tx_memcpy_offload' that moves the TX memcpy_toio() and
descriptor writes to a per-QP kernel thread. It is disabled by default.

This change also fixes a rare ordering hazard in ntb_tx_copy_callback(),
that was observed on R-Car S4 once throughput improved with the new
module parameter: the DONE flag write to the peer MW, which is WC
mapped, could be observed after the DB/MSI trigger. Both operations are
posted PCIe MWr (often via different OB iATUs), so WC buffering and
bridges may reorder visibility. Insert dma_mb() to enforce store->load
ordering and then read back hdr->flags to flush the posted write before
ringing the doorbell / issuing MSI.

While at it, update tx_index with WRITE_ONCE() at the earlier possible
location to make ntb_transport_tx_free_entry() robust.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

NTB: ntb_transport: Remove unused 'retries' field from ntb_queue_entry

Drop the unused field 'retries' from struct ntb_queue_entry for
simplicity's sake.

Signed-off-by: Koichiro Den <den@valinux.co.jp>
Signed-off-by: Jon Mason <jdmason@kudzu.us>

rtc: ds1390: fix number of bytes read from RTC

The spi_write_then_read() reads 8 bytes starting from
DS1390_REG_SECONDS (== 0x01), so the last byte read would already
be part of the alarm (Tenths and Hundredths of Seconds) feature.

However 7 bytes are engouh -- seconds (0x01), minutes (0x02), hours (0x03),
day (0x04), date (0x05), month/century (0x06) and year (0x07).

Signed-off-by: Andreas Gabriel-Platschek <andi.platschek@gmail.com>
Link: https://patch.msgid.link/20260209053439.313825-1-andi.platschek@gmail.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

rtc: class: Remove duplicate check for alarm

In __devm_rtc_register_device(), the callee rtc_initialize_alarm()
will check the alarm, there is no need to check in advance,
so remove it.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://patch.msgid.link/20260122090031.3871746-1-ruanjinjie@huawei.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

rtc: optee: simplify OP-TEE context match

Simplify the TEE implementor ID match by returning the boolean
expression directly instead of going through an if/else.

Signed-off-by: Rouven Czerwinski <rouven.czerwinski@linaro.org>
Link: https://patch.msgid.link/20260126-optee-simplify-context-match-v1-3-d4104e526cb6@linaro.org
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

Merge tag 'apparmor-pr-2026-02-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor

Pull AppArmor updates from John Johansen:
"Features:
   - add .kunitconfig
   - audit execpath in userns mediation
   - add support loading per permission tagging

  Cleanups:
   - remove unused percpu critical sections in buffer management
   - document the buffer hold, add an overflow guard
   - split xxx_in_ns into its two separate semantic use cases
   - remove apply_modes_to_perms from label_match
   - refactor/cleanup cred helper fns.
   - guard against free attachment/data routines being called with NULL
   - drop in_atomic flag in common_mmap, common_file_perm, and cleanup
   - make str table more generic and be able to have multiple entries
   - Replace deprecated strcpy with memcpy in gen_symlink_name
   - Replace deprecated strcpy in d_namespace_path
   - Replace sprintf/strcpy with scnprintf/strscpy in aa_policy_init
   - replace sprintf with snprintf in aa_new_learning_profile

  Bug Fixes:
   - fix cast in format string DEBUG statement
   - fix make aa_labelmatch return consistent
   - fix fmt string type error in process_strs_entry
   - fix kernel-doc comments for inview
   - fix invalid deref of rawdata when export_binary is unset
   - avoid per-cpu hold underflow in aa_get_buffer
   - fix fast path cache check for unix sockets
   - fix rlimit for posix cpu timers
   - fix label and profile debug macros
   - move check for aa_null file to cover all cases
   - return -ENOMEM in unpack_perms_table upon alloc failure
   - fix boolean argument in apparmor_mmap_file
   - Fix & Optimize table creation from possibly unaligned memory
   - Allow apparmor to handle unaligned dfa tables
   - fix NULL deref in aa_sock_file_perm
   - fix NULL pointer dereference in __unix_needs_revalidation
   - fix signedness bug in unpack_tags()"

* tag 'apparmor-pr-2026-02-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jj/linux-apparmor: (34 commits)
  apparmor: fix signedness bug in unpack_tags()
  apparmor: fix cast in format string DEBUG statement
  apparmor: fix aa_label to return state from compount and component match
  apparmor: fix fmt string type error in process_strs_entry
  apparmor: fix kernel-doc comments for inview
  apparmor: fix invalid deref of rawdata when export_binary is unset
  apparmor: add .kunitconfig
  apparmor: cleanup remove unused percpu critical sections in buffer management
  apparmor: document the buffer hold, add an overflow guard
  apparmor: avoid per-cpu hold underflow in aa_get_buffer
  apparmor: split xxx_in_ns into its two separate semantic use cases
  apparmor: make label_match return a consistent value
  apparmor: remove apply_modes_to_perms from label_match
  apparmor: fix fast path cache check for unix sockets
  apparmor: fix rlimit for posix cpu timers
  apparmor: refactor/cleanup cred helper fns.
  apparmor: fix label and profile debug macros
  apparmor: move check for aa_null file to cover all cases
  apparmor: guard against free routines being called with a NULL
  apparmor: return -ENOMEM in unpack_perms_table upon alloc failure
  ...

rtc: interface: Alarm race handling should not discard preceding error

Commit 795cda8338ea ("rtc: interface: Fix long-standing race when setting
alarm") should not discard any errors from the preceding validations.

Prior to that commit, if the alarm feature was disabled, or the
set_alarm failed, a meaningful error code would be returned to the
caller for further action.

After, more often than not, the __rtc_read_time will cause a success
return code instead, misleading the caller.

An example of this is when timer_enqueue is called for a rtc-abx080x
device. Since that driver does not clear the alarm feature bit, but
instead relies on the set_alarm operation to return invalid, the discard
of the return code causes very different behaviour; i.e.
hwclock: select() to /dev/rtc0 to wait for clock tick timed out

Fixes: 795cda8338ea ("rtc: interface: Fix long-standing race when setting alarm")
Signed-off-by: Anthony Pighin (Nokia) <anthony.pighin@nokia.com>
Reviewed-by: Esben Haabendal <esben@geanix.com>
Tested-by: Nick Bowler <nbowler@draconx.ca>
Link: https://patch.msgid.link/BN0PR08MB6951415A751F236375A2945683D1A@BN0PR08MB6951.namprd08.prod.outlook.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>

Merge tag 'kmalloc_obj-prep-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux

Pull kmalloc_obj prep from Kees Cook:
"Fixes for return types to prepare for the kmalloc_obj treewide
  conversion, that haven't yet appeared during the merge window:
  dm-crypt, dm-zoned, drm/msm, and arm64 kvm"

* tag 'kmalloc_obj-prep-v7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
  KVM: arm64: vgic: Handle const qualifier from gic_kvm_info allocation type
  drm/msm: Adjust msm_iommu_pagetable_prealloc_allocate() allocation type
  dm: dm-zoned: Adjust dmz_load_mapping() allocation type
  dm-crypt: Adjust crypt_alloc_tfms_aead() allocation type