git.ipfire.org Git - thirdparty/linux.git/log

]> git.ipfire.org Git - thirdparty/linux.git/log

projects / thirdparty / linux.git / log

summary | shortlog | log | commit | commitdiff | tree
first ⋅ prev ⋅ next

commit | commitdiff | tree

Kumar Kartikeya Dwivedi [Wed, 29 Oct 2025 18:18:27 +0000 (18:18 +0000)]

rqspinlock: Disable queue destruction for deadlocks

Disable propagation and unwinding of the waiter queue in case the head
waiter detects a deadlock condition, but keep it enabled in case of the
timeout fallback.

Currently, when the head waiter experiences an AA deadlock, it will
signal all its successors in the queue to exit with an error. This is
not ideal for cases where the same lock is held in contexts which can
cause errors in an unrestricted fashion (e.g., BPF programs, or kernel
paths invoked through BPF programs), and core kernel logic which is
written in a correct fashion and does not expect deadlocks.

The same reasoning can be extended to ABBA situations. Depending on the
actual runtime schedule, one or both of the head waiters involved in an
ABBA situation can detect and exit directly without terminating their
waiter queue. If the ABBA situation manifests again, the waiters will
keep exiting until progress can be made, or a timeout is triggered in
case of more complicated locking dependencies.

We still preserve the queue destruction in case of timeouts, as either
the locking dependencies are too complex to be captured by AA and ABBA
heuristics, or the owner is perpetually stuck. As such, it would be
unwise to continue to apply the timeout for each new head waiter without
terminating the queue, since we may end up waiting for more than 250 ms
in aggregate with all participants in the locking transaction.

The patch itself is fairly simple; we can simply signal our successor to
become the next head waiter, and leave the queue without attempting to
acquire the lock.

With this change, the behavior for waiters in case of deadlocks
experienced by a predecessor changes. It is guaranteed that call sites
will no longer receive errors if the predecessors encounter deadlocks
and the successors do not participate in one. This should lower the
failure rate for waiters that are not doing improper locking opreations,
just because they were unlucky to queue behind a misbehaving waiter.
However, timeouts are still a possibility, hence they must be accounted
for, so users cannot rely upon errors not occuring at all.

Suggested-by: Amery Hung <ameryhung@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/r/20251029181828.231529-2-memxor@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Wed, 29 Oct 2025 19:59:07 +0000 (19:59 +0000)]

selftests/bpf: Fix intermittent failures in file_reader test

file_reader/on_open_expect_fault intermittently fails when test_progs
runs tests in parallel, because it expects a page fault on first read.
Another file_reader test running concurrently may have already pulled
the same pages into the page cache, eliminating the fault and causing a
spurious failure.

Make file_reader/on_open_expect_fault read from a file region that does
not overlap with other file_reader tests, so the initial access still
faults even under parallel execution.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev>
Link: https://lore.kernel.org/r/20251029195907.858217-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Martin KaFai Lau [Wed, 29 Oct 2025 00:38:43 +0000 (17:38 -0700)]

Merge branch 'selftests-bpf-convert-test_tc_tunnel-sh-to-test_progs'

Alexis Lothoré says:

====================
Hello,
this is the v3 of test_tc_tunnel conversion into test_progs framework.
This new revision:
- fixes a few issues spotted by the bot reviewer
- removes any test ensuring connection failure (and so depending on a
  timout) to keep the execution time reasonable

test_tc_tunnel.sh tests a variety of tunnels based on BPF: packets are
encapsulated by a BPF program on the client egress. We then check that
those packets can be decapsulated on server ingress side, either thanks
to kernel-based or BPF-based decapsulation. Those tests are run thanks
to two veths in two dedicated namespaces.

- patches 1 and 2 are preparatory patches
- patch 3 introduce tc_tunnel test into test_progs
- patch 4 gets rid of the test_tc_tunnel.sh script

The new test has been executed both in some x86 local qemu machine, as
well as in CI:

  # ./test_progs -a tc_tunnel
  #454/1   tc_tunnel/ipip_none:OK
  #454/2   tc_tunnel/ipip6_none:OK
  #454/3   tc_tunnel/ip6tnl_none:OK
  #454/4   tc_tunnel/sit_none:OK
  #454/5   tc_tunnel/vxlan_eth:OK
  #454/6   tc_tunnel/ip6vxlan_eth:OK
  #454/7   tc_tunnel/gre_none:OK
  #454/8   tc_tunnel/gre_eth:OK
  #454/9   tc_tunnel/gre_mpls:OK
  #454/10  tc_tunnel/ip6gre_none:OK
  #454/11  tc_tunnel/ip6gre_eth:OK
  #454/12  tc_tunnel/ip6gre_mpls:OK
  #454/13  tc_tunnel/udp_none:OK
  #454/14  tc_tunnel/udp_eth:OK
  #454/15  tc_tunnel/udp_mpls:OK
  #454/16  tc_tunnel/ip6udp_none:OK
  #454/17  tc_tunnel/ip6udp_eth:OK
  #454/18  tc_tunnel/ip6udp_mpls:OK
  #454     tc_tunnel:OK
  Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED
====================

Link: https://patch.msgid.link/20251027-tc_tunnel-v3-0-505c12019f9d@bootlin.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>

commit | commitdiff | tree

Alexis Lothoré (eBPF Foundation) [Mon, 27 Oct 2025 14:51:56 +0000 (15:51 +0100)]

selftests/bpf: Remove test_tc_tunnel.sh

Now that test_tc_tunnel.sh scope has been ported to the test_progs
framework, remove it.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-4-505c12019f9d@bootlin.com

commit | commitdiff | tree

Alexis Lothoré (eBPF Foundation) [Mon, 27 Oct 2025 14:51:55 +0000 (15:51 +0100)]

selftests/bpf: Integrate test_tc_tunnel.sh tests into test_progs

The test_tc_tunnel.sh script checks that a large variety of tunneling
mechanisms handled by the kernel can be handled as well by eBPF
programs. While this test shares similarities with test_tunnel.c (which
is already integrated in test_progs), those are testing slightly
different things:
- test_tunnel.c creates a tunnel interface, and then get and set tunnel
  keys in packet metadata, from BPF programs.
- test_tc_tunnels.sh manually parses/crafts packets content

Bring the tests covered by test_tc_tunnel.sh into the test_progs
framework, by creating a dedicated test_tc_tunnel.sh. This new test
defines a "generic" runner which, for each test configuration:
- will configure the relevant veth pair, each of those isolated in a
  dedicated namespace
- will check that traffic will fail if there is only an encapsulating
  program attached to one veth egress
- will check that traffic succeed if we enable some decapsulation module
  on kernel side
- will check that traffic still succeeds if we replace the kernel
  decapsulation with some eBPF ingress decapsulation.

Example of the new test execution:

  # ./test_progs -a tc_tunnel
  #447/1   tc_tunnel/ipip_none:OK
  #447/2   tc_tunnel/ipip6_none:OK
  #447/3   tc_tunnel/ip6tnl_none:OK
  #447/4   tc_tunnel/sit_none:OK
  #447/5   tc_tunnel/vxlan_eth:OK
  #447/6   tc_tunnel/ip6vxlan_eth:OK
  #447/7   tc_tunnel/gre_none:OK
  #447/8   tc_tunnel/gre_eth:OK
  #447/9   tc_tunnel/gre_mpls:OK
  #447/10  tc_tunnel/ip6gre_none:OK
  #447/11  tc_tunnel/ip6gre_eth:OK
  #447/12  tc_tunnel/ip6gre_mpls:OK
  #447/13  tc_tunnel/udp_none:OK
  #447/14  tc_tunnel/udp_eth:OK
  #447/15  tc_tunnel/udp_mpls:OK
  #447/16  tc_tunnel/ip6udp_none:OK
  #447/17  tc_tunnel/ip6udp_eth:OK
  #447/18  tc_tunnel/ip6udp_mpls:OK
  #447     tc_tunnel:OK
  Summary: 1/18 PASSED, 0 SKIPPED, 0 FAILED

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-3-505c12019f9d@bootlin.com

commit | commitdiff | tree

Alexis Lothoré (eBPF Foundation) [Mon, 27 Oct 2025 14:51:54 +0000 (15:51 +0100)]

selftests/bpf: Make test_tc_tunnel.bpf.c compatible with big endian platforms

When trying to run bpf-based encapsulation in a s390x environment, some
parts of test_tc_tunnel.bpf.o do not encapsulate correctly the traffic,
leading to tests failures. Adding some logs shows for example that
packets about to be sent on an interface with the ip6vxlan_eth program
attached do not have the expected value 5 in the ip header ihl field,
and so are ignored by the program.

This phenomenon appears when trying to cross-compile the selftests,
rather than compiling it from a virtualized host: the selftests build
system may then wrongly pick some host headers. If <asm/byteorder.h>
ends up being picked on the host (and if the host has a endianness
different from the target one), it will then expose wrong endianness
defines (e.g __LITTLE_ENDIAN_BITFIELD instead of __BIT_ENDIAN_BITFIELD),
and it will for example mess up the iphdr structure layout used in the
ebpf program.

To prevent this, directly use the vmlinux.h header generated by the
selftests build system rather than including directly specific kernel
headers. As a consequence, add some missing definitions that are not
exposed by vmlinux.h, and adapt the bitfield manipulations to allow
building and using the program on both types of platforms.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-2-505c12019f9d@bootlin.com

commit | commitdiff | tree

Alexis Lothoré (eBPF Foundation) [Mon, 27 Oct 2025 14:51:53 +0000 (15:51 +0100)]

selftests/bpf: Add tc helpers

The test_tunnel.c file defines small fonctions to easily attach eBPF
programs to tc hooks, either on egress, ingress or both.

Create a shared helper in network_helpers.c so that other tests can
benefit from it.

Signed-off-by: Alexis Lothoré (eBPF Foundation) <alexis.lothore@bootlin.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251027-tc_tunnel-v3-1-505c12019f9d@bootlin.com

commit | commitdiff | tree

Jianyun Gao [Mon, 27 Oct 2025 03:20:08 +0000 (11:20 +0800)]

libbpf: Fix the incorrect reference to the memlock_rlim variable in the comment.

The variable "memlock_rlim_max" referenced in the comment does not exist.
I think that the author probably meant the variable "memlock_rlim". So,
correct it.

Signed-off-by: Jianyun Gao <jianyungao89@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251027032008.738944-1-jianyungao89@gmail.com

commit | commitdiff | tree

Jianyun Gao [Fri, 24 Oct 2025 08:08:02 +0000 (16:08 +0800)]

libbpf: Optimize the redundant code in the bpf_object__init_user_btf_maps() function.

In the elf_sec_data() function, the input parameter 'scn' will be
evaluated. If it is NULL, then it will directly return NULL. Therefore,
the return value of the elf_sec_data() function already takes into
account the case where the input parameter scn is NULL. Therefore,
subsequently, the code only needs to check whether the return value of
the elf_sec_data() function is NULL.

Signed-off-by: Jianyun Gao <jianyungao89@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/bpf/20251024080802.642189-1-jianyungao89@gmail.com

commit | commitdiff | tree

Arnaud Lecomte [Sat, 25 Oct 2025 19:29:41 +0000 (19:29 +0000)]

bpf: Fix stackmap overflow check in __bpf_get_stackid()

Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.

Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0")
Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com
Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20251025192941.1500-1-contact@arnaud-lcm.com
Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b

commit | commitdiff | tree

Arnaud Lecomte [Sat, 25 Oct 2025 19:28:58 +0000 (19:28 +0000)]

bpf: Refactor stack map trace depth calculation into helper function

Extract the duplicated maximum allowed depth computation for stack
traces stored in BPF stacks from bpf_get_stackid() and __bpf_get_stack()
into a dedicated stack_map_calculate_max_depth() helper function.

This unifies the logic for:
- The max depth computation
- Enforcing the sysctl_perf_event_max_stack limit

No functional changes for existing code paths.

Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/bpf/20251025192858.31424-1-contact@arnaud-lcm.com

commit | commitdiff | tree

Zhang Chujun [Tue, 28 Oct 2025 06:33:45 +0000 (14:33 +0800)]

bpftool: Fix missing closing parethesis for BTF_KIND_UNKN

In the btf_dumper_do_type function, the debug print statement for
BTF_KIND_UNKN was missing a closing parenthesis in the output format.
This patch adds the missing ')' to ensure proper formatting of the
dump output.

Signed-off-by: Zhang Chujun <zhangchujun@cmss.chinamobile.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251028063345.1911-1-zhangchujun@cmss.chinamobile.com

commit | commitdiff | tree

Xu Kuohai [Sat, 18 Oct 2025 03:57:38 +0000 (11:57 +0800)]

selftests/bpf/benchs: Add overwrite mode benchmark for BPF ring buffer

Add --rb-overwrite option to benchmark BPF ring buffer in overwrite mode.
Since overwrite mode is not yet supported by libbpf for consumer, also add
--rb-bench-producer option to benchmark producer directly without a consumer.

Benchmarks on an x86_64 and an arm64 CPU are shown below for reference.

- AMD EPYC 9654 (x86_64)

Ringbuf, multi-producer contention in overwrite mode, no consumer
=================================================================
rb-prod nr_prod 1    32.180 ± 0.033M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 2    9.617 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 3    8.810 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 4    9.272 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 8    9.173 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 12   3.086 ± 0.032M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 16   2.945 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 20   2.519 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 24   2.545 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 28   2.363 ± 0.024M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 32   2.357 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 36   2.267 ± 0.011M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 40   2.284 ± 0.020M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 44   2.215 ± 0.025M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 48   2.193 ± 0.023M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 52   2.208 ± 0.024M/s (drops 0.000 ± 0.000M/s)

- HiSilicon Kunpeng 920 (arm64)

Ringbuf, multi-producer contention in overwrite mode, no consumer
=================================================================
rb-prod nr_prod 1    14.478 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 2    21.787 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 3    6.045 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 4    5.352 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 8    4.850 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 12   3.542 ± 0.016M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 16   3.509 ± 0.021M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 20   3.171 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 24   3.154 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 28   2.974 ± 0.015M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 32   3.167 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 36   2.903 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 40   2.866 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 44   2.914 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-prod nr_prod 48   2.806 ± 0.012M/s (drops 0.000 ± 0.000M/s)
Rb-prod nr_prod 52   2.840 ± 0.012M/s (drops 0.000 ± 0.000M/s)

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251018035738.4039621-4-xukuohai@huaweicloud.com

commit | commitdiff | tree

Xu Kuohai [Sat, 18 Oct 2025 03:57:37 +0000 (11:57 +0800)]

selftests/bpf: Add overwrite mode test for BPF ring buffer

Add overwrite mode test for BPF ring buffer. The test creates a BPF ring
buffer in overwrite mode, then repeatedly reserves and commits records
to check if the ring buffer works as expected both before and after
overwriting occurs.

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251018035738.4039621-3-xukuohai@huaweicloud.com

commit | commitdiff | tree

Xu Kuohai [Sat, 18 Oct 2025 03:57:36 +0000 (11:57 +0800)]

bpf: Add overwrite mode for BPF ring buffer

When the BPF ring buffer is full, a new event cannot be recorded until one
or more old events are consumed to make enough space for it. In cases such
as fault diagnostics, where recent events are more useful than older ones,
this mechanism may lead to critical events being lost.

So add overwrite mode for BPF ring buffer to address it. In this mode, the
new event overwrites the oldest event when the buffer is full.

The basic idea is as follows:

1. producer_pos tracks the next position to record new event. When there
   is enough free space, producer_pos is simply advanced by producer to
   make space for the new event.

2. To avoid waiting for consumer when the buffer is full, a new variable,
   overwrite_pos, is introduced for producer. It points to the oldest event
   committed in the buffer. It is advanced by producer to discard one or more
   oldest events to make space for the new event when the buffer is full.

3. pending_pos tracks the oldest event to be committed. pending_pos is never
   passed by producer_pos, so multiple producers never write to the same
   position at the same time.

The following example diagrams show how it works in a 4096-byte ring buffer.

1. At first, {producer,overwrite,pending,consumer}_pos are all set to 0.

   0       512      1024    1536     2048     2560     3072     3584       4096
   +-----------------------------------------------------------------------+
   |                                                                       |
   |                                                                       |
   |                                                                       |
   +-----------------------------------------------------------------------+
   ^
   |
   |
producer_pos = 0
overwrite_pos = 0
pending_pos = 0
consumer_pos = 0

2. Now reserve a 512-byte event A.

   There is enough free space, so A is allocated at offset 0. And producer_pos
   is advanced to 512, the end of A. Since A is not submitted, the BUSY bit is
   set.

   0       512      1024    1536     2048     2560     3072     3584       4096
   +-----------------------------------------------------------------------+
   |        |                                                              |
   |   A    |                                                              |
   | [BUSY] |                                                              |
   +-----------------------------------------------------------------------+
   ^        ^
   |        |
   |        |
   |    producer_pos = 512
   |
overwrite_pos = 0
pending_pos = 0
consumer_pos = 0

3. Reserve event B, size 1024.

   B is allocated at offset 512 with BUSY bit set, and producer_pos is advanced
   to the end of B.

   0       512      1024    1536     2048     2560     3072     3584       4096
   +-----------------------------------------------------------------------+
   |        |                 |                                            |
   |   A    |        B        |                                            |
   | [BUSY] |      [BUSY]     |                                            |
   +-----------------------------------------------------------------------+
   ^                          ^
   |                          |
   |                          |
   |                   producer_pos = 1536
   |
overwrite_pos = 0
pending_pos = 0
consumer_pos = 0

4. Reserve event C, size 2048.

   C is allocated at offset 1536, and producer_pos is advanced to 3584.

   0       512      1024    1536     2048     2560     3072     3584       4096
   +-----------------------------------------------------------------------+
   |        |                 |                                   |        |
   |    A   |        B        |                 C                 |        |
   | [BUSY] |      [BUSY]     |               [BUSY]              |        |
   +-----------------------------------------------------------------------+
   ^                                                              ^
   |                                                              |
   |                                                              |
   |                                                    producer_pos = 3584
   |
overwrite_pos = 0
pending_pos = 0
consumer_pos = 0

5. Submit event A.

   The BUSY bit of A is cleared. B becomes the oldest event to be committed, so
   pending_pos is advanced to 512, the start of B.

   0       512      1024    1536     2048     2560     3072     3584       4096
   +-----------------------------------------------------------------------+
   |        |                 |                                   |        |
   |    A   |        B        |                 C                 |        |
   |        |      [BUSY]     |               [BUSY]              |        |
   +-----------------------------------------------------------------------+
   ^        ^                                                     ^
   |        |                                                     |
   |        |                                                     |
   |   pending_pos = 512                                  producer_pos = 3584
   |
overwrite_pos = 0
consumer_pos = 0

6. Submit event B.

   The BUSY bit of B is cleared, and pending_pos is advanced to the start of C,
   which is now the oldest event to be committed.

   0       512      1024    1536     2048     2560     3072     3584       4096
   +-----------------------------------------------------------------------+
   |        |                 |                                   |        |
   |    A   |        B        |                 C                 |        |
   |        |                 |               [BUSY]              |        |
   +-----------------------------------------------------------------------+
   ^                          ^                                   ^
   |                          |                                   |
   |                          |                                   |
   |                     pending_pos = 1536               producer_pos = 3584
   |
overwrite_pos = 0
consumer_pos = 0

7. Reserve event D, size 1536 (3 * 512).

   There are 2048 bytes not being written between producer_pos (currently 3584)
   and pending_pos, so D is allocated at offset 3584, and producer_pos is advanced
   by 1536 (from 3584 to 5120).

   Since event D will overwrite all bytes of event A and the first 512 bytes of
   event B, overwrite_pos is advanced to the start of event C, the oldest event
   that is not overwritten.

   0       512      1024    1536     2048     2560     3072     3584       4096
   +-----------------------------------------------------------------------+
   |                 |        |                                   |        |
   |      D End      |        |                 C                 | D Begin|
   |      [BUSY]     |        |               [BUSY]              | [BUSY] |
   +-----------------------------------------------------------------------+
   ^                 ^        ^
   |                 |        |
   |                 |   pending_pos = 1536
   |                 |   overwrite_pos = 1536
   |                 |
   |             producer_pos=5120
   |
consumer_pos = 0

8. Reserve event E, size 1024.

   Although there are 512 bytes not being written between producer_pos and
   pending_pos, E cannot be reserved, as it would overwrite the first 512
   bytes of event C, which is still being written.

9. Submit event C and D.

   pending_pos is advanced to the end of D.

   0       512      1024    1536     2048     2560     3072     3584       4096
   +-----------------------------------------------------------------------+
   |                 |        |                                   |        |
   |      D End      |        |                 C                 | D Begin|
   |                 |        |                                   |        |
   +-----------------------------------------------------------------------+
   ^                 ^        ^
   |                 |        |
   |                 |   overwrite_pos = 1536
   |                 |
   |             producer_pos=5120
   |             pending_pos=5120
   |
consumer_pos = 0

The performance data for overwrite mode will be provided in a follow-up
patch that adds overwrite-mode benchmarks.

A sample of performance data for non-overwrite mode, collected on an x86_64
CPU and an arm64 CPU, before and after this patch, is shown below. As we can
see, no obvious performance regression occurs.

- x86_64 (AMD EPYC 9654)

Before:

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1  11.623 ± 0.027M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 2  15.812 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 3  7.871 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 4  6.703 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 8  2.896 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 12 2.054 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 16 1.864 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 20 1.580 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 24 1.484 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 28 1.369 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 32 1.316 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 36 1.272 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 40 1.239 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 44 1.226 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 48 1.213 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 52 1.193 ± 0.001M/s (drops 0.000 ± 0.000M/s)

After:

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1  11.845 ± 0.036M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 2  15.889 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 3  8.155 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 4  6.708 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 8  2.918 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 12 2.065 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 16 1.870 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 20 1.582 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 24 1.482 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 28 1.372 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 32 1.323 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 36 1.264 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 40 1.236 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 44 1.209 ± 0.002M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 48 1.189 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 52 1.165 ± 0.002M/s (drops 0.000 ± 0.000M/s)

- arm64 (HiSilicon Kunpeng 920)

Before:

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1  11.310 ± 0.623M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 2  9.947 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 3  6.634 ± 0.011M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 4  4.502 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 8  3.888 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 12 3.372 ± 0.005M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 16 3.189 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 20 2.998 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 24 3.086 ± 0.018M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 28 2.845 ± 0.004M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 32 2.815 ± 0.008M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 36 2.771 ± 0.009M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 40 2.814 ± 0.011M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 44 2.752 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 48 2.695 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 52 2.710 ± 0.006M/s (drops 0.000 ± 0.000M/s)

After:

Ringbuf, multi-producer contention
==================================
rb-libbpf nr_prod 1  11.283 ± 0.550M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 2  9.993 ± 0.003M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 3  6.898 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 4  5.257 ± 0.001M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 8  3.830 ± 0.005M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 12 3.528 ± 0.013M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 16 3.265 ± 0.018M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 20 2.990 ± 0.007M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 24 2.929 ± 0.014M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 28 2.898 ± 0.010M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 32 2.818 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 36 2.789 ± 0.012M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 40 2.770 ± 0.006M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 44 2.651 ± 0.007M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 48 2.669 ± 0.005M/s (drops 0.000 ± 0.000M/s)
rb-libbpf nr_prod 52 2.695 ± 0.009M/s (drops 0.000 ± 0.000M/s)

Signed-off-by: Xu Kuohai <xukuohai@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251018035738.4039621-2-xukuohai@huaweicloud.com

commit | commitdiff | tree

Alexei Starovoitov [Mon, 27 Oct 2025 16:56:28 +0000 (09:56 -0700)]

Merge branch 'bpf-introduce-file-dynptr'

Mykyta Yatsenko says:

====================
bpf: Introduce file dynptr

From: Mykyta Yatsenko <yatsenko@meta.com>

This series adds a new dynptr kind, file dynptr, which enables BPF
programs to perform safe reads from files in a structured way.
Initial motivations include:
* Parsing the executable’s ELF to locate thread-local variable symbols
* Capturing stack traces when frame pointers are disabled

By leveraging the existing dynptr abstraction, we reuse the verifier’s
lifetime/size checks and keep the API consistent with existing dynptr
read helpers.

Technical details:
1. Reuses the existing freader library to read files a folio at a time.
2. bpf_dynptr_slice() and bpf_dynptr_read() always copy data from folios
into a program-provided buffer; zero-copy access is intentionally not
supported to keep it simple.
3. Reads may sleep if the requested folios are not in the page cache.
4. Few verifier changes required:
  * Support dynptr destruction in kfuncs
  * Add kfunc address substitution based on whether the program runs in
  a sleepable or non-sleepable context.

Testing:
The final patch adds a selftest that validates BPF program reads the
same data as userspace, page faults are enabled in sleepable context and
disabled in non-sleepable.

Changelog:
---
v4 -> v5
v4: https://lore.kernel.org/all/20251021200334.220542-1-mykyta.yatsenko5@gmail.com/
* Inlined and removed kfunc_call_imm(), run overflow check for call_imm
only if !bpf_jit_supports_far_kfunc_call().

v3 -> v4
v3: https://lore.kernel.org/bpf/20251020222538.932915-1-mykyta.yatsenko5@gmail.com/
* Remove ringbuf usage from selftests
* bpf_dynptr_set_null(ptr) when discarding file dynptr
* call kfunc_call_imm() in specialize_kfunc() only, removed
call from add_kfunc_call()

v2 -> v3
v2: https://lore.kernel.org/bpf/20251015161155.120148-1-mykyta.yatsenko5@gmail.com/
* Add negative tests
* Rewrote tests to use LSM for bpf_get_task_exe_file()
* Move call_imm overflow check into kfunc_call_imm()

v1 -> v2
v1: https://lore.kernel.org/bpf/20251003160416.585080-1-mykyta.yatsenko5@gmail.com/
* Remove ELF parsing selftest
* Expanded u32 -> u64 refactoring, changes in include/uapi/linux/bpf.h
* Removed freader.{c,h}, instead move freader definitions into
buildid.h.
* Small refactoring of the multiple folios reading algorithm
* Directly return error after unmark_stack_slots_dynptr().
* Make kfuncs receive trusted arguments.
* Remove enum bpf_is_sleepable, use bool instead
* Remove unnecessary sorting from specialize_kfunc()
* Remove bool kfunc_in_sleepable_ctx; field from the struct
bpf_insn_aux_data, rely on non_sleepable field introduced by Kumar
* Refactor selftests, do madvise(...MADV_PAGEOUT) for all pages read by
the test
* Introduce the test for non-sleepable case, verify it fails with -EFAULT
====================

Link: https://lore.kernel.org/r/20251026203853.135105-1-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:53 +0000 (20:38 +0000)]

selftests/bpf: add file dynptr tests

Introducing selftests for validating file-backed dynptr works as
expected.
* validate implementation supports dynptr slice and read operations
* validate destructors should be paired with initializers
* validate sleepable progs can page in.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-11-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:52 +0000 (20:38 +0000)]

bpf: dispatch to sleepable file dynptr

File dynptr reads may sleep when the requested folios are not in
the page cache. To avoid sleeping in non-sleepable contexts while still
supporting valid sleepable use, given that dynptrs are non-sleepable by
default, enable sleeping only when bpf_dynptr_from_file() is invoked
from a sleepable context.

This change:
  * Introduces a sleepable constructor: bpf_dynptr_from_file_sleepable()
  * Override non-sleepable constructor with sleepable if it's always
  called in sleepable context

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-10-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:51 +0000 (20:38 +0000)]

bpf: verifier: refactor kfunc specialization

Move kfunc specialization (function address substitution) to later stage
of verification to support a new use case, where we need to take into
consideration whether kfunc is called in sleepable context.

Minor refactoring in add_kfunc_call(), making sure that if function
fails, kfunc desc is not added to tab->descs (previously it could be
added or not, depending on what failed).

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-9-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:50 +0000 (20:38 +0000)]

bpf: add kfuncs and helpers support for file dynptrs

Add support for file dynptr.

Introduce struct bpf_dynptr_file_impl to hold internal state for file
dynptrs, with 64-bit size and offset support.

Introduce lifecycle management kfuncs:
  - bpf_dynptr_from_file() for initialization
  - bpf_dynptr_file_discard() for destruction

Extend existing helpers to support file dynptrs in:
  - bpf_dynptr_read()
  - bpf_dynptr_slice()

Write helpers (bpf_dynptr_write() and bpf_dynptr_data()) are not
modified, as file dynptr is read-only.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-8-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:49 +0000 (20:38 +0000)]

bpf: add plumbing for file-backed dynptr

Add the necessary verifier plumbing for the new file-backed dynptr type.
Introduce two kfuncs for its lifecycle management:
* bpf_dynptr_from_file() for initialization
* bpf_dynptr_file_discard() for destruction

Currently there is no mechanism for kfunc to release dynptr, this patch
add one:
* Dynptr release function sets meta->release_regno
* Call unmark_stack_slots_dynptr() if meta->release_regno is set and
dynptr ref_obj_id is set as well.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-7-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:48 +0000 (20:38 +0000)]

bpf: verifier: centralize const dynptr check in unmark_stack_slots_dynptr()

Move the const dynptr check into unmark_stack_slots_dynptr() so callers
don’t have to duplicate it. This puts the validation next to the code
that manipulates dynptr stack slots and allows upcoming changes to reuse
it directly.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-6-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:47 +0000 (20:38 +0000)]

lib/freader: support reading more than 2 folios

freader_fetch currently reads from at most two folios. When a read spans
into a third folio, the overflow bytes are copied adjacent to the second
folio’s data instead of being handled as a separate folio.
This patch modifies fetch algorithm to support reading from many folios.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Reviewed-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20251026203853.135105-5-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:46 +0000 (20:38 +0000)]

lib: move freader into buildid.h

Move struct freader and prototypes of the functions operating on it into
the buildid.h.

This allows reusing freader outside buildid, e.g. for file dynptr
support added later.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Link: https://lore.kernel.org/r/20251026203853.135105-4-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:45 +0000 (20:38 +0000)]

bpf: widen dynptr size/offset to 64 bit

Dynptr currently caps size and offset at 24 bits, which isn’t sufficient
for file-backed use cases; even 32 bits can be limiting. Refactor dynptr
helpers/kfuncs to use 64-bit size and offset, ensuring consistency
across the APIs.

This change does not affect internals of xdp, skb or other dynptrs,
which continue to behave as before. Also it does not break binary
compatibility.

The widening enables large-file access support via dynptr, implemented
in the next patches.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-3-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:44 +0000 (20:38 +0000)]

selftests/bpf: remove unnecessary kfunc prototypes

Remove unnecessary kfunc prototypes from test programs, these are
provided by vmlinux.h

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251026203853.135105-2-mykyta.yatsenko5@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Anton Protopopov [Sun, 19 Oct 2025 20:21:41 +0000 (20:21 +0000)]

libbpf: fix formatting of bpf_object__append_subprog_code

The commit 6c918709bd30 ("libbpf: Refactor bpf_object__reloc_code")
added the bpf_object__append_subprog_code() with incorrect indentations.
Use tabs instead. (This also makes a consequent commit better readable.)

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20251019202145.3944697-14-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Anton Protopopov [Sun, 19 Oct 2025 20:21:37 +0000 (20:21 +0000)]

bpf: make bpf_insn_successors to return a pointer

The bpf_insn_successors() function is used to return successors
to a BPF instruction. So far, an instruction could have 0, 1 or 2
successors. Prepare the verifier code to introduction of instructions
with more than 2 successors (namely, indirect jumps).

To do this, introduce a new struct, struct bpf_iarray, containing
an array of bpf instruction indexes and make bpf_insn_successors
to return a pointer of that type. The storage for all instructions
is allocated in the env->succ, which holds an array of size 2,
to be used for all instructions.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251019202145.3944697-10-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Anton Protopopov [Sun, 19 Oct 2025 20:21:31 +0000 (20:21 +0000)]

bpf: generalize and export map_get_next_key for arrays

The kernel/bpf/array.c file defines the array_map_get_next_key()
function which finds the next key for array maps. It actually doesn't
use any map fields besides the generic max_entries field. Generalize
it, and export as bpf_array_get_next_key() such that it can be
re-used by other array-like maps.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251019202145.3944697-4-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Anton Protopopov [Sun, 19 Oct 2025 20:21:30 +0000 (20:21 +0000)]

bpf: save the start of functions in bpf_prog_aux

Introduce a new subprog_start field in bpf_prog_aux. This field may
be used by JIT compilers wanting to know the real absolute xlated
offset of the function being jitted. The func_info[func_id] may have
served this purpose, but func_info may be NULL, so JIT compilers
can't rely on it.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251019202145.3944697-3-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Anton Protopopov [Sun, 19 Oct 2025 20:21:29 +0000 (20:21 +0000)]

bpf: fix the return value of push_stack

In [1] Eduard mentioned that on push_stack failure verifier code
should return -ENOMEM instead of -EFAULT. After checking with the
other call sites I've found that code randomly returns either -ENOMEM
or -EFAULT. This patch unifies the return values for the push_stack
(and similar push_async_cb) functions such that error codes are
always assigned properly.

[1] https://lore.kernel.org/bpf/20250615085943.3871208-1-a.s.protopopov@gmail.com

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251019202145.3944697-2-a.s.protopopov@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Shardul Bankar [Tue, 21 Oct 2025 08:08:46 +0000 (13:38 +0530)]

bpf: Clarify get_outer_instance() handling in propagate_to_outer_instance()

propagate_to_outer_instance() calls get_outer_instance() and uses the
returned pointer to reset and commit stack write marks. Under normal
conditions, update_instance() guarantees that an outer instance exists,
so get_outer_instance() cannot return an ERR_PTR.

However, explicitly checking for IS_ERR(outer_instance) makes this code
more robust and self-documenting. It reduces cognitive load when reading
the control flow and silences potential false-positive reports from
static analysis or automated tooling.

No functional change intended.

Signed-off-by: Shardul Bankar <shardulsb08@gmail.com>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://lore.kernel.org/r/20251021080849.860072-1-shardulsb08@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Daniel Borkmann [Mon, 20 Oct 2025 07:54:41 +0000 (09:54 +0200)]

bpf: Do not let BPF test infra emit invalid GSO types to stack

Yinhao et al. reported that their fuzzer tool was able to trigger a
skb_warn_bad_offload() from netif_skb_features() -> gso_features_check().
When a BPF program - triggered via BPF test infra - pushes the packet
to the loopback device via bpf_clone_redirect() then mentioned offload
warning can be seen. GSO-related features are then rightfully disabled.

We get into this situation due to convert___skb_to_skb() setting
gso_segs and gso_size but not gso_type. Technically, it makes sense
that this warning triggers since the GSO properties are malformed due
to the gso_type. Potentially, the gso_type could be marked non-trustworthy
through setting it at least to SKB_GSO_DODGY without any other specific
assumptions, but that also feels wrong given we should not go further
into the GSO engine in the first place.

The checks were added in 121d57af308d ("gso: validate gso_type in GSO
handlers") because there were malicious (syzbot) senders that combine
a protocol with a non-matching gso_type. If we would want to drop such
packets, gso_features_check() currently only returns feature flags via
netif_skb_features(), so one location for potentially dropping such skbs
could be validate_xmit_unreadable_skb(), but then otoh it would be
an additional check in the fast-path for a very corner case. Given
bpf_clone_redirect() is the only place where BPF test infra could emit
such packets, lets reject them right there.

Fixes: 850a88cc4096 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN")
Fixes: cf62089b0edd ("bpf: Add gso_size to __sk_buff")
Reported-by: Yinhao Hu <dddddd@hust.edu.cn>
Reported-by: Kaiyan Mei <M202472210@hust.edu.cn>
Reported-by: Dongliang Mu <dzm91@hust.edu.cn>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://patch.msgid.link/20251020075441.127980-1-daniel@iogearbox.net

commit | commitdiff | tree

Puranjay Mohan [Fri, 17 Oct 2025 14:17:25 +0000 (14:17 +0000)]

selftests/bpf: Fix list_del() in arena list

The __list_del fuction doesn't set the previous node's next pointer to
the next node of the node to be deleted. It just updates the local variable
and not the actual pointer in the previous node.

The test was passing up till now because the bpf code is doing bpf_free()
after list_del and therfore reading head->first from the userspace will
read all zeroes. But after arena_list_del() is finished, head->first should
point to NULL;

Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251017141727.51355-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Chu Guangqing [Wed, 15 Oct 2025 01:50:24 +0000 (09:50 +0800)]

samples/bpf: Fix spelling typos in samples/bpf

do_hbm_test.sh:
The comment incorrectly used "upcomming" instead of "upcoming".

hbm.c
The comment incorrectly used "Managment" instead of "Management".
The comment incorrectly used "Currrently" instead of "Currently".

tcp_cong_kern.c
The comment incorrectly used "deteremined" instead of "determined".

tracex1.bpf.c
The comment incorrectly used "loobpack" instead of "loopback".

Signed-off-by: Chu Guangqing <chuguangqing@inspur.com>
Link: https://lore.kernel.org/r/20251015015024.2212-2-chuguangqing@inspur.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Yonghong Song [Tue, 14 Oct 2025 05:16:39 +0000 (22:16 -0700)]

selftests/bpf: Fix selftest verif_scale_strobemeta failure with llvm22

With latest llvm22, I hit the verif_scale_strobemeta selftest failure
below:
  $ ./test_progs -n 618
  libbpf: prog 'on_event': BPF program load failed: -E2BIG
  libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG --
  BPF program is too large. Processed 1000001 insn
  verification time 7019091 usec
  stack depth 488
  processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0
  -- END PROG LOAD LOG --
  libbpf: prog 'on_event': failed to load: -E2BIG
  libbpf: failed to load object 'strobemeta.bpf.o'
  scale_test:FAIL:expect_success unexpected error: -7 (errno 7)
  #618     verif_scale_strobemeta:FAIL

But if I increase the verificaiton insn limit from 1M to 10M, the above
test_progs run actually will succeed. The below is the result from veristat:
  $ ./veristat strobemeta.bpf.o
  Processing 'strobemeta.bpf.o'...
  File              Program   Verdict  Duration (us)    Insns  States  Program size  Jited size
  ----------------  --------  -------  -------------  -------  ------  ------------  ----------
  strobemeta.bpf.o  on_event  success       90250893  9777685  358230         15954       80794
  ----------------  --------  -------  -------------  -------  ------  ------------  ----------
  Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs.

Further debugging shows the llvm commit [1] is responsible for the verificaiton
failure as it tries to convert certain switch statement to if-condition. Such
change may cause different transformation compared to original switch statement.

In bpf program strobemeta.c case, the initial llvm ir for read_int_var() function is
  define internal void @read_int_var(ptr noundef %0, i64 noundef %1, ptr noundef %2,
      ptr noundef %3, ptr noundef %4) #2 !dbg !535 {
    %6 = alloca ptr, align 8
    %7 = alloca i64, align 8
    %8 = alloca ptr, align 8
    %9 = alloca ptr, align 8
    %10 = alloca ptr, align 8
    %11 = alloca ptr, align 8
    %12 = alloca i32, align 4
    ...
    %20 = icmp ne ptr %19, null, !dbg !561
    br i1 %20, label %22, label %21, !dbg !562

  21:                                               ; preds = %5
    store i32 1, ptr %12, align 4
    br label %48, !dbg !563

  22:
    %23 = load ptr, ptr %9, align 8, !dbg !564
    ...

  47:                                               ; preds = %38, %22
    store i32 0, ptr %12, align 4, !dbg !588
    br label %48, !dbg !588

  48:                                               ; preds = %47, %21
    call void @llvm.lifetime.end.p0(ptr %11) #4, !dbg !588
    %49 = load i32, ptr %12, align 4
    switch i32 %49, label %51 [
      i32 0, label %50
      i32 1, label %50
    ]

  50:                                               ; preds = %48, %48
    ret void, !dbg !589

  51:                                               ; preds = %48
    unreachable
  }

Note that the above 'switch' statement is added by clang frontend.
Without [1], the switch statement will survive until SelectionDag,
so the switch statement acts like a 'barrier' and prevents some
transformation involved with both 'before' and 'after' the switch statement.

But with [1], the switch statement will be removed during middle end
optimization and later middle end passes (esp. after inlining) have more
freedom to reorder the code.

The following is the related source code:

  static void *calc_location(struct strobe_value_loc *loc, void *tls_base):
        bpf_probe_read_user(&tls_ptr, sizeof(void *), dtv);
        /* if pointer has (void *)-1 value, then TLS wasn't initialized yet */
        return tls_ptr && tls_ptr != (void *)-1
                ? tls_ptr + tls_index.offset
                : NULL;

  In read_int_var() func, we have:
        void *location = calc_location(&cfg->int_locs[idx], tls_base);
        if (!location)
                return;

        bpf_probe_read_user(value, sizeof(struct strobe_value_generic), location);
        ...

The static func calc_location() is called inside read_int_var(). The asm code
without [1]:
     77: .123....89 (85) call bpf_probe_read_user#112
     78: ........89 (79) r1 = *(u64 *)(r10 -368)
     79: .1......89 (79) r2 = *(u64 *)(r10 -8)
     80: .12.....89 (bf) r3 = r2
     81: .123....89 (0f) r3 += r1
     82: ..23....89 (07) r2 += 1
     83: ..23....89 (79) r4 = *(u64 *)(r10 -464)
     84: ..234...89 (a5) if r2 < 0x2 goto pc+13
     85: ...34...89 (15) if r3 == 0x0 goto pc+12
     86: ...3....89 (bf) r1 = r10
     87: .1.3....89 (07) r1 += -400
     88: .1.3....89 (b4) w2 = 16
In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place,
so the verifier actually prefers to do verification first at 'r1 = r10' etc.

The asm code with [1]:
    119: .123....89 (85) call bpf_probe_read_user#112
    120: ........89 (79) r1 = *(u64 *)(r10 -368)
    121: .1......89 (79) r2 = *(u64 *)(r10 -8)
    122: .12.....89 (bf) r3 = r2
    123: .123....89 (0f) r3 += r1
    124: ..23....89 (07) r2 += -1
    125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6
    126: ........89 (05) goto pc+17
    ...
    144: ........89 (b4) w1 = 0
    145: .1......89 (6b) *(u16 *)(r8 +80) = r1
In this case, if 'r2 < 0xfffffffe' is true, the control will go to
non-null 'location' branch, so 'goto pc+17' will actually go to
null 'location' branch. This seems causing tremendous amount of
verificaiton state.

To fix the issue, rewrite the following code
  return tls_ptr && tls_ptr != (void *)-1
                ? tls_ptr + tls_index.offset
                : NULL;
to if/then statement and hopefully these explicit if/then statements
are sticky during middle-end optimizations.

Test with llvm20 and llvm21 as well and all strobemeta related selftests
are passed.

  [1] https://github.com/llvm/llvm-project/pull/161000

Signed-off-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251014051639.1996331-1-yonghong.song@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Sun, 19 Oct 2025 02:23:08 +0000 (19:23 -0700)]

Merge branch 'bpf-mm-related-minor-changes'

Yafang Shao says:
====================
These two minor patches were developed during the implementation of
BPF-THP:
https://lwn.net/Articles/1042138/

As suggested by Andrii, they are being submitted separately.
====================

Link: https://patch.msgid.link/20251016063929.13830-1-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Yafang Shao [Thu, 16 Oct 2025 06:39:29 +0000 (14:39 +0800)]

bpf: mark vma->{vm_mm,vm_file} as __safe_trusted_or_null

The vma->vm_mm might be NULL and it can be accessed outside of RCU. Thus,
we can mark it as trusted_or_null. With this change, BPF helpers can safely
access vma->vm_mm to retrieve the associated mm_struct from the VMA.
Then we can make policy decision from the VMA.

The "trusted" annotation enables direct access to vma->vm_mm within kfuncs
marked with KF_TRUSTED_ARGS or KF_RCU, such as bpf_task_get_cgroup1() and
bpf_task_under_cgroup(). Conversely, "null" enforcement requires all
callsites using vma->vm_mm to perform NULL checks.

The lsm selftest must be modified because it directly accesses vma->vm_mm
without a NULL pointer check; otherwise it will break due to this
change.

For the VMA based THP policy, the use case is as follows,

  @mm = @vma->vm_mm; // vm_area_struct::vm_mm is trusted or null
  if (!@mm)
      return;
  bpf_rcu_read_lock(); // rcu lock must be held to dereference the owner
  @owner = @mm->owner; // mm_struct::owner is rcu trusted or null
  if (!@owner)
    goto out;
  @cgroup1 = bpf_task_get_cgroup1(@owner, MEMCG_HIERARCHY_ID);

  /* make the decision based on the @cgroup1 attribute */

  bpf_cgroup_release(@cgroup1); // release the associated cgroup
out:
  bpf_rcu_read_unlock();

PSI memory information can be obtained from the associated cgroup to inform
policy decisions. Since upstream PSI support is currently limited to cgroup
v2, the following example demonstrates cgroup v2 implementation:

  @owner = @mm->owner;
  if (@owner) {
      // @ancestor_cgid is user-configured
      @ancestor = bpf_cgroup_from_id(@ancestor_cgid);
      if (bpf_task_under_cgroup(@owner, @ancestor)) {
          @psi_group = @ancestor->psi;

          /* Extract PSI metrics from @psi_group and
           * implement policy logic based on the values
           */

      }
  }

The vma::vm_file can also be marked with __safe_trusted_or_null.

No additional selftests are required since vma->vm_file and vma->vm_mm are
already validated in the existing selftest suite.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com>
Link: https://lore.kernel.org/r/20251016063929.13830-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Yafang Shao [Thu, 16 Oct 2025 06:39:28 +0000 (14:39 +0800)]

bpf: mark mm->owner as __safe_rcu_or_null

When CONFIG_MEMCG is enabled, we can access mm->owner under RCU. The
owner can be NULL. With this change, BPF helpers can safely access
mm->owner to retrieve the associated task from the mm. We can then make
policy decision based on the task attribute.

The typical use case is as follows,

  bpf_rcu_read_lock(); // rcu lock must be held for rcu trusted field
  @owner = @mm->owner; // mm_struct::owner is rcu trusted or null
  if (!@owner)
      goto out;

  /* Do something based on the task attribute */

out:
  bpf_rcu_read_unlock();

Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Link: https://lore.kernel.org/r/20251016063929.13830-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Tiezhu Yang [Sat, 18 Oct 2025 08:28:15 +0000 (16:28 +0800)]

selftests/bpf: Silence unused-but-set build warnings

There are some set but not used build errors when compiling bpf selftests
with the latest upstream mainline GCC, at the beginning add the attribute
__maybe_unused for the variables, but it is better to just add the option
-Wno-unused-but-set-variable to CFLAGS in Makefile to disable the errors
instead of hacking the tests.

  tools/testing/selftests/bpf/map_tests/lpm_trie_map_basic_ops.c:229:36:
  error: variable ‘n_matches_after_delete’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/map_tests/lpm_trie_map_basic_ops.c:229:25:
  error: variable ‘n_matches’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/bpf_cookie.c:426:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/find_vma.c:52:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/perf_branches.c:67:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

  tools/testing/selftests/bpf/prog_tests/perf_link.c:15:22:
  error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]

Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn>
Link: https://lore.kernel.org/r/20251018082815.20622-1-yangtiezhu@loongson.cn
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Alexei Starovoitov [Sun, 19 Oct 2025 01:20:57 +0000 (18:20 -0700)]

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf at 6.18-rc2

Cross-merge BPF and other fixes after downstream PR.

No conflicts.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 20:05:13 +0000 (10:05 -1000)]

Merge tag 'rust-rustfmt' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux

Pull rustfmt fixes from Miguel Ojeda:
"Rust 'rustfmt' cleanup

  'rustfmt', by default, formats imports in a way that is prone to
  conflicts while merging and rebasing, since in some cases it condenses
  several items into the same line.

  Document in our guidelines that we will handle this for the moment
  with the trailing empty comment workaround and make the tree
  'rustfmt'-clean again"

* tag 'rust-rustfmt' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
  rust: bitmap: fix formatting
  rust: cpufreq: fix formatting
  rust: alloc: employ a trailing comment to keep vertical layout
  docs: rust: add section on imports formatting

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 18:38:28 +0000 (08:38 -1000)]

Merge tag 'tpmdd-next-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd

Pull tpm fix from Jarkko Sakkinen:
"Correct the state transitions for ARM FF-A to match the spec and how
tpm_crb behaves on other platforms"

* tag 'tpmdd-next-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm_crb: Add idle support for the Arm FF-A start method

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 18:35:09 +0000 (08:35 -1000)]

Merge tag 'pci-v6.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci

Pull pci fixes from Bjorn Helgaas:

- Search for MSI Capability with correct ID to fix an MSI regression on
   platforms with Cadence IP (Hans Zhang)

- Revert early bridge resource set up to fix resource assignment
   failures that broke at least alpha boot and Snapdragon ath12k WiFi
   (Ilpo Järvinen)

- Implement VMD .irq_startup()/.irq_shutdown() to fix IRQ issues that
   caused boot crashes and broken devices below VMD (Inochi Amaoto)

- Select CONFIG_SCREEN_INFO on X86 to fix black screen on boot when
   SCREEN_INFO not selected (Mario Limonciello)

* tag 'pci-v6.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
  PCI/VGA: Select SCREEN_INFO on X86
  PCI: vmd: Override irq_startup()/irq_shutdown() in vmd_init_dev_msi_info()
  PCI: Revert early bridge resource set up
  PCI: cadence: Search for MSI Capability with correct ID

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 18:22:07 +0000 (08:22 -1000)]

Merge tag 'cxl-fixes-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl

Pull Compute Express Link fixes from Dave Jiang:
"A small collection of CXL fixes. In addition to some misc fixes for
  the CXL subsystem, a number of fixes for CXL extended linear cache
  support are included to make it functional again.

   - Avoid missing port component registers setup due to dport
     enumeration failure

   - Add check for no entries in cxl_feature_info to address accessing
     invalid pointer.

   - Use %pa printk format to emit resource_size_t in
     validate_region_offset()

  CXL extended linear cache support fixes:

   - Fix setup of memory resource in cxl_acpi_set_cache_size()

   - Set range param for region_res_match_cxl_range() as const
     (addresses a compile warning for match_region_by_range() fix)

   - Fix match_region_by_range() to use region_res_match_cxl_range()

   - Subtract to find an hpa_alias0 in cxl_poison events to correct the
     alias math calculation"

* tag 'cxl-fixes-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
  cxl/trace: Subtract to find an hpa_alias0 in cxl_poison events
  cxl/region: Use %pa printk format to emit resource_size_t
  cxl: Fix match_region_by_range() to use region_res_match_cxl_range()
  cxl: Set range param for region_res_match_cxl_range() as const
  cxl/acpi: Fix setup of memory resource in cxl_acpi_set_cache_size()
  cxl/features: Add check for no entries in cxl_feature_info
  cxl/port: Avoid missing port component registers setup

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 18:18:18 +0000 (08:18 -1000)]

Merge tag 'hid-for-linus-2025101701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid

Pull HID fixes from Jiri Kosina:

- fix for sticky fingers handling in hid-multitouch (Benjamin
   Tissoires)

- fix for reporting of 0 battery levels (Dmitry Torokhov)

- build fix for hid-haptic in certain configurations (Jonathan Denose)

- improved probe and avoiding spamming kernel log by hid-nintendo
   (Vicki Pfau)

- fix for OOB in hid-cp2112 (Deepak Sharma)

- interrupt handling fix for intel-thc-hid (Even Xu)

- a couple of new device IDs and device-specific quirks

* tag 'hid-for-linus-2025101701' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
  HID: logitech-hidpp: Add HIDPP_QUIRK_RESET_HI_RES_SCROLL
  selftests/hid: add tests for missing release on the Dell Synaptics
  HID: multitouch: fix sticky fingers
  HID: multitouch: fix name of Stylus input devices
  HID: hid-input: only ignore 0 battery events for digitizers
  HID: hid-debug: Fix spelling mistake "Rechargable" -> "Rechargeable"
  HID: Kconfig: Fix build error from CONFIG_HID_HAPTIC
  HID: nintendo: Rate limit IMU compensation message
  HID: nintendo: Wait longer for initial probe
  HID: core: Add printk_ratelimited variants to hid_warn() etc
  HID: quirks: Add ALWAYS_POLL quirk for VRS R295 steering wheel
  HID: quirks: avoid Cooler Master MM712 dongle wakeup bug
  HID: cp2112: Add parameter validation to data length
  HID: intel-thc-hid: intel-quickspi: Add ARL PCI Device Id's
  HID: intel-thc-hid: Intel-quickspi: switch first interrupt from level to edge detection
  HID: intel-thc-hid: intel-quicki2c: Fix wrong type casting

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 18:00:43 +0000 (08:00 -1000)]

Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Pull bpf fixes from Alexei Starovoitov:

- Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak
   imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov)

- Make selftests/bpf arg_parsing.c more robust to errors (Andrii
   Nakryiko)

- Fix redefinition of 'off' as different kind of symbol when I40E
   driver is builtin (Brahmajit Das)

- Do not disable preemption in bpf_test_run (Sahil Chandna)

- Fix memory leak in __lookup_instance error path (Shardul Bankar)

- Ensure test data is flushed to disk before reading it (Xing Guo)

* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  selftests/bpf: Fix redefinition of 'off' as different kind of symbol
  bpf: Do not disable preemption in bpf_test_run().
  bpf: Fix memory leak in __lookup_instance error path
  selftests: arg_parsing: Ensure data is flushed to disk before reading.
  bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
  selftests/bpf: make arg_parsing.c more robust to crashes
  bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 17:23:59 +0000 (07:23 -1000)]

Merge tag 'exfat-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat

Pull exfat fixes from Namjae Jeon:

- Fix out-of-bounds in FS_IOC_SETFSLABEL

- Add validation for stream entry size to prevent infinite loop

* tag 'exfat-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: fix out-of-bounds in exfat_nls_to_ucs2()
exfat: fix improper check of dentry.stream.valid_size

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 17:18:48 +0000 (07:18 -1000)]

Merge tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs

Pull NFS client fixes from Anna Schumaker:

- Fix for FlexFiles mirror->dss allocation

- Apply delay_retrans to async operations

- Check if suid/sgid is cleared after a write when needed

- Fix setting the state renewal timer for early mounts after a reboot

* tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
  NFS4: Fix state renewals missing after boot
  NFS: check if suid/sgid was cleared after a write as needed
  NFS4: Apply delay_retrans to async operations
  NFSv4/flexfiles: fix to allocate mirror->dss before use

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 17:11:32 +0000 (07:11 -1000)]

Merge tag '6.18-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6

Pull smb client fixes from Steve French:
"smb client fixes, security and smbdirect improvements, and some minor cleanup:

   - Important OOB DFS fix

   - Fix various potential tcon refcount leaks

   - smbdirect (RDMA) fixes (following up from test event a few weeks
     ago):

      - Fixes to improve and simplify handling of memory lifetime of
        smbdirect_mr_io structures, when a connection gets disconnected

      - Make sure we really wait to reach SMBDIRECT_SOCKET_DISCONNECTED
        before destroying resources

      - Make sure the send/recv submission/completion queues are large
        enough to avoid ib_post_send() from failing under pressure

   - convert cifs.ko to use the recommended crypto libraries (instead of
     crypto_shash), this also can improve performance

   - Three small cleanup patches"

* tag '6.18-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
  smb: client: Consolidate cmac(aes) shash allocation
  smb: client: Remove obsolete crypto_shash allocations
  smb: client: Use HMAC-MD5 library for NTLMv2
  smb: client: Use MD5 library for SMB1 signature calculation
  smb: client: Use MD5 library for M-F symlink hashing
  smb: client: Use HMAC-SHA256 library for SMB2 signature calculation
  smb: client: Use HMAC-SHA256 library for key generation
  smb: client: Use SHA-512 library for SMB3.1.1 preauth hash
  cifs: parse_dfs_referrals: prevent oob on malformed input
  smb: client: Fix refcount leak for cifs_sb_tlink
  smb: client: let smbd_destroy() wait for SMBDIRECT_SOCKET_DISCONNECTED
  smb: move some duplicate definitions to common/cifsglob.h
  smb: client: let destroy_mr_list() keep smbdirect_mr_io memory if registered
  smb: client: let destroy_mr_list() call ib_dereg_mr() before ib_dma_unmap_sg()
  smb: client: call ib_dma_unmap_sg if mr->sgt.nents is not 0
  smb: client: improve logic in smbd_deregister_mr()
  smb: client: improve logic in smbd_register_mr()
  smb: client: improve logic in allocate_mr_list()
  smb: client: let destroy_mr_list() remove locked from the list
  smb: client: let destroy_mr_list() call list_del(&mr->list)
  ...

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 17:07:14 +0000 (07:07 -1000)]

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm

Pull kvm fixes from Paolo Bonzini:
"ARM:

   - Fix the handling of ZCR_EL2 in NV VMs

   - Pick the correct translation regime when doing a PTW on the back of
     a SEA

   - Prevent userspace from injecting an event into a vcpu that isn't
     initialised yet

   - Move timer save/restore to the sysreg handling code, fixing EL2
     timer access in the process

   - Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug

   - Fix trapping configuration when the host isn't GICv3

   - Improve the detection of HCR_EL2.E2H being RES1

   - Drop a spurious 'break' statement in the S1 PTW

   - Don't try to access SPE when owned by EL3

  Documentation updates:

   - Document the failure modes of event injection

   - Document that a GICv3 guest can be created on a GICv5 host with
     FEAT_GCIE_LEGACY

  Selftest improvements:

   - Add a selftest for the effective value of HCR_EL2.AMO

   - Address build warning in the timer selftest when building with
     clang

   - Teach irqfd selftests about non-x86 architectures

   - Add missing sysregs to the set_id_regs selftest

   - Fix vcpu allocation in the vgic_lpi_stress selftest

   - Correctly enable interrupts in the vgic_lpi_stress selftest

  x86:

   - Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test
     for the bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return
     -EAGAIN if userspace deletes/moves memslot during prefault")

   - Don't try to get PMU capabilities from perf when running a CPU with
     hybrid CPUs/PMUs, as perf will rightly WARN.

  guest_memfd:

   - Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a
     more generic KVM_CAP_GUEST_MEMFD_FLAGS

   - Add a guest_memfd INIT_SHARED flag and require userspace to
     explicitly set said flag to initialize memory as SHARED,
     irrespective of MMAP.

     The behavior merged in 6.18 is that enabling mmap() implicitly
     initializes memory as SHARED, which would result in an ABI
     collision for x86 CoCo VMs as their memory is currently always
     initialized PRIVATE.

   - Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with
     private memory, to enable testing such setups, i.e. to hopefully
     flush out any other lurking ABI issues before 6.18 is officially
     released.

   - Add testcases to the guest_memfd selftest to cover guest_memfd
     without MMAP, and host userspace accesses to mmap()'d private
     memory"

* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits)
  arm64: Revamp HCR_EL2.E2H RES1 detection
  KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available
  KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
  KVM: arm64: selftests: Fix misleading comment about virtual timer encoding
  KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list
  KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit
  KVM: arm64: Kill leftovers of ad-hoc timer userspace access
  KVM: arm64: Fix WFxT handling of nested virt
  KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure
  KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure
  KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure
  KVM: arm64: Add timer UAPI workaround to sysreg infrastructure
  KVM: arm64: Make timer_set_offset() generally accessible
  KVM: arm64: Replace timer context vcpu pointer with timer_id
  KVM: arm64: Introduce timer_context_to_vcpu() helper
  KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests
  Documentation: KVM: Update GICv3 docs for GICv5 hosts
  KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests
  KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress
  KVM: arm64: selftests: Allocate vcpus with correct size
  ...

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 17:02:28 +0000 (07:02 -1000)]

Merge tag 'powerpc-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux

Pull powerpc fixes from Madhavan Srinivasan:

- Fix to handle NULL pointer dereference at irq domain teardown

- Fix for handling extraction of struct xive_irq_data

- Fix to skip parameter area allocation when fadump disabled

Thanks to Ganesh Goudar, Hari Bathini, Nam Cao, Ritesh Harjani (IBM),
Sourabh Jain, and Venkat Rao Bagalkote,

* tag 'powerpc-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc/fadump: skip parameter area allocation when fadump is disabled
  powerpc, ocxl: Fix extraction of struct xive_irq_data
  powerpc/pseries/msi: Fix NULL pointer dereference at irq domain teardown

commit | commitdiff | tree

Linus Torvalds [Sat, 18 Oct 2025 16:59:25 +0000 (06:59 -1000)]

Merge tag 'slab-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab

Pull slab fixes from Vlastimil Babka:

- Fixes for two bugs that can be triggered when debugging options are
   enabled (Hao Ge, Vlastimil Babka)

* tag 'slab-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
  slab: reset slab->obj_ext when freeing and it is OBJEXTS_ALLOC_FAIL
  slab: fix clearing freelist in free_deferred_objects()

commit | commitdiff | tree

Stuart Yoder [Sat, 18 Oct 2025 11:25:18 +0000 (14:25 +0300)]

tpm_crb: Add idle support for the Arm FF-A start method

According to the CRB over FF-A specification [1], a TPM that implements
the ABI must comply with the TCG PTP specification. This requires support
for the Idle and Ready states.

This patch implements CRB control area requests for goIdle and
cmdReady on FF-A based TPMs.

The FF-A message used to notify the TPM of CRB updates includes a
locality parameter, which provides a hint to the TPM about which
locality modified the CRB. This patch adds a locality parameter
to __crb_go_idle() and __crb_cmd_ready() to support this.

[1] https://developer.arm.com/documentation/den0138/latest/

Signed-off-by: Stuart Yoder <stuart.yoder@arm.com>
Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

commit | commitdiff | tree

Paolo Bonzini [Sat, 18 Oct 2025 08:25:43 +0000 (10:25 +0200)]

Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEAD

KVM x86 fixes for 6.18:

- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
   bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace
   deletes/moves memslot during prefault")

- Don't try to get PMU capabbilities from perf when running a CPU with hybrid
   CPUs/PMUs, as perf will rightly WARN.

- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
   generic KVM_CAP_GUEST_MEMFD_FLAGS

- Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
   said flag to initialize memory as SHARED, irrespective of MMAP.  The
   behavior merged in 6.18 is that enabling mmap() implicitly initializes
   memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
   as their memory is currently always initialized PRIVATE.

- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
   memory, to enable testing such setups, i.e. to hopefully flush out any
   other lurking ABI issues before 6.18 is officially released.

- Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
   and host userspace accesses to mmap()'d private memory.

commit | commitdiff | tree

Paolo Bonzini [Sat, 18 Oct 2025 08:25:31 +0000 (10:25 +0200)]

Merge tag 'kvmarm-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD

KVM/arm64 fixes for 6.18, take #1

Improvements and bug fixes:

- Fix the handling of ZCR_EL2 in NV VMs
  (20250926194108.84093-1-oliver.upton@linux.dev)

- Pick the correct translation regime when doing a PTW on
  the back of a SEA (20250926224246.731748-1-oliver.upton@linux.dev)

- Prevent userspace from injecting an event into a vcpu that isn't
  initialised yet (20250930085237.108326-1-oliver.upton@linux.dev)

- Move timer save/restore to the sysreg handling code, fixing EL2 timer
  access in the process (20250929160458.3351788-1-maz@kernel.org)

- Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug
  (20250924235150.617451-1-oliver.upton@linux.dev)

- Fix trapping configuration when the host isn't GICv3
  (20251007160704.1673584-1-sascha.bischoff@arm.com)

- Improve the detection of HCR_EL2.E2H being RES1
  (20251009121239.29370-1-maz@kernel.org)

- Drop a spurious 'break' statement in the S1 PTW
  (20250930135621.162050-1-osama.abdelkader@gmail.com)

- Don't try to access SPE when owned by EL3
  (20251010174707.1684200-1-mukesh.ojha@oss.qualcomm.com)

Documentation updates:

- Document the failure modes of event injection
  (20250930233620.124607-1-oliver.upton@linux.dev)

- Document that a GICv3 guest can be created on a GICv5 host
  with FEAT_GCIE_LEGACY (20251007154848.1640444-1-sascha.bischoff@arm.com)

Selftest improvements:

- Add a selftest for the effective value of HCR_EL2.AMO
  (20250926224454.734066-1-oliver.upton@linux.dev)

- Address build warning in the timer selftest when building
  with clang (20250926155838.2612205-1-seanjc@google.com)

- Teach irq_fd selftests about non-x86 architectures
  (20250930193301.119859-1-oliver.upton@linux.dev)

- Add missing sysregs to the set_id_regs selftest
  (20251012154352.61133-1-zenghui.yu@linux.dev)

- Fix vcpu allocation in the vgic_lpi_stress selftest
  (20251008154520.54801-1-zenghui.yu@linux.dev)

- Correctly enable interrupts in the vgic_lpi_stress selftest
  (20251007195254.260539-1-oliver.upton@linux.dev)

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Oct 2025 23:04:21 +0000 (13:04 -1000)]

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas:

- Explicitly encode the XZR register if the value passed to
   write_sysreg_s() is 0.

   The GIC CDEOI instruction is encoded as a system register write with
   XZR as the source register. However, clang does not honour the "Z"
   register constraint, leading to incorrect code generation

- Ensure the interrupts (DAIF.IF) are unmasked when completing
   single-step of a suspended breakpoint before calling
   exit_to_user_mode().

   With pseudo-NMIs, interrupts are (additionally) masked at the PMR_EL1
   register, handled by local_irq_*()

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
  arm64: debug: always unmask interrupts in el0_softstp()
  arm64/sysreg: Fix GIC CDEOI instruction encoding

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Oct 2025 22:59:31 +0000 (12:59 -1000)]

Merge tag 'riscv-for-linux-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V fixes from Paul Walmsley:

- Disable CFI with Rust for any platform other than x86 and ARM64

- Keep task mm_cpumasks up-to-date to avoid triggering M-mode firmware
   warnings if the kernel tries to send an IPI to an offline CPU

- Improve kprobe address validation performance and avoid desyncs
   (following x86)

- Avoid duplicate device probes by avoiding DT hardware probing when
   ACPI is enabled in early boot

- Use the correct set of dependencies for
   CONFIG_ARCH_HAS_ELF_CORE_EFLAGS, avoiding an allnoconfig warning

- Fix a few other minor issues

* tag 'riscv-for-linux-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: kprobes: convert one final __ASSEMBLY__ to __ASSEMBLER__
  riscv: Respect dependencies of ARCH_HAS_ELF_CORE_EFLAGS
  riscv: acpi: avoid errors caused by probing DT devices when ACPI is used
  riscv: kprobes: Fix probe address validation
  riscv: entry: fix typo in comment 'instruciton' -> 'instruction'
  RISC-V: clear hot-unplugged cores from all task mm_cpumasks to avoid rfence errors
  riscv: kgdb: Ensure that BUFMAX > NUMREGBYTES
  rust: cfi: only 64-bit arm and x86 support CFI_CLANG

commit | commitdiff | tree

Brahmajit Das [Fri, 17 Oct 2025 17:15:51 +0000 (22:45 +0530)]

selftests/bpf: Fix redefinition of 'off' as different kind of symbol

This fixes the following build error

   CLNG-BPF [test_progs] verifier_global_ptr_args.bpf.o
progs/verifier_global_ptr_args.c:228:5: error: redefinition of 'off' as
different kind of symbol
   228 | u32 off;
       |     ^

The symbol 'off' was previously defined in
tools/testing/selftests/bpf/tools/include/vmlinux.h, which includes an
enum i40e_ptp_gpio_pin_state from
drivers/net/ethernet/intel/i40e/i40e_ptp.c:

enum i40e_ptp_gpio_pin_state {
end = -2,
invalid = -1,
off = 0,
in_A = 1,
in_B = 2,
out_A = 3,
out_B = 4,
};

This enum is included when CONFIG_I40E is enabled. As of commit
032676ff8217 ("LoongArch: Update Loongson-3 default config file"),
CONFIG_I40E is set in the defconfig, which leads to the conflict.

Renaming the local variable avoids the redefinition and allows the
build to succeed.

Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20251017171551.53142-1-listout@listout.xyz
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Sahil Chandna [Tue, 14 Oct 2025 18:56:35 +0000 (00:26 +0530)]

bpf: Do not disable preemption in bpf_test_run().

The timer mode is initialized to NO_PREEMPT mode by default,
this disables preemption and force execution in atomic context
causing issue on PREEMPT_RT configurations when invoking
spin_lock_bh(), leading to the following warning:

BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6107, name: syz.0.17
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
Preemption disabled at:
[<ffffffff891fce58>] bpf_test_timer_enter+0xf8/0x140 net/bpf/test_run.c:42

Fix this, by removing NO_PREEMPT/NO_MIGRATE mode check.
Also, the test timer context no longer needs explicit calls to
migrate_disable()/migrate_enable() with rcu_read_lock()/rcu_read_unlock().
Use helpers rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate()
instead.

Reported-by: syzbot+1f1fbecb9413cdbfbef8@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=1f1fbecb9413cdbfbef8
Suggested-by: Yonghong Song <yonghong.song@linux.dev>
Suggested-by: Menglong Dong <menglong.dong@linux.dev>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Tested-by: syzbot+1f1fbecb9413cdbfbef8@syzkaller.appspotmail.com
Co-developed-by: Brahmajit Das <listout@listout.xyz>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Signed-off-by: Sahil Chandna <chandna.sahil@gmail.com>
Link: https://lore.kernel.org/r/20251014185635.10300-1-chandna.sahil@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

commit | commitdiff | tree

Ada Couprie Diaz [Tue, 14 Oct 2025 09:25:36 +0000 (10:25 +0100)]

arm64: debug: always unmask interrupts in el0_softstp()

We intend that EL0 exception handlers unmask all DAIF exceptions
before calling exit_to_user_mode().

When completing single-step of a suspended breakpoint, we do not call
local_daif_restore(DAIF_PROCCTX) before calling exit_to_user_mode(),
leaving all DAIF exceptions masked.

When pseudo-NMIs are not in use this is benign.

When pseudo-NMIs are in use, this is unsound. At this point interrupts
are masked by both DAIF.IF and PMR_EL1, and subsequent irq flag
manipulation may not work correctly. For example, a subsequent
local_irq_enable() within exit_to_user_mode_loop() will only unmask
interrupts via PMR_EL1 (leaving those masked via DAIF.IF), and
anything depending on interrupts being unmasked (e.g. delivery of
signals) will not work correctly.

This was detected by CONFIG_ARM64_DEBUG_PRIORITY_MASKING.

Move the call to `try_step_suspended_breakpoints()` outside of the check
so that interrupts can be unmasked even if we don't call the step handler.

Fixes: 0ac7584c08ce ("arm64: debug: split single stepping exception entry")
Cc: <stable@vger.kernel.org> # 6.17
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: added Mark's rewritten commit log and some whitespace]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit | commitdiff | tree

Lorenzo Pieralisi [Tue, 7 Oct 2025 10:26:00 +0000 (12:26 +0200)]

arm64/sysreg: Fix GIC CDEOI instruction encoding

The GIC CDEOI system instruction requires the Rt field to be set to 0b11111
otherwise the instruction behaviour becomes CONSTRAINED UNPREDICTABLE.

Currenly, its usage is encoded as a system register write, with a constant
0 value:

write_sysreg_s(0, GICV5_OP_GIC_CDEOI)

While compiling with GCC, the 0 constant value, through these asm
constraints and modifiers ('x' modifier and 'Z' constraint combo):

asm volatile(__msr_s(r, "%x0") : : "rZ" (__val));

forces the compiler to issue the XZR register for the MSR operation (ie
that corresponds to Rt == 0b11111) issuing the right instruction encoding.

Unfortunately LLVM does not yet understand that modifier/constraint
combo so it ends up issuing a different register from XZR for the MSR
source, which in turns means that it encodes the GIC CDEOI instruction
wrongly and the instruction behaviour becomes CONSTRAINED UNPREDICTABLE
that we must prevent.

Add a conditional to write_sysreg_s() macro that detects whether it
is passed a constant 0 value and issues an MSR write with XZR as source
register - explicitly doing what the asm modifier/constraint is meant to
achieve through constraints/modifiers, fixing the LLVM compilation issue.

Fixes: 7ec80fb3f025 ("irqchip/gic-v5: Add GICv5 PPI support")
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org>
Acked-by: Marc Zyngier <maz@kernel.org>
Cc: stable@vger.kernel.org
Cc: Sascha Bischoff <sascha.bischoff@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Oct 2025 15:45:54 +0000 (08:45 -0700)]

Merge tag 'io_uring-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull io_uring fixes from Jens Axboe:

- Revert of a change that went into an older kernel, and which has been
   reported to cause a regression for some write workloads on LVM while
   a snapshop is being created

- Fix a regression from this merge window, where some compilers (and/or
   certain .config options) would cause an earlier evaluations of a
   dereference which would then cause a NULL pointer dereference.

   I was only able to reproduce this with OPTIMIZE_FOR_SIZE=y, but David
   Howells hit it with just KASAN enabled. Depending on how things
   inlined, this makes sense

- Fix for a missing lock around a mem region unregistration

- Fix for ring resizing with the same placement after resize

* tag 'io_uring-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  io_uring/rw: check for NULL io_br_sel when putting a buffer
  io_uring: fix unexpected placement on same size resizing
  io_uring: protect mem region deregistration
  Revert "io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()"

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Oct 2025 15:31:26 +0000 (08:31 -0700)]

Merge tag 'block-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux

Pull block fixes from Jens Axboe:

- NVMe pull request via Keith:
     - iostats accounting fixed on multipath retries (Amit)
     - secure concatenation response fixup (Martin)
     - tls partial record fixup (Wilfred)

- Fix for a lockdep reported issue with the elevator lock and
   blk group frozen operations

- Fix for a regression in this merge window, where updating
   'nr_requests' would not do the right thing for queues with
   shared tags

* tag 'block-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
  nvme/tcp: handle tls partially sent records in write_space()
  block: Remove elevator_lock usage from blkg_conf frozen operations
  blk-mq: fix stale tag depth for shared sched tags in blk_mq_update_nr_requests()
  nvme-auth: update sc_c in host response
  nvme-multipath: Skip nr_active increments in RETRY disposition

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Oct 2025 15:22:20 +0000 (08:22 -0700)]

Merge tag 'mmc-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc

Pull mmc cleanup from Ulf Hansson:
"Move rpmb_frame struct and constants to rpmb common header

  This helps us to avoid sharing an immutable branch between our git
  trees. I was planning to send it before rc1, but I didn't make it"

* tag 'mmc-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  rpmb: move rpmb_frame struct and constants to common header

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Oct 2025 15:20:10 +0000 (08:20 -0700)]

Merge tag 'sound-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"A collection of small fixes. All changes are rather boring
  device-specific fixes and quirks:

   - A few fixes for missing NULL checks

   - ASoC NAU8821 fixes for jack and irq handling

   - Various fixes for ASoC TAS2781, IDT821034, sc8280xp, max9809x,
     wcd938x, and SoundWire

   - Usual HD-audio and USB-audio quirks"

* tag 'sound-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
  ALSA: hda/realtek: Fix mute led for HP Omen 17-cb0xxx
  ALSA: usb-audio: fix vendor quirk for Logitech H390
  ALSA: usb-audio: add volume quirks for MS LifeChat LX-3000
  ASoC: amd/sdw_utils: avoid NULL deref when devm_kasprintf() fails
  ASoC: max98090/91: fixed max98091 ALSA widget powering up/down
  ASoC: dt-bindings: Add compatible string fsl,imx-audio-tlv320
  ASoC: codecs: wcd938x-sdw: remove redundant runtime pm calls
  ASoC: sdw_utils: add rt1321 part id to codec_info_list
  ALSA: usb-audio: Fix NULL pointer deference in try_to_register_card
  ALSA: firewire: amdtp-stream: fix enum kernel-doc warnings
  ALSA: usb-audio: add mixer_playback_min_mute quirk for Logitech H390
  ASoC: nau8821: Avoid unnecessary blocking in IRQ handler
  ASoC: nau8821: Add DMI quirk to bypass jack debounce circuit
  ASoC: nau8821: Consistently clear interrupts before unmasking
  ASoC: nau8821: Generalize helper to clear IRQ status
  ASoC: nau8821: Cancel jdet_work before handling jack ejection
  ASoC: codecs: Fix gain setting ranges for Renesas IDT821034 codec
  ASoC: tas2781: Update ti,tas2781.yaml for adding tas58xx
  ASoC: tas2781: Support more newly-released amplifiers tas58xx in the driver
  ASoC: qcom: sc8280xp: Add support for QCS615
  ...

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Oct 2025 15:16:58 +0000 (08:16 -0700)]

Merge tag 'drm-fixes-2025-10-17' of https://gitlab.freedesktop.org/drm/kernel

Pull drm fixes from Dave Airlie:
"As per usual xe/amdgpu are the leaders, with some i915 and then a
  bunch of scattered fixes. There are a bunch of stability fixes for
  some older amdgpu cards.

  draw:
   - Avoid color truncation

  gpuvm:
   - Avoid kernel-doc warning

  sched:
   - Avoid double free

  i915:
   - Skip GuC communication warning if reset is in progress
   - Couple frontbuffer related fixes
   - Deactivate PSR only on LNL and when selective fetch enabled

  xe:
   - Increase global invalidation timeout to handle some workloads
   - Fix NPD while evicting BOs in an array of VM binds
   - Fix resizable BAR to account for possibly needing to move BARs
     other than the LMEMBAR
   - Fix error handling in xe_migrate_init()
   - Fix atomic fault handling with mixed mappings or if the page is
     already in VRAM
   - Enable media samplers power gating for platforms before Xe2
   - Fix de-registering exec queue from GuC when unbinding
   - Ensure data migration to system if indicated by madvise with SVM
   - Fix kerneldoc for kunit change
   - Always account for cacheline alignment on migration
   - Drop bogus assertion on eviction

  amdgpu:
   - Backlight fix
   - SI fixes
   - CIK fix
   - Make CE support debug only
   - IP discovery fix
   - Ring reset fixes
   - GPUVM fault memory barrier fix
   - Drop unused structures in amdgpu_drm.h
   - JPEG debugfs fix
   - VRAM handling fixes for GPUs without VRAM
   - GC 12 MES fixes

  amdkfd:
   - MES fix

  ast:
   - Fix display output after reboot

  bridge:
   - lt9211: Fix version check

  panthor:
   - Fix MCU suspend

  qaic:
   - Init bootlog in correct order
   - Treat remaining == 0 as error in find_and_map_user_pages()
   - Lock access to DBC request queue

  rockchip:
   - vop2: Fix destination size in atomic check"

* tag 'drm-fixes-2025-10-17' of https://gitlab.freedesktop.org/drm/kernel: (44 commits)
  drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies
  drm/xe/evict: drop bogus assert
  drm/xe/migrate: don't misalign current bytes
  drm/xe/kunit: Fix kerneldoc for parameterized tests
  drm/xe/svm: Ensure data will be migrated to system if indicated by madvise.
  drm/gpuvm: Fix kernel-doc warning for drm_gpuvm_map_req.map
  drm/i915/psr: Deactivate PSR only on LNL and when selective fetch enabled
  drm/ast: Blank with VGACR17 sync enable, always clear VGACRB6 sync off
  accel/qaic: Synchronize access to DBC request queue head & tail pointer
  accel/qaic: Treat remaining == 0 as error in find_and_map_user_pages()
  accel/qaic: Fix bootlog initialization ordering
  drm/rockchip: vop2: use correct destination rectangle height check
  drm/draw: fix color truncation in drm_draw_fill24
  drm/xe/guc: Check GuC running state before deregistering exec queue
  drm/xe: Enable media sampler power gating
  drm/xe: Handle mixed mappings and existing VRAM on atomic faults
  drm/xe/migrate: Fix an error path
  drm/xe: Move rebar to be done earlier
  drm/xe: Don't allow evicting of BOs in same VM in array of VM binds
  drm/xe: Increase global invalidation timeout to 1000us
  ...

commit | commitdiff | tree

Linus Torvalds [Fri, 17 Oct 2025 15:12:19 +0000 (08:12 -0700)]

Merge tag 'i2c-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux

Pull i2c fixes from Wolfram Sang:

- PM cleanup after all prerequisites are merged with rc1

- usbio: missing addition after all dependencies are in

- slimpro: DT binding schema conversion

* tag 'i2c-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
  dt-bindings: i2c: Convert apm,xgene-slimpro-i2c to DT schema
  i2c: usbio: Add ACPI device-id for MTL-CVF devices
  i2c: Remove redundant pm_runtime_mark_last_busy() calls

commit | commitdiff | tree

Dawn Gardner [Thu, 16 Oct 2025 18:42:06 +0000 (15:42 -0300)]

ALSA: hda/realtek: Fix mute led for HP Omen 17-cb0xxx

This laptop uses the ALC285 codec, fixed by enabling
the ALC285_FIXUP_HP_MUTE_LED quirk

Signed-off-by: Dawn Gardner <dawn.auroali@gmail.com>
Link: https://patch.msgid.link/20251016184218.31508-3-dawn.auroali@gmail.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Mario Limonciello (AMD) [Mon, 13 Oct 2025 22:08:26 +0000 (17:08 -0500)]

PCI/VGA: Select SCREEN_INFO on X86

commit 337bf13aa9dda ("PCI/VGA: Replace vga_is_firmware_default() with a
screen info check") introduced an implicit dependency upon SCREEN_INFO by
removing the open coded implementation.

If a user didn't have CONFIG_SCREEN_INFO set, vga_is_firmware_default()
would now return false. SCREEN_INFO is only used on X86 so add a
conditional select for SCREEN_INFO to ensure that the VGA arbiter works as
intended.

Fixes: 337bf13aa9dda ("PCI/VGA: Replace vga_is_firmware_default() with a screen info check")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Closes: https://lore.kernel.org/linux-pci/20251012182302.GA3412@sol/
Suggested-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Tested-by: Eric Biggers <ebiggers@kernel.org>
Link: https://patch.msgid.link/20251013220829.1536292-1-superm1@kernel.org

commit | commitdiff | tree

Inochi Amaoto [Tue, 14 Oct 2025 01:46:07 +0000 (09:46 +0800)]

PCI: vmd: Override irq_startup()/irq_shutdown() in vmd_init_dev_msi_info()

Since commit 54f45a30c0d0 ("PCI/MSI: Add startup/shutdown for per
device domains") set callback irq_startup() and irq_shutdown() of
the struct pci_msi[x]_template, __irq_startup() will always invokes
irq_startup() callback instead of irq_enable() callback overridden
in vmd_init_dev_msi_info(). This will not start the IRQ correctly.

Also override irq_startup()/irq_shutdown() in vmd_init_dev_msi_info(),
so the irq_startup() can invoke the real logic.

Fixes: 54f45a30c0d0 ("PCI/MSI: Add startup/shutdown for per device domains")
Reported-by: Kenneth Crudup <kenny@panix.com>
Closes: https://lore.kernel.org/r/8a923590-5b3a-406f-a324-7bd1cf894d8f@panix.com/
Reported-by: Genes Lists <lists@sapience.com>
Closes: https://lore.kernel.org/r/4b392af8847cc19720ffcd53865f60ab3edc56b3.camel@sapience.com
Reported-by: Todd Brandt <todd.e.brandt@intel.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220658
Reported-by: Oliver Hartkopp <socketcan@hartkopp.net>
Closes: https://lore.kernel.org/r/8d6887a5-60bc-423c-8f7a-87b4ab739f6a@hartkopp.net
Reported-by: Hervé <herve@dxcv.net>
Signed-off-by: Inochi Amaoto <inochiama@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Genes Lists <lists@sapience.com>
Tested-by: Oliver Hartkopp <socketcan@hartkopp.net>
Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com>
Tested-by: Hervé <herve@dxcv.net>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251014014607.612586-1-inochiama@gmail.com

commit | commitdiff | tree

Miguel Ojeda [Thu, 16 Oct 2025 22:57:54 +0000 (00:57 +0200)]

rust: bitmap: fix formatting

We do our best to keep the repository `rustfmt`-clean, thus run the tool
to fix the formatting issue.

Link: https://docs.kernel.org/rust/coding-guidelines.html#style-formatting
Link: https://rust-for-linux.com/contributing#submit-checklist-addendum
Fixes: 0f5878834d6c ("rust: bitmap: clean Rust 1.92.0 `unused_unsafe` warning")
Reviewed-by: Burak Emir <bqe@google.com>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

commit | commitdiff | tree

Dave Airlie [Thu, 16 Oct 2025 23:39:34 +0000 (09:39 +1000)]

Merge tag 'drm-xe-fixes-2025-10-16' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes

Driver Changes:
- Increase global invalidation timeout to handle some workloads
   (Kenneth Graunke)
- Fix NPD while evicting BOs in an array of VM binds (Matthew Brost)
- Fix resizable BAR to account for possibly needing to move BARs other
   than the LMEMBAR (Lucas De Marchi)
- Fix error handling in xe_migrate_init() (Thomas Hellström)
- Fix atomic fault handling with mixed mappings or if the page is
   already in VRAM (Matthew Brost)
- Enable media samplers power gating for platforms before Xe2 (Vinay
   Belgaumkar)
- Fix de-registering exec queue from GuC when unbinding (Matthew Brost)
- Ensure data migration to system if indicated by madvise with SVM
   (Thomas Hellström)
- Fix kerneldoc for kunit change (Matt Roper)
- Always account for cacheline alignment on migration (Matthew Auld)
- Drop bogus assertion on eviction (Matthew Auld)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Lucas De Marchi <lucas.demarchi@intel.com>
Link: https://lore.kernel.org/r/rch735eqkmprfyutk3ux2fsqa3e5ve4p77w7a5j66qdpgyquxr@ao3wzcqtpn6s

commit | commitdiff | tree

Dave Airlie [Thu, 16 Oct 2025 21:10:24 +0000 (07:10 +1000)]

Merge tag 'drm-misc-fixes-2025-10-16' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes

Short summary of fixes pull:

ast:
- Fix display output after reboot

bridge:
- lt9211: Fix version check

core:
- draw: Avoid color truncation
- gpuvm: Avoid kernel-doc warning
- sched: Avoid double free

panthor:
- Fix MCU suspend

qaic:
- Init bootlog in correct order
- Treat remaining == 0 as error in find_and_map_user_pages()
- Lock access to DBC request queue

rockchip:
- vop2: Fix destination size in atomic check

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://lore.kernel.org/r/20251016141607.GA73919@linux.fritz.box

commit | commitdiff | tree

Miguel Ojeda [Fri, 10 Oct 2025 17:43:51 +0000 (19:43 +0200)]

rust: cpufreq: fix formatting

We do our best to keep the repository `rustfmt`-clean, thus run the tool
to fix the formatting issue.

Link: https://docs.kernel.org/rust/coding-guidelines.html#style-formatting
Link: https://rust-for-linux.com/contributing#submit-checklist-addendum
Fixes: f97aef092e19 ("cpufreq: Make drivers using CPUFREQ_ETERNAL specify transition latency")
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

commit | commitdiff | tree

Miguel Ojeda [Fri, 10 Oct 2025 17:43:50 +0000 (19:43 +0200)]

rust: alloc: employ a trailing comment to keep vertical layout

Apply the formatting guidelines introduced in the previous commit to
make the file `rustfmt`-clean again.

Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

commit | commitdiff | tree

Miguel Ojeda [Fri, 10 Oct 2025 17:43:49 +0000 (19:43 +0200)]

docs: rust: add section on imports formatting

`rustfmt`, by default, formats imports in a way that is prone to conflicts
while merging and rebasing, since in some cases it condenses several
items into the same line.

For instance, Linus mentioned [1] that the following case:

    use crate::{
        fmt,
        page::AsPageIter,
    };

is compressed by `rustfmt` into:

    use crate::{fmt, page::AsPageIter};

which is undesirable.

Similarly, `rustfmt` may put several items in the same line even if the
braces span already multiple lines, e.g.:

    use kernel::{
        acpi, c_str,
        device::{property, Core},
        of, platform,
    };

The options that control the formatting behavior around imports are
generally unstable, and `rustfmt` releases do not allow to use nightly
features, unlike the compiler and other Rust tooling [2].

For the moment, we can introduce a workaround to prevent `rustfmt`
from compressing the example above -- the "trailing empty comment":

    use crate::{
        fmt,
        page::AsPageIter, //
    };

which is reminiscent of the trailing comma behavior in other formatters.
We already used empty comments for formatting purposes in the past,
e.g. in commit b9b701fce49a ("rust: clarify the language unstable features
in use").

In addition, `rustfmt` actually reformats with a vertical layout (i.e. it
does not put two items in the same line) when seeing such a comment,
i.e. it doesn't just preserve the formatting, which is good in the sense
that we can use it to easily reformat some imports, since it matches
the style we generally want to have.

A Git merge driver would help (suggested by Gary and Wedson), though
maintainers would need to set it up, the diffs would still be larger
and the formatting rules for imports would remain hard to predict.

Thus document the style that we will follow in the coding guidelines
by introducing a new section and explain how the trailing empty comment
works there too.

We discussed the issue with upstream Rust in our usual Rust <-> Rust
for Linux meeting [3], and there have also been a few other discussions
in parallel in issues [4][5] and Zulip [6]. We will see what happens,
but upstream Rust has already created a subteam of `rustfmt` to try
to overcome the bandwidth issue [7], which is a good signal, and some
organization work has already started (e.g. tracking issues). We will
continue our discussions with them about it.

Cc: Caleb Cartwright <caleb.cartwright@outlook.com>
Cc: Yacin Tmimi <yacintmimi@gmail.com>
Cc: Manish Goregaokar <manishsmail@gmail.com>
Cc: Deadbeef <ent3rm4n@gmail.com>
Cc: Cameron Steffen <cam.steffen94@gmail.com>
Cc: Jieyou Xu <jieyouxu@outlook.com>
Link: https://lore.kernel.org/all/CAHk-=wgO7S_FZUSBbngG5vtejWOpzDfTTBkVvP3_yjJmFddbzA@mail.gmail.com/
Link: https://github.com/rust-lang/rustfmt/issues/4884
Link: https://hackmd.io/iSCyY3JTTz-g8YM-nnzTTA
Link: https://github.com/rust-lang/rustfmt/issues/4991
Link: https://github.com/rust-lang/rustfmt/issues/3361
Link: https://rust-lang.zulipchat.com/#narrow/channel/392734-council/topic/rustfmt.20maintenance/near/543815381
Link: https://github.com/rust-lang/team/pull/2017
Reviewed-by: Benno Lossin <lossin@kernel.org>
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>

commit | commitdiff | tree

Dave Airlie [Thu, 16 Oct 2025 20:57:44 +0000 (06:57 +1000)]

Merge tag 'amd-drm-fixes-6.18-2025-10-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes

amd-drm-fixes-6.18-2025-10-16:

amdgpu:
- Backlight fix
- SI fixes
- CIK fix
- Make CE support debug only
- IP discovery fix
- Ring reset fixes
- GPUVM fault memory barrier fix
- Drop unused structures in amdgpu_drm.h
- JPEG debugfs fix
- VRAM handling fixes for GPUs without VRAM
- GC 12 MES fixes

amdkfd:
- MES fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20251016132224.2534946-1-alexander.deucher@amd.com

commit | commitdiff | tree

Dave Airlie [Thu, 16 Oct 2025 20:46:14 +0000 (06:46 +1000)]

Merge tag 'drm-intel-fixes-2025-10-16' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes

- Skip GuC communication warning if reset is in progress (Zhanjun)
- Couple frontbuffer related fixes (Ville)
- Deactivate PSR only on LNL and when selective fetch enabled (Jouni)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Rodrigo Vivi <rodrigo.vivi@intel.com>
Link: https://lore.kernel.org/r/aPDoguxlhXlvjNAi@intel.com

commit | commitdiff | tree

Jens Axboe [Thu, 16 Oct 2025 19:25:40 +0000 (13:25 -0600)]

Merge tag 'nvme-6.18-2025-10-16' of git://git.infradead.org/nvme into block-6.18

Pull NVMe fixes from Keith:

"- iostats accounting fixed on multipath retries (Amit)
- secure concatenation response fixup (Martin)
- tls partial record fixup (Wilfred)"

* tag 'nvme-6.18-2025-10-16' of git://git.infradead.org/nvme:
  nvme/tcp: handle tls partially sent records in write_space()
  nvme-auth: update sc_c in host response
  nvme-multipath: Skip nr_active increments in RETRY disposition

commit | commitdiff | tree

Wilfred Mallawa [Fri, 10 Oct 2025 07:19:42 +0000 (17:19 +1000)]

nvme/tcp: handle tls partially sent records in write_space()

With TLS enabled, records that are encrypted and appended to TLS TX
list can fail to see a retry if the underlying TCP socket is busy, for
example, hitting an EAGAIN from tcp_sendmsg_locked(). This is not known
to the NVMe TCP driver, as the TLS layer successfully generated a record.

Typically, the TLS write_space() callback would ensure such records are
retried, but in the NVMe TCP Host driver, write_space() invokes
nvme_tcp_write_space(). This causes a partially sent record in the TLS TX
list to timeout after not being retried.

This patch fixes the above by calling queue->write_space(), which calls
into the TLS layer to retry any pending records.

Fixes: be8e82caa685 ("nvme-tcp: enable TLS handshake upcall")
Signed-off-by: Wilfred Mallawa <wilfred.mallawa@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>

commit | commitdiff | tree

Takashi Iwai [Thu, 16 Oct 2025 18:14:24 +0000 (20:14 +0200)]

Merge tag 'asoc-fix-v6.18-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Fixes for v6.18

A moderately large collection of driver specific fixes, plus a few new
quirks and device IDs. The NAU8821 changes are a little large but more
in mechanical ways than in ways that are complex.

commit | commitdiff | tree

Linus Torvalds [Thu, 16 Oct 2025 17:58:49 +0000 (10:58 -0700)]

Merge tag 'f2fs-fix-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs

Pull f2fs fixes from Jaegeuk Kim:

- fix soft lockupg caused by iput() added in bc986b1d756482a ("fs: stop
   accessing ->i_count directly in f2fs and gfs2")

- fix a wrong block address map on multiple devices

Link: https://lore.kernel.org/oe-lkp/202509301450.138b448f-lkp@intel.com
* tag 'f2fs-fix-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
  f2fs: fix wrong block mapping for multi-devices
  f2fs: don't call iput() from f2fs_drop_inode()

commit | commitdiff | tree

Shardul Bankar [Thu, 16 Oct 2025 06:33:30 +0000 (12:03 +0530)]

bpf: Fix memory leak in __lookup_instance error path

When __lookup_instance() allocates a func_instance structure but fails
to allocate the must_write_set array, it returns an error without freeing
the previously allocated func_instance. This causes a memory leak of 192
bytes (sizeof(struct func_instance)) each time this error path is triggered.

Fix by freeing 'result' on must_write_set allocation failure.

Fixes: b3698c356ad9 ("bpf: callchain sensitive stack liveness tracking using CFG")
Reported-by: BPF Runtime Fuzzer (BRF)
Signed-off-by: Shardul Bankar <shardulsb08@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Link: https://patch.msgid.link/20251016063330.4107547-1-shardulsb08@gmail.com

commit | commitdiff | tree

Linus Torvalds [Thu, 16 Oct 2025 17:22:38 +0000 (10:22 -0700)]

Merge tag 'for-6.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:

- in tree-checker fix extref bounds check

- reorder send context structure to avoid
   -Wflex-array-member-not-at-end warning

- fix extent readahead length for compressed extents

- fix memory leaks on error paths (qgroup assign ioctl, zone loading
   with raid stripe tree enabled)

- fix how device specific mount options are applied, in particular the
   'ssd' option will be set unexpectedly

- fix tracking of relocation state when tasks are running and
   cancellation is attempted

- adjust assertion condition for folios allocated for scrub

- remove incorrect assertion checking for block group when populating
   free space tree

* tag 'for-6.18-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: send: fix -Wflex-array-member-not-at-end warning in struct send_ctx
  btrfs: tree-checker: fix bounds check in check_inode_extref()
  btrfs: fix memory leaks when rejecting a non SINGLE data profile without an RST
  btrfs: fix incorrect readahead expansion length
  btrfs: do not assert we found block group item when creating free space tree
  btrfs: do not use folio_test_partial_kmap() in ASSERT()s
  btrfs: only set the device specific options after devices are opened
  btrfs: fix memory leak on duplicated memory in the qgroup assign ioctl
  btrfs: fix clearing of BTRFS_FS_RELOC_RUNNING if relocation already running

commit | commitdiff | tree

Linus Torvalds [Thu, 16 Oct 2025 17:16:41 +0000 (10:16 -0700)]

Merge tag 'v6.18-rc1-smb-server-fixes' of git://git.samba.org/ksmbd

Pull smb server fixes from Steve French:

- Fix RPC hang due to locking bug

- Fix for memory leak in read and refcount leak (in session setup)

- Minor cleanup

* tag 'v6.18-rc1-smb-server-fixes' of git://git.samba.org/ksmbd:
  ksmbd: fix recursive locking in RPC handle list access
  smb/server: fix possible refcount leak in smb2_sess_setup()
  smb/server: fix possible memory leak in smb2_read()
  smb: server: Use common error handling code in smb_direct_rdma_xmit()

commit | commitdiff | tree

Linus Torvalds [Thu, 16 Oct 2025 16:41:21 +0000 (09:41 -0700)]

Merge tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
"Including fixes from CAN

  Current release - regressions:

    - udp: do not use skb_release_head_state() before
      skb_attempt_defer_free()

    - gro_cells: use nested-BH locking for gro_cell

    - dpll: zl3073x: increase maximum size of flash utility

  Previous releases - regressions:

    - core: fix lockdep splat on device unregister

    - tcp: fix tcp_tso_should_defer() vs large RTT

    - tls:
        - don't rely on tx_work during send()
        - wait for pending async decryptions if tls_strp_msg_hold fails

    - can: j1939: add missing calls in NETDEV_UNREGISTER notification
      handler

    - eth: lan78xx: fix lost EEPROM write timeout in
      lan78xx_write_raw_eeprom

  Previous releases - always broken:

    - ip6_tunnel: prevent perpetual tunnel growth

    - dpll: zl3073x: handle missing or corrupted flash configuration

    - can: m_can: fix pm_runtime and CAN state handling

    - eth:
        - ixgbe: fix too early devlink_free() in ixgbe_remove()
        - ixgbevf: fix mailbox API compatibility
        - gve: Check valid ts bit on RX descriptor before hw timestamping
        - idpf: cleanup remaining SKBs in PTP flows
        - r8169: fix packet truncation after S4 resume on RTL8168H/RTL8111H"

* tag 'net-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (50 commits)
  udp: do not use skb_release_head_state() before skb_attempt_defer_free()
  net: usb: lan78xx: fix use of improperly initialized dev->chipid in lan78xx_reset
  netdevsim: set the carrier when the device goes up
  selftests: tls: add test for short splice due to full skmsg
  selftests: net: tls: add tests for cmsg vs MSG_MORE
  tls: don't rely on tx_work during send()
  tls: wait for pending async decryptions if tls_strp_msg_hold fails
  tls: always set record_type in tls_process_cmsg
  tls: wait for async encrypt in case of error during latter iterations of sendmsg
  tls: trim encrypted message to match the plaintext on short splice
  tg3: prevent use of uninitialized remote_adv and local_adv variables
  MAINTAINERS: new entry for IPv6 IOAM
  gve: Check valid ts bit on RX descriptor before hw timestamping
  net: core: fix lockdep splat on device unregister
  MAINTAINERS: add myself as maintainer for b53
  selftests: net: check jq command is supported
  net: airoha: Take into account out-of-order tx completions in airoha_dev_xmit()
  tcp: fix tcp_tso_should_defer() vs large RTT
  r8152: add error handling in rtl8152_driver_init
  usbnet: Fix using smp_processor_id() in preemptible code warnings
  ...

commit | commitdiff | tree

Linus Torvalds [Thu, 16 Oct 2025 16:39:29 +0000 (09:39 -0700)]

Merge tag 'ata-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux

Pull ata fix from Niklas Cassel:

- Do not print an error message (and assume that the General Purpose
   Log Directory log page is not supported) for a device that reports a
   bogus General Purpose Logging Version.

   Unsurprisingly, many vendors fail to report the only valid General
   Purpose Logging Version (Damien)

* tag 'ata-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/libata/linux:
  ata: libata-core: relax checks in ata_read_log_directory()

commit | commitdiff | tree

Xing Guo [Thu, 16 Oct 2025 03:53:30 +0000 (11:53 +0800)]

selftests: arg_parsing: Ensure data is flushed to disk before reading.

test_parse_test_list_file writes some data to
/tmp/bpf_arg_parsing_test.XXXXXX and parse_test_list_file() will read
the data back. However, after writing data to that file, we forget to
call fsync() and it's causing testing failure in my laptop. This patch
helps fix it by adding the missing fsync() call.

Fixes: 64276f01dce8 ("selftests/bpf: Test_progs can read test lists from file")
Signed-off-by: Xing Guo <higuoxing@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20251016035330.3217145-1-higuoxing@gmail.com

commit | commitdiff | tree

Stuart Hayhurst [Mon, 6 Oct 2025 01:05:49 +0000 (02:05 +0100)]

HID: logitech-hidpp: Add HIDPP_QUIRK_RESET_HI_RES_SCROLL

The Logitech G502 Hero Wireless's high resolution scrolling resets after
being unplugged without notifying the driver, causing extremely slow
scrolling.

The only indication of this is a battery update packet, so add a quirk to
detect when the device is unplugged and re-enable the scrolling.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218037
Signed-off-by: Stuart Hayhurst <stuart.a.hayhurst@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.com>

commit | commitdiff | tree

Eric Dumazet [Wed, 15 Oct 2025 05:27:15 +0000 (05:27 +0000)]

udp: do not use skb_release_head_state() before skb_attempt_defer_free()

Michal reported and bisected an issue after recent adoption
of skb_attempt_defer_free() in UDP.

The issue here is that skb_release_head_state() is called twice per skb,
one time from skb_consume_udp(), then a second time from skb_defer_free_flush()
and napi_consume_skb().

As Sabrina suggested, remove skb_release_head_state() call from
skb_consume_udp().

Add a DEBUG_NET_WARN_ON_ONCE(skb_nfct(skb)) in skb_attempt_defer_free()

Many thanks to Michal, Sabrina, Paolo and Florian for their help.

Fixes: 6471658dc66c ("udp: use skb_attempt_defer_free()")
Reported-and-bisected-by: Michal Kubecek <mkubecek@suse.cz>
Closes: https://lore.kernel.org/netdev/gpjh4lrotyephiqpuldtxxizrsg6job7cvhiqrw72saz2ubs3h@g6fgbvexgl3r/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Michal Kubecek <mkubecek@suse.cz>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Cc: Florian Westphal <fw@strlen.de>
Reviewed-by: Sabrina Dubroca <sd@queasysnail.net>
Link: https://patch.msgid.link/20251015052715.4140493-1-edumazet@google.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

commit | commitdiff | tree

Hao Ge [Wed, 15 Oct 2025 14:16:42 +0000 (22:16 +0800)]

slab: reset slab->obj_ext when freeing and it is OBJEXTS_ALLOC_FAIL

If obj_exts allocation failed, slab->obj_exts is set to OBJEXTS_ALLOC_FAIL,
But we do not clear it when freeing the slab. Since OBJEXTS_ALLOC_FAIL and
MEMCG_DATA_OBJEXTS currently share the same bit position, during the
release of the associated folio, a VM_BUG_ON_FOLIO() check in
folio_memcg_kmem() is triggered because the OBJEXTS_ALLOC_FAIL flag was
not cleared, causing it to be interpreted as a kmem folio (non-slab)
with MEMCG_OBJEXTS_DATA flag set, which is invalid because
MEMCG_OBJEXTS_DATA is supposed to be set only on slabs.

Another problem that predates sharing the OBJEXTS_ALLOC_FAIL and
MEMCG_DATA_OBJEXTS bits is that on configurations with
is_check_pages_enabled(), the non-cleared bit in page->memcg_data will
trigger a free_page_is_bad() failure "page still charged to cgroup"

When freeing a slab, we clear slab->obj_exts if the obj_ext array has
been successfully allocated. So let's clear it also when the allocation
has failed.

Fixes: 09c46563ff6d ("codetag: debug: introduce OBJEXTS_ALLOC_FAIL to mark failed slab_ext allocations")
Fixes: 7612833192d5 ("slab: Reuse first bit for OBJEXTS_ALLOC_FAIL")
Link: https://lore.kernel.org/all/20251015141642.700170-1-hao.ge@linux.dev/
Cc: <stable@vger.kernel.org>
Signed-off-by: Hao Ge <gehao@kylinos.cn>
Reviewed-by: Suren Baghdasaryan <surenb@google.com>
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>

commit | commitdiff | tree

Tvrtko Ursulin [Wed, 15 Oct 2025 08:40:15 +0000 (09:40 +0100)]

drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies

When adding dependencies with drm_sched_job_add_dependency(), that
function consumes the fence reference both on success and failure, so in
the latter case the dma_fence_put() on the error path (xarray failed to
expand) is a double free.

Interestingly this bug appears to have been present ever since
commit ebd5f74255b9 ("drm/sched: Add dependency tracking"), since the code
back then looked like this:

drm_sched_job_add_implicit_dependencies():
...
       for (i = 0; i < fence_count; i++) {
               ret = drm_sched_job_add_dependency(job, fences[i]);
               if (ret)
                       break;
       }

       for (; i < fence_count; i++)
               dma_fence_put(fences[i]);

Which means for the failing 'i' the dma_fence_put was already a double
free. Possibly there were no users at that time, or the test cases were
insufficient to hit it.

The bug was then only noticed and fixed after
commit 9c2ba265352a ("drm/scheduler: use new iterator in drm_sched_job_add_implicit_dependencies v2")
landed, with its fixup of
commit 4eaf02d6076c ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies").

At that point it was a slightly different flavour of a double free, which
commit 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
noticed and attempted to fix.

But it only moved the double free from happening inside the
drm_sched_job_add_dependency(), when releasing the reference not yet
obtained, to the caller, when releasing the reference already released by
the former in the failure case.

As such it is not easy to identify the right target for the fixes tag so
lets keep it simple and just continue the chain.

While fixing we also improve the comment and explain the reason for taking
the reference and not dropping it.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Fixes: 963d0b356935 ("drm/scheduler: fix drm_sched_job_add_implicit_dependencies harder")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/dri-devel/aNFbXq8OeYl3QSdm@stanley.mountain/
Cc: Christian König <christian.koenig@amd.com>
Cc: Rob Clark <robdclark@chromium.org>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Matthew Brost <matthew.brost@intel.com>
Cc: Danilo Krummrich <dakr@kernel.org>
Cc: Philipp Stanner <phasta@kernel.org>
Cc: Christian König <ckoenig.leichtzumerken@gmail.com>
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org # v5.16+
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20251015084015.6273-1-tvrtko.ursulin@igalia.com

commit | commitdiff | tree

Mark Brown [Thu, 16 Oct 2025 11:41:35 +0000 (12:41 +0100)]

ASoC: nau8821: Fix IRQ handling and improve jack

Merge series from Cristian Ciocaltea <cristian.ciocaltea@collabora.com>:

This patch series addresses a set of issues in the Nuvoton NAU88L21
audio codec driver related to interrupt handling and jack hotplug
detection reliability.

The changes focus on:

* Eliminating race conditions between jack insertion and ejection events
* Ensuring interrupts are consistently and correctly cleared before
  unmasking
* Introducing a DMI-based quirk to bypass the jack debounce circuit on
  Valve Steam Deck, improving detection accuracy under stress
* Improving robustness of the IRQ handler by avoiding unnecessary
  blocking operations

The series has been tested on affected hardware to verify correct
behavior during repeated and rapid jack hotplug cycles.

commit | commitdiff | tree

Mark Brown [Thu, 16 Oct 2025 11:41:30 +0000 (12:41 +0100)]

ASoC: Add QCS615 sound card support

Merge series from Le Qi <le.qi@oss.qualcomm.com>:

This patch series adds support for the QCS615 sound card:
- Updates device tree bindings for SM8250 to include QCS615.
- Adds QCS615 support in the SC8280XP ASoC driver.

commit | commitdiff | tree

Pauli Virtanen [Wed, 15 Oct 2025 21:11:30 +0000 (00:11 +0300)]

ALSA: usb-audio: fix vendor quirk for Logitech H390

Vendor quirk QUIRK_FLAG_CTL_MSG_DELAY_1M was inadvertently missing when
adding quirk for Logitech H390. Add it back.

Fixes: 2b929b6eec0c ("ALSA: usb-audio: add mixer_playback_min_mute quirk for Logitech H390")
Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Pauli Virtanen [Wed, 15 Oct 2025 19:47:10 +0000 (22:47 +0300)]

ALSA: usb-audio: add volume quirks for MS LifeChat LX-3000

ID 045e:070f Microsoft Corp. LifeChat LX-3000 Headset
has muted minimum Speaker Playback Volume, and 4 amixer steps were
observed to produce 1 actual volume step.

Apply min_mute quirk and correct res=48 -> 4*48.
Tested with the device.

Signed-off-by: Pauli Virtanen <pav@iki.fi>
Signed-off-by: Takashi Iwai <tiwai@suse.de>

commit | commitdiff | tree

Matthew Auld [Fri, 10 Oct 2025 15:24:58 +0000 (16:24 +0100)]

drm/xe/evict: drop bogus assert

This assert can trigger here with non pin_map users that select
LATE_RESTORE, since the vmap is allowed to be NULL given that
save/restore can now use the blitter instead. The check here doesn't
seem to have much value anymore given that we no longer move pinned
memory, so any existing vmap is left well alone, and doesn't need to be
recreated upon restore, so just drop the assert here.

Fixes: 86f69c26113c ("drm/xe: use backup object for pinned save/restore")
Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6213
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Link: https://lore.kernel.org/r/20251010152457.177884-2-matthew.auld@intel.com
(cherry picked from commit a10b4a69c7f8f596d2c5218fbe84430734fab3b2)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

commit | commitdiff | tree

Matthew Auld [Fri, 10 Oct 2025 16:20:21 +0000 (17:20 +0100)]

drm/xe/migrate: don't misalign current bytes

If current bytes exceeds the max copy size, ensure the clamped size
still accounts for the XE_CACHELINE_BYTES alignment, otherwise we
trigger the assert in xe_migrate_vram with the size now being out of
alignment.

Fixes: 8c2d61e0e916 ("drm/xe/migrate: don't overflow max copy size")
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6212
Signed-off-by: Matthew Auld <matthew.auld@intel.com>
Cc: Stuart Summers <stuart.summers@intel.com>
Cc: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Matthew Brost <matthew.brost@intel.com>
Link: https://lore.kernel.org/r/20251010162020.190962-2-matthew.auld@intel.com
(cherry picked from commit 641bcf8731d21b56760e3646a39a65f471e9efd1)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

commit | commitdiff | tree

Matt Roper [Mon, 13 Oct 2025 15:30:15 +0000 (08:30 -0700)]

drm/xe/kunit: Fix kerneldoc for parameterized tests

Kunit's generate_params() was recently updated to take an additional
test context parameter.  Xe's IP and platform parameter generators were
updated accordingly at the same time, but the new parameter was not
added to the functions' kerneldoc, resulting in the following warnings:

   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:78 function parameter 'test' not described in 'xe_pci_fake_data_gen_params'
   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:254 function parameter 'test' not described in 'xe_pci_graphics_ip_gen_param'
   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:278 function parameter 'test' not described in 'xe_pci_media_ip_gen_param'
   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:302 function parameter 'test' not described in 'xe_pci_id_gen_param'
   Warning: drivers/gpu/drm/xe/tests/xe_pci.c:390 function parameter 'test' not described in 'xe_pci_live_device_gen_param'
   5 warnings as errors

Document the new parameter to eliminate the warnings and make CI happy.

Fixes: b9a214b5f6aa ("kunit: Pass parameterized test context to generate_params()")
Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com>
Link: https://lore.kernel.org/r/20251013153014.2362879-2-matthew.d.roper@intel.com
Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 89e347f8a70165d1e8d88a93d875da7742c902ce)
Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com>

A mirror of Linus' kernel repository