selftests/bpf: Skip tc_tunnel subtest if its setup fails
A subtest setup can fail in a wide variety of ways, so make sure not to
run it if an issue occurs during its setup. The return value is
already representing whether the setup succeeds or fails, it is just
about wiring it.
====================
selftests/bpf: Integrate test_xsk.c to test_progs framework
The test_xsk.sh script covers many AF_XDP use cases. The tests it runs
are defined in xksxceiver.c. Since this script is used to test real
hardware, the goal here is to leave it as it is, and only integrate the
tests that run on veth peers into the test_progs framework.
PATCH 1 extracts test_xsk[.c/.h] from xskxceiver[.c/.h] to make the
tests available to test_progs.
PATCH 2 to 7 fix small issues in the current test
PATCH 8 to 13 handle all errors to release resources instead of calling
exit() when any error occurs.
PATCH 14 isolates the tests that won't fit in the CI
PATCH 15 integrates the CI tests to the test_progs framework
Signed-off-by: Bastien Curutchet (eBPF Foundation) <bastien.curutchet@bootlin.com>
---
Changes in v7:
- Restore 'test_ns' prefix to allow parallel execution.
- PATCH 11: fix potential uninitialized variable spotted by AI.
- PACTH 12: fix potential resource leak spotted by AI
- Link to v6: https://lore.kernel.org/r/20251029-xsk-v6-0-5a63a64dff98@bootlin.com
Changes in v6:
- Setup veth peer once for each mode instead of once for each substest
- Rename the 'flaky' table 'skip-ci' table and move the automatically
skipped and the longest tests into it
- Link to v5: https://lore.kernel.org/r/20251016-xsk-v5-0-662c95eb8005@bootlin.com
Changes in v5:
- Rebase on latest bpf-next_base
- Move XDP_ADJUST_TAIL_SHRINK_MULTI_BUFF to the flaky table
- Add Maciej's reviewed-by
- Link to v4: https://lore.kernel.org/r/20250924-xsk-v4-0-20e57537b876@bootlin.com
Changes in v4:
- Fix test_xsk.sh's summary report.
- Merge PATCH 11 & 12 together, otherwise PATCH 11 fails to build.
- Split old PATCH 3 in two patches. The first one fixes
testapp_stats_rx_dropped(), the second one fixes
testapp_xdp_shared_umem(). The unecessary frees (in
testapp_stats_rx_full() and testapp_stats_fill_empty() are removed)
- Link to v3: https://lore.kernel.org/r/20250904-xsk-v3-0-ce382e331485@bootlin.com
Changes in v3:
- Rebase on latest bpf-next_base to integrate commit c9110e6f7237 ("selftests/bpf:
Fix count write in testapp_xdp_metadata_copy()").
- Move XDP_METADATA_COPY_* tests from flaky-tests to nominal tests
- Link to v2: https://lore.kernel.org/r/20250902-xsk-v2-0-17c6345d5215@bootlin.com
Changes in v2:
- Rebase on the latest bpf-next_base and integrate the newly added tests
to the work (adjust_tail* and tx_queue_consumer tests)
- Re-order patches to split xkxceiver sooner.
- Fix the bug reported by Maciej.
- Fix verbose mode in test_xsk.sh by keeping kselftest (remove PATCH 1,
7 and 8)
- Link to v1: https://lore.kernel.org/r/20250313-xsk-v1-0-7374729a93b9@bootlin.com
====================
selftests/bpf: test_xsk: Integrate test_xsk.c to test_progs framework
test_xsk.c isn't part of the test_progs framework.
Integrate the tests defined by test_xsk.c into the test_progs framework
through a new file : prog_tests/xsk.c. ZeroCopy mode isn't tested in it
as veth peers don't support it.
Move test_xsk{.c/.h} to prog_tests/.
Add the find_bit library to test_progs sources in the Makefile as it is
is used by test_xsk.c
Following tests won't fit in the CI:
- XDP_ADJUST_TAIL_* and SEND_RECEIVE_9K_PACKETS because of their
flakyness
- UNALIGNED_* because they depend on huge page allocations
- *_RING_SIZE because they depend on HW rings
- TEARDOWN because it's too long
Remove these tests from the nominal tests table so they won't be
run by the CI in upcoming patch.
Create a skip_ci_tests table to hold them.
Use this skip_ci table in xskxceiver.c to keep all the tests available
from the test_xsk.sh script.
selftests/bpf: test_xsk: Don't exit immediately on allocation failures
If any allocation in the pkt_stream_*() helpers fail, exit_with_error() is
called. This terminates the program immediately. It prevents the following
tests from running and isn't compliant with the CI.
Return NULL in case of allocation failure.
Return TEST_FAILURE when something goes wrong in the packet generation.
Clean up the resources if a failure happens between two steps of a test.
Move exit_with_error()'s definition into xskxceiver.c as it isn't used
anywhere else now.
selftests/bpf: test_xsk: Don't exit immediately if validate_traffic fails
__testapp_validate_traffic() calls exit_with_error() on failures. This
exits the program immediately. It prevents the following tests from
running and isn't compliant with the CI.
Return TEST_FAILURE instead of calling exit_with_error().
Release the resource of the 1st thread if a failure happens between its
creation and the creation of the second thread.
selftests/bpf: test_xsk: Don't exit immediately when workers fail
TX and RX workers can fail in many places. These failures trigger a call
to exit_with_error() which exits the program immediately. It prevents the
following tests from running and isn't compliant with the CI.
Add return value to functions that can fail.
Handle failures more smoothly through report_failure().
selftests/bpf: test_xsk: Don't exit immediately when gettimeofday fails
exit_with_error() is called when gettimeofday() fails. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.
Return TEST_FAILURE instead of calling exit_on_error().
selftests/bpf: test_xsk: Don't exit immediately when xsk_attach fails
xsk_reattach_xdp calls exit_with_error() on failures. This exits the
program immediately. It prevents the following tests from being run and
isn't compliant with the CI.
Add a return value to the functions handling XDP attachments to handle
errors more smoothly.
selftests/bpf: test_xsk: Add return value to init_iface()
init_iface() doesn't have any return value while it can fail. In case of
failure it calls exit_on_error() which exits the application
immediately. This prevents the following tests from being run and isn't
compliant with the CI
Add a return value to init_iface() so errors can be handled more
smoothly.
selftests/bpf: test_xsk: Release resources when swap fails
testapp_validate_traffic() doesn't release the sockets and the umem
created by the threads if the test isn't currently in its last step.
Thus, if the swap_xsk_resources() fails before the last step, the
created resources aren't cleaned up.
Clean the sockets and the umem in case of swap_xsk_resources() failure.
selftests/bpf: test_xsk: Wrap test clean-up in functions
The clean-up done at the end of a test in __testapp_validate_traffic()
isn't wrapped in a function. It isn't convenient if we want to use it
somewhere else in the code.
Wrap the clean-up in two new functions : the first deletes the sockets,
the second releases the umem.
selftests/bpf: test_xsk: fix memory leak in testapp_xdp_shared_umem()
testapp_xdp_shared_umem() generates pkt_stream on each xsk from xsk_arr,
where normally xsk_arr[0] gets pkt_streams and xsk_arr[1] have them NULLed.
At the end of the test pkt_stream_restore_default() only releases
xsk_arr[0] which leads to memory leaks.
Release the missing pkt_stream at the end of testapp_xdp_shared_umem()
selftests/bpf: test_xsk: fix memory leak in testapp_stats_rx_dropped()
testapp_stats_rx_dropped() generates pkt_stream twice. The last
generated is released by pkt_stream_restore_default() at the end of the
test but we lose the pointer of the first pkt_stream.
Release the 'middle' pkt_stream when it's getting replaced to prevent
memory leaks.
selftests/bpf: test_xsk: Fix __testapp_validate_traffic()'s return value
__testapp_validate_traffic is supposed to return an integer value that
tells if the test passed (0), failed (-1) or was skiped (2). It actually
returns a boolean in the end. This doesn't harm when the test is
successful but can lead to misinterpretation in case of failure as 1
will be returned instead of -1.
Return TEST_FAILURE (-1) in case of failure, TEST_PASS (0) otherwise.
AF_XDP features are tested by the test_xsk.sh script but not by the
test_progs framework. The tests used by the script are defined in
xksxceiver.c which can't be integrated in the test_progs framework as is.
Extract these test definitions from xskxceiver{.c/.h} to put them in new
test_xsk{.c/.h} files.
Keep the main() function and its unshared dependencies in xksxceiver to
avoid impacting the test_xsk.sh script which is often used to test real
hardware.
Move ksft_test_result_*() calls to xskxceiver.c to keep the kselftest's
report valid
Puranjay Mohan [Thu, 23 Oct 2025 16:14:44 +0000 (16:14 +0000)]
bpf: Use kmalloc_nolock() in bpf streams
BPF stream kfuncs need to be non-sleeping as they can be called from
programs running in any context, this requires a way to allocate memory
from any context. Currently, this is done by a custom per-CPU NMI-safe
bump allocation mechanism, backed by alloc_pages_nolock() and
free_pages_nolock() primitives.
As kmalloc_nolock() and kfree_nolock() primitives are available now, the
custom allocator can be removed in favor of these.
A couple of changes for rqspinlock, the first disables propagation of AA
and ABBA deadlocks to waiters succeeding the deadlocking waiter. A more
verbose rationale is available in the commit log. The second commit
expands the stress test to introduce a ABBCCA mode that will reliably
exercise the timeout fallback.
====================
selftests/bpf: Add ABBCCA case for rqspinlock stress test
Introduce a new mode for the rqspinlock stress test that exercises a
deadlock that won't be detected by the AA and ABBA checks, such that we
always reliably trigger the timeout fallback. We need 4 CPUs for this
particular case, as CPU 0 is untouched, and three participant CPUs for
triggering the ABBCCA case.
Refactor the lock acquisition paths in the module to better reflect the
three modes and choose the right lock depending on the context.
Also drop ABBA case from running by default as part of test progs, since
the stress test can consume a significant amount of time.
rqspinlock: Disable queue destruction for deadlocks
Disable propagation and unwinding of the waiter queue in case the head
waiter detects a deadlock condition, but keep it enabled in case of the
timeout fallback.
Currently, when the head waiter experiences an AA deadlock, it will
signal all its successors in the queue to exit with an error. This is
not ideal for cases where the same lock is held in contexts which can
cause errors in an unrestricted fashion (e.g., BPF programs, or kernel
paths invoked through BPF programs), and core kernel logic which is
written in a correct fashion and does not expect deadlocks.
The same reasoning can be extended to ABBA situations. Depending on the
actual runtime schedule, one or both of the head waiters involved in an
ABBA situation can detect and exit directly without terminating their
waiter queue. If the ABBA situation manifests again, the waiters will
keep exiting until progress can be made, or a timeout is triggered in
case of more complicated locking dependencies.
We still preserve the queue destruction in case of timeouts, as either
the locking dependencies are too complex to be captured by AA and ABBA
heuristics, or the owner is perpetually stuck. As such, it would be
unwise to continue to apply the timeout for each new head waiter without
terminating the queue, since we may end up waiting for more than 250 ms
in aggregate with all participants in the locking transaction.
The patch itself is fairly simple; we can simply signal our successor to
become the next head waiter, and leave the queue without attempting to
acquire the lock.
With this change, the behavior for waiters in case of deadlocks
experienced by a predecessor changes. It is guaranteed that call sites
will no longer receive errors if the predecessors encounter deadlocks
and the successors do not participate in one. This should lower the
failure rate for waiters that are not doing improper locking opreations,
just because they were unlucky to queue behind a misbehaving waiter.
However, timeouts are still a possibility, hence they must be accounted
for, so users cannot rely upon errors not occuring at all.
Mykyta Yatsenko [Wed, 29 Oct 2025 19:59:07 +0000 (19:59 +0000)]
selftests/bpf: Fix intermittent failures in file_reader test
file_reader/on_open_expect_fault intermittently fails when test_progs
runs tests in parallel, because it expects a page fault on first read.
Another file_reader test running concurrently may have already pulled
the same pages into the page cache, eliminating the fault and causing a
spurious failure.
Make file_reader/on_open_expect_fault read from a file region that does
not overlap with other file_reader tests, so the initial access still
faults even under parallel execution.
====================
Hello,
this is the v3 of test_tc_tunnel conversion into test_progs framework.
This new revision:
- fixes a few issues spotted by the bot reviewer
- removes any test ensuring connection failure (and so depending on a
timout) to keep the execution time reasonable
test_tc_tunnel.sh tests a variety of tunnels based on BPF: packets are
encapsulated by a BPF program on the client egress. We then check that
those packets can be decapsulated on server ingress side, either thanks
to kernel-based or BPF-based decapsulation. Those tests are run thanks
to two veths in two dedicated namespaces.
- patches 1 and 2 are preparatory patches
- patch 3 introduce tc_tunnel test into test_progs
- patch 4 gets rid of the test_tc_tunnel.sh script
The new test has been executed both in some x86 local qemu machine, as
well as in CI:
selftests/bpf: Integrate test_tc_tunnel.sh tests into test_progs
The test_tc_tunnel.sh script checks that a large variety of tunneling
mechanisms handled by the kernel can be handled as well by eBPF
programs. While this test shares similarities with test_tunnel.c (which
is already integrated in test_progs), those are testing slightly
different things:
- test_tunnel.c creates a tunnel interface, and then get and set tunnel
keys in packet metadata, from BPF programs.
- test_tc_tunnels.sh manually parses/crafts packets content
Bring the tests covered by test_tc_tunnel.sh into the test_progs
framework, by creating a dedicated test_tc_tunnel.sh. This new test
defines a "generic" runner which, for each test configuration:
- will configure the relevant veth pair, each of those isolated in a
dedicated namespace
- will check that traffic will fail if there is only an encapsulating
program attached to one veth egress
- will check that traffic succeed if we enable some decapsulation module
on kernel side
- will check that traffic still succeeds if we replace the kernel
decapsulation with some eBPF ingress decapsulation.
selftests/bpf: Make test_tc_tunnel.bpf.c compatible with big endian platforms
When trying to run bpf-based encapsulation in a s390x environment, some
parts of test_tc_tunnel.bpf.o do not encapsulate correctly the traffic,
leading to tests failures. Adding some logs shows for example that
packets about to be sent on an interface with the ip6vxlan_eth program
attached do not have the expected value 5 in the ip header ihl field,
and so are ignored by the program.
This phenomenon appears when trying to cross-compile the selftests,
rather than compiling it from a virtualized host: the selftests build
system may then wrongly pick some host headers. If <asm/byteorder.h>
ends up being picked on the host (and if the host has a endianness
different from the target one), it will then expose wrong endianness
defines (e.g __LITTLE_ENDIAN_BITFIELD instead of __BIT_ENDIAN_BITFIELD),
and it will for example mess up the iphdr structure layout used in the
ebpf program.
To prevent this, directly use the vmlinux.h header generated by the
selftests build system rather than including directly specific kernel
headers. As a consequence, add some missing definitions that are not
exposed by vmlinux.h, and adapt the bitfield manipulations to allow
building and using the program on both types of platforms.
Jianyun Gao [Mon, 27 Oct 2025 03:20:08 +0000 (11:20 +0800)]
libbpf: Fix the incorrect reference to the memlock_rlim variable in the comment.
The variable "memlock_rlim_max" referenced in the comment does not exist.
I think that the author probably meant the variable "memlock_rlim". So,
correct it.
Jianyun Gao [Fri, 24 Oct 2025 08:08:02 +0000 (16:08 +0800)]
libbpf: Optimize the redundant code in the bpf_object__init_user_btf_maps() function.
In the elf_sec_data() function, the input parameter 'scn' will be
evaluated. If it is NULL, then it will directly return NULL. Therefore,
the return value of the elf_sec_data() function already takes into
account the case where the input parameter scn is NULL. Therefore,
subsequently, the code only needs to check whether the return value of
the elf_sec_data() function is NULL.
Arnaud Lecomte [Sat, 25 Oct 2025 19:29:41 +0000 (19:29 +0000)]
bpf: Fix stackmap overflow check in __bpf_get_stackid()
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
Fixes: ee2a098851bf ("bpf: Adjust BPF stack helper functions to accommodate skip > 0") Reported-by: syzbot+c9b724fbb41cf2538b7b@syzkaller.appspotmail.com Signed-off-by: Arnaud Lecomte <contact@arnaud-lcm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20251025192941.1500-1-contact@arnaud-lcm.com Closes: https://syzkaller.appspot.com/bug?extid=c9b724fbb41cf2538b7b
Arnaud Lecomte [Sat, 25 Oct 2025 19:28:58 +0000 (19:28 +0000)]
bpf: Refactor stack map trace depth calculation into helper function
Extract the duplicated maximum allowed depth computation for stack
traces stored in BPF stacks from bpf_get_stackid() and __bpf_get_stack()
into a dedicated stack_map_calculate_max_depth() helper function.
This unifies the logic for:
- The max depth computation
- Enforcing the sysctl_perf_event_max_stack limit
Zhang Chujun [Tue, 28 Oct 2025 06:33:45 +0000 (14:33 +0800)]
bpftool: Fix missing closing parethesis for BTF_KIND_UNKN
In the btf_dumper_do_type function, the debug print statement for
BTF_KIND_UNKN was missing a closing parenthesis in the output format.
This patch adds the missing ')' to ensure proper formatting of the
dump output.
Xu Kuohai [Sat, 18 Oct 2025 03:57:38 +0000 (11:57 +0800)]
selftests/bpf/benchs: Add overwrite mode benchmark for BPF ring buffer
Add --rb-overwrite option to benchmark BPF ring buffer in overwrite mode.
Since overwrite mode is not yet supported by libbpf for consumer, also add
--rb-bench-producer option to benchmark producer directly without a consumer.
Benchmarks on an x86_64 and an arm64 CPU are shown below for reference.
Xu Kuohai [Sat, 18 Oct 2025 03:57:37 +0000 (11:57 +0800)]
selftests/bpf: Add overwrite mode test for BPF ring buffer
Add overwrite mode test for BPF ring buffer. The test creates a BPF ring
buffer in overwrite mode, then repeatedly reserves and commits records
to check if the ring buffer works as expected both before and after
overwriting occurs.
Xu Kuohai [Sat, 18 Oct 2025 03:57:36 +0000 (11:57 +0800)]
bpf: Add overwrite mode for BPF ring buffer
When the BPF ring buffer is full, a new event cannot be recorded until one
or more old events are consumed to make enough space for it. In cases such
as fault diagnostics, where recent events are more useful than older ones,
this mechanism may lead to critical events being lost.
So add overwrite mode for BPF ring buffer to address it. In this mode, the
new event overwrites the oldest event when the buffer is full.
The basic idea is as follows:
1. producer_pos tracks the next position to record new event. When there
is enough free space, producer_pos is simply advanced by producer to
make space for the new event.
2. To avoid waiting for consumer when the buffer is full, a new variable,
overwrite_pos, is introduced for producer. It points to the oldest event
committed in the buffer. It is advanced by producer to discard one or more
oldest events to make space for the new event when the buffer is full.
3. pending_pos tracks the oldest event to be committed. pending_pos is never
passed by producer_pos, so multiple producers never write to the same
position at the same time.
The following example diagrams show how it works in a 4096-byte ring buffer.
1. At first, {producer,overwrite,pending,consumer}_pos are all set to 0.
There is enough free space, so A is allocated at offset 0. And producer_pos
is advanced to 512, the end of A. Since A is not submitted, the BUSY bit is
set.
There are 2048 bytes not being written between producer_pos (currently 3584)
and pending_pos, so D is allocated at offset 3584, and producer_pos is advanced
by 1536 (from 3584 to 5120).
Since event D will overwrite all bytes of event A and the first 512 bytes of
event B, overwrite_pos is advanced to the start of event C, the oldest event
that is not overwritten.
Although there are 512 bytes not being written between producer_pos and
pending_pos, E cannot be reserved, as it would overwrite the first 512
bytes of event C, which is still being written.
The performance data for overwrite mode will be provided in a follow-up
patch that adds overwrite-mode benchmarks.
A sample of performance data for non-overwrite mode, collected on an x86_64
CPU and an arm64 CPU, before and after this patch, is shown below. As we can
see, no obvious performance regression occurs.
This series adds a new dynptr kind, file dynptr, which enables BPF
programs to perform safe reads from files in a structured way.
Initial motivations include:
* Parsing the executable’s ELF to locate thread-local variable symbols
* Capturing stack traces when frame pointers are disabled
By leveraging the existing dynptr abstraction, we reuse the verifier’s
lifetime/size checks and keep the API consistent with existing dynptr
read helpers.
Technical details:
1. Reuses the existing freader library to read files a folio at a time.
2. bpf_dynptr_slice() and bpf_dynptr_read() always copy data from folios
into a program-provided buffer; zero-copy access is intentionally not
supported to keep it simple.
3. Reads may sleep if the requested folios are not in the page cache.
4. Few verifier changes required:
* Support dynptr destruction in kfuncs
* Add kfunc address substitution based on whether the program runs in
a sleepable or non-sleepable context.
Testing:
The final patch adds a selftest that validates BPF program reads the
same data as userspace, page faults are enabled in sleepable context and
disabled in non-sleepable.
Changelog:
---
v4 -> v5
v4: https://lore.kernel.org/all/20251021200334.220542-1-mykyta.yatsenko5@gmail.com/
* Inlined and removed kfunc_call_imm(), run overflow check for call_imm
only if !bpf_jit_supports_far_kfunc_call().
v3 -> v4
v3: https://lore.kernel.org/bpf/20251020222538.932915-1-mykyta.yatsenko5@gmail.com/
* Remove ringbuf usage from selftests
* bpf_dynptr_set_null(ptr) when discarding file dynptr
* call kfunc_call_imm() in specialize_kfunc() only, removed
call from add_kfunc_call()
v2 -> v3
v2: https://lore.kernel.org/bpf/20251015161155.120148-1-mykyta.yatsenko5@gmail.com/
* Add negative tests
* Rewrote tests to use LSM for bpf_get_task_exe_file()
* Move call_imm overflow check into kfunc_call_imm()
v1 -> v2
v1: https://lore.kernel.org/bpf/20251003160416.585080-1-mykyta.yatsenko5@gmail.com/
* Remove ELF parsing selftest
* Expanded u32 -> u64 refactoring, changes in include/uapi/linux/bpf.h
* Removed freader.{c,h}, instead move freader definitions into
buildid.h.
* Small refactoring of the multiple folios reading algorithm
* Directly return error after unmark_stack_slots_dynptr().
* Make kfuncs receive trusted arguments.
* Remove enum bpf_is_sleepable, use bool instead
* Remove unnecessary sorting from specialize_kfunc()
* Remove bool kfunc_in_sleepable_ctx; field from the struct
bpf_insn_aux_data, rely on non_sleepable field introduced by Kumar
* Refactor selftests, do madvise(...MADV_PAGEOUT) for all pages read by
the test
* Introduce the test for non-sleepable case, verify it fails with -EFAULT
====================
Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:53 +0000 (20:38 +0000)]
selftests/bpf: add file dynptr tests
Introducing selftests for validating file-backed dynptr works as
expected.
* validate implementation supports dynptr slice and read operations
* validate destructors should be paired with initializers
* validate sleepable progs can page in.
Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:52 +0000 (20:38 +0000)]
bpf: dispatch to sleepable file dynptr
File dynptr reads may sleep when the requested folios are not in
the page cache. To avoid sleeping in non-sleepable contexts while still
supporting valid sleepable use, given that dynptrs are non-sleepable by
default, enable sleeping only when bpf_dynptr_from_file() is invoked
from a sleepable context.
This change:
* Introduces a sleepable constructor: bpf_dynptr_from_file_sleepable()
* Override non-sleepable constructor with sleepable if it's always
called in sleepable context
Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:51 +0000 (20:38 +0000)]
bpf: verifier: refactor kfunc specialization
Move kfunc specialization (function address substitution) to later stage
of verification to support a new use case, where we need to take into
consideration whether kfunc is called in sleepable context.
Minor refactoring in add_kfunc_call(), making sure that if function
fails, kfunc desc is not added to tab->descs (previously it could be
added or not, depending on what failed).
Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:49 +0000 (20:38 +0000)]
bpf: add plumbing for file-backed dynptr
Add the necessary verifier plumbing for the new file-backed dynptr type.
Introduce two kfuncs for its lifecycle management:
* bpf_dynptr_from_file() for initialization
* bpf_dynptr_file_discard() for destruction
Currently there is no mechanism for kfunc to release dynptr, this patch
add one:
* Dynptr release function sets meta->release_regno
* Call unmark_stack_slots_dynptr() if meta->release_regno is set and
dynptr ref_obj_id is set as well.
Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:48 +0000 (20:38 +0000)]
bpf: verifier: centralize const dynptr check in unmark_stack_slots_dynptr()
Move the const dynptr check into unmark_stack_slots_dynptr() so callers
don’t have to duplicate it. This puts the validation next to the code
that manipulates dynptr stack slots and allows upcoming changes to reuse
it directly.
Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:47 +0000 (20:38 +0000)]
lib/freader: support reading more than 2 folios
freader_fetch currently reads from at most two folios. When a read spans
into a third folio, the overflow bytes are copied adjacent to the second
folio’s data instead of being handled as a separate folio.
This patch modifies fetch algorithm to support reading from many folios.
Mykyta Yatsenko [Sun, 26 Oct 2025 20:38:45 +0000 (20:38 +0000)]
bpf: widen dynptr size/offset to 64 bit
Dynptr currently caps size and offset at 24 bits, which isn’t sufficient
for file-backed use cases; even 32 bits can be limiting. Refactor dynptr
helpers/kfuncs to use 64-bit size and offset, ensuring consistency
across the APIs.
This change does not affect internals of xdp, skb or other dynptrs,
which continue to behave as before. Also it does not break binary
compatibility.
The widening enables large-file access support via dynptr, implemented
in the next patches.
Anton Protopopov [Sun, 19 Oct 2025 20:21:41 +0000 (20:21 +0000)]
libbpf: fix formatting of bpf_object__append_subprog_code
The commit 6c918709bd30 ("libbpf: Refactor bpf_object__reloc_code")
added the bpf_object__append_subprog_code() with incorrect indentations.
Use tabs instead. (This also makes a consequent commit better readable.)
Anton Protopopov [Sun, 19 Oct 2025 20:21:37 +0000 (20:21 +0000)]
bpf: make bpf_insn_successors to return a pointer
The bpf_insn_successors() function is used to return successors
to a BPF instruction. So far, an instruction could have 0, 1 or 2
successors. Prepare the verifier code to introduction of instructions
with more than 2 successors (namely, indirect jumps).
To do this, introduce a new struct, struct bpf_iarray, containing
an array of bpf instruction indexes and make bpf_insn_successors
to return a pointer of that type. The storage for all instructions
is allocated in the env->succ, which holds an array of size 2,
to be used for all instructions.
Anton Protopopov [Sun, 19 Oct 2025 20:21:31 +0000 (20:21 +0000)]
bpf: generalize and export map_get_next_key for arrays
The kernel/bpf/array.c file defines the array_map_get_next_key()
function which finds the next key for array maps. It actually doesn't
use any map fields besides the generic max_entries field. Generalize
it, and export as bpf_array_get_next_key() such that it can be
re-used by other array-like maps.
Anton Protopopov [Sun, 19 Oct 2025 20:21:30 +0000 (20:21 +0000)]
bpf: save the start of functions in bpf_prog_aux
Introduce a new subprog_start field in bpf_prog_aux. This field may
be used by JIT compilers wanting to know the real absolute xlated
offset of the function being jitted. The func_info[func_id] may have
served this purpose, but func_info may be NULL, so JIT compilers
can't rely on it.
Anton Protopopov [Sun, 19 Oct 2025 20:21:29 +0000 (20:21 +0000)]
bpf: fix the return value of push_stack
In [1] Eduard mentioned that on push_stack failure verifier code
should return -ENOMEM instead of -EFAULT. After checking with the
other call sites I've found that code randomly returns either -ENOMEM
or -EFAULT. This patch unifies the return values for the push_stack
(and similar push_async_cb) functions such that error codes are
always assigned properly.
Shardul Bankar [Tue, 21 Oct 2025 08:08:46 +0000 (13:38 +0530)]
bpf: Clarify get_outer_instance() handling in propagate_to_outer_instance()
propagate_to_outer_instance() calls get_outer_instance() and uses the
returned pointer to reset and commit stack write marks. Under normal
conditions, update_instance() guarantees that an outer instance exists,
so get_outer_instance() cannot return an ERR_PTR.
However, explicitly checking for IS_ERR(outer_instance) makes this code
more robust and self-documenting. It reduces cognitive load when reading
the control flow and silences potential false-positive reports from
static analysis or automated tooling.
Daniel Borkmann [Mon, 20 Oct 2025 07:54:41 +0000 (09:54 +0200)]
bpf: Do not let BPF test infra emit invalid GSO types to stack
Yinhao et al. reported that their fuzzer tool was able to trigger a
skb_warn_bad_offload() from netif_skb_features() -> gso_features_check().
When a BPF program - triggered via BPF test infra - pushes the packet
to the loopback device via bpf_clone_redirect() then mentioned offload
warning can be seen. GSO-related features are then rightfully disabled.
We get into this situation due to convert___skb_to_skb() setting
gso_segs and gso_size but not gso_type. Technically, it makes sense
that this warning triggers since the GSO properties are malformed due
to the gso_type. Potentially, the gso_type could be marked non-trustworthy
through setting it at least to SKB_GSO_DODGY without any other specific
assumptions, but that also feels wrong given we should not go further
into the GSO engine in the first place.
The checks were added in 121d57af308d ("gso: validate gso_type in GSO
handlers") because there were malicious (syzbot) senders that combine
a protocol with a non-matching gso_type. If we would want to drop such
packets, gso_features_check() currently only returns feature flags via
netif_skb_features(), so one location for potentially dropping such skbs
could be validate_xmit_unreadable_skb(), but then otoh it would be
an additional check in the fast-path for a very corner case. Given
bpf_clone_redirect() is the only place where BPF test infra could emit
such packets, lets reject them right there.
Fixes: 850a88cc4096 ("bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN") Fixes: cf62089b0edd ("bpf: Add gso_size to __sk_buff") Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reported-by: Dongliang Mu <dzm91@hust.edu.cn> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20251020075441.127980-1-daniel@iogearbox.net
Puranjay Mohan [Fri, 17 Oct 2025 14:17:25 +0000 (14:17 +0000)]
selftests/bpf: Fix list_del() in arena list
The __list_del fuction doesn't set the previous node's next pointer to
the next node of the node to be deleted. It just updates the local variable
and not the actual pointer in the previous node.
The test was passing up till now because the bpf code is doing bpf_free()
after list_del and therfore reading head->first from the userspace will
read all zeroes. But after arena_list_del() is finished, head->first should
point to NULL;
Yonghong Song [Tue, 14 Oct 2025 05:16:39 +0000 (22:16 -0700)]
selftests/bpf: Fix selftest verif_scale_strobemeta failure with llvm22
With latest llvm22, I hit the verif_scale_strobemeta selftest failure
below:
$ ./test_progs -n 618
libbpf: prog 'on_event': BPF program load failed: -E2BIG
libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG --
BPF program is too large. Processed 1000001 insn
verification time 7019091 usec
stack depth 488
processed 1000001 insns (limit 1000000) max_states_per_insn 28 total_states 33927 peak_states 12813 mark_read 0
-- END PROG LOAD LOG --
libbpf: prog 'on_event': failed to load: -E2BIG
libbpf: failed to load object 'strobemeta.bpf.o'
scale_test:FAIL:expect_success unexpected error: -7 (errno 7)
#618 verif_scale_strobemeta:FAIL
But if I increase the verificaiton insn limit from 1M to 10M, the above
test_progs run actually will succeed. The below is the result from veristat:
$ ./veristat strobemeta.bpf.o
Processing 'strobemeta.bpf.o'...
File Program Verdict Duration (us) Insns States Program size Jited size
---------------- -------- ------- ------------- ------- ------ ------------ ----------
strobemeta.bpf.o on_event success 902508939777685 358230 15954 80794
---------------- -------- ------- ------------- ------- ------ ------------ ----------
Done. Processed 1 files, 0 programs. Skipped 1 files, 0 programs.
Further debugging shows the llvm commit [1] is responsible for the verificaiton
failure as it tries to convert certain switch statement to if-condition. Such
change may cause different transformation compared to original switch statement.
Note that the above 'switch' statement is added by clang frontend.
Without [1], the switch statement will survive until SelectionDag,
so the switch statement acts like a 'barrier' and prevents some
transformation involved with both 'before' and 'after' the switch statement.
But with [1], the switch statement will be removed during middle end
optimization and later middle end passes (esp. after inlining) have more
freedom to reorder the code.
The static func calc_location() is called inside read_int_var(). The asm code
without [1]:
77: .123....89 (85) call bpf_probe_read_user#112
78: ........89 (79) r1 = *(u64 *)(r10 -368)
79: .1......89 (79) r2 = *(u64 *)(r10 -8)
80: .12.....89 (bf) r3 = r2
81: .123....89 (0f) r3 += r1
82: ..23....89 (07) r2 += 1
83: ..23....89 (79) r4 = *(u64 *)(r10 -464)
84: ..234...89 (a5) if r2 < 0x2 goto pc+13
85: ...34...89 (15) if r3 == 0x0 goto pc+12
86: ...3....89 (bf) r1 = r10
87: .1.3....89 (07) r1 += -400
88: .1.3....89 (b4) w2 = 16
In this case, 'r2 < 0x2' and 'r3 == 0x0' go to null 'locaiton' place,
so the verifier actually prefers to do verification first at 'r1 = r10' etc.
The asm code with [1]:
119: .123....89 (85) call bpf_probe_read_user#112
120: ........89 (79) r1 = *(u64 *)(r10 -368)
121: .1......89 (79) r2 = *(u64 *)(r10 -8)
122: .12.....89 (bf) r3 = r2
123: .123....89 (0f) r3 += r1
124: ..23....89 (07) r2 += -1
125: ..23....89 (a5) if r2 < 0xfffffffe goto pc+6
126: ........89 (05) goto pc+17
...
144: ........89 (b4) w1 = 0
145: .1......89 (6b) *(u16 *)(r8 +80) = r1
In this case, if 'r2 < 0xfffffffe' is true, the control will go to
non-null 'location' branch, so 'goto pc+17' will actually go to
null 'location' branch. This seems causing tremendous amount of
verificaiton state.
To fix the issue, rewrite the following code
return tls_ptr && tls_ptr != (void *)-1
? tls_ptr + tls_index.offset
: NULL;
to if/then statement and hopefully these explicit if/then statements
are sticky during middle-end optimizations.
Test with llvm20 and llvm21 as well and all strobemeta related selftests
are passed.
Yafang Shao [Thu, 16 Oct 2025 06:39:29 +0000 (14:39 +0800)]
bpf: mark vma->{vm_mm,vm_file} as __safe_trusted_or_null
The vma->vm_mm might be NULL and it can be accessed outside of RCU. Thus,
we can mark it as trusted_or_null. With this change, BPF helpers can safely
access vma->vm_mm to retrieve the associated mm_struct from the VMA.
Then we can make policy decision from the VMA.
The "trusted" annotation enables direct access to vma->vm_mm within kfuncs
marked with KF_TRUSTED_ARGS or KF_RCU, such as bpf_task_get_cgroup1() and
bpf_task_under_cgroup(). Conversely, "null" enforcement requires all
callsites using vma->vm_mm to perform NULL checks.
The lsm selftest must be modified because it directly accesses vma->vm_mm
without a NULL pointer check; otherwise it will break due to this
change.
For the VMA based THP policy, the use case is as follows,
@mm = @vma->vm_mm; // vm_area_struct::vm_mm is trusted or null
if (!@mm)
return;
bpf_rcu_read_lock(); // rcu lock must be held to dereference the owner
@owner = @mm->owner; // mm_struct::owner is rcu trusted or null
if (!@owner)
goto out;
@cgroup1 = bpf_task_get_cgroup1(@owner, MEMCG_HIERARCHY_ID);
/* make the decision based on the @cgroup1 attribute */
bpf_cgroup_release(@cgroup1); // release the associated cgroup
out:
bpf_rcu_read_unlock();
PSI memory information can be obtained from the associated cgroup to inform
policy decisions. Since upstream PSI support is currently limited to cgroup
v2, the following example demonstrates cgroup v2 implementation:
@owner = @mm->owner;
if (@owner) {
// @ancestor_cgid is user-configured
@ancestor = bpf_cgroup_from_id(@ancestor_cgid);
if (bpf_task_under_cgroup(@owner, @ancestor)) {
@psi_group = @ancestor->psi;
/* Extract PSI metrics from @psi_group and
* implement policy logic based on the values
*/
}
}
The vma::vm_file can also be marked with __safe_trusted_or_null.
No additional selftests are required since vma->vm_file and vma->vm_mm are
already validated in the existing selftest suite.
Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Acked-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Link: https://lore.kernel.org/r/20251016063929.13830-3-laoar.shao@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Yafang Shao [Thu, 16 Oct 2025 06:39:28 +0000 (14:39 +0800)]
bpf: mark mm->owner as __safe_rcu_or_null
When CONFIG_MEMCG is enabled, we can access mm->owner under RCU. The
owner can be NULL. With this change, BPF helpers can safely access
mm->owner to retrieve the associated task from the mm. We can then make
policy decision based on the task attribute.
The typical use case is as follows,
bpf_rcu_read_lock(); // rcu lock must be held for rcu trusted field
@owner = @mm->owner; // mm_struct::owner is rcu trusted or null
if (!@owner)
goto out;
There are some set but not used build errors when compiling bpf selftests
with the latest upstream mainline GCC, at the beginning add the attribute
__maybe_unused for the variables, but it is better to just add the option
-Wno-unused-but-set-variable to CFLAGS in Makefile to disable the errors
instead of hacking the tests.
tools/testing/selftests/bpf/map_tests/lpm_trie_map_basic_ops.c:229:36:
error: variable ‘n_matches_after_delete’ set but not used [-Werror=unused-but-set-variable=]
tools/testing/selftests/bpf/map_tests/lpm_trie_map_basic_ops.c:229:25:
error: variable ‘n_matches’ set but not used [-Werror=unused-but-set-variable=]
tools/testing/selftests/bpf/prog_tests/bpf_cookie.c:426:22:
error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]
tools/testing/selftests/bpf/prog_tests/find_vma.c:52:22:
error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]
tools/testing/selftests/bpf/prog_tests/perf_branches.c:67:22:
error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]
tools/testing/selftests/bpf/prog_tests/perf_link.c:15:22:
error: variable ‘j’ set but not used [-Werror=unused-but-set-variable=]
Linus Torvalds [Sat, 18 Oct 2025 20:05:13 +0000 (10:05 -1000)]
Merge tag 'rust-rustfmt' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux
Pull rustfmt fixes from Miguel Ojeda:
"Rust 'rustfmt' cleanup
'rustfmt', by default, formats imports in a way that is prone to
conflicts while merging and rebasing, since in some cases it condenses
several items into the same line.
Document in our guidelines that we will handle this for the moment
with the trailing empty comment workaround and make the tree
'rustfmt'-clean again"
* tag 'rust-rustfmt' of git://git.kernel.org/pub/scm/linux/kernel/git/ojeda/linux:
rust: bitmap: fix formatting
rust: cpufreq: fix formatting
rust: alloc: employ a trailing comment to keep vertical layout
docs: rust: add section on imports formatting
Linus Torvalds [Sat, 18 Oct 2025 18:38:28 +0000 (08:38 -1000)]
Merge tag 'tpmdd-next-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull tpm fix from Jarkko Sakkinen:
"Correct the state transitions for ARM FF-A to match the spec and how
tpm_crb behaves on other platforms"
* tag 'tpmdd-next-v6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
tpm_crb: Add idle support for the Arm FF-A start method
Linus Torvalds [Sat, 18 Oct 2025 18:35:09 +0000 (08:35 -1000)]
Merge tag 'pci-v6.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci fixes from Bjorn Helgaas:
- Search for MSI Capability with correct ID to fix an MSI regression on
platforms with Cadence IP (Hans Zhang)
- Revert early bridge resource set up to fix resource assignment
failures that broke at least alpha boot and Snapdragon ath12k WiFi
(Ilpo Järvinen)
- Implement VMD .irq_startup()/.irq_shutdown() to fix IRQ issues that
caused boot crashes and broken devices below VMD (Inochi Amaoto)
- Select CONFIG_SCREEN_INFO on X86 to fix black screen on boot when
SCREEN_INFO not selected (Mario Limonciello)
* tag 'pci-v6.18-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci:
PCI/VGA: Select SCREEN_INFO on X86
PCI: vmd: Override irq_startup()/irq_shutdown() in vmd_init_dev_msi_info()
PCI: Revert early bridge resource set up
PCI: cadence: Search for MSI Capability with correct ID
Linus Torvalds [Sat, 18 Oct 2025 18:22:07 +0000 (08:22 -1000)]
Merge tag 'cxl-fixes-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl
Pull Compute Express Link fixes from Dave Jiang:
"A small collection of CXL fixes. In addition to some misc fixes for
the CXL subsystem, a number of fixes for CXL extended linear cache
support are included to make it functional again.
- Avoid missing port component registers setup due to dport
enumeration failure
- Add check for no entries in cxl_feature_info to address accessing
invalid pointer.
- Use %pa printk format to emit resource_size_t in
validate_region_offset()
CXL extended linear cache support fixes:
- Fix setup of memory resource in cxl_acpi_set_cache_size()
- Set range param for region_res_match_cxl_range() as const
(addresses a compile warning for match_region_by_range() fix)
- Fix match_region_by_range() to use region_res_match_cxl_range()
- Subtract to find an hpa_alias0 in cxl_poison events to correct the
alias math calculation"
* tag 'cxl-fixes-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl:
cxl/trace: Subtract to find an hpa_alias0 in cxl_poison events
cxl/region: Use %pa printk format to emit resource_size_t
cxl: Fix match_region_by_range() to use region_res_match_cxl_range()
cxl: Set range param for region_res_match_cxl_range() as const
cxl/acpi: Fix setup of memory resource in cxl_acpi_set_cache_size()
cxl/features: Add check for no entries in cxl_feature_info
cxl/port: Avoid missing port component registers setup
Linus Torvalds [Sat, 18 Oct 2025 18:00:43 +0000 (08:00 -1000)]
Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf
Pull bpf fixes from Alexei Starovoitov:
- Replace bpf_map_kmalloc_node() with kmalloc_nolock() to fix kmemleak
imbalance in tracking of bpf_async_cb structures (Alexei Starovoitov)
- Make selftests/bpf arg_parsing.c more robust to errors (Andrii
Nakryiko)
- Fix redefinition of 'off' as different kind of symbol when I40E
driver is builtin (Brahmajit Das)
- Do not disable preemption in bpf_test_run (Sahil Chandna)
- Fix memory leak in __lookup_instance error path (Shardul Bankar)
- Ensure test data is flushed to disk before reading it (Xing Guo)
* tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
selftests/bpf: Fix redefinition of 'off' as different kind of symbol
bpf: Do not disable preemption in bpf_test_run().
bpf: Fix memory leak in __lookup_instance error path
selftests: arg_parsing: Ensure data is flushed to disk before reading.
bpf: Replace bpf_map_kmalloc_node() with kmalloc_nolock() to allocate bpf_async_cb structures.
selftests/bpf: make arg_parsing.c more robust to crashes
bpf: test_run: Fix ctx leak in bpf_prog_test_run_xdp error path
Linus Torvalds [Sat, 18 Oct 2025 17:23:59 +0000 (07:23 -1000)]
Merge tag 'exfat-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat
Pull exfat fixes from Namjae Jeon:
- Fix out-of-bounds in FS_IOC_SETFSLABEL
- Add validation for stream entry size to prevent infinite loop
* tag 'exfat-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/exfat:
exfat: fix out-of-bounds in exfat_nls_to_ucs2()
exfat: fix improper check of dentry.stream.valid_size
Linus Torvalds [Sat, 18 Oct 2025 17:18:48 +0000 (07:18 -1000)]
Merge tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client fixes from Anna Schumaker:
- Fix for FlexFiles mirror->dss allocation
- Apply delay_retrans to async operations
- Check if suid/sgid is cleared after a write when needed
- Fix setting the state renewal timer for early mounts after a reboot
* tag 'nfs-for-6.18-2' of git://git.linux-nfs.org/projects/anna/linux-nfs:
NFS4: Fix state renewals missing after boot
NFS: check if suid/sgid was cleared after a write as needed
NFS4: Apply delay_retrans to async operations
NFSv4/flexfiles: fix to allocate mirror->dss before use
Linus Torvalds [Sat, 18 Oct 2025 17:11:32 +0000 (07:11 -1000)]
Merge tag '6.18-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull smb client fixes from Steve French:
"smb client fixes, security and smbdirect improvements, and some minor cleanup:
- Important OOB DFS fix
- Fix various potential tcon refcount leaks
- smbdirect (RDMA) fixes (following up from test event a few weeks
ago):
- Fixes to improve and simplify handling of memory lifetime of
smbdirect_mr_io structures, when a connection gets disconnected
- Make sure we really wait to reach SMBDIRECT_SOCKET_DISCONNECTED
before destroying resources
- Make sure the send/recv submission/completion queues are large
enough to avoid ib_post_send() from failing under pressure
- convert cifs.ko to use the recommended crypto libraries (instead of
crypto_shash), this also can improve performance
- Three small cleanup patches"
* tag '6.18-rc1-smb-client-fixes' of git://git.samba.org/sfrench/cifs-2.6: (24 commits)
smb: client: Consolidate cmac(aes) shash allocation
smb: client: Remove obsolete crypto_shash allocations
smb: client: Use HMAC-MD5 library for NTLMv2
smb: client: Use MD5 library for SMB1 signature calculation
smb: client: Use MD5 library for M-F symlink hashing
smb: client: Use HMAC-SHA256 library for SMB2 signature calculation
smb: client: Use HMAC-SHA256 library for key generation
smb: client: Use SHA-512 library for SMB3.1.1 preauth hash
cifs: parse_dfs_referrals: prevent oob on malformed input
smb: client: Fix refcount leak for cifs_sb_tlink
smb: client: let smbd_destroy() wait for SMBDIRECT_SOCKET_DISCONNECTED
smb: move some duplicate definitions to common/cifsglob.h
smb: client: let destroy_mr_list() keep smbdirect_mr_io memory if registered
smb: client: let destroy_mr_list() call ib_dereg_mr() before ib_dma_unmap_sg()
smb: client: call ib_dma_unmap_sg if mr->sgt.nents is not 0
smb: client: improve logic in smbd_deregister_mr()
smb: client: improve logic in smbd_register_mr()
smb: client: improve logic in allocate_mr_list()
smb: client: let destroy_mr_list() remove locked from the list
smb: client: let destroy_mr_list() call list_del(&mr->list)
...
Linus Torvalds [Sat, 18 Oct 2025 17:07:14 +0000 (07:07 -1000)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix the handling of ZCR_EL2 in NV VMs
- Pick the correct translation regime when doing a PTW on the back of
a SEA
- Prevent userspace from injecting an event into a vcpu that isn't
initialised yet
- Move timer save/restore to the sysreg handling code, fixing EL2
timer access in the process
- Add FGT-based trapping of MDSCR_EL1 to reduce the overhead of debug
- Fix trapping configuration when the host isn't GICv3
- Improve the detection of HCR_EL2.E2H being RES1
- Drop a spurious 'break' statement in the S1 PTW
- Don't try to access SPE when owned by EL3
Documentation updates:
- Document the failure modes of event injection
- Document that a GICv3 guest can be created on a GICv5 host with
FEAT_GCIE_LEGACY
Selftest improvements:
- Add a selftest for the effective value of HCR_EL2.AMO
- Address build warning in the timer selftest when building with
clang
- Teach irqfd selftests about non-x86 architectures
- Add missing sysregs to the set_id_regs selftest
- Fix vcpu allocation in the vgic_lpi_stress selftest
- Correctly enable interrupts in the vgic_lpi_stress selftest
x86:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test
for the bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return
-EAGAIN if userspace deletes/moves memslot during prefault")
- Don't try to get PMU capabilities from perf when running a CPU with
hybrid CPUs/PMUs, as perf will rightly WARN.
guest_memfd:
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a
more generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to
explicitly set said flag to initialize memory as SHARED,
irrespective of MMAP.
The behavior merged in 6.18 is that enabling mmap() implicitly
initializes memory as SHARED, which would result in an ABI
collision for x86 CoCo VMs as their memory is currently always
initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with
private memory, to enable testing such setups, i.e. to hopefully
flush out any other lurking ABI issues before 6.18 is officially
released.
- Add testcases to the guest_memfd selftest to cover guest_memfd
without MMAP, and host userspace accesses to mmap()'d private
memory"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (46 commits)
arm64: Revamp HCR_EL2.E2H RES1 detection
KVM: arm64: nv: Use FGT write trap of MDSCR_EL1 when available
KVM: arm64: Compute per-vCPU FGTs at vcpu_load()
KVM: arm64: selftests: Fix misleading comment about virtual timer encoding
KVM: arm64: selftests: Add an E2H=0-specific configuration to get_reg_list
KVM: arm64: selftests: Make dependencies on VHE-specific registers explicit
KVM: arm64: Kill leftovers of ad-hoc timer userspace access
KVM: arm64: Fix WFxT handling of nested virt
KVM: arm64: Move CNT*CT_EL0 userspace accessors to generic infrastructure
KVM: arm64: Move CNT*_CVAL_EL0 userspace accessors to generic infrastructure
KVM: arm64: Move CNT*_CTL_EL0 userspace accessors to generic infrastructure
KVM: arm64: Add timer UAPI workaround to sysreg infrastructure
KVM: arm64: Make timer_set_offset() generally accessible
KVM: arm64: Replace timer context vcpu pointer with timer_id
KVM: arm64: Introduce timer_context_to_vcpu() helper
KVM: arm64: Hide CNTHV_*_EL2 from userspace for nVHE guests
Documentation: KVM: Update GICv3 docs for GICv5 hosts
KVM: arm64: gic-v3: Only set ICH_HCR traps for v2-on-v3 or v3 guests
KVM: arm64: selftests: Actually enable IRQs in vgic_lpi_stress
KVM: arm64: selftests: Allocate vcpus with correct size
...
Linus Torvalds [Sat, 18 Oct 2025 17:02:28 +0000 (07:02 -1000)]
Merge tag 'powerpc-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Madhavan Srinivasan:
- Fix to handle NULL pointer dereference at irq domain teardown
- Fix for handling extraction of struct xive_irq_data
- Fix to skip parameter area allocation when fadump disabled
Thanks to Ganesh Goudar, Hari Bathini, Nam Cao, Ritesh Harjani (IBM),
Sourabh Jain, and Venkat Rao Bagalkote,
* tag 'powerpc-6.18-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/fadump: skip parameter area allocation when fadump is disabled
powerpc, ocxl: Fix extraction of struct xive_irq_data
powerpc/pseries/msi: Fix NULL pointer dereference at irq domain teardown
Linus Torvalds [Sat, 18 Oct 2025 16:59:25 +0000 (06:59 -1000)]
Merge tag 'slab-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab
Pull slab fixes from Vlastimil Babka:
- Fixes for two bugs that can be triggered when debugging options are
enabled (Hao Ge, Vlastimil Babka)
* tag 'slab-for-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab:
slab: reset slab->obj_ext when freeing and it is OBJEXTS_ALLOC_FAIL
slab: fix clearing freelist in free_deferred_objects()
Stuart Yoder [Sat, 18 Oct 2025 11:25:18 +0000 (14:25 +0300)]
tpm_crb: Add idle support for the Arm FF-A start method
According to the CRB over FF-A specification [1], a TPM that implements
the ABI must comply with the TCG PTP specification. This requires support
for the Idle and Ready states.
This patch implements CRB control area requests for goIdle and
cmdReady on FF-A based TPMs.
The FF-A message used to notify the TPM of CRB updates includes a
locality parameter, which provides a hint to the TPM about which
locality modified the CRB. This patch adds a locality parameter
to __crb_go_idle() and __crb_cmd_ready() to support this.
Paolo Bonzini [Sat, 18 Oct 2025 08:25:43 +0000 (10:25 +0200)]
Merge tag 'kvm-x86-fixes-6.18-rc2' of https://github.com/kvm-x86/linux into HEAD
KVM x86 fixes for 6.18:
- Expand the KVM_PRE_FAULT_MEMORY selftest to add a regression test for the
bug fixed by commit 3ccbf6f47098 ("KVM: x86/mmu: Return -EAGAIN if userspace
deletes/moves memslot during prefault")
- Don't try to get PMU capabbilities from perf when running a CPU with hybrid
CPUs/PMUs, as perf will rightly WARN.
- Rework KVM_CAP_GUEST_MEMFD_MMAP (newly introduced in 6.18) into a more
generic KVM_CAP_GUEST_MEMFD_FLAGS
- Add a guest_memfd INIT_SHARED flag and require userspace to explicitly set
said flag to initialize memory as SHARED, irrespective of MMAP. The
behavior merged in 6.18 is that enabling mmap() implicitly initializes
memory as SHARED, which would result in an ABI collision for x86 CoCo VMs
as their memory is currently always initialized PRIVATE.
- Allow mmap() on guest_memfd for x86 CoCo VMs, i.e. on VMs with private
memory, to enable testing such setups, i.e. to hopefully flush out any
other lurking ABI issues before 6.18 is officially released.
- Add testcases to the guest_memfd selftest to cover guest_memfd without MMAP,
and host userspace accesses to mmap()'d private memory.
Linus Torvalds [Fri, 17 Oct 2025 23:04:21 +0000 (13:04 -1000)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Explicitly encode the XZR register if the value passed to
write_sysreg_s() is 0.
The GIC CDEOI instruction is encoded as a system register write with
XZR as the source register. However, clang does not honour the "Z"
register constraint, leading to incorrect code generation
- Ensure the interrupts (DAIF.IF) are unmasked when completing
single-step of a suspended breakpoint before calling
exit_to_user_mode().
With pseudo-NMIs, interrupts are (additionally) masked at the PMR_EL1
register, handled by local_irq_*()
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: debug: always unmask interrupts in el0_softstp()
arm64/sysreg: Fix GIC CDEOI instruction encoding
- Avoid duplicate device probes by avoiding DT hardware probing when
ACPI is enabled in early boot
- Use the correct set of dependencies for
CONFIG_ARCH_HAS_ELF_CORE_EFLAGS, avoiding an allnoconfig warning
- Fix a few other minor issues
* tag 'riscv-for-linux-6.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: kprobes: convert one final __ASSEMBLY__ to __ASSEMBLER__
riscv: Respect dependencies of ARCH_HAS_ELF_CORE_EFLAGS
riscv: acpi: avoid errors caused by probing DT devices when ACPI is used
riscv: kprobes: Fix probe address validation
riscv: entry: fix typo in comment 'instruciton' -> 'instruction'
RISC-V: clear hot-unplugged cores from all task mm_cpumasks to avoid rfence errors
riscv: kgdb: Ensure that BUFMAX > NUMREGBYTES
rust: cfi: only 64-bit arm and x86 support CFI_CLANG
Brahmajit Das [Fri, 17 Oct 2025 17:15:51 +0000 (22:45 +0530)]
selftests/bpf: Fix redefinition of 'off' as different kind of symbol
This fixes the following build error
CLNG-BPF [test_progs] verifier_global_ptr_args.bpf.o
progs/verifier_global_ptr_args.c:228:5: error: redefinition of 'off' as
different kind of symbol
228 | u32 off;
| ^
The symbol 'off' was previously defined in
tools/testing/selftests/bpf/tools/include/vmlinux.h, which includes an
enum i40e_ptp_gpio_pin_state from
drivers/net/ethernet/intel/i40e/i40e_ptp.c:
This enum is included when CONFIG_I40E is enabled. As of commit 032676ff8217 ("LoongArch: Update Loongson-3 default config file"),
CONFIG_I40E is set in the defconfig, which leads to the conflict.
Renaming the local variable avoids the redefinition and allows the
build to succeed.
Suggested-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Brahmajit Das <listout@listout.xyz> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20251017171551.53142-1-listout@listout.xyz Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Sahil Chandna [Tue, 14 Oct 2025 18:56:35 +0000 (00:26 +0530)]
bpf: Do not disable preemption in bpf_test_run().
The timer mode is initialized to NO_PREEMPT mode by default,
this disables preemption and force execution in atomic context
causing issue on PREEMPT_RT configurations when invoking
spin_lock_bh(), leading to the following warning:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 6107, name: syz.0.17
preempt_count: 1, expected: 0
RCU nest depth: 1, expected: 1
Preemption disabled at:
[<ffffffff891fce58>] bpf_test_timer_enter+0xf8/0x140 net/bpf/test_run.c:42
Fix this, by removing NO_PREEMPT/NO_MIGRATE mode check.
Also, the test timer context no longer needs explicit calls to
migrate_disable()/migrate_enable() with rcu_read_lock()/rcu_read_unlock().
Use helpers rcu_read_lock_dont_migrate() and rcu_read_unlock_migrate()
instead.
Reported-by: syzbot+1f1fbecb9413cdbfbef8@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1f1fbecb9413cdbfbef8 Suggested-by: Yonghong Song <yonghong.song@linux.dev> Suggested-by: Menglong Dong <menglong.dong@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Tested-by: syzbot+1f1fbecb9413cdbfbef8@syzkaller.appspotmail.com Co-developed-by: Brahmajit Das <listout@listout.xyz> Signed-off-by: Brahmajit Das <listout@listout.xyz> Signed-off-by: Sahil Chandna <chandna.sahil@gmail.com> Link: https://lore.kernel.org/r/20251014185635.10300-1-chandna.sahil@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Ada Couprie Diaz [Tue, 14 Oct 2025 09:25:36 +0000 (10:25 +0100)]
arm64: debug: always unmask interrupts in el0_softstp()
We intend that EL0 exception handlers unmask all DAIF exceptions
before calling exit_to_user_mode().
When completing single-step of a suspended breakpoint, we do not call
local_daif_restore(DAIF_PROCCTX) before calling exit_to_user_mode(),
leaving all DAIF exceptions masked.
When pseudo-NMIs are not in use this is benign.
When pseudo-NMIs are in use, this is unsound. At this point interrupts
are masked by both DAIF.IF and PMR_EL1, and subsequent irq flag
manipulation may not work correctly. For example, a subsequent
local_irq_enable() within exit_to_user_mode_loop() will only unmask
interrupts via PMR_EL1 (leaving those masked via DAIF.IF), and
anything depending on interrupts being unmasked (e.g. delivery of
signals) will not work correctly.
This was detected by CONFIG_ARM64_DEBUG_PRIORITY_MASKING.
Move the call to `try_step_suspended_breakpoints()` outside of the check
so that interrupts can be unmasked even if we don't call the step handler.
Fixes: 0ac7584c08ce ("arm64: debug: split single stepping exception entry") Cc: <stable@vger.kernel.org> # 6.17 Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: added Mark's rewritten commit log and some whitespace] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The GIC CDEOI system instruction requires the Rt field to be set to 0b11111
otherwise the instruction behaviour becomes CONSTRAINED UNPREDICTABLE.
Currenly, its usage is encoded as a system register write, with a constant
0 value:
write_sysreg_s(0, GICV5_OP_GIC_CDEOI)
While compiling with GCC, the 0 constant value, through these asm
constraints and modifiers ('x' modifier and 'Z' constraint combo):
asm volatile(__msr_s(r, "%x0") : : "rZ" (__val));
forces the compiler to issue the XZR register for the MSR operation (ie
that corresponds to Rt == 0b11111) issuing the right instruction encoding.
Unfortunately LLVM does not yet understand that modifier/constraint
combo so it ends up issuing a different register from XZR for the MSR
source, which in turns means that it encodes the GIC CDEOI instruction
wrongly and the instruction behaviour becomes CONSTRAINED UNPREDICTABLE
that we must prevent.
Add a conditional to write_sysreg_s() macro that detects whether it
is passed a constant 0 value and issues an MSR write with XZR as source
register - explicitly doing what the asm modifier/constraint is meant to
achieve through constraints/modifiers, fixing the LLVM compilation issue.
Fixes: 7ec80fb3f025 ("irqchip/gic-v5: Add GICv5 PPI support") Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Acked-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org Cc: Sascha Bischoff <sascha.bischoff@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Fri, 17 Oct 2025 15:45:54 +0000 (08:45 -0700)]
Merge tag 'io_uring-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Revert of a change that went into an older kernel, and which has been
reported to cause a regression for some write workloads on LVM while
a snapshop is being created
- Fix a regression from this merge window, where some compilers (and/or
certain .config options) would cause an earlier evaluations of a
dereference which would then cause a NULL pointer dereference.
I was only able to reproduce this with OPTIMIZE_FOR_SIZE=y, but David
Howells hit it with just KASAN enabled. Depending on how things
inlined, this makes sense
- Fix for a missing lock around a mem region unregistration
- Fix for ring resizing with the same placement after resize
* tag 'io_uring-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/rw: check for NULL io_br_sel when putting a buffer
io_uring: fix unexpected placement on same size resizing
io_uring: protect mem region deregistration
Revert "io_uring/rw: drop -EOPNOTSUPP check in __io_complete_rw_common()"
Linus Torvalds [Fri, 17 Oct 2025 15:31:26 +0000 (08:31 -0700)]
Merge tag 'block-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull block fixes from Jens Axboe:
- NVMe pull request via Keith:
- iostats accounting fixed on multipath retries (Amit)
- secure concatenation response fixup (Martin)
- tls partial record fixup (Wilfred)
- Fix for a lockdep reported issue with the elevator lock and
blk group frozen operations
- Fix for a regression in this merge window, where updating
'nr_requests' would not do the right thing for queues with
shared tags
* tag 'block-6.18-20251016' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
nvme/tcp: handle tls partially sent records in write_space()
block: Remove elevator_lock usage from blkg_conf frozen operations
blk-mq: fix stale tag depth for shared sched tags in blk_mq_update_nr_requests()
nvme-auth: update sc_c in host response
nvme-multipath: Skip nr_active increments in RETRY disposition
Linus Torvalds [Fri, 17 Oct 2025 15:16:58 +0000 (08:16 -0700)]
Merge tag 'drm-fixes-2025-10-17' of https://gitlab.freedesktop.org/drm/kernel
Pull drm fixes from Dave Airlie:
"As per usual xe/amdgpu are the leaders, with some i915 and then a
bunch of scattered fixes. There are a bunch of stability fixes for
some older amdgpu cards.
draw:
- Avoid color truncation
gpuvm:
- Avoid kernel-doc warning
sched:
- Avoid double free
i915:
- Skip GuC communication warning if reset is in progress
- Couple frontbuffer related fixes
- Deactivate PSR only on LNL and when selective fetch enabled
xe:
- Increase global invalidation timeout to handle some workloads
- Fix NPD while evicting BOs in an array of VM binds
- Fix resizable BAR to account for possibly needing to move BARs
other than the LMEMBAR
- Fix error handling in xe_migrate_init()
- Fix atomic fault handling with mixed mappings or if the page is
already in VRAM
- Enable media samplers power gating for platforms before Xe2
- Fix de-registering exec queue from GuC when unbinding
- Ensure data migration to system if indicated by madvise with SVM
- Fix kerneldoc for kunit change
- Always account for cacheline alignment on migration
- Drop bogus assertion on eviction
amdgpu:
- Backlight fix
- SI fixes
- CIK fix
- Make CE support debug only
- IP discovery fix
- Ring reset fixes
- GPUVM fault memory barrier fix
- Drop unused structures in amdgpu_drm.h
- JPEG debugfs fix
- VRAM handling fixes for GPUs without VRAM
- GC 12 MES fixes
amdkfd:
- MES fix
ast:
- Fix display output after reboot
bridge:
- lt9211: Fix version check
panthor:
- Fix MCU suspend
qaic:
- Init bootlog in correct order
- Treat remaining == 0 as error in find_and_map_user_pages()
- Lock access to DBC request queue
rockchip:
- vop2: Fix destination size in atomic check"
* tag 'drm-fixes-2025-10-17' of https://gitlab.freedesktop.org/drm/kernel: (44 commits)
drm/sched: Fix potential double free in drm_sched_job_add_resv_dependencies
drm/xe/evict: drop bogus assert
drm/xe/migrate: don't misalign current bytes
drm/xe/kunit: Fix kerneldoc for parameterized tests
drm/xe/svm: Ensure data will be migrated to system if indicated by madvise.
drm/gpuvm: Fix kernel-doc warning for drm_gpuvm_map_req.map
drm/i915/psr: Deactivate PSR only on LNL and when selective fetch enabled
drm/ast: Blank with VGACR17 sync enable, always clear VGACRB6 sync off
accel/qaic: Synchronize access to DBC request queue head & tail pointer
accel/qaic: Treat remaining == 0 as error in find_and_map_user_pages()
accel/qaic: Fix bootlog initialization ordering
drm/rockchip: vop2: use correct destination rectangle height check
drm/draw: fix color truncation in drm_draw_fill24
drm/xe/guc: Check GuC running state before deregistering exec queue
drm/xe: Enable media sampler power gating
drm/xe: Handle mixed mappings and existing VRAM on atomic faults
drm/xe/migrate: Fix an error path
drm/xe: Move rebar to be done earlier
drm/xe: Don't allow evicting of BOs in same VM in array of VM binds
drm/xe: Increase global invalidation timeout to 1000us
...
commit 337bf13aa9dda ("PCI/VGA: Replace vga_is_firmware_default() with a
screen info check") introduced an implicit dependency upon SCREEN_INFO by
removing the open coded implementation.
If a user didn't have CONFIG_SCREEN_INFO set, vga_is_firmware_default()
would now return false. SCREEN_INFO is only used on X86 so add a
conditional select for SCREEN_INFO to ensure that the VGA arbiter works as
intended.
Fixes: 337bf13aa9dda ("PCI/VGA: Replace vga_is_firmware_default() with a screen info check") Reported-by: Eric Biggers <ebiggers@kernel.org> Closes: https://lore.kernel.org/linux-pci/20251012182302.GA3412@sol/ Suggested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Tested-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20251013220829.1536292-1-superm1@kernel.org
Inochi Amaoto [Tue, 14 Oct 2025 01:46:07 +0000 (09:46 +0800)]
PCI: vmd: Override irq_startup()/irq_shutdown() in vmd_init_dev_msi_info()
Since commit 54f45a30c0d0 ("PCI/MSI: Add startup/shutdown for per
device domains") set callback irq_startup() and irq_shutdown() of
the struct pci_msi[x]_template, __irq_startup() will always invokes
irq_startup() callback instead of irq_enable() callback overridden
in vmd_init_dev_msi_info(). This will not start the IRQ correctly.
Also override irq_startup()/irq_shutdown() in vmd_init_dev_msi_info(),
so the irq_startup() can invoke the real logic.
Dave Airlie [Thu, 16 Oct 2025 23:39:34 +0000 (09:39 +1000)]
Merge tag 'drm-xe-fixes-2025-10-16' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Driver Changes:
- Increase global invalidation timeout to handle some workloads
(Kenneth Graunke)
- Fix NPD while evicting BOs in an array of VM binds (Matthew Brost)
- Fix resizable BAR to account for possibly needing to move BARs other
than the LMEMBAR (Lucas De Marchi)
- Fix error handling in xe_migrate_init() (Thomas Hellström)
- Fix atomic fault handling with mixed mappings or if the page is
already in VRAM (Matthew Brost)
- Enable media samplers power gating for platforms before Xe2 (Vinay
Belgaumkar)
- Fix de-registering exec queue from GuC when unbinding (Matthew Brost)
- Ensure data migration to system if indicated by madvise with SVM
(Thomas Hellström)
- Fix kerneldoc for kunit change (Matt Roper)
- Always account for cacheline alignment on migration (Matthew Auld)
- Drop bogus assertion on eviction (Matthew Auld)
Miguel Ojeda [Fri, 10 Oct 2025 17:43:49 +0000 (19:43 +0200)]
docs: rust: add section on imports formatting
`rustfmt`, by default, formats imports in a way that is prone to conflicts
while merging and rebasing, since in some cases it condenses several
items into the same line.
For instance, Linus mentioned [1] that the following case:
use crate::{
fmt,
page::AsPageIter,
};
is compressed by `rustfmt` into:
use crate::{fmt, page::AsPageIter};
which is undesirable.
Similarly, `rustfmt` may put several items in the same line even if the
braces span already multiple lines, e.g.:
use kernel::{
acpi, c_str,
device::{property, Core},
of, platform,
};
The options that control the formatting behavior around imports are
generally unstable, and `rustfmt` releases do not allow to use nightly
features, unlike the compiler and other Rust tooling [2].
For the moment, we can introduce a workaround to prevent `rustfmt`
from compressing the example above -- the "trailing empty comment":
use crate::{
fmt,
page::AsPageIter, //
};
which is reminiscent of the trailing comma behavior in other formatters.
We already used empty comments for formatting purposes in the past,
e.g. in commit b9b701fce49a ("rust: clarify the language unstable features
in use").
In addition, `rustfmt` actually reformats with a vertical layout (i.e. it
does not put two items in the same line) when seeing such a comment,
i.e. it doesn't just preserve the formatting, which is good in the sense
that we can use it to easily reformat some imports, since it matches
the style we generally want to have.
A Git merge driver would help (suggested by Gary and Wedson), though
maintainers would need to set it up, the diffs would still be larger
and the formatting rules for imports would remain hard to predict.
Thus document the style that we will follow in the coding guidelines
by introducing a new section and explain how the trailing empty comment
works there too.
We discussed the issue with upstream Rust in our usual Rust <-> Rust
for Linux meeting [3], and there have also been a few other discussions
in parallel in issues [4][5] and Zulip [6]. We will see what happens,
but upstream Rust has already created a subteam of `rustfmt` to try
to overcome the bandwidth issue [7], which is a good signal, and some
organization work has already started (e.g. tracking issues). We will
continue our discussions with them about it.
Dave Airlie [Thu, 16 Oct 2025 20:57:44 +0000 (06:57 +1000)]
Merge tag 'amd-drm-fixes-6.18-2025-10-16' of https://gitlab.freedesktop.org/agd5f/linux into drm-fixes
amd-drm-fixes-6.18-2025-10-16:
amdgpu:
- Backlight fix
- SI fixes
- CIK fix
- Make CE support debug only
- IP discovery fix
- Ring reset fixes
- GPUVM fault memory barrier fix
- Drop unused structures in amdgpu_drm.h
- JPEG debugfs fix
- VRAM handling fixes for GPUs without VRAM
- GC 12 MES fixes
Dave Airlie [Thu, 16 Oct 2025 20:46:14 +0000 (06:46 +1000)]
Merge tag 'drm-intel-fixes-2025-10-16' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-fixes
- Skip GuC communication warning if reset is in progress (Zhanjun)
- Couple frontbuffer related fixes (Ville)
- Deactivate PSR only on LNL and when selective fetch enabled (Jouni)
Jens Axboe [Thu, 16 Oct 2025 19:25:40 +0000 (13:25 -0600)]
Merge tag 'nvme-6.18-2025-10-16' of git://git.infradead.org/nvme into block-6.18
Pull NVMe fixes from Keith:
"- iostats accounting fixed on multipath retries (Amit)
- secure concatenation response fixup (Martin)
- tls partial record fixup (Wilfred)"
* tag 'nvme-6.18-2025-10-16' of git://git.infradead.org/nvme:
nvme/tcp: handle tls partially sent records in write_space()
nvme-auth: update sc_c in host response
nvme-multipath: Skip nr_active increments in RETRY disposition