Daniel Wagner [Mon, 30 Mar 2026 22:53:10 +0000 (23:53 +0100)]
net: phy: bcm84881: add BCM84891/BCM84892 support
The BCM84891 and BCM84892 are 10GBASE-T PHYs in the same family as the
BCM84881, sharing the register map and most callbacks. They add USXGMII
as a host interface mode.
bcm8489x_config_init() is separate from bcm84881_config_init(): it
allows only USXGMII (the only host mode available on the tested
hardware) and clears MDIO_CTRL1_LPOWER, which is set at boot on the
tested platform. Does not recur on ifdown/ifup, cable events, or
link-partner advertisement changes, so config_init is sufficient.
For USXGMII, read_status() skips the 0x4011 host-mode register: it
returns the same value regardless of negotiated copper speed (USXGMII
symbol replication). Speed comes from phy_resolve_aneg_linkmode() via
standard C45 AN resolution.
Tested on TRENDnet TEG-S750 (RTL9303 + 1x BCM84891 + 4x BCM84892)
running OpenWrt, where the MDIO controller driver is currently
OpenWrt-specific. Link verified at 100M, 1G, 2.5G, 10G.
Signed-off-by: Daniel Wagner <wagner.daniel.t@gmail.com> Reviewed-by: Florian Fainelli <florian.fainelli@broadcom.com> Reviewed-by: Nicolai Buchwitz <nb@tipi-net.de> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/20260330225310.2801264-1-wagner.daniel.t@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Li Xiasong [Mon, 30 Mar 2026 12:03:35 +0000 (20:03 +0800)]
mptcp: fix soft lockup in mptcp_recvmsg()
syzbot reported a soft lockup in mptcp_recvmsg() [0].
When receiving data with MSG_PEEK | MSG_WAITALL flags, the skb is not
removed from the sk_receive_queue. This causes sk_wait_data() to always
find available data and never perform actual waiting, leading to a soft
lockup.
Fix this by adding a 'last' parameter to track the last peeked skb.
This allows sk_wait_data() to make informed waiting decisions and prevent
infinite loops when MSG_PEEK is used.
Eric Biggers [Tue, 31 Mar 2026 02:44:38 +0000 (19:44 -0700)]
lib/crypto: Include <crypto/utils.h> instead of <crypto/algapi.h>
Since the lib/crypto/ files that include <crypto/algapi.h> need it only
for the transitive inclusion of <crypto/utils.h> (and not all the
traditional crypto API stuff that the rest of <crypto/algapi.h> is
filled with), replace these inclusions with direct inclusions of
<crypto/utils.h>.
Eric Biggers [Tue, 31 Mar 2026 02:44:30 +0000 (19:44 -0700)]
lib/crypto: aesgcm: Don't disable IRQs during AES block encryption
aes_encrypt() now uses AES instructions when available instead of always
using table-based code. AES instructions are constant-time and don't
benefit from disabling IRQs as a constant-time hardening measure.
In fact, on two architectures (arm and riscv) disabling IRQs is
counterproductive because it prevents the AES instructions from being
used. (See the may_use_simd() implementation on those architectures.)
Therefore, let's remove the IRQ disabling/enabling and leave the choice
of constant-time hardening measures to the AES library code.
Note that currently the arm table-based AES code (which runs on arm
kernels that don't have ARMv8 CE) disables IRQs, while the generic
table-based AES code does not. So this does technically regress in
constant-time hardening when that generic code is used. But as
discussed in commit a22fd0e3c495 ("lib/crypto: aes: Introduce improved
AES library") I think just leaving IRQs enabled is the right choice.
Disabling them is slow and can cause problems, and AES instructions
(which modern CPUs have) solve the problem in a much better way anyway.
Eric Biggers [Tue, 31 Mar 2026 02:44:14 +0000 (19:44 -0700)]
lib/crypto: aescfb: Don't disable IRQs during AES block encryption
aes_encrypt() now uses AES instructions when available instead of always
using table-based code. AES instructions are constant-time and don't
benefit from disabling IRQs as a constant-time hardening measure.
In fact, on two architectures (arm and riscv) disabling IRQs is
counterproductive because it prevents the AES instructions from being
used. (See the may_use_simd() implementation on those architectures.)
Therefore, let's remove the IRQ disabling/enabling and leave the choice
of constant-time hardening measures to the AES library code.
Note that currently the arm table-based AES code (which runs on arm
kernels that don't have ARMv8 CE) disables IRQs, while the generic
table-based AES code does not. So this does technically regress in
constant-time hardening when that generic code is used. But as
discussed in commit a22fd0e3c495 ("lib/crypto: aes: Introduce improved
AES library") I think just leaving IRQs enabled is the right choice.
Disabling them is slow and can cause problems, and AES instructions
(which modern CPUs have) solve the problem in a much better way anyway.
Kees Cook [Tue, 24 Mar 2026 02:07:30 +0000 (19:07 -0700)]
lkdtm/fortify: Drop unneeded FORTIFY_STR_OBJECT test
The str* family of fortified functions all use member-sized limits
for a while now, so the FORTIFY_STR_OBJECT test is redundant to
FORTIFY_STR_MEMBER. While here, replace the strncpy() use with strscpy(),
as strncpy() is being removed.
Siddharth Nayyar [Thu, 26 Mar 2026 21:25:05 +0000 (21:25 +0000)]
module: use kflagstab instead of *_gpl sections
Read kflagstab section for vmlinux and modules to determine whether
kernel symbols are GPL only.
This patch eliminates the need for fragmenting the ksymtab for infering
the value of GPL-only symbol flag, henceforth stop populating *_gpl
versions of the ksymtab and kcrctab in modpost.
Signed-off-by: Siddharth Nayyar <sidnayyar@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Siddharth Nayyar [Thu, 26 Mar 2026 21:25:03 +0000 (21:25 +0000)]
module: add kflagstab section to vmlinux and modules
This patch introduces a __kflagstab section to store symbol flags in a
dedicated data structure, similar to how CRCs are handled in the
__kcrctab.
The flags for a given symbol in __kflagstab will be located at the same
index as the symbol's entry in __ksymtab and its CRC in __kcrctab. This
design decouples the flags from the symbol table itself, allowing us to
maintain a single, sorted __ksymtab. As a result, the symbol search
remains an efficient, single lookup, regardless of the number of flags
we add in the future.
The motivation for this change comes from the Android kernel, which uses
an additional symbol flag to restrict the use of certain exported
symbols by unsigned modules, thereby enhancing kernel security. This
__kflagstab can be implemented as a bitmap to efficiently manage which
symbols are available for general use versus those restricted to signed
modules only.
This section will contain read-only data for values of kernel symbol
flags in the form of an 8-bit bitsets for each kernel symbol. Each bit
in the bitset represents a flag value defined by ksym_flags enumeration.
Petr Pavlu ran a small test to get a better understanding of the
different section sizes resulting from this patch series. He used
v6.17-rc6 together with the openSUSE x86_64 config [1], which is fairly
large. The resulting vmlinux.bin (no debuginfo) had an on-disk size of
58 MiB, and included 5937 + 6589 (GPL-only) exported symbols.
The following table summarizes his measurements and calculations
regarding the sizes of all sections related to exported symbols:
| HAVE_ARCH_PREL32_RELOCATIONS | !HAVE_ARCH_PREL32_RELOCATIONS
Section | Base [B] | Ext. [B] | Sep. [B] | Base [B] | Ext. [B] | Sep. [B]
----------------------------------------------------------------------------------------
__ksymtab | 71244 | 200416 | 150312 | 142488 | 400832 | 300624
__ksymtab_gpl | 79068 | NA | NA | 158136 | NA | NA
__kcrctab | 23748 | 50104 | 50104 | 23748 | 50104 | 50104
__kcrctab_gpl | 26356 | NA | NA | 26356 | NA | NA
__ksymtab_strings | 253628 | 253628 | 253628 | 253628 | 253628 | 253628
__kflagstab | NA | NA | 12526 | NA | NA | 12526
----------------------------------------------------------------------------------------
Total | 454044 | 504148 | 466570 | 604356 | 704564 | 616882
Increase to base [%] | NA | 11.0 | 2.8 | NA | 16.6 | 2.1
The column "HAVE_ARCH_PREL32_RELOCATIONS -> Base" contains the measured
numbers. The rest of the values are calculated. The "Ext." column
represents an alternative approach of extending __ksymtab to include a
bitset of symbol flags, and the "Sep." column represents the approach of
having a separate __kflagstab. With HAVE_ARCH_PREL32_RELOCATIONS, each
kernel_symbol is 12 B in size and is extended to 16 B. With
!HAVE_ARCH_PREL32_RELOCATIONS, it is 24 B, extended to 32 B. Note that
this does not include the metadata needed to relocate __ksymtab*, which
is freed after the initial processing.
Adding __kflagstab as a separate section has a negligible impact, as
expected. When extending __ksymtab (kernel_symbol) instead, the worst
case with !HAVE_ARCH_PREL32_RELOCATIONS increases the export data size
by 16.6%. Note that the larger increase in size for the latter approach
is due to 4-byte alignment of kernel_symbol data structure, instead of
1-byte alignment for the flags bitset in __kflagstab in the former
approach.
Based on the above, it was concluded that introducing __kflagstab makes
sense, as the added complexity is minimal over extending kernel_symbol,
and there is overall simplification of symbol finding logic in the
module loader.
Signed-off-by: Siddharth Nayyar <sidnayyar@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
[Sami: Updated commit message to include details from the cover letter.] Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Siddharth Nayyar [Thu, 26 Mar 2026 21:25:02 +0000 (21:25 +0000)]
module: define ksym_flags enumeration to represent kernel symbol flags
The core architectural issue with kernel symbol flags is our reliance on
splitting the main symbol table, ksymtab. To handle a single boolean
property, such as GPL-only, all exported symbols are split across two
separate tables: __ksymtab and __ksymtab_gpl.
This design forces the module loader to perform a separate search on
each of these tables for every symbol it needs, for vmlinux and for all
previously loaded modules.
This approach is fundamentally not scalable. If we were to introduce a
second flag, we would need four distinct symbol tables. For n boolean
flags, this model requires an exponential growth to 2^n tables,
dramatically increasing complexity.
Another consequence of this fragmentation is degraded performance. For
example, a binary search on the symbol table of vmlinux, that would take
only 14 comparison steps (assuming ~2^14 or 16K symbols) in a unified
table, can require up to 26 steps when spread across two tables
(assuming both tables have ~2^13 symbols). This performance penalty
worsens as more flags are added.
To address this, symbol flags is an enumeration used to represent flags
as a bitset, for example a flag to tell if a symbol is GPL only.
The said bitset is introduced in subsequent patches and will contain
values of kernel symbol flags. These bitset will then be used to infer
flag values rather than fragmenting ksymtab for separating symbols with
different flag values, thereby eliminating the need to fragment the
ksymtab.
Link: https://lore.kernel.org/r/20260326-kflagstab-v5-0-fa0796fe88d9@google.com Signed-off-by: Siddharth Nayyar <sidnayyar@google.com> Reviewed-by: Petr Pavlu <petr.pavlu@suse.com>
[Sami: Updated the commit message to explain the use case for the series.] Signed-off-by: Sami Tolvanen <samitolvanen@google.com>
Fredric Cover [Mon, 30 Mar 2026 20:11:27 +0000 (13:11 -0700)]
fs/smb/client: fix out-of-bounds read in cifs_sanitize_prepath
When cifs_sanitize_prepath is called with an empty string or a string
containing only delimiters (e.g., "/"), the current logic attempts to
check *(cursor2 - 1) before cursor2 has advanced. This results in an
out-of-bounds read.
This patch adds an early exit check after stripping prepended
delimiters. If no path content remains, the function returns NULL.
The bug was identified via manual audit and verified using a
standalone test case compiled with AddressSanitizer, which
triggered a SEGV on affected inputs.
Signed-off-by: Fredric Cover <FredTheDude@proton.me> Reviewed-by: Henrique Carvalho <[2]henrique.carvalho@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
====================
Fix bpf_link grace period wait for tracepoints
A recent change to non-faultable tracepoints switched from
preempt-disabled critical sections to SRCU-fast, which breaks
assumptions in the bpf_link_free() path. Use call_srcu() to fix the
breakage.
* Introduce and switch to call_tracepoint_unregister_non_faultable(). (Steven)
* Address Andrii's comment and add Acked-by. (Andrii)
* Drop rcu_trace_implies_rcu_gp() conversion. (Alexei)
bpf: Fix grace period wait for tracepoint bpf_link
Recently, tracepoints were switched from using disabled preemption
(which acts as RCU read section) to SRCU-fast when they are not
faultable. This means that to do a proper grace period wait for programs
running in such tracepoints, we must use SRCU's grace period wait.
This is only for non-faultable tracepoints, faultable ones continue
using RCU Tasks Trace.
However, bpf_link_free() currently does call_rcu() for all cases when
the link is non-sleepable (hence, for tracepoints, non-faultable). Fix
this by doing a call_srcu() grace period wait.
As far RCU Tasks Trace gp -> RCU gp chaining is concerned, it is deemed
unnecessary for tracepoint programs. The link and program are either
accessed under RCU Tasks Trace protection, or SRCU-fast protection now.
The earlier logic of chaining both RCU Tasks Trace and RCU gp waits was
to generalize the logic, even if it conceded an extra RCU gp wait,
however that is unnecessary for tracepoints even before this change.
In practice no cost was paid since rcu_trace_implies_rcu_gp() was always
true. Hence we need not chaining any RCU gp after the SRCU gp.
For instance, in the non-faultable raw tracepoint, the RCU read section
of the program in __bpf_trace_run() is enclosed in the SRCU gp, likewise
for faultable raw tracepoint, the program is under the RCU Tasks Trace
protection. Hence, the outermost scope can be waited upon to ensure
correctness.
Also, sleepable programs cannot be attached to non-faultable
tracepoints, so whenever program or link is sleepable, only RCU Tasks
Trace protection is being used for the link and prog.
Fixes: a46023d5616e ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast") Reviewed-by: Sun Jian <sun.jian.kdev@gmail.com> Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://lore.kernel.org/r/20260331211021.1632902-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Mykyta Yatsenko [Tue, 31 Mar 2026 17:26:34 +0000 (18:26 +0100)]
selftests/bpf: Suppress veristat error messages in non-verbose mode
When running veristat across many BPF objects, expected load failures
produce noisy stderr output that obscures actual issues. Gate these
diagnostic messages behind --verbose.
Menglong Dong [Tue, 31 Mar 2026 07:04:34 +0000 (15:04 +0800)]
selftests/bpf: Test access to ringbuf position with map pointer
Add the testing to access the bpf_ringbuf with the map pointer.
"consumer_pos" and "producer_pos" is accessed in this testing. We reserve
128 bytes in the ringbuf to test the producer_pos, which should be
"128 + BPF_RINGBUF_HDR_SZ".
It will be helpful if we want to evaluate the usage of the ringbuf in bpf
prog with the consumer and producer position.
Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Leon Hwang <leon.hwang@linux.dev> Link: https://lore.kernel.org/bpf/20260331070434.10037-1-dongml2@chinatelecom.cn
Eyal Birger [Tue, 31 Mar 2026 13:06:12 +0000 (06:06 -0700)]
bpf: Clarify BPF_RB_NO_WAKEUP behavior for bpf_ringbuf_discard()
Clarify bpf_ringbuf_discard() documentation for BPF_RB_NO_WAKEUP.
Discarded ring buffer records are still left in the ring buffer and are
only skipped when user space consumes them. This can matter when
BPF_RB_NO_WAKEUP is used: a later submit relying on adaptive wakeup
might not wake the consumer, because the discarded record still needs to
be consumed first.
Zhengchuan Liang [Mon, 30 Mar 2026 08:46:24 +0000 (16:46 +0800)]
net: ipv6: flowlabel: defer exclusive option free until RCU teardown
`ip6fl_seq_show()` walks the global flowlabel hash under the seq-file
RCU read-side lock and prints `fl->opt->opt_nflen` when an option block
is present.
Exclusive flowlabels currently free `fl->opt` as soon as `fl->users`
drops to zero in `fl_release()`. However, the surrounding
`struct ip6_flowlabel` remains visible in the global hash table until
later garbage collection removes it and `fl_free_rcu()` finally tears it
down.
A concurrent `/proc/net/ip6_flowlabel` reader can therefore race that
early `kfree()` and dereference freed option state, triggering a crash
in `ip6fl_seq_show()`.
Fix this by keeping `fl->opt` alive until `fl_free_rcu()`. That matches
the lifetime already required for the enclosing flowlabel while readers
can still reach it under RCU.
Fixes: d3aedd5ebd4b ("ipv6 flowlabel: Convert hash list to RCU.") Reported-by: Yifan Wu <yifanwucs@gmail.com> Reported-by: Juefei Pu <tomapufckgml@gmail.com> Co-developed-by: Yuan Tan <yuantan098@gmail.com> Signed-off-by: Yuan Tan <yuantan098@gmail.com> Suggested-by: Xin Liu <bird@lzu.edu.cn> Tested-by: Ren Wei <enjou1224z@gmail.com> Signed-off-by: Zhengchuan Liang <zcliangcn@gmail.com> Signed-off-by: Ren Wei <n05ec@lzu.edu.cn> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/07351f0ec47bcee289576f39f9354f4a64add6e4.1774855883.git.zcliangcn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
In case rold->reg->range == BEYOND_PKT_END && rcur->reg->range == N
regsafe() may return true which may lead to current state with
valid packet range not being explored. Fix the bug.
Fixes: 6d94e741a8ff ("bpf: Support for pointers beyond pkt_end.") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Amery Hung <ameryhung@gmail.com> Acked-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/bpf/20260331204228.26726-1-alexei.starovoitov@gmail.com
Dmytro Maluka [Thu, 19 Mar 2026 15:38:12 +0000 (16:38 +0100)]
selftests/cpu-hotplug: Fix check for cpu hotplug not supported
If CONFIG_HOTPLUG_CPU is disabled, /sys/devices/system/cpu/cpu*
directories are still populated, so the test fails to correctly detect
that CPU hotplug is not supported.
Fix this by checking for the presence of 'online' files in those
directories instead. The 'online' node is created for the given CPU if
and only if this CPU supports hotplug. So if none of the CPUs have
'online' nodes, it means CPU hotplug is not supported.
pstore/ftrace: Keep ftrace module parameter and debugfs switch in sync
Commit a5d05b07961a ("pstore/ftrace: Allow immediate recording")
introduced a kernel parameter to enable early-boot collection for
ftrace frontend. But then, if we enable the debugfs later, the
parameter remains set as N. This is not a biggie, things work fine;
but at the same time, why not have both in sync if possible, right?
Cole Leavitt [Wed, 25 Feb 2026 23:54:06 +0000 (16:54 -0700)]
pstore/ram: fix resource leak when ioremap() fails
In persistent_ram_iomap(), ioremap() or ioremap_wc() may return NULL on
failure. Currently, if this happens, the function returns NULL without
releasing the memory region acquired by request_mem_region().
This leads to a resource leak where the memory region remains reserved
but unusable.
Additionally, the caller persistent_ram_buffer_map() handles NULL
correctly by returning -ENOMEM, but without this check, a NULL return
combined with request_mem_region() succeeding leaves resources in an
inconsistent state.
This is the ioremap() counterpart to commit 05363abc7625 ("pstore:
ram_core: fix incorrect success return when vmap() fails") which fixed
a similar issue in the vmap() path.
In order to set ECC on ramoops, the parameter "ecc" should be
used. The variable that carries this information is "ramoops_ecc".
Due to some confusion in the parameter setting functions, modinfo
ends-up showing both "ecc" and "ramoops_ecc" as valid parameters,
but only "ecc" is the valid one, hence this fix to the parameter
help text.
Seems the linux/memblock.h inclusion was added early on due
to usage of some memblock allocation routine. But that was
removed, header was forgotten, hence let's remove that.
Andrey Skvortsov [Sun, 15 Feb 2026 18:51:55 +0000 (21:51 +0300)]
pstore: fix ftrace dump, when ECC is enabled
total_size is sum of record->size and record->ecc_notice_size (ECC: No
errors detected). When ECC is not used, then there is no problem.
When ECC is enabled, then ftrace dump is decoded incorrectly after
restart.
First this affects starting offset calculation, that breaks
reading of all ftrace records.
Waiman Long [Tue, 31 Mar 2026 18:35:22 +0000 (14:35 -0400)]
workqueue: Remove HK_TYPE_WQ from affecting wq_unbound_cpumask
For historical reason, wq_unbound_cpumask is initially set as
intersection of HK_TYPE_DOMAIN, HK_TYPE_WQ and workqueue.unbound_cpus
boot command line option.
At run time, users can update the unbound cpumask via the
/sys/devices/virtual/workqueue/cpumask sysfs file. Creation
and modification of cpuset isolated partitions will also update
wq_unbound_cpumask based on the latest HK_TYPE_DOMAIN cpumask.
The HK_TYPE_WQ cpumask is out of the picture with these runtime updates.
Complete the transition by taking HK_TYPE_WQ out from the workqueue code
and make it depends on HK_TYPE_DOMAIN only from the housekeeping side.
The final goal is to eliminate HK_TYPE_WQ as a housekeeping cpumask type.
Kees Cook [Tue, 31 Mar 2026 16:37:19 +0000 (09:37 -0700)]
refcount: Remove unused __signed_wrap function annotations
With CONFIG_UBSAN_INTEGER_WRAP being replaced by Overflow Behavior
Types, remove the __signed_wrap function annotation as it is already
unused, and any future work here will use OBT annotations instead.
Dave Airlie [Tue, 31 Mar 2026 21:20:59 +0000 (07:20 +1000)]
Merge tag 'drm-rust-next-2026-03-30' of https://gitlab.freedesktop.org/drm/rust/kernel into drm-next
DRM Rust changes for v7.1-rc1
- DMA:
- Rework the DMA coherent API: introduce Coherent<T> as a generalized
container for arbitrary types, replacing the slice-only
CoherentAllocation<T>. Add CoherentBox for memory initialization
before exposing a buffer to hardware (converting to Coherent when
ready), and CoherentHandle for allocations without kernel mapping.
- Add Coherent::init() / init_with_attrs() for one-shot initialization
via pin-init, and from-slice constructors for both Coherent and
CoherentBox
- Add uaccess write_dma() for copying from DMA buffers to userspace
and BinaryWriter support for Coherent<T>
- DRM:
- Add GPU buddy allocator abstraction
- Add DRM shmem GEM helper abstraction
- Allow drm::Device to dispatch work and delayed work items to driver
private data
- Add impl_aref_for_gem_obj!() macro to reduce GEM refcount
boilerplate, and introduce DriverObject::Args for constructor
context
- Add dma_resv_lock helper and raw_dma_resv() accessor on GEM objects
- Clean up imports across the DRM module
- I/O:
- Merged via a signed tag from the driver-core tree: register!() macro
and I/O infrastructure improvements (IoCapable refactor, RelaxedMmio
wrapper, IoLoc trait, generic accessors, write_reg /
LocatedRegister)
- Nova (Core):
- Fix and harden the GSP command queue: correct write pointer
advancing, empty slot handling, and ring buffer indexing; add mutex
locking and make Cmdq a pinned type; distinguish wait vs no-wait
commands
- Add support for large RPCs via continuation records, splitting
oversized commands across multiple queue slots
- Simplify GSP sequencer and message handling code: remove unused
trait and Display impls, derive Debug and Zeroable where applicable,
warn on unconsumed message data
- Refactor Falcon firmware handling: create DMA objects lazily, add
PIO upload support, and use the Generic Bootloader to boot FWSEC on
Turing
- Convert all register definitions (PMC, PBUS, PFB, GC6, FUSE, PDISP,
Falcon) to the kernel register!() macro; add bounded_enum macro to
define enums usable as register fields
- Migrate all DMA usage to the new Coherent, CoherentBox, and
CoherentHandle APIs
- Harden firmware parsing with checked arithmetic throughout FWSEC,
Booter, RISC-V parsing paths
- Add debugfs support for reading GSP-RM log buffers; replace
module_pci_driver!() with explicit module init to support
module-level debugfs setup
- Fix auxiliary device registration for multi-GPU systems
Linus Torvalds [Tue, 31 Mar 2026 21:23:12 +0000 (14:23 -0700)]
Merge tag 'sched_ext-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext
Pull sched_ext fixes from Tejun Heo:
- Fix SCX_KICK_WAIT deadlock where multiple CPUs waiting for each other
in hardirq context form a cycle. Move the wait to a balance callback
which can drop the rq lock and process IPIs.
- Fix inconsistent NUMA node lookup in scx_select_cpu_dfl() where
the waker_node used cpu_to_node() while prev_cpu used
scx_cpu_node_if_enabled(), leading to undefined behavior when
per-node idle tracking is disabled.
* tag 'sched_ext-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext:
selftests/sched_ext: Add cyclic SCX_KICK_WAIT stress test
sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback
sched_ext: Fix inconsistent NUMA node lookup in scx_select_cpu_dfl()
Linus Torvalds [Tue, 31 Mar 2026 21:20:39 +0000 (14:20 -0700)]
Merge tag 'wq-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq
Pull workqueue fix from Tejun Heo:
- Fix false positive stall reports on weakly ordered architectures
where the lockless worklist/timestamp check in the watchdog can
observe stale values due to memory reordering.
Recheck under pool->lock to confirm.
* tag 'wq-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: Better describe stall check
workqueue: Fix false positive stall reports
Simon Liebold [Thu, 12 Mar 2026 14:02:00 +0000 (14:02 +0000)]
selftests/mqueue: Fix incorrectly named file
Commit 85506aca2eb4 ("selftests/mqueue: Set timeout to 180 seconds")
intended to increase the timeout for mq_perf_tests from the default
kselftest limit of 45 seconds to 180 seconds.
Unfortunately, the file storing this information was incorrectly named
`setting` instead of `settings`, causing the kselftest runner not to
pick up the limit and keep using the default 45 seconds limit.
Fix this by renaming it to `settings` to ensure that the kselftest
runner uses the increased timeout of 180 seconds for this test.
Fixes: 85506aca2eb4 ("selftests/mqueue: Set timeout to 180 seconds") Cc: <stable@vger.kernel.org> # 5.10.y Signed-off-by: Simon Liebold <simonlie@amazon.de> Link: https://lore.kernel.org/r/20260312140200.2224850-1-simonlie@amazon.de Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
Linus Torvalds [Tue, 31 Mar 2026 20:59:51 +0000 (13:59 -0700)]
Merge tag 'cgroup-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
- Fix cgroup rmdir racing with dying tasks.
Deferred task cgroup unlink introduced a window where cgroup.procs
is empty but the cgroup is still populated, causing rmdir to fail
with -EBUSY and selftest failures.
Make rmdir wait for dying tasks to fully leave and fix selftests to
not depend on synchronous populated updates.
- Fix cpuset v1 task migration failure from empty cpusets under strict
security policies.
When CPU hotplug removes the last CPU from a v1 cpuset, tasks must be
migrated to an ancestor without a security_task_setscheduler() check
that would block the migration.
* tag 'cgroup-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup/cpuset: Skip security check for hotplug induced v1 task migration
cgroup/cpuset: Simplify setsched decision check in task iteration loop of cpuset_can_attach()
cgroup: Fix cgroup_drain_dying() testing the wrong condition
selftests/cgroup: Don't require synchronous populated update on task exit
cgroup: Wait for dying tasks to leave on rmdir
Hangbin Liu [Wed, 25 Feb 2026 01:08:33 +0000 (01:08 +0000)]
selftests: Use ktap helpers for runner.sh
Instead of manually writing ktap messages, we should use the formal
ktap helpers in runner.sh. Brendan did some work in commit d9e6269e3303
("selftests/run_kselftest.sh: exit with error if tests fail") to make
run_kselftest.sh exit with the correct return value. However, the output
does not include the total results, such as how many tests passed or failed.
Let’s convert all manually printed messages in runner.sh to use the
formal ktap helpers. Here are what I changed:
1. Move TAP header from runner.sh to run_kselftest.sh, since
run_kselftest.sh is the only caller of run_many().
2. In run_kselftest.sh, call run_many() in main process to count the
pass/fail numbers.
3. In run_kselftest.sh, do not generate kselftest_failures_file. Just
use ktap_print_totals to report the result.
4. In runner.sh run_one(), get the return value and use ktap helpers for
all pass/fail reporting. This allows counting pass/fail numbers in the
main process.
5. In runner.sh run_in_netns(), also return the correct rc, so we can
count results during wait.
After the change, the printed result looks like:
not ok 4 4 selftests: clone3: clone3_cap_checkpoint_restore # exit=1
# Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0
]# echo $?
1
Fixed change log commit description errors and long lines:
Shuah Khan <skhan@linuxfoundation.org>
Akhil P Oommen [Fri, 27 Mar 2026 00:14:06 +0000 (05:44 +0530)]
drm/msm/adreno: Expose a PARAM to check AQE support
AQE (Applicaton Qrisc Engine) is required to support VK ray-pipeline. Two
conditions should be met to use this HW:
1. AQE firmware should be loaded and programmed
2. Preemption support
Expose a new MSM_PARAM to allow userspace to query its support.
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714685/
Message-ID: <20260327-a8xx-gpu-batch2-v2-17-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:14:05 +0000 (05:44 +0530)]
drm/msm/a6xx: Enable Preemption on X2-85
Add the save-restore register lists and set the necessary quirk flags
in the catalog to enable the Preemption feature on Adreno X2-85 GPU.
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714684/
Message-ID: <20260327-a8xx-gpu-batch2-v2-16-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:14:04 +0000 (05:44 +0530)]
drm/msm/a8xx: Preemption support for A840
The programing sequence related to preemption is unchanged from A7x. But
there is some code churn due to register shuffling in A8x. So, split out
the common code into a header file for code sharing and add/update
additional changes required to support preemption feature on A8x GPUs.
Finally, enable the preemption quirk in A840's catalog to enable this
feature.
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714682/
Message-ID: <20260327-a8xx-gpu-batch2-v2-15-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:14:03 +0000 (05:44 +0530)]
drm/msm/a8xx: Implement IFPC support for A840
Implement pwrup reglist support and add the necessary register
configurations to enable IFPC support on A840
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714679/
Message-ID: <20260327-a8xx-gpu-batch2-v2-14-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:14:02 +0000 (05:44 +0530)]
drm/msm/a6xx: Add SKU detection support for X2-85
Add the Speedbin table to the catalog to enable SKU detection support
for X2-85 GPU found in Glymur chipset. As this chipset support the SOFT
FUSE mechanism, enable the ADRENO_QUIRK_SOFTFUSE quirk too.
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714677/
Message-ID: <20260327-a8xx-gpu-batch2-v2-13-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:14:01 +0000 (05:44 +0530)]
drm/msm/a6xx: Add soft fuse detection support
Recent chipsets like Glymur supports a new mechanism for SKU detection.
A new CX_MISC register exposes the combined (or final) speedbin value
from both HW fuse register and the Soft Fuse register. Implement this new
SKU detection along with a new quirk to identify the GPUs that has soft
fuse support.
There is a side effect of this patch on A4x and older series. The
speedbin field in the MSM_PARAM_CHIPID will be 0 instead of 0xffff. This
should be okay as Mesa correctly handles it. Speedbin was not even a
thing when those GPUs' support were added.
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714676/
Message-ID: <20260327-a8xx-gpu-batch2-v2-12-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:14:00 +0000 (05:44 +0530)]
drm/msm/a8xx: Add SKU table for A840
Add the SKU table in the catalog for A840 GPU. This data helps to pick
the correct bin from the OPP table based on the speed_bin fuse value.
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714673/
Message-ID: <20260327-a8xx-gpu-batch2-v2-11-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:13:59 +0000 (05:43 +0530)]
drm/msm/a6xx: Update HFI definitions
Update the HFI definitions to support additional GMU based power
features.
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714671/
Message-ID: <20260327-a8xx-gpu-batch2-v2-10-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:13:58 +0000 (05:43 +0530)]
drm/msm/a6xx: Use packed structs for HFI
HFI related structs define the ABI between the KMD and the GMU firmware.
So, use packed structures to avoid unintended compiler inserted padding.
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714669/
Message-ID: <20260327-a8xx-gpu-batch2-v2-9-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:13:56 +0000 (05:43 +0530)]
drm/msm/a6xx: Add support for Debug HFI Q
Add the Debug HFI Queue which contains the F2H messages posted from the
GMU firmware. Having this data in coredump is useful to debug firmware
issues.
Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714666/
Message-ID: <20260327-a8xx-gpu-batch2-v2-7-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:13:55 +0000 (05:43 +0530)]
drm/msm/a6xx: Fix gpu init from secure world
A7XX_GEN2 and newer GPUs requires initialization of few configurations
related to features/power from secure world. The SCM call to do this
should be triggered after GDSC and clocks are enabled. So, keep this
sequence to a6xx_gmu_resume instead of the probe.
Also, simplify the error handling in a6xx_gmu_resume() using 'goto'
labels.
Fixes: 14b27d5df3ea ("drm/msm/a7xx: Initialize a750 "software fuse"") Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714664/
Message-ID: <20260327-a8xx-gpu-batch2-v2-6-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:13:53 +0000 (05:43 +0530)]
drm/msm/a6xx: Correct OOB usage
During the GMU resume sequence, using another OOB other than OOB_GPU may
confuse the internal state of GMU firmware. To align more strictly with
the downstream sequence, move the sysprof related OOB setup after the
OOB_GPU is cleared.
Fixes: 62cd0fa6990b ("drm/msm/adreno: Disable IFPC when sysprof is active") Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714659/
Message-ID: <20260327-a8xx-gpu-batch2-v2-4-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:13:52 +0000 (05:43 +0530)]
drm/msm/a6xx: Switch to preemption safe AO counter
CP_ALWAYS_ON_COUNTER is not save-restored during preemption, so it won't
provide accurate data about the 'submit' when preemption is enabled.
Switch to CP_ALWAYS_ON_CONTEXT which is preemption safe.
Fixes: e7ae83da4a28 ("drm/msm/a6xx: Implement preemption for a7xx targets") Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714657/
Message-ID: <20260327-a8xx-gpu-batch2-v2-3-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:13:51 +0000 (05:43 +0530)]
drm/msm/a8xx: Fix the ticks used in submit traces
GMU_ALWAYS_ON_COUNTER_* registers got moved in A8x, but currently, A6x
register offsets are used in the submit traces instead of A8x offsets.
To fix this, refactor a bit and use adreno_gpu->funcs->get_timestamp()
everywhere.
While we are at it, update a8xx_gmu_get_timestamp() to use the GMU AO
counter.
Fixes: 288a93200892 ("drm/msm/adreno: Introduce A8x GPU Support") Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714655/
Message-ID: <20260327-a8xx-gpu-batch2-v2-2-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Akhil P Oommen [Fri, 27 Mar 2026 00:13:50 +0000 (05:43 +0530)]
drm/msm/a6xx: Use barriers while updating HFI Q headers
To avoid harmful compiler optimizations and IO reordering in the HW, use
barriers and READ/WRITE_ONCE helpers as necessary while accessing the HFI
queue index variables.
Fixes: 4b565ca5a2cb ("drm/msm: Add A6XX device support") Signed-off-by: Akhil P Oommen <akhilpo@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/714653/
Message-ID: <20260327-a8xx-gpu-batch2-v2-1-2b53c38d2101@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Yasuaki Torimaru [Wed, 25 Mar 2026 11:46:34 +0000 (20:46 +0900)]
drm/msm/gem: fix error handling in msm_ioctl_gem_info_get_metadata()
msm_ioctl_gem_info_get_metadata() always returns 0 regardless of
errors. When copy_to_user() fails or the user buffer is too small,
the error code stored in ret is ignored because the function
unconditionally returns 0. This causes userspace to believe the
ioctl succeeded when it did not.
Additionally, kmemdup() can return NULL on allocation failure, but
the return value is not checked. This leads to a NULL pointer
dereference in the subsequent copy_to_user() call.
Add the missing NULL check for kmemdup() and return ret instead of 0.
Note that the SET counterpart (msm_ioctl_gem_info_set_metadata)
correctly returns ret.
Fixes: 9902cb999e4e ("drm/msm/gem: Add metadata") Cc: stable@vger.kernel.org Signed-off-by: Yasuaki Torimaru <yasuakitorimaru@gmail.com>
Patchwork: https://patchwork.freedesktop.org/patch/714478/
Message-ID: <20260325114635.383241-1-yasuakitorimaru@gmail.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Rob Clark [Wed, 25 Mar 2026 18:41:05 +0000 (11:41 -0700)]
drm/msm/shrinker: Fix can_block() logic
The intention here was to allow blocking if DIRECT_RECLAIM or if called
from kswapd and KSWAPD_RECLAIM is set.
Reported by Claude code review: https://lore.gitlab.freedesktop.org/drm-ai-reviews/review-patch9-20260309151119.290217-10-boris.brezillon@collabora.com/ on a panthor patch which had copied similar logic.
Reported-by: Boris Brezillon <boris.brezillon@collabora.com> Fixes: 7860d720a84c ("drm/msm: Fix build break with recent mm tree") Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Patchwork: https://patchwork.freedesktop.org/patch/714238/
Message-ID: <20260325184106.1259528-1-robin.clark@oss.qualcomm.com>
Rob Clark [Tue, 24 Mar 2026 22:05:17 +0000 (15:05 -0700)]
drm/msm: Disallow foreign mapping of _NO_SHARE
This restriction applies to mapping of _NO_SHARE objs in the kms vm as
well as importing/exporting BOs. Since the DPU has it's own VM, scanout
counts as "exporting" a BO from outside of it's host VM.
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/713897/
Message-ID: <20260324220519.1221471-1-robin.clark@oss.qualcomm.com>
Rob Clark [Mon, 16 Mar 2026 18:44:42 +0000 (11:44 -0700)]
drm/msm/vma: Avoid lock in VM_BIND fence signaling path
Use msm_gem_unpin_active(), similar to what is used in the GEM_SUBMIT
path. This avoids needing to hold the obj lock, and the end result is
the same. (As with GEM_SUBMIT, we know the fence isn't signaled yet.)
Reported-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Fixes: 2e6a8a1fe2b2 ("drm/msm: Add VM_BIND ioctl") Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/712230/
Message-ID: <20260316184442.673558-1-robin.clark@oss.qualcomm.com>
Rob Clark [Mon, 16 Mar 2026 18:34:33 +0000 (11:34 -0700)]
drm/msm/adreno: Change chip_id format
The "ipv4-style" %u.%u.%u.%u used to make sense when the chip_id was
simply encoding gen.major.minor.patch. But this hasn't been true for
at least a couple years.
Switch to %08x, which is still easy enough to read for older devices,
and much easier to read with the new scheme.
Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/712222/
Message-ID: <20260316183436.671482-2-robin.clark@oss.qualcomm.com>
dt-bindings: display/msm/gpu: Drop redundant reg-names in one if:then:
Top-level reg-names defines already proper order for "reg-names" with
minItems: 1, so no need to repeat it again in one of "if:then:" cases.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Reviewed-by: David Heidelberg <david@ixit.cz> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Acked-by: Rob Herring (Arm) <robh@kernel.org>
Patchwork: https://patchwork.freedesktop.org/patch/707987/
Message-ID: <20260301142033.88851-2-krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Anna Maniscalco [Tue, 10 Feb 2026 16:29:42 +0000 (17:29 +0100)]
drm/msm: always recover the gpu
Previously, in case there was no more work to do, recover worker
wouldn't trigger recovery and would instead rely on the gpu going to
sleep and then resuming when more work is submitted.
Recover_worker will first increment the fence of the hung ring so, if
there's only one job submitted to a ring and that causes an hang, it
will early out.
There's no guarantee that the gpu will suspend and resume before more
work is submitted and if the gpu is in a hung state it will stay in that
state and probably trigger a timeout again.
Just stop checking and always recover the gpu.
Signed-off-by: Anna Maniscalco <anna.maniscalco2000@gmail.com> Cc: stable@vger.kernel.org
Patchwork: https://patchwork.freedesktop.org/patch/704066/
Message-ID: <20260210-recovery_suspend_fix-v1-1-00ed9013da04@gmail.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Signed-off-by: Richard Acayan <mailingradian@gmail.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Patchwork: https://patchwork.freedesktop.org/patch/703803/
Message-ID: <20260210014603.1372-2-mailingradian@gmail.com> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Forbid recover/fault worker to enter memory reclaim (under
gpu->lock) to address this deadlock scenario.
Cc: Tomasz Figa <tfiga@chromium.org> Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com>
Patchwork: https://patchwork.freedesktop.org/patch/700978/
Message-ID: <20260127073341.2862078-1-senozhatsky@chromium.org> Signed-off-by: Rob Clark <robin.clark@oss.qualcomm.com>
Jialin Wang [Tue, 31 Mar 2026 10:05:09 +0000 (10:05 +0000)]
blk-iocost: fix busy_level reset when no IOs complete
When a disk is saturated, it is common for no IOs to complete within a
timer period. Currently, in this case, rq_wait_pct and missed_ppm are
calculated as 0, the iocost incorrectly interprets this as meeting QoS
targets and resets busy_level to 0.
This reset prevents busy_level from reaching the threshold (4) needed
to reduce vrate. On certain cloud storage, such as Azure Premium SSD,
we observed that iocost may fail to reduce vrate for tens of seconds
during saturation, failing to mitigate noisy neighbor issues.
Fix this by tracking the number of IO completions (nr_done) in a period.
If nr_done is 0 and there are lagging IOs, the saturation status is
unknown, so we keep busy_level unchanged.
The issue is consistently reproducible on Azure Standard_D8as_v5 (Dasv5)
VMs with 512GB Premium SSD (P20) using the script below. It was not
observed on GCP n2d VMs (with 100G pd-ssd and 1.5T local-ssd), and no
regressions were found with this patch. In this script, cgA performs
large IOs with iodepth=128, while cgB performs small IOs with iodepth=1
rate_iops=100 rw=randrw. With iocost enabled, we expect it to throttle
cgA, the submission latency (slat) of cgA should be significantly higher,
cgB can reach 200 IOPS and the completion latency (clat) should below.
Jackie Liu [Tue, 31 Mar 2026 08:50:54 +0000 (16:50 +0800)]
blk-cgroup: fix disk reference leak in blkcg_maybe_throttle_current()
Add the missing put_disk() on the error path in
blkcg_maybe_throttle_current(). When blkcg lookup, blkg lookup, or
blkg_tryget() fails, the function jumps to the out label which only
calls rcu_read_unlock() but does not release the disk reference acquired
by blkcg_schedule_throttle() via get_device(). Since current->throttle_disk
is already set to NULL before the lookup, blkcg_exit() cannot release
this reference either, causing the disk to never be freed.
Restore the reference release that was present as blk_put_queue() in the
original code but was inadvertently dropped during the conversion from
request_queue to gendisk.
Fixes: f05837ed73d0 ("blk-cgroup: store a gendisk to throttle in struct task_struct") Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260331085054.46857-1-liu.yun@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
selftests: harness: Detect illegal mixing of kselftest and harness functionality
Users may accidentally use the kselftest_test_result_*() functions in
their harness tests. If ksft_finished() is not used, the results
reported in this way are silently ignored.
Detect such false-positive cases and fail the test.
A more correct test would be to reject *any* usage of the ksft APIs but
that would force code churn on users.
Correct usages, which do use ksft_finished() will not trigger this
validation as the test will exit before it.
Waiman Long [Tue, 31 Mar 2026 15:11:08 +0000 (11:11 -0400)]
cgroup/cpuset: Skip security check for hotplug induced v1 task migration
When a CPU hot removal causes a v1 cpuset to lose all its CPUs, the
cpuset hotplug handler will schedule a work function to migrate tasks
in that cpuset with no CPU to its ancestor to enable those tasks to
continue running.
If a strict security policy is in place, however, the task migration
may fail when security_task_setscheduler() call in cpuset_can_attach()
returns a -EACCES error. That will mean that those tasks will have
no CPU to run on. The system administrators will have to explicitly
intervene to either add CPUs to that cpuset or move the tasks elsewhere
if they are aware of it.
This problem was found by a reported test failure in the LTP's
cpuset_hotplug_test.sh. Fix this problem by treating this special case as
an exception to skip the setsched security check in cpuset_can_attach()
when a v1 cpuset with tasks have no CPU left.
With that patch applied, the cpuset_hotplug_test.sh test can be run
successfully without failure.
Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Waiman Long [Tue, 31 Mar 2026 15:11:07 +0000 (11:11 -0400)]
cgroup/cpuset: Simplify setsched decision check in task iteration loop of cpuset_can_attach()
Centralize the check required to run security_task_setscheduler() in
the task iteration loop of cpuset_can_attach() outside of the loop as
it has no dependency on the characteristics of the tasks themselves.
There is no functional change.
Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
selftests/tracing: Fix to make --logdir option work again
Since commit a0aa283c53a7 ("selftest/ftrace: Generalise ftracetest to
use with RV") moved the default LOG_DIR setting after --logdir option
parser, it overwrites the user given LOG_DIR.
This fixes it to check the --logdir option parameter when setting new
default LOG_DIR with a new TOP_DIR.
nilfs2: reject zero bd_oblocknr in nilfs_ioctl_mark_blocks_dirty()
nilfs_ioctl_mark_blocks_dirty() uses bd_oblocknr to detect dead blocks
by comparing it with the current block number bd_blocknr. If they differ,
the block is considered dead and skipped.
However, bd_oblocknr should never be 0 since block 0 typically stores the
primary superblock and is never a valid GC target block. A corrupted ioctl
request with bd_oblocknr set to 0 causes the comparison to incorrectly
match when the lookup returns -ENOENT and sets bd_blocknr to 0, bypassing
the dead block check and calling nilfs_bmap_mark() on a non-existent
block. This causes nilfs_btree_do_lookup() to return -ENOENT, triggering
the WARN_ON(ret == -ENOENT).
Fix this by rejecting ioctl requests with bd_oblocknr set to 0 at the
beginning of each iteration.
[ryusuke: slightly modified the commit message and comments for accuracy]
Linus Torvalds [Tue, 31 Mar 2026 17:28:08 +0000 (10:28 -0700)]
Merge tag 'fs_for_v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull udf fix from Jan Kara:
"Fix for a race in UDF that can lead to memory corruption"
* tag 'fs_for_v7.0-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
udf: Fix race between file type conversion and writeback
mpage: Provide variant of mpage_writepages() with own optional folio handler
Rosen Penev [Fri, 27 Mar 2026 02:48:28 +0000 (19:48 -0700)]
EDAC/mc: Use kzalloc_flex()
Convert struct mem_ctl_info to use flex array and use the new flex array
helpers to enable runtime bounds checking, including annotating the array
length member with __counted_by() for extra runtime analysis when requested.
Move memcpy() after the counter assignment so that it is initialized before
the first reference to the flex array, as the new attribute requires.
Biju Das [Sat, 28 Mar 2026 10:33:18 +0000 (10:33 +0000)]
irqchip/renesas-rzg2l: Clear the shared interrupt bit in rzg2l_irqc_free()
rzg2l_irqc_free() invokes irq_domain_free_irqs_common(), which internally
calls irq_domain_reset_irq_data(). That explicitly sets irq_data->hwirq to
0. Consequently, irqd_to_hwirq(d) returns 0 when called after it.
Since 0 falls outside the valid shared IRQ ranges,
rzg2l_irqc_is_shared_and_get_irq_num() evaluates to false, completely
bypassing the test_and_clear_bit() operation.
This leaves the bit set in priv->used_irqs, causing future allocations to
fail with -EBUSY.
Fix this by retrieving irq_data and caching hwirq before calling
irq_domain_free_irqs_common().
Ethan Tidmore [Tue, 24 Mar 2026 17:38:30 +0000 (12:38 -0500)]
ASoC: SOF: Intel: hda: Place check before dereference
The struct hext_stream is dereferenced before it is checked for NULL.
Although it can never be NULL due to a check prior to
hda_dsp_iccmax_stream_hw_params() being called, this change clears any
confusion regarding hext_stream possibly being NULL.
Check hext_stream for NULL and then assign its members.
Detected by Smatch:
sound/soc/sof/intel/hda-stream.c:488 hda_dsp_iccmax_stream_hw_params() warn:
variable dereferenced before check 'hext_stream' (see line 486)
drivers/video/fbdev/aty/atyfb_base.c:2327:24: warning: variable 'fb_list' set but not used [-Wunused-but-set-global]
2327 | static struct fb_info *fb_list = NULL;
Indeed, the last user of fb_list was removed in 2004, while the actual
linked list was removed in 2002.
Damien Le Moal [Thu, 26 Mar 2026 20:32:45 +0000 (05:32 +0900)]
zloop: add max_open_zones option
Introduce the new max_open_zones option to allow specifying a limit on
the maximum number of open zones of a zloop device. This change allows
creating a zloop device that can more closely mimick the characteristics
of a physical SMR drive.
When set to a non zero value, only up to max_open_zones zones can be in
the implicit open (BLK_ZONE_COND_IMP_OPEN) and explicit open
(BLK_ZONE_COND_EXP_OPEN) conditions at any time. The transition to the
implicit open condition of a zone on a write operation can result in an
implicit close of an already implicitly open zone. This is handled in
the function zloop_do_open_zone(). This function also handles
transitions to the explicit open condition. Implicit close transitions
are handled using an LRU ordered list of open zones which is managed
using the helper functions zloop_lru_rotate_open_zone() and
zloop_lru_remove_open_zone().
Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260326203245.946830-1-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
Unbeknownst to me, and unremarked upon by the checkpatch maintainer, this
same problem was also solved in the mm tree. Fixing it once is enough, so
this one comes out.
ALSA: hda/realtek: Add quirk for Lenovo Yoga Pro 7 14IMH9
The Lenovo Yoga Pro 7 14IMH9 (DMI: 83E2) shares PCI SSID 17aa:3847
with the Legion 7 16ACHG6, but has a different codec subsystem ID
(17aa:38cf). The existing SND_PCI_QUIRK for 17aa:3847 applies
ALC287_FIXUP_LEGION_16ACHG6, which attempts to initialize an external
I2C amplifier (CLSA0100) that is not present on the Yoga Pro 7 14IMH9.
As a result, pin 0x17 (bass speakers) is connected to DAC 0x06 which
has no volume control, making hardware volume adjustment completely
non-functional. Audio is either silent or at maximum volume regardless
of the slider position.
Add a HDA_CODEC_QUIRK entry using the codec subsystem ID (17aa:38cf)
to correctly identify the Yoga Pro 7 14IMH9 and apply
ALC287_FIXUP_YOGA9_14IMH9_BASS_SPK_PIN, which redirects pin 0x17 to
DAC 0x02 and restores proper volume control. The existing Legion entry
is preserved unchanged.
This follows the same pattern used for 17aa:386e, where Legion Y9000X
and Yoga Pro 7 14ARP8 share a PCI SSID but are distinguished via
HDA_CODEC_QUIRK.
Xiang Mei [Sat, 28 Mar 2026 06:30:00 +0000 (23:30 -0700)]
bridge: mrp: reject zero test interval to avoid OOM panic
br_mrp_start_test() and br_mrp_start_in_test() accept the user-supplied
interval value from netlink without validation. When interval is 0,
usecs_to_jiffies(0) yields 0, causing the delayed work
(br_mrp_test_work_expired / br_mrp_in_test_work_expired) to reschedule
itself with zero delay. This creates a tight loop on system_percpu_wq
that allocates and transmits MRP test frames at maximum rate, exhausting
all system memory and causing a kernel panic via OOM deadlock.
The same zero-interval issue applies to br_mrp_start_in_test_parse()
for interconnect test frames.
Use NLA_POLICY_MIN(NLA_U32, 1) in the nla_policy tables for both
IFLA_BRIDGE_MRP_START_TEST_INTERVAL and
IFLA_BRIDGE_MRP_START_IN_TEST_INTERVAL, so zero is rejected at the
netlink attribute parsing layer before the value ever reaches the
workqueue scheduling code. This is consistent with how other bridge
subsystems (br_fdb, br_mst) enforce range constraints on netlink
attributes.
Fixes: 20f6a05ef635 ("bridge: mrp: Rework the MRP netlink interface") Fixes: 7ab1748e4ce6 ("bridge: mrp: Extend MRP netlink interface for configuring MRP interconnect") Reported-by: Weiming Shi <bestswngs@gmail.com> Signed-off-by: Xiang Mei <xmei5@asu.edu> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260328063000.1845376-1-xmei5@asu.edu Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Alexander Duyck [Fri, 27 Mar 2026 20:44:45 +0000 (13:44 -0700)]
fbnic: Set Relaxed Ordering PCIe TLP attributes for DMA engines
Add ATTR CSR bit field definitions for the DMA engine TLP header
configuration registers:
AW_CFG: RDE_ATTR[17:15], RQM_ATTR[14:12], TQM_ATTR[11:9]
AR_CFG: TDE_ATTR[17:15], RQM_ATTR[14:12], TQM_ATTR[11:9]
These fields control the PCIe TLP attribute bits for outbound
transactions from the TQM, RQM, RDE (write path), and TDE (read path)
DMA engines. An enum is added with standard PCIe TLP attribute values:
NS (No Snoop), RO (Relaxed Ordering), and IDO (ID-based Ordering).
Read the PCIe Relaxed Ordering capability at probe time and store it in
fbnic_dev. Configure Relaxed Ordering on the PCIe TLP attributes in
fbnic_mbx_init_desc_ring when the capability is enabled. For the write
path (AW_CFG), set RO on RDE and TQM attributes. For the read path
(AR_CFG), set RO on all three attributes (TDE, RQM, TQM). This allows
the PCIe fabric to reorder these transactions for improved throughput.
David Timber [Mon, 16 Mar 2026 21:41:37 +0000 (06:41 +0900)]
exfat: fix s_maxbytes
With fallocate support, xfstest unit generic/213 fails with
QA output created by 213
We should get: fallocate: No space left on device
Strangely, xfs_io sometimes says "Success" when something went wrong
-fallocate: No space left on device
+fallocate: File too large
because sb->s_maxbytes is set to the volume size.
To be in line with other non-extent-based filesystems, set to max volume
size possible with the cluster size of the volume.
Signed-off-by: David Timber <dxdt@dev.snart.me> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org>
snd_soc_unbind_card() calls snd_soc_flush_all_delayed_work() (A),
but it will be called in soc_cleanup_card_resources() (B).
It is duplicated, let's remove it.
(B) static void soc_cleanup_card_resources(...)
{
...
/* flush delayed work before removing DAIs and DAPM widgets */
(A)' snd_soc_flush_all_delayed_work(card);
...
}
static void snd_soc_unbind_card(...)
{
if (snd_soc_card_is_instantiated(card)) {
card->instantiated = false;
Jackie Liu [Tue, 31 Mar 2026 11:12:16 +0000 (19:12 +0800)]
block: fix zones_cond memory leak on zone revalidation error paths
When blk_revalidate_disk_zones() fails after disk_revalidate_zone_resources()
has allocated args.zones_cond, the memory is leaked because no error path
frees it.
Fixes: 6e945ffb6555 ("block: use zone condition to determine conventional zones") Suggested-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Link: https://patch.msgid.link/20260331111216.24242-1-liu.yun@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>
Julian Braha [Wed, 25 Mar 2026 00:15:21 +0000 (00:15 +0000)]
ASoC: Intel: boards: fix unmet dependency on PINCTRL
This reverts commit c073f0757663 ("ASoC: Intel: sof_sdw: select PINCTRL_CS42L43 and SPI_CS42L43")
Currently, SND_SOC_INTEL_SOUNDWIRE_SOF_MACH selects PINCTRL_CS42L43
without also selecting or depending on PINCTRL, despite PINCTRL_CS42L43
depending on PINCTRL.
In response to v1 of this patch [1], Arnd pointed out that there is
no compile-time dependency sof_sdw and the PINCTRL_CS42L43 driver.
After testing, I can confirm that the kernel compiled with
SND_SOC_INTEL_SOUNDWIRE_SOF_MACH enabled and PINCTRL_CS42L43 disabled.
This unmet dependency was detected by kconfirm, a static analysis
tool for Kconfig.