Chen Yu [Wed, 13 May 2026 20:39:24 +0000 (13:39 -0700)]
sched/cache: Fix cache aware scheduling enabling for multi LLCs system
If there are multiple LLCs in the system, cache aware scheduling
should be enabled. However, there is a corner case where, if there
is a single NUMA node and a single LLC per node, cache aware
scheduling will be turned on in the current implementation -
because at this moment, the parent domain has not yet been
degenerated, and it is possible that the current domain has the
same cpu span as its parent. There is no need to turn cache aware
scheduling on in this scenario.
Fix it by iterating the parent domains to find a domain that is
a superset of the current sd_llc, so that later, after the duplicated
parent domains have been degenerated, cache aware scheduling will
take effect.
For example, the expected behavior would be:
2 sockets, 1 LLC per socket: MC span=0-3, PKG span=0-7, has_multi_llcs=true
1 socket, 2 LLCs per socket: MC span=0-3, PKG span=0-7, has_multi_llcs=true
2 sockets, 2 LLCs per socket: MC span=0-3, PKG span=0-7, has_multi_llcs=true
1 socket, 1 LLC per socket: MC span=0-3, PKG span=0-3, has_multi_llcs=false
Chen Yu [Wed, 13 May 2026 20:39:23 +0000 (13:39 -0700)]
sched/cache: Fix race condition during sched domain rebuild
sched_cache_active_set_unlocked() checks hardware support without
locks:
static void sched_cache_active_set(bool locked)
{
/* hardware does not support */
if (!static_branch_likely(&sched_cache_present)) {
_sched_cache_active_set(false, locked);
return;
}
...
If build_sched_domains() runs concurrently during CPU hotplug,
it can disable sched_cache_present under sched_domains_mutex
and the CPU hotplug lock. If a debugfs write thread evaluates
sched_cache_present as true right before that, and then blocks
or gets preempted, it might proceed to enable sched_cache_active
after the hardware support has been marked as absent. Make it
safer by acquiring cpus_read_lock() and sched_domains_mutex_lock()
when the user changes sched_cache_active via debugfs.
Chen Yu [Wed, 13 May 2026 20:39:22 +0000 (13:39 -0700)]
sched/cache: Fix checking active load balance by only considering the CFS task
The currently running task cur may not be a CFS task, such as
an RT or Deadline task. For non-CFS tasks, the task_util(cur)
utilization average is not maintained, so this might pass a
stale or meaningless value to can_migrate_llc().
Check if the task is CFS before getting its task_util().
There is a race condition that, after a task is enqueued
on a runqueue, task_llc(p) may change due to CPU hotplug,
because the llc_id is dynamically allocated and adjusted
at runtime.
Therefore, checking task_llc(p) to determine whether the
task is being dequeued from its preferred LLC is unreliable
and can cause inconsistent values.
To fix this problem, record whether p is enqueued on its
preferred LLC, in order to pair with account_llc_dequeue()
to maintain a consistent nr_pref_llc_running per runqueue.
This bug was reported by sashiko, and the solution was once
suggested by Prateek.
Fixes: 46afe3af7ead ("sched/cache: Track LLC-preferred tasks per runqueue") Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/0c8c6a1571d66792a4d2ff0103ba3cc13e059046.1778703694.git.tim.c.chen@linux.intel.com
Chen Yu [Wed, 13 May 2026 20:39:20 +0000 (13:39 -0700)]
sched/cache: Annotate lockless accesses to mm->sc_stat.cpu
mm->sc_stat.cpu is written by task_cache_work() and could be read
locklessly by several functions on other CPUs. Use READ_ONCE and
WRITE_ONCE on mm->sc_stat.cpu access and write to prevent inconsistent
values from compiler optimizations when there are multiple accesses.
For example in get_pref_llc(), if the writer updated the field between
two compiler-generated loads, the validation (e.g., cpu != -1) and
subsequent use (e.g., llc_id(cpu)) could operate on different values,
allowing a negative CPU ID to be used as an index.
Leave plain write in mm_init_sched(), where the mm is not
yet visible to other CPUs.
Chen Yu [Wed, 13 May 2026 20:39:19 +0000 (13:39 -0700)]
sched/cache: Fix potential NULL mm pointer access
A concurrent task exit might cause a NULL pointer dereference
in account_mm_sched(). Use the locally cached mm pointer instead,
since the active_mm reference guarantees the structure remains
allocated. Meanwhile, skip the kernel thread because it has
nothing to do with cache aware scheduling.
Chen Yu [Wed, 13 May 2026 20:39:17 +0000 (13:39 -0700)]
sched/cache: Add user control to adjust the aggressiveness of cache-aware scheduling
Introduce a set of debugfs knobs to control how aggressively the
cache aware scheduling does the task aggregation.
(1) aggr_tolerance
With sched_cache enabled, the scheduler uses a process's footprint
as a proxy for its LLC footprint to determine if aggregating tasks
on the preferred LLC could cause cache contention. If the footprint
exceeds the LLC size, aggregation is skipped. Since the kernel
cannot efficiently track per-task cache usage (resctrl is
user-space only), userspace can provide a more accurate hint.
Introduce /sys/kernel/debug/sched/llc_balancing/aggr_tolerance to
let users control how strictly footprint limits aggregation. Values
range from 0 to 100:
- 0: Cache-aware scheduling is disabled.
- 1: Strict; tasks with footprint larger than LLC size are skipped.
- >=100: Aggressive; tasks are aggregated regardless of footprint.
For example, with a 32MB L3 cache:
- aggr_tolerance=1 -> tasks with footprint > 32MB are skipped.
- aggr_tolerance=99 -> tasks with footprint > 784GB are skipped
(784GB = (1 + (99 - 1) * 256) * 32MB).
Similarly, /sys/kernel/debug/sched/llc_balancing/aggr_tolerance also
controls how strictly the number of active threads is considered when
doing cache aware load balance. The number of SMTs is also considered.
High SMT counts reduce the aggregation capacity, preventing excessive
task aggregation on SMT-heavy systems like Power10/Power11.
Yangyu suggested introducing separate aggregation controls for the
number of active threads and memory footprint checks. Since there are
plans to add per-process/task group controls, fine-grained tunables are
deferred to that implementation.
(2) epoch_period, epoch_affinity_timeout,
imb_pct, overaggr_pct are also turned into tunables.
Chen Yu [Wed, 13 May 2026 20:39:16 +0000 (13:39 -0700)]
sched/cache: Avoid cache-aware scheduling for memory-heavy processes
Prateek and Tingyin reported that memory-intensive workloads (such as
stream) can saturate memory bandwidth and caches on the preferred LLC
when sched_cache aggregates too many threads.
To mitigate this, estimate a process's memory footprint by comparing
its NUMA balancing fault statistics to the size of the LLC. If the
footprint exceeds the LLC size, skip cache-aware scheduling.
Note that footprint is only an approximation of the memory footprint,
since the kernel lacks suitable metrics to estimate the real working
set. If a user-provided hint is available in the future, it would be
more accurate. A later patch will allow users to provide a hint to
adjust this threshold.
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Suggested-by: Vern Hao <vernhao@tencent.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tingyin Duan <tingyin.duan@gmail.com> Link: https://patch.msgid.link/95cf64a385bcc12f18dcebe9d59e8d3ba8bb318f.1778703694.git.tim.c.chen@linux.intel.com
Chen Yu [Wed, 13 May 2026 20:39:15 +0000 (13:39 -0700)]
sched/cache: Calculate the LLC size and store it in sched_domain
Cache aware scheduling needs to know the LLC size that a process
can use, so as to avoid memory-intensive tasks from being
over-aggregated on a single LLC.
Introduce a preparation patch to add get_effective_llc_bytes() to
get the LLC size that a CPU can use. The function can be further
enhanced by subtracting the LLC cache ways reserved by resctrl
(CAT in Intel RDT, etc).
Suggested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tingyin Duan <tingyin.duan@gmail.com> Link: https://patch.msgid.link/37afee09ff608034da0ce149e72d33b6f4698edf.1778703694.git.tim.c.chen@linux.intel.com
Chen Yu [Wed, 13 May 2026 20:39:14 +0000 (13:39 -0700)]
sched/cache: Skip cache-aware scheduling for single-threaded processes
For a single thread, the current wakeup path tends to place it
on the same LLC where it was previously running with cache-hot
data. There is no need to enable cache-aware scheduling for
single-threaded processes for the following reasons:
1. Cache-aware scheduling primarily benefits multi-threaded
processes where threads share data. Single-threaded processes
typically have no inter-thread data sharing and thus gain little.
2. Enabling it incurs the additional overhead of tracking the
thread's residency in the LLCs.
3. Bypassing single-threaded processes avoids excessive
concentration of such tasks on a single LLC.
Nevertheless, this check can be omitted if users explicitly
provide hints for such single-threaded workloads where different
processes have shared memory, e.g., via prctl() or other interfaces
to be added in the future.
Chen Yu [Wed, 13 May 2026 20:39:13 +0000 (13:39 -0700)]
sched/cache: Disable cache aware scheduling for processes with high thread counts
A performance regression was observed by Prateek when running hackbench
with many threads per process (high fd count). To avoid this, processes
with a large number of active threads are excluded from cache-aware
scheduling.
With sched_cache enabled, record the number of active threads in each
process during the periodic task_cache_work(). While iterating over
CPUs, if the currently running task belongs to the same process as the
task that launched task_cache_work(), increment the active thread count.
If the number of active threads within the process exceeds the number
of Cores (divided by the SMT number) in the LLC, do not enable
cache-aware scheduling. However, on systems with a smaller number of
CPUs within 1 LLC, like Power10/Power11 with SMT4 and an LLC size of 4,
this check effectively disables cache-aware scheduling for any process.
One possible solution suggested by Peter is to use an LLC-mask instead
of a single LLC value for preference. Once there are a 'few' LLCs as
preference, this constraint becomes a little easier. It could be an
enhancement in the future.
For users who wish to perform task aggregation regardless, a debugfs knob
is provided for tuning in a subsequent change.
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com> Suggested-by: Aaron Lu <ziqianlu@bytedance.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Co-developed-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Tingyin Duan <tingyin.duan@gmail.com> Link: https://patch.msgid.link/d076cd21a8e6c6341d1e2d927e118db770ebb650.1778703694.git.tim.c.chen@linux.intel.com
Jianyong Wu [Wed, 13 May 2026 20:39:12 +0000 (13:39 -0700)]
sched/cache: Allow only 1 thread of the process to calculate the LLC occupancy
Scanning online CPUs to calculate the occupancy might be
time-consuming. Only allow 1 thread of the process to scan
the CPUs at the same time, which is similar to what
NUMA balance does in task_numa_work().
drm/xe/multi_queue: Fix secondary queue error case
If xe_lrc_create() fails, the secondary queue added to the
multi-queue group list is not removed before freeing the
queue. Fix error path handling for secondary queues by
removing it from the multi-queue group list at the right
place.
Reported-by: Sebastian Ă–sterlund <sebastian.osterlund@intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/work_items/7979 Fixes: d716a5088c88 ("drm/xe/multi_queue: Handle tearing down of a multi queue") Cc: stable@vger.kernel.org # v7.0+ Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patch.msgid.link/20260518191639.320890-2-niranjana.vishwanathapura@intel.com
Qing Ming [Sat, 16 May 2026 07:08:49 +0000 (15:08 +0800)]
cgroup/rstat: validate cpu before css_rstat_cpu() access
css_rstat_updated() is exposed as a BPF kfunc and accepts a
caller-provided cpu argument. The function uses cpu for per-cpu rstat
lookups without checking whether it refers to a valid possible CPU.
A BPF iter/cgroup program with CAP_BPF and CAP_PERFMON can pass an
invalid cpu value. On an unfixed UBSCAN_BOUNDS test kernel, cpu ==
0x7fffffff triggers:
UBSAN: array-index-out-of-bounds in kernel/cgroup/rstat.c:31:9
index 2147483647 is out of range for type 'long unsigned int [64]'
Call Trace:
css_rstat_updated
bpf_iter_run_prog
cgroup_iter_seq_show
bpf_seq_read
Add cpu validation to the BPF-facing css_rstat_updated() kfunc and
move the common implementation to __css_rstat_updated() for in-kernel
callers.
Fixes: a319185be9f5 ("cgroup: bpf: enable bpf programs to integrate with rstat") Signed-off-by: Qing Ming <a0yami@mailbox.org> Signed-off-by: Tejun Heo <tj@kernel.org>
Paul E. McKenney [Mon, 11 May 2026 17:54:41 +0000 (19:54 +0200)]
srcu: Don't queue workqueue handlers to never-online CPUs
While an srcu_struct structure is in the midst of switching from CPU-0
to all-CPUs state, it can attempt to invoke callbacks for CPUs that
have never been online. Worse yet, it can attempt in invoke callbacks
for CPUs that never will be online, even including imaginary CPUs not in
cpu_possible_mask. This can cause hangs on s390, which is not set up to
deal with workqueue handlers being scheduled on such CPUs. This commit
therefore causes Tree SRCU to refrain from queueing workqueue handlers
on CPUs that have not yet (and might never) come online.
Because callbacks are not invoked on CPUs that have not been
online, it is an error to invoke call_srcu(), synchronize_srcu(), or
synchronize_srcu_expedited() on a CPU that is not yet fully online.
However, it turns out to be less code to redirect the callbacks
from too-early invocations of call_srcu() than to warn about such
invocations. This commit therefore also redirects callbacks queued on
not-yet-fully-online CPUs to the boot CPU.
Reported-by: Vasily Gorbik <gor@linux.ibm.com> Fixes: 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when non-preemptible") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Samir <samir@linux.ibm.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Boqun Feng <boqun@kernel.org>
Ashutosh Desai [Sun, 10 May 2026 20:31:28 +0000 (20:31 +0000)]
drm/dp/mst: fix OOB reads on 2-byte fields in sideband reply parsers
Three sideband reply parsers read 16-bit fields as:
val = (raw->msg[idx] << 8) | (raw->msg[idx+1]);
and check bounds only after the fact. When idx == raw->curlen,
raw->msg[idx+1] reads one byte past the received message data into
the following struct fields (curchunk_len, curchunk_idx, curlen).
Affected functions:
- drm_dp_sideband_parse_enum_path_resources_ack()
full_payload_bw_number and avail_payload_bw_number fields
- drm_dp_sideband_parse_allocate_payload_ack()
allocated_pbn field
- drm_dp_sideband_parse_query_payload_ack()
allocated_pbn field
Fix by using a single combined check (idx + 2 > curlen) before each
2-byte read. Since the check is strictly tighter than idx > curlen,
no separate step is needed.
Ashutosh Desai [Sun, 10 May 2026 20:17:33 +0000 (20:17 +0000)]
drm/dp/mst: fix OOB reads in remote DPCD/I2C sideband reply parsers
drm_dp_sideband_parse_remote_dpcd_read() reads num_bytes from the raw
message and then unconditionally does:
memcpy(bytes, &raw->msg[idx], num_bytes);
without checking that idx + num_bytes <= raw->curlen. raw->msg[] is
256 bytes; if a malicious or misbehaving MST hub sets num_bytes larger
than the remaining payload, the memcpy reads past the received data
into whatever follows in raw->msg[].
drm_dp_sideband_parse_remote_i2c_read_ack() has the same flaw (noted
with a /* TODO check */ comment since the code was introduced).
Fix both functions by using a single combined check
(idx + num_bytes > curlen) before each memcpy. Since num_bytes is u8,
it is always >= 0, so this strictly subsumes the simpler idx > curlen
form and no separate step is needed.
Mark Brown [Mon, 18 May 2026 16:44:18 +0000 (17:44 +0100)]
ASoC: Add support for GPIOs driven amplifiers
Herve Codina <herve.codina@bootlin.com> says:
On some embedded system boards, audio amplifiers are designed using
discrete components such as op-amp, several resistors and switches to
either adjust the gain (switching resistors) or fully switch the
audio signal path (mute and/or bypass features).
Those switches are usually driven by simple GPIOs.
This kind of amplifiers are not handled in ASoC and the fallback is to
let the user-space handle those GPIOs out of the ALSA world.
In order to have those kind of amplifiers fully integrated in the audio
stack, this series introduces the audio-gpio-amp to handle them.
This new ASoC component allows to have the amplifiers seen as ASoC
auxiliarty devices and so it allows to control them through audio mixer
controls.
In order to ease the review, I choose to split modifications related
to the merge of the gpio-audio-amp part into the simple-amplfier driver
in several commits.
Herve Codina [Wed, 13 May 2026 08:17:00 +0000 (10:17 +0200)]
ASoC: simple-amplifier: Update author and copyright
After reworking the simple-amplifier driver and adding support for
gpio-audio-amp in the driver, add myself as the author of the
gpio-audio-amp part of the driver and add a related copyright.
Herve Codina [Wed, 13 May 2026 08:16:59 +0000 (10:16 +0200)]
ASoC: simple-amplifier: gpio-audio-amp: Add support for gain-labels
The possible gain values can be described using labels instead of gain
values in dB.
Those different labels are attached to a gpio values using the
gain-labels property.
Using the gain-labels description is mutually exclusive with gain-ranges
description used to describe the relationship between gpios values and
gain values.
Handle the gain-labels description and the related kcontrol.
Herve Codina [Wed, 13 May 2026 08:16:58 +0000 (10:16 +0200)]
ASoC: simple-amplifier: gpio-audio-amp: Add support for gain-ranges
The mapping between physical gain values and gpio values can be
expressed using ranges described in the gain-ranges property.
This gain-ranges property is an array of ranges.
Each range in the array is defined by the first point and last point in
the range. Those points are a pair of values, the gpios value and the
related gain (dB) value.
With that, a given range defines N possible items (from the first point
gpios value to the last point gpios value) in order to set a gain from
the first point gain value to the last point gain value.
Herve Codina [Wed, 13 May 2026 08:16:53 +0000 (10:16 +0200)]
ASoC: simple-amplifier: Introduce support for gpio-audio-amp
Improve the simple-amplifier introducing preliminary support for
gpio-audio-amp.
Those amplifiers are amplifiers driven by gpios.
This support introduction doesn't handle any GPIO yet but introduces
the compatible strings and the related DAPM table.
Two gpio-audio-amp are available: A mono and a stereo version.
The mono version has only one audio channel and gpio settings impact
features such as the gain or mute of this sole channel.
The stereo version has two channels (left and right). Gpio settings
impact both channels in the same manner and at the same time. For
instance, the gain setting set the gain of both channels as well as
the mute setting mutes both channels.
Herve Codina [Wed, 13 May 2026 08:16:52 +0000 (10:16 +0200)]
ASoC: simple-amplifier: Remove DAPM widgets and routes from the ASoC component driver
The simple-amplifier set the DAPM wigets and routes table in the ASoC
component driver. This is perfectly fine when the component has well
known DAPM tables.
The simple-amplifier is going to handle several kind of components based
on the driver compatible string. The DAPM table will not be the same for
all components supported by the driver.
In order to have different DAPM table based on matching compatible
strings, move those tables from the ASoC component driver to the device
compatible string matching data.
Add those DAPM widgets and routes dynamically during the ASoC component
probe operation.
Herve Codina [Wed, 13 May 2026 08:16:46 +0000 (10:16 +0200)]
ASoC: dt-bindings: Add support for the GPIOs driven amplifier
Some amplifiers based on analog switches and op-amps can be present in
the audio path and can be driven by GPIOs in order to control their gain
value, their mute and/or bypass functions.
Those components needs to be viewed as audio components in order to be
fully integrated in the audio path.
gpio-audio-amp allows to consider these GPIO driven amplifiers as
auxiliary audio devices.
Herve Codina [Wed, 13 May 2026 08:16:45 +0000 (10:16 +0200)]
of: Introduce of_property_read_s32_index()
Signed integers can be read from single value properties using
of_property_read_s32() but nothing exist to read signed integers
from multi-value properties.
Fix this lack adding of_property_read_s32_index().
Johan Hovold [Tue, 12 May 2026 07:48:09 +0000 (09:48 +0200)]
spi: ti-qspi: fix use-after-free after DMA setup failure
The driver falls back to PIO mode if DMA setup fails during probe.
Make sure to clear the DMA channel pointer also if buffer allocation
fails to avoid passing a pointer to the released channel to the DMA
engine (or trying to free the channel a second time on late probe errors
or driver unbind).
This issue was flagged by Sashiko when reviewing a devres allocation
conversion patch.
Johan Hovold [Tue, 12 May 2026 07:47:33 +0000 (09:47 +0200)]
spi: sprd: fix error pointer deref after DMA setup failure
The driver falls back to PIO mode if DMA setup fails during probe.
Make sure to check the dma.enabled flag before trying to release the DMA
channels also on late probe errors to avoid dereferencing an error
pointer (or attempting to release a channel a second time).
This issue was flagged by Sashiko when reviewing a devres allocation
conversion patch.
Shengjiu Wang [Tue, 12 May 2026 06:52:52 +0000 (14:52 +0800)]
ASoC: fsl_sai: Eliminate possible interrupt storm during probe
When the SAI peripheral is left in a running state by the bootloader,
the driver can experience an interrupt storm during probe that prevents
successful initialization. This occurs because the current code registers
the IRQ handler before resetting the hardware to a known state.
The issue manifests as:
- Continuous interrupts firing immediately after devm_request_irq()
- Driver probe failure or system hang
- Error messages about unhandled interrupts
This is particularly problematic on systems where U-Boot or other
bootloaders enable SAI for boot-time audio feedback or diagnostics
and don't properly disable it before handing control to Linux.
Fix this by reordering the probe sequence:
1. Add fsl_sai_reset_hw() to clear TCSR/RCSR control registers,
which disables the transmitter/receiver and all interrupt sources
2. Move devm_request_irq() to after hardware initialization
This ensures the SAI is in a clean reset state before the interrupt
handler can be invoked, preventing the storm while maintaining proper
error handling and cleanup paths.
Johan Hovold [Tue, 12 May 2026 07:43:34 +0000 (09:43 +0200)]
spi: qup: fix error pointer deref after DMA setup failure
The driver falls back to PIO mode if DMA setup fails during probe.
Make sure to the clear the DMA channel pointers on setup failure to
avoid dereferencing an error pointer (or attempting to release a channel
a second time) on later probe errors or driver unbind.
This issue was flagged by Sashiko when reviewing a devres allocation
conversion patch.
The MT8196 AFE probe assigns reserved memory with
of_reserved_mem_device_init(), but never releases it.
This leaks the reserved memory assignment on driver
removal and on later probe failures.
The same probe path also uses unchecked pm_runtime_get_sync() calls.
A failure while resuming the device can leave the runtime PM usage
count in an unexpected state.
The regmap error path returns directly while the device is still
runtime active, and the remove path drops a runtime PM reference even
though successful probe has already released its temporary reference.
Register a devm cleanup action for the reserved memory assignment,
use pm_runtime_resume_and_get(), and only drop runtime PM references
on paths where they are actually held.
Arnd Bergmann [Mon, 18 May 2026 14:59:15 +0000 (16:59 +0200)]
Merge tag 'soc_fsl-7.1-2' of https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux into soc/drivers
FSL SOC Changes for 7.1
Freescale QUICC Engine:
- Add missing cleanup on device removal and switch to irq_domain_create_linear()
in interrupt controller for IO Ports
- Panic on ioremap() failure in qe_reset()
Freescale Management Complex:
- Move fsl-mc over to device MSI infrastructure
- Wait for the MC firmware to complete its boot
* tag 'soc_fsl-7.1-2' of https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux:
bus: fsl-mc: wait for the MC firmware to complete its boot
soc: fsl: qe: panic on ioremap() failure in qe_reset()
soc: fsl: qe_ports_ic: switch to irq_domain_create_linear()
soc: fsl: qe_ports_ic: Add missing cleanup on device removal
virt: fsl_hypervisor: fix header kernel-doc warnings
platform-msi: Remove stale comment
fsl-mc: Remove legacy MSI implementation
fsl-mc: Switch over to per-device platform MSI
irqchip/gic-v3-its: Add fsl_mc device plumbing to the msi-parent handling
fsl-mc: Add minimal infrastructure to use platform MSI
fsl-mc: Remove MSI domain propagation to sub-devices
io_uring: propagate array_index_nospec opcode into req->opcode
Commit 1e988c3fe126 ("io_uring: prevent opcode speculation") added
array_index_nospec() to io_init_req(), but applied it only to a local
opcode variable. req->opcode is initialized from sqe->opcode before the
bounds check and remains the raw value.
Keep req->opcode as the canonical opcode in io_init_req(): reject
out-of-range values architecturally, then write the array_index_nospec()
result back to req->opcode before any table lookup. This keeps downstream
users of req->opcode from observing the raw user byte on a mispredicted
path.
No functional change: array_index_nospec() is a no-op for opcodes in
[0, IORING_OP_LAST), and out-of-range opcodes are still rejected at the
bounds check above the assignment.
arm64: defconfig: Enable PCI M.2 power sequencing driver
POWER_SEQUENCING_PCIE_M2 driver handles power supply to the PCIe M.2
connectors and is required on wide variety of ARM64 platforms such as
Qcom Snapdragon X Elite laptops and Mediatek Dojo Chromebooks.
soc: qcom: ice: Return proper error codes from devm_of_qcom_ice_get() instead of NULL
devm_of_qcom_ice_get() currently returns NULL if ICE SCM is not available
or "qcom,ice" property is not found in DT. But this confuses the clients
since NULL doesn't convey the reason for failure. So return proper error
codes instead of NULL.
soc: qcom: ice: Return -ENODEV if the ICE platform device is not found
By the time the consumer driver calls devm_of_qcom_ice_get(), all the
platform devices for ICE nodes would've been created by
of_platform_default_populate().
So for the absence of any platform device, -ENODEV should not returned, not
-EPROBE_DEFER.
Fixes: 2afbf43a4aec ("soc: qcom: Make the Qualcomm UFS/SDCC ICE a dedicated driver") Tested-by: Sumit Garg <sumit.garg@oss.qualcomm.com> # OP-TEE as TZ Acked-by: Sumit Garg <sumit.garg@oss.qualcomm.com> Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260518-qcom-ice-fix-v7-2-2a595382185b@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
soc: qcom: ice: Fix race between qcom_ice_probe() and of_qcom_ice_get()
The current platform driver design causes probe ordering races with
consumers (UFS, eMMC) due to ICE's dependency on SCM firmware calls. If ICE
probe fails (missing ICE SCM or DT registers), devm_of_qcom_ice_get() loops
with -EPROBE_DEFER, leaving consumers non-functional even when ICE should
be gracefully disabled. devm_of_qcom_ice_get() doesn't know if the ICE
driver probe has failed due to above reasons or it is waiting for the SCM
driver.
Moreover, there is no devlink dependency between ICE and consumer drivers
as 'qcom,ice' is not considered as a DT 'supplier'. So the consumer drivers
have no idea of when the ICE driver is going to probe.
To address these issues, store the error pointer in a global xarray with
ice node phandle as a key during probe in addition to the valid ice pointer
and synchronize both qcom_ice_probe() and of_qcom_ice_get() using a mutex.
If the xarray entry is NULL, then it implies that the driver is not
probed yet, so return -EPROBE_DEFER. If it has any error pointer, return
that error pointer directly. Otherwise, add the devlink as usual and return
the valid pointer to the consumer.
Xarray is used instead of platform drvdata, since driver core frees the
drvdata during probe failure. So it cannot be used to pass the error
pointer to the consumers.
Note that this change only fixes the standalone ICE DT node bindings and
not the ones with 'ice' range embedded in the consumer nodes, where there
is no issue.
Linus Torvalds [Mon, 18 May 2026 14:30:31 +0000 (07:30 -0700)]
Merge tag 'vfs-7.1-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs fixes from Christian Brauner:
"This contains a fixes for the current development cycle. Note that AI
related review sometimes delays fixes a bit because we find more fixes
for the fixes. I might try and send smaller but more fixes PRs if this
trend keeps up.
- Fix various netfslib bugs
- Fix an out-of-bounds write when listing idmappings
- Fix the return values in jfs_mkdir() and orangefs_mkdir()
- Fix a writeback writeback array overflow in fuse
- Fix a forced iversion increment on lazytime timestamp updates
- Reject a negative timeval component in kern_select()
- Fix error return when vfs_mkdir() fails in the cachefiles code
- Fix wrong error code returned for pidns ioctls"
* tag 'vfs-7.1-rc5.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (31 commits)
cachefiles: Fix error return when vfs_mkdir() fails
afs: Fix the locking used by afs_get_link()
netfs, afs: Fix write skipping in dir/link writepages
netfs: Fix netfs_read_folio() to wait on writeback
netfs: Fix folio->private handling in netfs_perform_write()
netfs: Fix partial invalidation of streaming-write folio
netfs: Fix potential UAF in netfs_unlock_abandoned_read_pages()
netfs: Fix leak of request in netfs_write_begin() error handling
netfs: Fix early put of sink folio in netfs_read_gaps()
netfs: Fix write streaming disablement if fd open O_RDWR
netfs: Fix read-gaps to remove netfs_folio from filled folio
netfs: Fix potential deadlock in write-through mode
netfs: Fix streaming write being overwritten
netfs: Defer the emission of trace_netfs_folio()
netfs: Fix netfs_invalidate_folio() to clear dirty bit if all changes gone
netfs: Fix overrun check in netfs_extract_user_iter()
netfs: fix error handling in netfs_extract_user_iter()
netfs: Fix potential uninitialised var in netfs_extract_user_iter()
netfs: fix VM_BUG_ON_FOLIO() issue in netfs_write_begin() call
netfs: Fix zeropoint update where i_size > remote_i_size
...
drm/mediatek: mtk_hdmi_ddc: Fix non-static global variable
The struct 'mtk_hdmi_ddc_driver' is not used outside of the
mtk_hdmi_ddc.c file, so make it static to silence sparse warning:
```
drivers/gpu/drm/mediatek/mtk_hdmi_ddc.c:331:24: sparse: warning: symbol
'mtk_hdmi_ddc_driver' was not declared. Should it be static?
```
drm/mediatek: mtk_cec: Fix non-static global variable
The struct 'mtk_cec_driver' is not used outside of the
mtk_cec.c file, so make it static to silence sparse warning:
```
drivers/gpu/drm/mediatek/mtk_cec.c:243:24: sparse: warning: symbol
'mtk_cec_driver' was not declared. Should it be static?
```
drm/mediatek: mtk_hdmi_v2: Fix non-static global variable
The struct 'mtk_hdmi_v2_clk_names' is not used outside of the
mtk_hdmi_v2.c file, so make it static to silence sparse warning:
```
drivers/gpu/drm/mediatek/mtk_hdmi_v2.c:53:12: sparse: warning: symbol
'mtk_hdmi_v2_clk_names' was not declared. Should it be static?
```
drm/mediatek: mtk_hdmi_ddc_v2: Fix non-static global variable
The struct 'mtk_hdmi_ddc_v2_driver' is not used outside of the
mtk_hdmi_ddc_v2.c file, so make it static to silence sparse warning:
```
drivers/gpu/drm/mediatek/mtk_hdmi_ddc_v2.c:392:24: sparse: warning:
symbol 'mtk_hdmi_ddc_v2_driver' was not declared. Should it be
static?
```
Potin Lai [Mon, 23 Mar 2026 12:41:06 +0000 (20:41 +0800)]
ARM: dts: aspeed: Add Meta SanMiguel BMC
Add linux device tree entry for Meta (Facebook) SanMiguel compute-tray
BMC using AT2620 SoC.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Potin Lai <potin.lai.pt@gmail.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Baochen Qiang [Thu, 14 May 2026 03:32:51 +0000 (11:32 +0800)]
wifi: ath12k: fix EHT TX MCS limitation due to wrong 20 MHz-only parsing
When connecting to an AP configured for EHT 20 MHz with a full EHT
MCS/NSS map (supporting MCS 0-13)
Supported EHT-MCS and NSS Set
EHT-MCS Map (BW <= 80MHz): 0x444444
.... .... .... .... .... 0100 = Rx Max Nss That Supports EHT-MCS 0-9: 4
.... .... .... .... 0100 .... = Tx Max Nss That Supports EHT-MCS 0-9: 4
.... .... .... 0100 .... .... = Rx Max Nss That Supports EHT-MCS 10-11: 4
.... .... 0100 .... .... .... = Tx Max Nss That Supports EHT-MCS 10-11: 4
.... 0100 .... .... .... .... = Rx Max Nss That Supports EHT-MCS 12-13: 4
0100 .... .... .... .... .... = Tx Max Nss That Supports EHT-MCS 12-13: 4
TX throughput is observed to be significantly lower than expected.
Investigation shows that TX rates are limited to EHT MCS 11, even though
the AP advertises support for EHT MCS 12/13.
The root cause is an incorrect parsing of the Supported EHT-MCS and NSS
Set element in ath12k_peer_assoc_h_eht().
IEEE Std 802.11be-2024 Figure 9-1074as describes the format for 20
MHz-Only Non-AP STAs.
IEEE Std 802.11be-2024 Figure 9-1074at describes the format for all
other AP and non-AP STAs.
Currently the first format is parsed when the peer advertises no wider
HE channel width support, without considering whether it is an AP or a
non-AP STA. This is incorrect: the peer AP's capabilities must be parsed
using Figure 9-1074at even when it operates on 20 MHz only. Parsing it
as Figure 9-1074as causes rx_tx_mcs13_max_nss to be interpreted as zero,
which is then passed to firmware, leading firmware to assume the peer
does not support MCS 13 and to limit TX rates at MCS 11.
Fix this by parsing the Figure 9-1074as format only when the peer is a
20 MHz-Only non-AP STA, i.e. when the local interface operates as AP or
mesh point.
Kyle Farnung [Thu, 14 May 2026 04:52:12 +0000 (21:52 -0700)]
wifi: ath11k: clear shared SRNG pointer state on restart
LMAC rings reuse the shared rdp/wrp pointer buffers without going
through the normal SRNG hw-init path that zeros non-LMAC ring
pointers. After restart, ath11k_hal_srng_clear() can therefore hand
stale hp/tp state from the previous firmware instance back to the new
one.
Clear the shared pointer buffers while keeping the allocations in
place so restart still avoids reallocating SRNG DMA memory, but starts
with fresh ring-pointer state.
Fixes: 32be3ca4cf78b ("wifi: ath11k: HAL SRNG: don't deinitialize and re-initialize again") Cc: stable@vger.kernel.org Closes: https://lore.kernel.org/all/CAOPSVF04q6uvVdq8GTRLHBrVMdpt9=o9wVcFMc6f-yhmSBcZqQ@mail.gmail.com/ Signed-off-by: Kyle Farnung <kfarnung@gmail.com> Reviewed-by: Rameshkumar Sundaram <rameshkumar.sundaram@oss.qualcomm.com> Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Link: https://patch.msgid.link/20260513-kfarnung-ath11k-srng-clear-pointer-state-v1-1-bc700dd8b333@gmail.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
Matthew Leach [Fri, 24 Apr 2026 09:50:35 +0000 (10:50 +0100)]
wifi: ath11k: fix peer resolution on rx path when peer_id=0
It has been observed that on certain chipsets a peer can be assigned
peer_id=0. For reception of non-aggregated MPDUs this is fine as
ath11k_dp_rx_h_find_peer() has a fallback case where it locates the peer
based upon the source MAC address. On an aggregated link, the mpdu_start
header is only populated by hardware on the first sub-MSDU. This causes
the peer resolution to be skipped for the subsequent MSDUs and the
encryption type of these frames to be set to an incorrect value,
resulting in these MSDUs being dropped by ieee80211.
ath11k_pci 0000:03:00.0: data rx skb 000000002f4b704d len 1534 peer xx:xx:xx:xx:xx:xx 0 ucast sn 3063 he160 rate_idx 9 vht_nss 2 freq 5240 band 1 flag 0x40d1a fcs-err 0 mic-err 0 amsdu-more 0 peer_id 0 first_msdu 1 last_msdu 0
ath11k_pci 0000:03:00.0: data rx skb 0000000038acd580 len 1534 peer (null) 0 ucast sn 3063 he160 rate_idx 9 vht_nss 2 freq 5240 band 1 flag 0x40d00 fcs-err 0 mic-err 0 amsdu-more 0 peer_id 0 first_msdu 0 last_msdu 1
Remove the null peer_id checks in ath11k_dp_rx_h_find_peer() and
ath11k_hal_rx_parse_mon_status_tlv(), allowing peers with an assigned ID
of 0 to be resolved.
Fixes: 2167fa606c0f ("ath11k: Add support for RX decapsulation offload") Reviewed-by: Baochen Qiang <baochen.qiang@oss.qualcomm.com> Signed-off-by: Matthew Leach <matthew.leach@collabora.com> Reviewed-by: P Praneesh <praneesh.p@oss.qualcomm.com> Link: https://patch.msgid.link/20260424-ath11k-null-peerid-workaround-v4-1-252b224d3cf6@collabora.com Signed-off-by: Jeff Johnson <jeff.johnson@oss.qualcomm.com>
The mt8167 DSI controller is fully compatible with the one found in
mt2701. Unfortunately the device tree has a dedicated compatible for
mt8167 since 2022 and it cannot be changed with a fallback nor removed at
this point. The only way to get the device to work is to add the
compatible to the driver.
CĂ¡ssio Gabriel [Mon, 18 May 2026 02:33:49 +0000 (23:33 -0300)]
ALSA: ctxfi: Keep line/mic notification controls per mixer
ctxfi stores the Line Capture Switch and Mic Capture Switch controls in
a file-scope kctls[] array so do_line_mic_switch() can notify the
opposite control when the shared line/mic input selection changes.
That storage is shared by all ctxfi cards. If more than one X-Fi card is
present, a later card can overwrite the pointers saved by an earlier one.
A control update on one card can then use another card's kcontrol object
for snd_ctl_notify(). If that other card is removed, the saved pointer can
also become stale.
Store those notification targets in struct ct_mixer instead. The mixer is
per-card state and matches the lifetime of the controls created for that
card.
drm/mediatek: Convert legacy DRM logging to drm_* helpers in mtk_dsi.c
Replace DRM_INFO(), DRM_WARN() and DRM_ERROR() calls in
drivers/gpu/drm/mediatek/mtk_dsi.c with the corresponding
drm_info(), drm_warn() and drm_err() helpers.
The drm_*() logging helpers take a struct drm_device * argument,
allowing the DRM core to prefix log messages with the correct device
name and instance. This is required to correctly distinguish log
messages on systems with multiple GPUs.
This change aligns the radeon driver with the DRM TODO item:
"Convert logging to drm_* functions with drm_device parameter".
Gustavo Sousa [Thu, 14 May 2026 21:44:46 +0000 (18:44 -0300)]
drm/xe: Define and use MCR version of COMMON_SLICE_CHICKEN4
The register COMMON_SLICE_CHICKEN4 is a MCR register on both Xe2 and
Xe3. Let's make sure to define a MCR version of it and use it for the
relevant IP versions.
Use XEHP_ as prefix for the register name, since it is MCR as of Xe_HP.
v2:
- Also change for one entry in lrc_tunnings, which was caught by
manual testing and add corresponging Fixes tag in commit message.
(Gustavo)
Fixes: 8d6f16f1f082 ("drm/xe: Extend Wa_22021007897 to Xe3 platforms") Fixes: e5c13e2c505b ("drm/xe/xe2hpg: Add Wa_22021007897") Fixes: 8ccf5f6b2295 ("drm/xe/tuning: Apply windower hardware filtering setting on Xe3 and Xe3p")
Bspec: 66534, 71185, 74417 Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20260514-rtp-mcr-check-v3-3-30dd47855fee@intel.com Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com>
(cherry picked from commit 75f65f1a4c06da1d87f28570a9d4cdad28f13360) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Reading debugfs file (/sys/kernel/debug/dri/0/gt*/pf/adverse_events)
with CFI (Control Flow Integrity) enabled, the kernel panics at
xe_gt_debugfs_simple_show+0x82/0xc0.
xe_gt_debugfs_simple_show() declare a function pointer expecting int
return type, but xe_gt_sriov_pf_monitor_print_events() is void return
type, leading to CFI failure and kernel panic.
Michal Wajdeczko [Thu, 14 May 2026 15:57:26 +0000 (17:57 +0200)]
drm/xe/vf: Fix signature of print functions
We have plugged-in existing VF print functions into our GT debugfs
show helper as-is, but we missed that the helper expects functions
to return int, while they were defined as void. This can lead to
errors being reported when CFI is enabled.
Fixes: 63d8cb8fe3dd ("drm/xe/vf: Expose SR-IOV VF attributes to GT debugfs") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Mohanram Meenakshisundaram <mohanram.meenakshisundaram@intel.com> Reviewed-by: Shuicheng Lin <shuicheng.lin@intel.com> Link: https://patch.msgid.link/20260514155726.7165-1-michal.wajdeczko@intel.com
(cherry picked from commit 314e31c9a8a1c421ee4f7f755b9348aefbbca090) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Shuicheng Lin [Mon, 11 May 2026 15:41:34 +0000 (15:41 +0000)]
drm/xe/gsc: Fix double-free of managed BO in error path
The error path in xe_gsc_init_post_hwconfig() explicitly frees a BO
allocated with xe_managed_bo_create_pin_map() via
xe_bo_unpin_map_no_vm(). Since the managed BO already has a devm
cleanup action registered, this causes a double-free when devm
unwinds during probe failure.
Remove the explicit free and let devm handle it, consistent with
all other xe_managed_bo_create_pin_map() callers.
Michal Wajdeczko [Mon, 11 May 2026 17:28:37 +0000 (19:28 +0200)]
drm/xe/memirq: Update interrupt handler logic
To workaround some corner case hardware limitations, new programming
note for the memory based interrupt handler suggests to assume that
some status bytes, like GT_MI_USER_INTERRUPT and GUC_INTR_GUC2HOST,
are always set. Update our interrupt handler to follow the new rules.
Bspec: 53672 Fixes: a6581ebe7685 ("drm/xe/vf: Introduce Memory Based Interrupts Handler") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patch.msgid.link/20260511172838.2299-2-michal.wajdeczko@intel.com
(cherry picked from commit 284f4cae4579eed9dd4406f18a6c1becc69f8931) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Introduce a device tree node for the AST2600 PWM/Tach controller.
Describe register range, clock, reset, and cell configuration.
Set status to "disabled" by default.
Prepares for enabling PWM and tachometer support on platforms
utilizing this SoC.
Signed-off-by: Billy Tsai <billy_tsai@aspeedtech.com> Signed-off-by: Andrew Jeffery <andrew@codeconstruct.com.au>
Michal Pecio [Mon, 18 May 2026 05:32:58 +0000 (07:32 +0200)]
usb: core: Clean up SuperSpeed/eUSB2 descriptor validation logging
Core usually prints endpoint addresses with 0x%X format.
Change this code to use it too, instead of just %d.
Particularly for IN, 0x83 seems more readable than 131.
While at that, fix checkpatch warnings about multi-line
quoted strings, as well as missing or doubled whitespace
in those strings.
Michal Pecio [Mon, 18 May 2026 05:32:07 +0000 (07:32 +0200)]
usb: core: Fix up Interrupt IN endpoints with bogus wBytesPerInterval
Tao Xue found that some common devices violate USB 3.x section 9.6.7
by reporting wBytesPerInterval lower than the size of packets they
actually send. I confirmed that AX88179 may set it to 0 and RTL8153
CDC configuration sets it to 8 but sends both 8 and 16 byte packets:
Most xHCI host controllers neglect interrupt bandwidth reservations
and let such devices exceed theirs, some fail the URB with EOVERFLOW.
Assume that wBytesPerInterval lower than wMaxPacketSize is bogus and
increase it to the worst case maximum on interrupt IN endpoints. This
solves xHCI problems and appears to have no other effect. Interrupt
transfers are not limited to one interval and drivers submit URBs of
class defined size without looking at wBytesPerInterval. Any multi-
interval transfer is considered terminated by a packet shorter than
wMaxPacketSize regardless of wBytesPerInterval - see USB3 8.10.3.
Stay in spec on OUT endpoints and isochronous. No buggy devices are
known and we don't want to risk sending more data than the device
is prepared to handle or confusing isoc drivers regarding altsetting
capacities guaranteed by the device itself. And don't complain when
wMaxPacketSize <= wBytesPerInterval < wMaxPacketSize * (bMaxBurst+1)
because enabling this seems to be the exact goal of the spec.
Reported-and-tested-by: Tao Xue <xuetao09@huawei.com> Closes: https://lore.kernel.org/linux-usb/20260402021400.28853-1-xuetao09@huawei.com/ Cc: stable@vger.kernel.org Signed-off-by: Michal Pecio <michal.pecio@gmail.com> Link: https://patch.msgid.link/20260518073207.5b7d26e7.michal.pecio@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Michal Pecio [Mon, 18 May 2026 05:31:21 +0000 (07:31 +0200)]
usb: core: Fix SuperSpeed root hub wMaxPacketSize
There is no good reason to have wBytesPerInterval < wMaxPacketSize -
either one is too low or the other too high, and we may want to warn
about such descriptors. Start with cleaning up our own root hubs.
USB 3.2 section 10.15.1 sets wMaxPacketSize and wBytesPerInterval of
SuperSpeed hub status endpoints at 2 bytes, so reduce wMaxPacketSize
from its former value of 4, which was derived from USB 2.0 spec and
the kernel's USB_MAXCHILDREN limit. They don't apply because USB 3.2
10.15.2.1 specifies SuperSpeed hubs to have up to 15 ports.
Boris Brezillon [Mon, 18 May 2026 11:41:45 +0000 (13:41 +0200)]
drm/gem: Make the GEM LRU lock part of drm_device
Recently, a few races have been discovered in the GEM LRU logic, all
of them caused by the fact the LRU lock is accessed through
gem->lru->lock, and that very same lock also protects changes to
gem->lru, leading to situations where gem->lru needs to first be
accessed without the lock held, to then get the lru to access the lock
through and finally take the lock and do the expected operation.
Currently, the only driver making use of this API (MSM) declares a
device-wide lock, and the user we're about to add (panthor) will
do the same. There's no evidence that we will ever have a driver
that wants different pools of LRUs protected by different locks under
the same drm_device. So we're better off moving this lock to drm_device
and always locking it through obj->dev->gem_lru_mutex, or directly
through dev->gem_lru_mutex.
If anyone ever needs more fine-grained locking, this can be revisited
to pass some drm_gem_lru_pool object representing the pool of LRUs
under a specific lock, but for now, the per-device lock seems to be
enough.
Fixes: e7c2af13f811 ("drm/gem: Add LRU/shrinker helper") Reported-by: Chia-I Wu <olvaffe@gmail.com> Closes: https://gitlab.freedesktop.org/panfrost/linux/-/work_items/86 Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com> Link: https://patch.msgid.link/20260518-panthor-shrinker-fixes-v4-1-1920234470d5@collabora.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
usb: typec: ucsi: ccg: reject firmware images without a ':' record header
do_flash() locates the first .cyacd record with
p = strnchr(fw->data, fw->size, ':');
while (p < eof) {
s = strnchr(p + 1, eof - p - 1, ':');
...
}
If the firmware image contains no ':' byte, strnchr() returns NULL.
NULL compares less than the valid kernel pointer eof, so the loop body
runs and strnchr() is called with p + 1 == (void *)1 and a length of
roughly (unsigned long)eof, causing a wonderful crash.
The not_signed_fw fallthrough earlier in do_flash() and the chip-state
branches in ccg_fw_update_needed() allow an unsigned blob to reach this
loop, so a root user who can place a crafted file under /lib/firmware
and write the do_flash sysfs attribute can trigger the oops.
Bail out with -EINVAL when the initial strnchr() returns NULL.
ends up copying close to UINT_MAX bytes from cdev->landing_page into
cdev->req->buf. KASAN reports a slab-out-of-bounds in composite_setup
on the kmalloc-2k gadget_info allocation, and FORTIFY_SOURCE traps the
memcpy as a 4294967293-byte field-spanning write into
url_descriptor->URL (size 252).
A USB host can reach this from a single SETUP packet against any
gadget that has webusb/use=1 and a landingPage configured.
Handle the small-wLength case before the math: when the host requested
fewer bytes than the URL descriptor header, only the header is
meaningful and no URL bytes need to be copied. Setting
landing_page_length to landing_page_offset makes the existing memcpy a
no-op and leaves the descriptor returned to the host unchanged for all
larger wLength values.
Dan Carpenter [Tue, 12 May 2026 10:14:59 +0000 (13:14 +0300)]
usb: typec: tipd: Fix error code in tps6598x_probe()
Set the error code on these two error paths. The existing code returns
success.
Fixes: 77ed2f4538da ("usb: typec: tipd: Use read_power_status function in probe") Fixes: 04041fd7d6ec ("usb: typec: tipd: Read data status in probe and cache its value") Cc: stable <stable@kernel.org> Signed-off-by: Dan Carpenter <error27@gmail.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://patch.msgid.link/agL9o7wUK1dOVBTy@stanley.mountain Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Nirmoy Das [Thu, 14 May 2026 14:42:57 +0000 (07:42 -0700)]
ovl: keep err zero after successful ovl_cache_get()
ovl_iterate_merged() stores PTR_ERR(cache) in err before checking
IS_ERR(cache). On success err holds the truncated cache pointer and
can be returned as a bogus non-zero error.
The syzbot reproducer reaches this through overlay-on-overlay readdir:
Fengnan Chang [Wed, 13 May 2026 09:13:49 +0000 (17:13 +0800)]
dm: limit target bio polling to one shot
dm_poll_bio() is the ->poll_bio() callback for a stacked dm device.
The caller only knows about the dm queue, so it may decide to do a
spinning poll if it thinks a single queue is being polled. Passing those
flags unchanged to the mapped clone lets blk_mq_poll() spin on a target
queue from inside dm_poll_bio().
With io_uring IOPOLL on a dm-stripe target this can keep a task in
dm_poll_bio() -> bio_poll() -> blk_mq_poll()
long enough to trigger an RCU CPU stall, before io_uring gets back to
io_iopoll_check() and its need_resched() check.
Keep dm's ->poll_bio() bounded by forcing one-shot polling for target
bios. The caller can invoke dm_poll_bio() again if it wants to keep
polling, and it also gets a chance to reap completions or reschedule
between passes.
Mikulas Patocka [Mon, 11 May 2026 11:04:16 +0000 (13:04 +0200)]
dm-ioctl: report an error if a device has no table
When we send a message to a device that has no table, the return code was
not set. The code would return "2", which is not considered a valid return value.
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Reviewed-by: Benjamin Marzinski <bmarzins@redhat.com>
As described in commit 2bc057692599 ("block: don't make REQ_POLLED imply
REQ_NOWAIT"), which fixed the same issue for the block device node, there
are valid cases to poll for I/O completion without REQ_NOWAIT.
Additionally, sing REQ_NOWAIT for file system writes is currently not
supported as file systems writes are not idempotent and would need a
retry of just the bio and not the entire operation to be fully supported.
Switch iomap to set REQ_POLLED and remove the now unused bio_set_polled
helper.
Gary Guo [Tue, 12 May 2026 12:09:53 +0000 (13:09 +0100)]
rust: pin-init: internal: project using full slot
Instead of projecting using pointer to a field project the full slot. This
further shifts the code generation from the initializer site to the struct
definition site, which means less code is generated overall.
It also makes the safety comment easier to justify, as now the projection
is done by the `#[pin_data]` macro which has full visibility of pinnedness
of fields.
The field alignment could also be checked on the `#[pin_data]` side;
however, since `init!()` macro works for other type of structs, we cannot
remove the alignment check from `init!`/`pin_init!` side anyway, so I opted
to still keep the alignment check in init.rs.
Gary Guo [Tue, 12 May 2026 12:09:52 +0000 (13:09 +0100)]
rust: pin-init: internal: project slots instead of references
By projecting slots, the `pin_init!` and `init!` code path can be more
unified. This also reduces the amount of macro-generated code and shifts
them to the shared infrastructure.
Gary Guo [Tue, 12 May 2026 12:09:50 +0000 (13:09 +0100)]
rust: pin-init: internal: use marker on drop guard type for pinned fields
Instead of projecting the created reference, simply create drop guards with
different marker types and have the `let_binding()` method of guards of
different marker produce different type instead.
This allows more flexible lifetime as this is now controlled by the guard.
This will be needed when implementing self-referential fields.
Gary Guo [Tue, 12 May 2026 12:09:49 +0000 (13:09 +0100)]
rust: pin-init: internal: init: handle code blocks early
`InitializerKind::Code` is a special case where it does not initialize a
field, and thus generate no guard and accessors. Handle it earlier and make
the rest of the code more linear.
Niklas Cassel [Thu, 14 May 2026 07:39:02 +0000 (09:39 +0200)]
ata: libata-scsi: do not needlessly defer commands when using PMP with FBS
The ACS specification does not allow a non-NCQ command to be issued while
an NCQ command is outstanding.
Commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation")
introduced a feature where a deferred non-NCQ command gets issued from a
workqueue. The design stores a single non-NCQ command per port.
However, when using Port Multipliers (PMPs), specifically PMPs that
support FIS-Based Switching (FBS), non-NCQ and NCQ commands can be mixed
on the same port, just not for the same link, see e.g. ata_std_qc_defer()
which is, and always has operated on a per-link basis.
Therefore, move the deferred_qc from struct ata_port to struct ata_link.
This way, when using a PMP with FBS, we will not needlessly defer commands
to all other links, just because one link issued a non-NCQ command while
having an NCQ command outstanding. Only commands for that specific link
will be deferred. This is in line with how PMPs with FBS worked before
commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation").
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Tested-by: Tommy Kelly <linux@tkel.ly> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Niklas Cassel [Thu, 14 May 2026 07:39:01 +0000 (09:39 +0200)]
ata: libata-scsi: do not use the deferred QC feature on PMPs with CBS
When using Port Multipliers (PMPs) with Command-Based Switching (CBS), you
can only issue commands to one link at a time. For PMPs with CBS, there is
already code to handle commands being sent to different links in
sata_pmp_qc_defer_cmd_switch() using ap->excl_link. sata_sil24 also makes
use of ap->excl_link.
A user on the list reported that commit 0ea84089dbf6 ("ata: libata-scsi:
avoid Non-NCQ command starvation") broke PMPs with CBS. The commit
introduced code that stores a deferred qc in ap->deferred_qc, to later be
issued via a workqueue. It turns out that this change is incompatible with
the existing ap->excl_link handling used by PMPs with CBS.
Thus, modify sata_pmp_qc_defer_cmd_switch() and sil24_qc_defer() to return
ATA_DEFER_LINK_EXCL, and make sure that the deferred QC handling via
workqueue is not used for this return value.
This way, PMPs with CBS will work once again. Note that the starvation
referenced in commit 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ
command starvation") can only happen on libsas ports, and libsas does not
support Port Multipliers, thus there is no harm of reverting back to the
previous way of deferring commands for PMPs with CBS.
Non-libsas ports connected to anything but a PMP with CBS (e.g. a normal
drive or a PMP with FBS) will continue using the deferred workqueue, since
it does result in lower completion latencies for non-NCQ commands, even
though the workqueue is not strictly needed to avoid starvation for
non-libsas ports.
If we want to modify the scope of the workqueue issuing to also handle
PMPs with CBS, then we should ensure that we can save both NCQ and non-NCQ
commands in ap->deferred_qc, while also removing the existing PMP CBS
handling using ap->excl_link, such that we don't duplicate features.
While at it, also add a comment explaining how the ap->excl_link mechanism
works.
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Tested-by: Tommy Kelly <linux@tkel.ly> Reported-by: Tommy Kelly <linux@tkel.ly> Closes: https://lore.kernel.org/linux-ide/ce09cc21-a8e9-4845-b205-35411e22fba9@tkel.ly/ Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>
Liviu Dudau [Thu, 7 May 2026 10:50:46 +0000 (11:50 +0100)]
drm/syncobj: Fix memory leak in drm_syncobj_find_fence()
Commit 18226ba52159 ("drm/syncobj: reject invalid flags in
drm_syncobj_find_fence") forgot to take into account the fact that
drm_syncobj_find() takes a reference to syncobj and returns early
without dropping the reference, leading to memory leaks.
Fixes: 18226ba52159 ("drm/syncobj: reject invalid flags in drm_syncobj_find_fence")
Reported by: Sam Spencer <sam.spencer@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Acked-by: Erik Kurzinger <ekurzinger@gmail.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://lore.kernel.org/all/20260507144425.2488057-1-liviu.dudau@arm.com
Niklas Cassel [Thu, 14 May 2026 07:39:00 +0000 (09:39 +0200)]
ata: libata-scsi: do not use the deferred QC feature for ATA_DEFER_PORT
The deferred QC feature was meant to handle mixed NCQ and non-NCQ commands,
i.e. for return value ATA_DEFER_LINK.
ATA_DEFER_PORT is returned by PATA drivers, but also certain SATA drivers
like sata_mv and sata_sil24 that uses ap->excl_link to workaround hardware
bugs in these HBAs. Regardless of the reason, using the deferred QC feature
for ATA_DEFER_PORT is always wrong, and will break the ap->excl_link usage
of the SATA drivers that rely on that feature.
Modify ata_scsi_qc_issue() to only use the deferred QC feature when mixing
NCQ and non-NCQ commands, i.e. ATA_DEFER_LINK.
Fixes: 0ea84089dbf6 ("ata: libata-scsi: avoid Non-NCQ command starvation") Tested-by: Tommy Kelly <linux@tkel.ly> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org>