Alex Deucher [Wed, 27 Aug 2025 15:34:14 +0000 (11:34 -0400)]
drm/amdgpu: clean up and unify hw fence handling
Decouple the amdgpu fence from the amdgpu_job structure.
This lets us clean up the separate fence ops for the embedded
fence and other fences. This also allows us to allocate the
vm fence up front when we allocate the job.
v2: Additional cleanup suggested by Christian
v3: Additional cleanups suggested by Christian
v4: Additional cleanups suggested by David and
vm fence fix
v5: cast seqno (David)
Cc: David.Wu3@amd.com Cc: christian.koenig@amd.com Tested-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: David (Ming Qiang) Wu <David.Wu3@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Stop overloading power limit with limit type
When passed around internally the upper 8 bits of power limit include
the limit type. This is non-obvious without digging into the nuances
of each function. Instead pass the limit type as an argument to all
applicable layers.
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 8 Oct 2025 19:07:53 +0000 (15:07 -0400)]
drm/amdgpu/userq: drop VCN and VPE doorbell handling
VCN and VPE userqs are not yet supported and this code is
not correct. Userspace should provide the correct
doorbell offset with in their doorbell page for the IP.
Adjusting it here will not work as expected as userspace
and the queue itself will have different offsets.
We need to add a INFO IOCTL query to get the offset and
range for each IP within the doorbell page to handle this
properly.
If a userq failed to suspend the rest of the suspend sequence may
have problems. Pass the error code up to the caller for a decision
on what to do.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Fix error handling with multiple userq IDRs
If multiple userq IDR are in use and there is an error handling one
at suspend or resume it will be silently discarded.
Switch the suspend/resume() code to use guards and return immediately.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
If IP suspend fails the callers should be notified so that they can
potentially react.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Don't always set IP block HW status to false
amdgpu_device_ip_suspend_phase2() calls amdgpu_ip_block_suspend()
which already sets HW block status to false when succeeding with
IP suspend. Remove the explicit call in
amdgpu_device_ip_suspend_phase2() so that the status is accurate.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd: Remove comment about handling errors in amdgpu_device_ip_suspend_phase1()
Error handling was introduced in commit e095026f0066 ("drm/amdgpu:
validate suspend before function call") so the comment about TODO is no
longer needed.
Fixes: e095026f0066 ("drm/amdgpu: validate suspend before function call") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_device_ip_suspend() doesn't have a caller outside of
amdgpu_device.c. Make it static.
No intended functional changes.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Why]
The shutdown() callback uses amdgpu_ip_suspend() which doesn't notify
drm clients during shutdown. This could lead to hangs.
[How]
Change amdgpu_pci_shutdown() to call the same sequence as suspend/resume.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
When a user unmaps a userq VA, the driver must ensure
the queue has no in-flight jobs. If there is pending work,
the kernel should wait for the attached eviction (bookkeeping)
fence to signal before deleting the mapping.
Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 25 Sep 2025 10:09:56 +0000 (12:09 +0200)]
drm/amdgpu: reduce queue timeout to 2 seconds v2
There has been multiple complains that 10 seconds are usually to long.
The original requirement for longer timeout came from compute tests on
AMDVLK, since that is no longer a topic reduce the timeout back to 2
seconds for all queues.
While at it also remove any special handling for compute queues under
SRIOV or pass through.
v2: fix checkpatch warning.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: remove gart_window_lock usage from gmc v12
This lock was part of the SDMA workaround originally implemented in
gmc_v10_0_flush_gpu_tlb (a70cb2176f7ef6f moved it to
amdgpu_gmc_flush_gpu_tlb).
This means this lock is useless and be safely dropped.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amdgpu_ttm_copy_mem_to_mem has a single caller, make sure the out
fence is non-NULL to simplify the code.
Since none of the pointers should be NULL, we can enable
__attribute__((nonnull))__.
While at it make the function static since it's only used from
amdgpuu_ttm.c.
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Driver implementation for cursor offloading to DMU
[Why]
We require an interlock between driver and firmware for upcoming
features and given that this could possibly happen on any single
cursor programming call (and that we can't asynchronously wait for
firmware to respond because of it) we'd be regressing cursor performance
by at least an extra 40us per call.
When we could possibly have cursor update every 20us - 100s from high
frequency gaming mice this means that we'd be stuttering or dropping
updates and impacting overall cursor performance.
We want a solution that can:
1. Interlock between other firmware features
2. Not stall out or require the DMCUB lock for every single update
[How]
When cursor offloading is enabled and supported by an ASIC driver will
route the cursor programming through to DMU as part of the regular
DC stream cursor programming interfaces for attributes and position.
The atomic pipe programming version will not be updated: this will still
follow the existing programming path by keeping track of a field that
specifies when the register writes should be deferred to DMU.
Cursor locking is not required when cursor offload is in progress since
the updates are consolidated and processed by DMU once at the end
of the frame in a periodic manner.
The shared buffer the firmware queries from is allocated along with the
rest of the scratch state region in an area that's accessible by
both firmware and driver.
The size of the cursor offload (v1) state will not change, but it does
have a unique union per ASIC version with room for expansion if needed.
When firmware features notifying DMU of DRR updates are not enabled we
now send an explicit vtotal min/max update via driver to DMU firmware
whenever the vtotal max changes. This is to allow the cursor programming
to determine the appropriate latch update point offset from vupdate.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
- Fix slice width calculation for YCbCr420
- Fix DTBCLK gating
- Use NRD cap as lttpr cap
- Consolidate DML2 FP guards
- DML2.1 Update
- Firmware Release 0.1.29.0 changes
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Add new interface for offloading cursor programming to DMUB.
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Prevent Gating DTBCLK before It Is Properly Latched
[why]
1. With allow_0_dtb_clk enabled, the time required to latch DTBCLK to 600 MHz
depends on the SMU. If DTBCLK is not latched to 600 MHz before set_mode completes,
gating DTBCLK causes the DP2 sink to lose its clock source.
2. The existing DTBCLK gating sequence ungates DTBCLK based on both pix_clk and ref_dtbclk,
but gates DTBCLK when either pix_clk or ref_dtbclk is zero.
pix_clk can be zero outside the set_mode sequence before DTBCLK is properly latched,
which can lead to DTBCLK being gated by mistake.
[how]
Consider both pixel_clk and ref_dtbclk when determining when it is safe to gate DTBCLK;
this is more accurate.
Reviewed-by: Charlene Liu <charlene.liu@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: lttpr cap should be nrd cap in bw_alloc mode
[WHY]
When bw allocation mode enabled, dpia may reports lttpr cap with
reduced common cap. It would cause driver not start pre-training with
max available bandwidth.
[How]
When bw allocation mode enabled, use NRD cap as lttpr cap.
Reviewed-by: Cruise Hung <cruise.hung@amd.com> Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Rename FAMS2 global control lock to DMUB HW control lock
[Why]
FAMS2 dictates whether the inbox0 HW lock is required, but it is not the
only feature that may determine this.
In order to leverage the faster inbox0 HW lock in place of the inbox1
ringbuffer based control lock it's desirable to utilize the HWSS
based locking protocol FAMS2 has already implemented.
[How]
Rename the FAMS2 global control lock to DMUB HW control lock.
This is purely a refactor with no functional change, the logic that will
determine which features need to enable this HW lock will be added in a
future commit.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Rename should_use_dmub_lock to reflect inbox1 usage
[Why]
Newer DCN use the DMCUB HW lock via inbox0 for performance reasons while
older ones will use inbox1.
The should_use_dmub_lock() function does not describe whether the lock
in general should be used, but whether it should be used via inbox1.
[How]
Rename the function to should_use_dmub_inbox1_lock() to reflect this.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Support possibly NULL link for should_use_dmub_lock
[Why]
It's possible to have a stream enabled without a link or link encoder.
There are cases where we'd still like to interlock the driver
programming from firmware programming to ensure that we don't put the
hardware in an undefined (or error) state if two programming sequences
are simultaneously executed on the same hardware blocks.
[How]
Add an explicit DC parameter to should_use_dmub_lock().
Make pointers to should_use_dmub_lock() const since it's a checker
function that shouldn't modify state.
Update the callsites to pass in DC explicitly.
Check that the link is non-NULL before deferencing and performing link
based checks.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Lipski [Wed, 17 Sep 2025 15:21:30 +0000 (11:21 -0400)]
drm/amd/display: Consolidate two DML2 FP guards
[Why&How]
Consolidate two FP guards into one in dml2 since they are separated by
one line of code, independent from the guard.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Correct slice width calculation for YCbCr420
[Why]
-OVT compliance testing for 5120x2880p300Hz YCbCr420 was failing due to
incorrect slice width being calculated
[How]
-Ensure slice width is divisible by 2 for 420 to comply with spec
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Relja Vojvodic <rvojvodi@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[Summary of changes]
- Updated structs
- Renaming of variables for clarity
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Austin Zheng <Austin.Zheng@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Clean up the amdgpu hmm range functions for clearer
definition of each.
a. Split amdgpu_ttm_tt_get_user_pages_done into two:
1. amdgpu_hmm_range_valid: To check if the user pages
are valid and update seq num
2. amdgpu_hmm_range_free: Clean up the hmm range
and pfn memory.
b. amdgpu_ttm_tt_get_user_pages_done and
amdgpu_ttm_tt_discard_user_pages are similar function so remove
discard and directly use amdgpu_hmm_range_free to clean up the
hmm range and pfn memory.
Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: use user provided hmm_range buffer in amdgpu_ttm_tt_get_user_pages
update the amdgpu_ttm_tt_get_user_pages and all dependent function
along with it callers to use a user allocated hmm_range buffer instead
hmm layer allocates the buffer.
This is a need to get hmm_range pointers easily accessible
without accessing the bo and that is a requirement for the
userqueue to lock the userptrs effectively.
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Wed, 18 Jun 2025 14:31:15 +0000 (10:31 -0400)]
drm/amdkfd: fix suspend/resume all calls in mes based eviction path
Suspend/resume all gangs should be done with the device lock is held.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Harish Kasiviswanathan <harish.kasiviswanathan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Thu, 5 Jun 2025 14:18:37 +0000 (10:18 -0400)]
drm/amdgpu: enable suspend/resume all for gfx 12
Suspend/resume all gangs has been available for GFX12 for a while now
so enable it.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
By design the MES will return an array result that is twice the number
of hung doorbells it can report.
i.e. if up k reported doorbells are supported, then the
second half of the array, also of length k, holds the HQD information
(type/queue/pipe) where queue 1 corresponds to index 0 and k,
queue 2 corresponds to index 1 and k + 1 etc ...
The driver will use the HDQ info to target queue/pipe reset for
hardware scheduled user compute queues.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Thu, 9 Oct 2025 14:48:09 +0000 (10:48 -0400)]
drm/amdgpu: fix initialization of doorbell array for detect and hang
Initialized doorbells should be set to invalid rather than 0 to prevent
driver from over counting hung doorbells since it checks against the
invalid value to begin with.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Thu, 9 Oct 2025 14:45:42 +0000 (10:45 -0400)]
drm/amdgpu: fix gfx12 mes packet status return check
GFX12 MES uses low 32 bits of status return for success (1 or 0)
and high bits for debug information if low bits are 0.
GFX11 MES doesn't do this so checking full 64-bit status return
for 1 or 0 is still valid.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
Jesse.Zhang [Mon, 13 Oct 2025 05:46:12 +0000 (13:46 +0800)]
drm/amdgpu: Fix NULL pointer dereference in VRAM logic for APU devices
Previously, APU platforms (and other scenarios with uninitialized VRAM managers)
triggered a NULL pointer dereference in `ttm_resource_manager_usage()`. The root
cause is not that the `struct ttm_resource_manager *man` pointer itself is NULL,
but that `man->bdev` (the backing device pointer within the manager) remains
uninitialized (NULL) on APUs—since APUs lack dedicated VRAM and do not fully
set up VRAM manager structures. When `ttm_resource_manager_usage()` attempts to
acquire `man->bdev->lru_lock`, it dereferences the NULL `man->bdev`, leading to
a kernel OOPS.
1. **amdgpu_cs.c**: Extend the existing bandwidth control check in
`amdgpu_cs_get_threshold_for_moves()` to include a check for
`ttm_resource_manager_used()`. If the manager is not used (uninitialized
`bdev`), return 0 for migration thresholds immediately—skipping VRAM-specific
logic that would trigger the NULL dereference.
2. **amdgpu_kms.c**: Update the `AMDGPU_INFO_VRAM_USAGE` ioctl and memory info
reporting to use a conditional: if the manager is used, return the real VRAM
usage; otherwise, return 0. This avoids accessing `man->bdev` when it is
NULL.
3. **amdgpu_virt.c**: Modify the vf2pf (virtual function to physical function)
data write path. Use `ttm_resource_manager_used()` to check validity: if the
manager is usable, calculate `fb_usage` from VRAM usage; otherwise, set
`fb_usage` to 0 (APUs have no discrete framebuffer to report).
This approach is more robust than APU-specific checks because it:
- Works for all scenarios where the VRAM manager is uninitialized (not just APUs),
- Aligns with TTM's design by using its native helper function,
- Preserves correct behavior for discrete GPUs (which have fully initialized
`man->bdev` and pass the `ttm_resource_manager_used()` check).
v4: use ttm_resource_manager_used(&adev->mman.vram_mgr.manager) instead of checking the adev->gmc.is_app_apu flag (Christian)
Reviewed-by: Christian König <christian.koenig@amd.com> Suggested-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Fri, 10 Oct 2025 18:02:40 +0000 (23:32 +0530)]
drm/amdgpu: fix bit shift logic
BIT_ULL(n) sets nth bit, remove explicit shift and set the position
Fixes: a7a411e24626 ("drm/amdgpu: fix shift-out-of-bounds in amdgpu_debugfs_jpeg_sched_mask_set") Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Mon, 13 Oct 2025 06:06:42 +0000 (08:06 +0200)]
drm/amd/powerplay: Fix CIK shutdown temperature
Remove extra multiplication.
CIK GPUs such as Hawaii appear to use PP_TABLE_V0 in which case
the shutdown temperature is hardcoded in smu7_init_dpm_defaults
and is already multiplied by 1000. The value was mistakenly
multiplied another time by smu7_get_thermal_temperature_range.
Fixes: 4ba082572a42 ("drm/amd/powerplay: export the thermal ranges of VI asics (V2)") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1676 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Gui-Dong Han [Wed, 8 Oct 2025 03:43:27 +0000 (03:43 +0000)]
drm/amdgpu: use atomic functions with memory barriers for vm fault info
The atomic variable vm_fault_info_updated is used to synchronize access to
adev->gmc.vm_fault_info between the interrupt handler and
get_vm_fault_info().
The default atomic functions like atomic_set() and atomic_read() do not
provide memory barriers. This allows for CPU instruction reordering,
meaning the memory accesses to vm_fault_info and the vm_fault_info_updated
flag are not guaranteed to occur in the intended order. This creates a
race condition that can lead to inconsistent or stale data being used.
The previous implementation, which used an explicit mb(), was incomplete
and inefficient. It failed to account for all potential CPU reorderings,
such as the access of vm_fault_info being reordered before the atomic_read
of the flag. This approach is also more verbose and less performant than
using the proper atomic functions with acquire/release semantics.
Fix this by switching to atomic_set_release() and atomic_read_acquire().
These functions provide the necessary acquire and release semantics,
which act as memory barriers to ensure the correct order of operations.
It is also more efficient and idiomatic than using explicit full memory
barriers.
Fixes: b97dfa27ef3a ("drm/amdgpu: save vm fault information for amdkfd") Cc: stable@vger.kernel.org Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com> Signed-off-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Wed, 3 Sep 2025 17:48:23 +0000 (13:48 -0400)]
drm/amdgpu: set an error on all fences from a bad context
When we backup ring contents to reemit after a queue reset,
we don't backup ring contents from the bad context. When
we signal the fences, we should set an error on those
fences as well.
Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 15 Sep 2025 16:37:32 +0000 (12:37 -0400)]
drm/amdgpu: handle wrap around in reemit handling
Compare the sequence numbers directly.
Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Fri, 26 Sep 2025 21:31:32 +0000 (17:31 -0400)]
drm/amdgpu: fix handling of harvesting for ip_discovery firmware
Chips which use the IP discovery firmware loaded by the driver
reported incorrect harvesting information in the ip discovery
table in sysfs because the driver only uses the ip discovery
firmware for populating sysfs and not for direct parsing for the
driver itself as such, the fields that are used to print the
harvesting info in sysfs report incorrect data for some IPs. Populate
the relevant fields for this case as well.
Fixes: 514678da56da ("drm/amdgpu/discovery: fix fw based ip discovery") Acked-by: Tom St Denis <tom.stdenis@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Mon, 22 Sep 2025 12:18:16 +0000 (14:18 +0200)]
drm/amdgpu: block CE CS if not explicitely allowed by module option
The Constant Engine found on gfx6-gfx10 HW has been a notorious source of
problems.
RADV never used it in the first place, radeonsi only used it for a few
releases around 2017 for gfx6-gfx9 before dropping support for it as
well.
While investigating another problem I just recently found that submitting
to the CE seems to be completely broken on gfx9 for quite a while.
Since nobody complained about that problem it most likely means that
nobody is using any of the affected radeonsi versions on current Linux
kernels any more.
So to potentially phase out the support for the CE and eliminate another
source of problems block submitting CE IBs unless it is enabled again
using a debug flag.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 27 Aug 2025 12:47:23 +0000 (14:47 +0200)]
drm/amdgpu: remove two invalid BUG_ON()s
Those can be triggered trivially by userspace.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:26:13 +0000 (20:26 +0200)]
drm/amd: Disable ASPM on SI
Enabling ASPM causes randoms hangs on Tahiti and Oland on Zen4.
It's unclear if this is a platform-specific or GPU-specific issue.
Disable ASPM on SI for the time being.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Timur Kristóf [Fri, 26 Sep 2025 18:26:12 +0000 (20:26 +0200)]
drm/amd/pm: Disable MCLK switching on SI at high pixel clocks
On various SI GPUs, a flickering can be observed near the bottom
edge of the screen when using a single 4K 60Hz monitor over DP.
Disabling MCLK switching works around this problem.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Revert "drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume"
This fix regressed the original issue that commit 7875afafba84
("drm/amd/display: Fix brightness level not retained over reboot") solved,
so revert it until a different approach to solve the regression that
it caused with AMD_PRIVATE_COLOR is found.
Fixes: a490c8d77d50 ("drm/amd/display: Only restore backlight after amdgpu_dm_init or dm_resume") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4620 Cc: stable@vger.kernel.org Signed-off-by: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Linus Torvalds [Sun, 12 Oct 2025 20:27:56 +0000 (13:27 -0700)]
Merge tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c fix from Wolfram Sang:
"One revert because of a regression in the I2C core which has sadly not
showed up during its time in -next"
* tag 'i2c-for-6.18-rc1-hotfix' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
Revert "i2c: boardinfo: Annotate code used in init phase only"
Linus Torvalds [Sun, 12 Oct 2025 15:45:52 +0000 (08:45 -0700)]
Merge tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Borislav Petkov:
- Skip interrupt ID 0 in sifive-plic during suspend/resume because
ID 0 is reserved and accessing reserved register space could result
in undefined behavior
- Fix a function's retval check in aspeed-scu-ic
* tag 'irq_urgent_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/sifive-plic: Avoid interrupt ID 0 handling during suspend/resume
irqchip/aspeed-scu-ic: Fix an IS_ERR() vs NULL check
Linus Torvalds [Sat, 11 Oct 2025 23:06:04 +0000 (16:06 -0700)]
Merge tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing fixes from Steven Rostedt:
"The previous fix to trace_marker required updating trace_marker_raw as
well. The difference between trace_marker_raw from trace_marker is
that the raw version is for applications to write binary structures
directly into the ring buffer instead of writing ASCII strings. This
is for applications that will read the raw data from the ring buffer
and get the data structures directly. It's a bit quicker than using
the ASCII version.
Unfortunately, it appears that our test suite has several tests that
test writes to the trace_marker file, but lacks any tests to the
trace_marker_raw file (this needs to be remedied). Two issues came
about the update to the trace_marker_raw file that syzbot found:
- Fix tracing_mark_raw_write() to use per CPU buffer
The fix to use the per CPU buffer to copy from user space was
needed for both the trace_maker and trace_maker_raw file.
The fix for reading from user space into per CPU buffers properly
fixed the trace_marker write function, but the trace_marker_raw
file wasn't fixed properly. The user space data was correctly
written into the per CPU buffer, but the code that wrote into the
ring buffer still used the user space pointer and not the per CPU
buffer that had the user space data already written.
- Stop the fortify string warning from writing into trace_marker_raw
After converting the copy_from_user_nofault() into a memcpy(),
another issue appeared. As writes to the trace_marker_raw expects
binary data, the first entry is a 4 byte identifier. The entry
structure is defined as:
struct {
struct trace_entry ent;
int id;
char buf[];
};
The size of this structure is reserved on the ring buffer with:
size = sizeof(*entry) + cnt;
Then it is copied from the buffer into the ring buffer with:
memcpy(&entry->id, buf, cnt);
This use to be a copy_from_user_nofault(), but now converting it to
a memcpy() triggers the fortify-string code, and causes a warning.
The allocated space is actually more than what is copied, as the
cnt used also includes the entry->id portion. Allocating
sizeof(*entry) plus cnt is actually allocating 4 bytes more than
what is needed.
* tag 'trace-v6.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
tracing: Stop fortify-string from warning in tracing_mark_raw_write()
tracing: Fix tracing_mark_raw_write() to use buf and not ubuf
Linus Torvalds [Sat, 11 Oct 2025 22:47:12 +0000 (15:47 -0700)]
Merge tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux
Pull Kbuild fixes from Nathan Chancellor:
- Fix UAPI types check in headers_check.pl
- Only enable -Werror for hostprogs with CONFIG_WERROR / W=e
- Ignore fsync() error when output of gen_init_cpio is a pipe
- Several little build fixes for recent modules.builtin.modinfo series
* tag 'kbuild-fixes-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kbuild/linux:
kbuild: Use '--strip-unneeded-symbol' for removing module device table symbols
s390/vmlinux.lds.S: Move .vmlinux.info to end of allocatable sections
kbuild: Add '.rel.*' strip pattern for vmlinux
kbuild: Restore pattern to avoid stripping .rela.dyn from vmlinux
gen_init_cpio: Ignore fsync() returning EINVAL on pipes
scripts/Makefile.extrawarn: Respect CONFIG_WERROR / W=e for hostprogs
kbuild: uapi: Strip comments before size type check
Reported-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Closes: https://lore.kernel.org/r/29ec0082-4dd4-4120-acd2-44b35b4b9487@oss.qualcomm.com Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Linus Torvalds [Sat, 11 Oct 2025 18:56:47 +0000 (11:56 -0700)]
Merge tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux
Pull RTC updates from Alexandre Belloni:
"This cycle, we have a new RTC driver, for the SpacemiT P1. The optee
driver gets alarm support. We also get a fix for a race condition that
was fairly rare unless while stress testing the alarms.
Subsystem:
- Fix race when setting alarm
- Ensure alarm irq is enabled when UIE is enabled
- remove unneeded 'fast_io' parameter in regmap_config
New driver:
- SpacemiT P1 RTC
Drivers:
- efi: Remove wakeup functionality
- optee: add alarms support
- s3c: Drop support for S3C2410
- zynqmp: Restore alarm functionality after kexec transition"
* tag 'rtc-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/abelloni/linux: (29 commits)
rtc: interface: Ensure alarm irq is enabled when UIE is enabled
rtc: tps6586x: Fix initial enable_irq/disable_irq balance
rtc: cpcap: Fix initial enable_irq/disable_irq balance
rtc: isl12022: Fix initial enable_irq/disable_irq balance
rtc: interface: Fix long-standing race when setting alarm
rtc: pcf2127: fix watchdog interrupt mask on pcf2131
rtc: zynqmp: Restore alarm functionality after kexec transition
rtc: amlogic-a4: Optimize global variables
rtc: sd2405al: Add I2C address.
rtc: Kconfig: move symbols to proper section
rtc: optee: make optee_rtc_pm_ops static
rtc: optee: Fix error code in optee_rtc_read_alarm()
rtc: optee: fix error code in probe()
dt-bindings: rtc: Convert apm,xgene-rtc to DT schema
rtc: spacemit: support the SpacemiT P1 RTC
rtc: optee: add alarm related rtc ops to optee rtc driver
rtc: optee: remove unnecessary memory operations
rtc: optee: fix memory leak on driver removal
rtc: x1205: Fix Xicor X1205 vendor prefix
dt-bindings: rtc: Fix Xicor X1205 vendor prefix
...
Linus Torvalds [Sat, 11 Oct 2025 18:49:00 +0000 (11:49 -0700)]
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Fixes only in drivers (ufs, mvsas, qla2xxx, target) that came in just
before or during the merge window.
The most important one is the qla2xxx which reverts a conversion to
fix flexible array member warnings, that went up in this merge window
but which turned out on further testing to be causing data corruption"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Include UTP error in INT_FATAL_ERRORS
scsi: ufs: sysfs: Make HID attributes visible
scsi: mvsas: Fix use-after-free bugs in mvs_work_queue
scsi: ufs: core: Fix PM QoS mutex initialization
scsi: ufs: core: Fix runtime suspend error deadlock
Revert "scsi: qla2xxx: Fix memcpy() field-spanning write issue"
scsi: target: target_core_configfs: Add length check to avoid buffer overflow
Linus Torvalds [Sat, 11 Oct 2025 18:19:16 +0000 (11:19 -0700)]
Merge tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull more x86 updates from Borislav Petkov:
- Remove a bunch of asm implementing condition flags testing in KVM's
emulator in favor of int3_emulate_jcc() which is written in C
- Replace KVM fastops with C-based stubs which avoids problems with the
fastop infra related to latter not adhering to the C ABI due to their
special calling convention and, more importantly, bypassing compiler
control-flow integrity checking because they're written in asm
- Remove wrongly used static branches and other ugliness accumulated
over time in hyperv's hypercall implementation with a proper static
function call to the correct hypervisor call variant
- Add some fixes and modifications to allow running FRED-enabled
kernels in KVM even on non-FRED hardware
- Add kCFI improvements like validating indirect calls and prepare for
enabling kCFI with GCC. Add cmdline params documentation and other
code cleanups
- Use the single-byte 0xd6 insn as the official #UD single-byte
undefined opcode instruction as agreed upon by both x86 vendors
- Other smaller cleanups and touchups all over the place
* tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
x86,retpoline: Optimize patch_retpoline()
x86,ibt: Use UDB instead of 0xEA
x86/cfi: Remove __noinitretpoline and __noretpoline
x86/cfi: Add "debug" option to "cfi=" bootparam
x86/cfi: Standardize on common "CFI:" prefix for CFI reports
x86/cfi: Document the "cfi=" bootparam options
x86/traps: Clarify KCFI instruction layout
compiler_types.h: Move __nocfi out of compiler-specific header
objtool: Validate kCFI calls
x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y
x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware
x86/fred: Install system vector handlers even if FRED isn't fully enabled
x86/hyperv: Use direct call to hypercall-page
x86/hyperv: Clean up hv_do_hypercall()
KVM: x86: Remove fastops
KVM: x86: Convert em_salc() to C
KVM: x86: Introduce EM_ASM_3WCL
KVM: x86: Introduce EM_ASM_1SRC2
KVM: x86: Introduce EM_ASM_2CL
KVM: x86: Introduce EM_ASM_2W
...
Linus Torvalds [Sat, 11 Oct 2025 17:51:14 +0000 (10:51 -0700)]
Merge tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Borislav Petkov:
- Simplify inline asm flag output operands now that the minimum
compiler version supports the =@ccCOND syntax
- Remove a bunch of AS_* Kconfig symbols which detect assembler support
for various instruction mnemonics now that the minimum assembler
version supports them all
- The usual cleanups all over the place
* tag 'x86_cleanups_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asm: Remove code depending on __GCC_ASM_FLAG_OUTPUTS__
x86/sgx: Use ENCLS mnemonic in <kernel/cpu/sgx/encls.h>
x86/mtrr: Remove license boilerplate text with bad FSF address
x86/asm: Use RDPKRU and WRPKRU mnemonics in <asm/special_insns.h>
x86/idle: Use MONITORX and MWAITX mnemonics in <asm/mwait.h>
x86/entry/fred: Push __KERNEL_CS directly
x86/kconfig: Remove CONFIG_AS_AVX512
crypto: x86 - Remove CONFIG_AS_VPCLMULQDQ
crypto: X86 - Remove CONFIG_AS_VAES
crypto: x86 - Remove CONFIG_AS_GFNI
x86/kconfig: Drop unused and needless config X86_64_SMP