git.ipfire.org Git - thirdparty/kernel/linux.git/log

drm/amd/display: properly handle family setting for early GC 11.5.4

Early variants need an override.

Fixes: 57d00816c6a9 ("drm/amdgpu: set family for GC 11.5.4")
Cc: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Cc: Roman Li <Roman.Li@amd.com>
Cc: Mario Limonciello <superm1@kernel.org>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Tested-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 922fccc2d3f8186008c19ba08a49ae8a9463cb50)

drm/amd/pm: Update emit clock logic

If only one level is enabled in clock table, there is no need to
follow the fine grained clock logic which expects a minimum of
two levels (min/max).

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7f19097af1496dd908a044ca95862f32d05f02df)

drm/amd/display: Update MCIF_ADDR macro to address IGT DWB regression

[Why]
A previous warning-fix commit updated type casts in the DCN3
mmhubbub code but missed updating the MCIF_ADDR macro to the
correct, fully parenthesized and casted version. This caused
a regression during DWB tests, where address values could be
misinterpreted, potentially leading to incorrect hardware
programming.

[How]
Updated the MCIF_ADDR macro in dcn30_mmhubbub.c to use the
proper parenthesization and type casting, ensuring correct
address handling. Removed redundant casts from REG_UPDATE
calls for improved clarity and consistency with current
coding standards.

Fixes: f4cdbb5d5405 ("drm/amd/display: Fix implicit narrowing conversion warnings")
Reviewed-by: Clayton King <clayton.king@amd.com>
Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4f251a5e9f2297023b00b7cab606de111931cfa3)

drm/amdgpu: rework userq fence signal processing

Move more code into a common userq function.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 12f52fab11500d0dce7d23c71909eaf0cf9aa701)

drm/amdgpu: fix build for CONFIG_DRM_FBDEV_EMULATION=n

The merge-commit 02e778f12359 ("Merge tag 'amd-drm-next-7.1-2026-03-12' of
https://gitlab.freedesktop.org/agd5f/linux into drm-next") removes the stub
for drm_fb_helper_gem_is_fb(), so the buld gets broken if DRM_FBDEV_EMULATION
is not set.

‘drm_fb_helper_gem_is_fb’; did you mean ‘drm_fb_helper_from_client’? [-Wimplicit-function-declaration]
1777 |                 if (!drm_fb_helper_gem_is_fb(dev->fb_helper, fb->obj[0])) {
      |                      ^~~~~~~~~~~~~~~~~~~~~~~
      |                      drm_fb_helper_from_client

Restore it.

Fixes: 02e778f12359 ("Merge tag 'amd-drm-next-7.1-2026-03-12' of https://gitlab.freedesktop.org/agd5f/linux into drm-next")
Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de>
Signed-off-by: Yury Norov <ynorov@nvidia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 7b81bc38e92c2522484c42671401eaa023ae8831)

drm/amdkfd: check if vm ready in svm map and unmap to gpu

Don't map or unmap svm range to gpu if vm is not ready for updates.

Why: DRM entity may already be killed when the svm worker try to
update gpu vm.

Signed-off-by: YuanShang <YuanShang.Mao@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 55f8e366c326980174a4f2b9501b524d8eb25135)

drm/amdkfd: validate SVM ioctl nattr against buffer size

Validate nattr field against the buffer size, preventing
out-of-bounds buffer access via user-controlled attribute count.

Reviewed-by: Amir Shetaia <Amir.Shetaia@amd.com>
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5eca8bfdfa456c3304ca77523718fe24254c172f)
Cc: stable@vger.kernel.org

drm/amdgpu: Avoid reset in AMDGPU unload path for APUs with GFX V11 and higher.

GFX V11 has GC block as default off IP.
Every time AMDGPU driver sends a request to PMFW
to unload MP1, PMFW will put GC in reset and
power down the voltage.Hence, skipping reset
for APUs with GFX V11 or later to avoid reset
related failures.

Fixes: 34355e61835e ("drm/amdgpu: Fix GFX hang on SteamDeck when amdgpu is reloaded")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Shubhankar Milind Sardeshpande <Shubhankar.MilindSardeshpande@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit d0a8cadffc818f51d05bc234d8da1af228bc59a3)
Cc: stable@vger.kernel.org

drm/amdgpu: Only send RMA CPER when threshold is exceeded

According to our documentation, the RMA should only occur when the
threshold has been exceeded, not met.

Fixes: 5028a24aa89a ("drm/amdgpu: Send applicable RMA CPERs at end of RAS init")
Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 8bc09a7d0e90ec45a0b4865661cf45cbbce1c3d7)

drm/amdgpu: fix root reservation in amdgpu_vm_handle_fault

svm_range_restore_pages might reserve the root bo so it must
be called after unreserving it.

Fixes: 1b135c6da061 ("drm/amdgpu: extract amdgpu_vm_lock_by_pasid from amdgpu_vm_handle_fault")
Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5cdc219fe86a1720aa4b5b4f42f11913146e6a93)

drm/amdgpu/gfx6: Support harvested SI chips with disabled TCCs (v2)

This commit fixes amdgpu to work on the Radeon HD 7870 XT
which has never worked with the Linux open source drivers before.

Some boards have "harvested" chips, meaning that some parts of
the chip are disabled and fused, and it's sold for cheaper and
under a different marketing name.
On a harvested chip, any of the following can be disabled:
- CUs (Compute Units)
- RBs (Render Backend, aka. ROP)
- Memory channels (ie. the chip has a lower bandwidth)
- TCCs (ie. less L2 cache)

Handle chips with harvested TCCs by patching the registers
that configure how TCCs are mapped.

If some TCCs are disabled, we need to make sure that
the disabled TCCs are not used, and the remaining TCCs
are used optimally.

TCP_CHAN_STEER_LO/HI control which TCC is used by TCP channels.
TCP_ADDR_CONFIG.NUM_TCC_BANKS controls how many channels are used.

Note that the TCC configuration is highly relevant to performance.
Suboptimal configuration (eg. CHAN_STEER=0) can significantly
reduce gaming performance.

For optimal performance:
- Rely on the CHAN_STEER from the golden registers table,
only skip disabled TCCs but keep the mapping order.
- Limit NUM_TCC_BANKS to number of active TCCs to avoid thrashing,
which performs better than using the same TCC twice.

v2:
- Also consider CGTS_USER_TCC_DISABLE for disabled TCCs.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=60879
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/2664
Fixes: 2cd46ad22383 ("drm/amdgpu: add graphic pipeline implementation for si v8")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 00218d15528fab9f6b31241fe5904eea4fcaa30d)

drm/amdgpu/uvd3.1: Don't validate the firmware when already validated

UVD 3.1 firmware validation seems to always fail after
attempting it when it had already been validated.
(This works similarly with the VCE 1.0 as well.)

Don't attempt repeating the validation when it's already done.

This caused issues in situations when the system isn't able
to suspend the GPU properly and so the GPU isn't actually
powered down. Then amdgpu would fail when calling the IP
block resume function.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/2887
Fixes: bb7978111dd3 ("drm/amdgpu: fix SI UVD firmware validate resume fail")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 889a2cfd889c4a4dd9d0c89ce9a8e60b78be71dd)

drm/amdgpu: fix AMDGPU_INFO_READ_MMR_REG

There were multiple issues in that code.

First of all the order between the reset semaphore and the mm_lock was
wrong (e.g. copy_to_user) was called while holding the lock.

Then we allocated memory while holding the reset semaphore which is also
a pretty big bug and can deadlock.

Then we used down_read_trylock() instead of waiting for the reset to
finish.

Signed-off-by: Christian König <christian.koenig@amd.com>
Fixes: 9e823f307074 ("drm/amdgpu: Block MMR_READ IOCTL in reset")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 361b6e6b303d4b691f6c5974d3eaab67ca6dd90e)

drm/amd/pm: fix missing fine-grained dpm table flag on aldebaran

Add the missing SMU_DPM_TABLE_FINE_GRAINED flag to aldebaran DPM table.
This fixes the pp_dpm_sclk node issue caused by missing flag configuration.

Fixes: 7ea1c722fe1d ("drm/amd/pm: Use common helper for aldebaran dpm table")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3427dea3a48ebddb491a26093f3627384b3cb2c2)

drm/amdgpu/gmc: Fix AMDGPU_GART_PLACEMENT_LOW to not overlap with VRAM

When the GART placement is set to AMDGPU_GART_PLACEMENT_LOW:
Make sure that GART does not overlap with VRAM when
VRAM is configured to be in the low address space.

Solve this according to the following logic:
- When GART fits before VRAM, use zero address for GART
- Otherwise, put GART after the end of VRAM, aligned to 4 GiB

Previously, I had assumed this was not possible
so it was OK to not handle it, but now we got a report
from a user who has a board that is configured this way.

Fixes: 917f91d8d8e8 ("drm/amdgpu/gmc: add a way to force a particular placement for GART")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 3d9de5d86a1658cadb311461b001eb1df67263ad)

amdkfd: Only ignore -ENOENT for KFD init failuires

When compiled without CONFIG_HSA_AMD KFD will return -ENOENT.
As other errors will cause KFD functionality issues this is the
only error code that should be ignored at init.

Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 4259a25341abf77939767215706f4e3cfd4b73b8)

drm/amdgpu: avoid double drm_exec_fini() in userq validate

When new_addition is true, amdgpu_userq_vm_validate() calls
drm_exec_fini(&exec) before iterating over the collected HMM ranges and
calling amdgpu_ttm_tt_get_user_pages().

If amdgpu_ttm_tt_get_user_pages() fails in that path, the code jumps to
unlock_all and calls drm_exec_fini(&exec) a second time on the same
exec object. drm_exec_fini() is not idempotent: it frees exec->objects
and may also drop exec->contended and finalize the ww acquire context.

Route that error path directly to the range cleanup once exec has
already been finalized.

Fixes: 42f148788469 ("drm/amdgpu/userqueue: validate userptrs for userqueues")
Issue found using a prototype static analysis tool
and confirmed by code review.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Hongyan Xu <getshell@seu.edu.cn>
Signed-off-by: Slavin Liu <220245772@seu.edu.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 2802952e4a07306da6ebe813ff1acacc5691851a)

drm/amd/display: Restore analog connector support

[Why]
The analog connector support was accidentally removed,
causing a crash when connecting an analog monitor.

[How]
This patch restores the functions and pointers required for proper analog
and DP bridge encoder support on legacy GPUs.

V2: Restore the external encoder control functions.

V3:
- Restore BIOS parser external encoder DAC load detection
- Restore stream initialization and source selection changes

Fixes: e56e3cff2a1b ("drm/amd/display: Sync dcn42 with DC 3.2.373")
Cc: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit cea8349e4494d2892ea57eef3fe4a8987464a876)

drm/amdgpu: fix zero-size GDS range init on RDNA4

RDNA4 (GFX 12) hardware removes the GDS, GWS, and OA on-chip memory
resources. The gfx_v12_0 initialisation code correctly leaves
adev->gds.gds_size, adev->gds.gws_size, and adev->gds.oa_size at
zero to reflect this.

amdgpu_ttm_init() unconditionally calls amdgpu_ttm_init_on_chip() for
each of these resources regardless of size. When the size is zero,
amdgpu_ttm_init_on_chip() forwards the call to ttm_range_man_init(),
which calls drm_mm_init(mm, 0, 0). drm_mm_init() immediately fires
DRM_MM_BUG_ON(start + size <= start) -- trivially true when size is
zero -- crashing the kernel during modprobe of amdgpu on an RX 9070 XT.

Guard against this by returning 0 early from
amdgpu_ttm_init_on_chip() when size_in_page is zero. This skips TTM
resource manager registration for hardware resources that are absent,
without affecting any other GPU type.

DRM_MM_BUG_ON() only asserts if CONFIG_DRM_DEBUG_MM is enabled in
the kernel config. This is apparently rarely enabled as these chips
have been in the market for over a year and this issue was only reported
now.

Link: https://lore.kernel.org/all/bug-221376-2300@https.bugzilla.kernel.org%2F/
Link: https://bugzilla.kernel.org/show_bug.cgi?id=221376
Oops-Analysis: http://oops.fenrus.org/reports/bugzilla.korg/221376/report.html
Assisted-by: GitHub Copilot:Claude Sonnet 4.6 linux-kernel-oops-x86.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: "Christian König" <christian.koenig@amd.com>
Cc: amd-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 5719ce5865279cad4fd5f01011fe037168503f2d)
Cc: stable@vger.kernel.org

Merge tag 'amd-drm-fixes-7.1-2026-04-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-fixes-7.1-2026-04-23:

amdgpu:
- DC idle state manager fix
- ASPM fix
- GPUVM SVM fix
- DCE 6 fix

amdkfd:
- num_of_nodes bounds check fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260423170129.2345978-1-alexander.deucher@amd.com

Merge tag 'drm-misc-next-fixes-2026-04-23' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

Short summary of fixes pull:

rcar-du:
- fix NULL-ptr crash

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260423130852.GA114622@linux.fritz.box

drm/amdkfd: Add upper bound check for num_of_nodes

drm/amdkfd: Add upper bound check for num_of_nodes
in kfd_ioctl_get_process_apertures_new.

Reviewed-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alysa Liu <Alysa.Liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 98ff46a5ea090c14d2cdb4f5b993b05d74f3949f)
Cc: stable@vger.kernel.org

drm: rcar-du: Fix crash when no CMM is available

Commit 3bce3fdd1ff2 ("drm: rcar-du: Don't leak device_link to CMM")
refactored CMM handling, and introduced an incorrect test for CMM
availability. When no CMM is present, the rcrtc->cmm field is NULL,
testing rcrtc->cmm->dev causes a NULL pointer dereference. This slipped
through testing as all tests were run with the CMM present.

Fix this issue by correctly testing for rcrtc->cmm.

Fixes: 3bce3fdd1ff2 ("drm: rcar-du: Don't leak device_link to CMM")
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Closes: https://lore.kernel.org/dri-devel/CAMuHMdXomz9GFDqkBjGX9Sda_GLccPcrihvFbOz0GAitDVNTbw@mail.gmail.com
Signed-off-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Link: https://patch.msgid.link/20260408124205.1962448-1-laurent.pinchart+renesas@ideasonboard.com
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
(cherry picked from commit 3e9a1da270ddff449b1ad9eadc958f43bc204bd2)
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>

Merge tag 'drm-intel-next-fixes-2026-04-22' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

- Fix uninitialized variable in the alignment loop [psr] (Jouni Högander)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patch.msgid.link/aeh-dKTow5Fl4Iv4@linux

Merge tag 'amd-drm-next-7.1-2026-04-17' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-7.1-2026-04-17:

amdgpu:
- SMU 14 fixes
- Partition fixes
- SMUIO 15.x fix
- SR-IOV fixes
- JPEG fix
- PSP 15.x fix
- NBIF fix
- Devcoredump fixes
- DPC fix
- RAS fixes
- Aldebaran smu fix
- IP discovery fix
- SDMA 7.1 fix
- Runtime pm fix
- MES 12.1 fix
- DML2 fixes
- DCN 4.2 fixes
- YCbCr fixes
- Freesync fixes
- ISM fixes
- Overlay cursor fix
- DC FP fixes
- UserQ locking fixes

amdkfd:
- Fix memory clear handling

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260417225351.8714-1-alexander.deucher@amd.com

drm/amd/display: Disable 10-bit truncation and dithering on DCE 6.x

DCE 6.x doesn't support 10-bit truncation and 10-bit dithering
because the following fields are 1-bit only:
FMT_TEMPORAL_DITHER_DEPTH
FMT_SPATIAL_DITHER_DEPTH
FMT_TRUNCATE_DEPTH
Programming these fields to "2" will program them as if the
dithering option was 6-bit, resulting in sub-par picture
quality and an ugly "color banding" effect.

Note that a recent commit changed the default 10-bit dithering
option to DITHER_OPTION_SPATIAL10 which improves the picture
quality because it happens to look better, but is still not
actually supported by DCE 6.x versions.

When the color depth is 10-bit or more, just disable
any kind of dithering options on DCE 6.x.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5151
Fixes: 529cad0f945c ("drm/amd/display: Add function to set dither option")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6be8ced880dfe29ce38c2d5e74489822da5c250e)

drm/amdgpu: OR init_pte_flags into invalid leaf PTE updates

Invalid leaf clears that only set AMDGPU_PTE_EXECUTABLE match the old
GMC9 fault-priority workaround but omit adev->gmc.init_pte_flags.
On GFX12 that includes AMDGPU_PTE_IS_PTE; without it, some cleared
PTEs can fault as no-retry and bypass the SVM/XNACK handler when a
VA is reused after a BO unmap.

Apply init_pte_flags in amdgpu_vm_pte_update_flags() alongside
EXECUTABLE so range-driven clears (e.g. amdgpu_vm_clear_freed) match
amdgpu_vm_pt_clear() for leaf templates.

Signed-off-by: Siwei He <siwei.he@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 9d47b2c36b9a6c6b844c33cab407a5d7ad102234)

drm/amd: Adjust ASPM support quirk to cover more Intel hosts

Some of the same issues identified in commit c770ef19673fb
("drm/amd/amdgpu: disable ASPM in some situations") also affect
Tiger Lake systems with GFX11 connected over USB4. Widen the net
to also match these hosts.

Fixes: d9b3a066dfcd ("drm/amd: Exclude dGPUs in eGPU enclosures from DPM quirks")
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5145
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 0a214d888485b9f35fe03882a92962e6d5697849)

drm/amd/display: Undo accidental fix revert in amdgpu_dm_ism.c

[Why]

Pausing DPM power profiles during static screen caused a bunch of
audio/performance/clock issues that were addressed in this fix:
'commit 1412482b7143 ("Revert "drm/amd/display: pause the workload setting in dm"")'

This logic in function amdgpu_dm_crtc_vblank_control_worker() was moved
to amdgpu_dm_ism.c, but the fix was lost in the process.

[How]

Reapply the fix to amdgpu_dm_ism.c

Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit bc621e91d6fc004cfae9148c5a91acad19ada3e4)

drm/i915/psr: Init variable to avoid early exit from et alignment loop

Uninitialized boolean variable may cause unwanted exit from et alignment
loop. Fix this by initializing it as false.

Fixes: 1be2fca84f52 ("drm/i915/psr: Repeat Selective Update area alignment")
Cc: <stable@vger.kernel.org> # v6.9+
Signed-off-by: Jouni Högander <jouni.hogander@intel.com>
Reviewed-by: Nemesa Garg <nemesa.garg@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Link: https://patch.msgid.link/20260413112345.88853-1-jouni.hogander@intel.com
(cherry picked from commit 289678a90b8cf81e3514c9d6c667235cd39c7acf)
Signed-off-by: Tvrtko Ursulin <tursulin@ursulin.net>

drm/amdgpu: drop userq fence driver refs out of fence process()

amdgpu_userq_wait_ioctl() takes extra references on waited-on fence
drivers and stores them in waitq->fence_drv_xa. When a new userq fence is
created, those references are transferred into userq_fence->fence_drv_array
so they can be released when the fence completes.

However, those inherited references are currently only dropped from
amdgpu_userq_fence_driver_process(). If a fence never reaches that path,
such as it is already signaled when created, so we need to explicitly release
those fences in that case.

v2: use a list(list_cut_before) for managing the signal userq driver fences.(Christian)
Link: https://patchwork.freedesktop.org/patch/718078/?series=164763&rev=2
v3: Doesn't cache the userq first unsignaled fence and use the cut before list
head directly.(Christian)

Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: unpin and unref doorbell and wptr outside mutex

In amdgpu_userq_destroy once unmap_helpder is called within mutex
there is no need to hold mutex.

This helps in avoiding a deadlock between doorbell and wptr ww mutex
and we could unpin and unref these bos outside mutex safely.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: use pm_runtime_resume_and_get and fix err handling

Use pm_runtime_resume_and_get instead of pm_runtime_get_sync as it
return error but put the reference in the function itself.

In goto statements we need to drop the pm reference too.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: unmap_helper dont return the queue state

We check for return value of amdgpu_userq_unmap_helper and
compare it against the queue->state which is logically
wrong and we should just check for failure and do the needfull.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: unmap is to be called before freeing doorbell/wptr bo

Unmap the queue after freeing doorbell and wptr memory is completely
wrong. Any operation on the queue needs the doorbell and wptr to be
valid and hence fixing the ordering.

Also since we are using amdgpu_bo_reserve in non interruptrable mode
so there is no need to check for its return values.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: hold root bo lock in caller of input_va_validate

Caller should hold the reservation lock for root.bo in func
amdgpu_userq_input_va_validate.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: caller to take reserv lock for vas_list_cleanup

In function amdgpu_userq_buffer_vas_list_cleanup, remove the
reservation lock for vm and caller should make sure it's taken
before locking userq_mutex.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: create_mqd does not need userq_mutex

Reshuffle the code to run create_mqd outside the mutex.
code here is mostly setting up software structure init
before actually registering the userqueue in the xa and
to the driver.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: dont lock root bo with userq_mutex held

Do not hold reservation lock for root bo if userq_mutex
is already held in the call flow this cause a lock
issue with ttm_bo_delayed_delete.

Its better to lock the vm->root.bo first and then go ahead
with userq_mutex so userq_mutex threads dont get stuck until
the reservation lock is held.

In this case it helps in the function amdgpu_userq_buffer_vas_mapped
for each queue during restore_all.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: fix kerneldoc for amdgpu_userq_ensure_ev_fence

Move the comment for the caller to the definition for
amdgpu_userq_ensure_ev_fence in kerneldoc format.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: clean the VA mapping list for failed queue creation

If the queue creation failed during mapping of the important VA's
like queue_va, rptr_va and wptr_va. These needs to be cleaned
as queue destroy will not be called for such queues as user never
get call to creation failure.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: avoid uneccessary locking in amdgpu_userq_create

Reorganise code to avoid holding mutex userq_mutex while
also trying to grab exec lock ww_mutex where its not needed
for function amdgpu_userq_input_va_validate

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix ISM teardown crash from NULL dc dereference

The Idle State Manager (ISM) uses delayed work to apply display idle
optimizations later, instead of immediately. This helps avoid rapid idle
transitions that can hurt power or performance.

A crash was seen during driver teardown. The system boots normally and
the driver loads successfully. Later, when the GPU is being stopped, the
log shows:

  amdgpu 0000:0e:00.0: finishing device.
  Workqueue: events_unbound dm_ism_sso_delayed_work_func [amdgpu]

After this, delayed ISM work still runs and reaches:

  dm_ism_sso_delayed_work_func()
    -> amdgpu_dm_ism_commit_event()
    -> dm_ism_commit_idle_optimization_state()
    -> dc_allow_idle_optimizations_internal()

The crash report showed:
  KASAN: null-ptr-deref in range [0x690-0x697]

Signature:
[22601.113316] KASAN: null-ptr-deref in range [0x0000000000000690-0x0000000000000697]
...
[22601.113368] Workqueue: events_unbound dm_ism_sso_delayed_work_func [amdgpu]
[22601.113930] RIP: 0010:dc_allow_idle_optimizations_internal+0xa6/0xc40 [amdgpu]
...
[22601.114491] RDX: dffffc0000000000 RSI: 0000000000000000 RDI: 0000000000000690
...
[22601.114561] Call Trace:
[22601.114566]  <TASK>
[22601.114572]  ? srso_alias_return_thunk+0x5/0xfbef5
[22601.114582]  ? update_load_avg+0x1b6/0x20b0
[22601.114593]  ? __pfx_dc_allow_idle_optimizations_internal+0x10/0x10 [amdgpu]
[22601.114932]  ? psi_group_change+0x4ed/0x8d0
[22601.114942]  dm_ism_commit_idle_optimization_state+0x214/0x570 [amdgpu]
[22601.115268]  amdgpu_dm_ism_commit_event+0xe1d/0x15a0 [amdgpu]
[22601.115588]  ? srso_alias_return_thunk+0x5/0xfbef5
[22601.115595]  ? __kasan_check_write+0x18/0x20
[22601.115603]  ? srso_alias_return_thunk+0x5/0xfbef5
[22601.115610]  ? mutex_lock+0x83/0xc0
[22601.115620]  dm_ism_sso_delayed_work_func+0x64/0x90 [amdgpu]

GDB resolved dc_allow_idle_optimizations_internal+0xa6 to:

  struct dc_state *context = dc->current_state;

The matching disassembly showed:

  mov %rdi, %r12
  mov 0x690(%r12), %r13

where r12 holds the dc pointer. A GDB layout dump of struct dc showed:

  /* 1680 | 8 */ struct dc_state *current_state;

Since 1680 decimal is 0x690, this confirms that current_state is at
offset 0x690. The faulting access was effectively:

  dc + 0x690

which indicates that dc was NULL at the time of dereference.

This shows that ISM work can still run during teardown after dc has
been cleared.

ISM is not expected to run after dc is destroyed. Fix this by disabling
ISM under dc_lock in amdgpu_dm_fini() before dc_destroy(), ensuring no
further ISM work runs after dc teardown.

Also add ASSERT(dm->dc) in amdgpu_dm_ism_commit_event() to enforce this
invariant, and ASSERT(mutex_is_locked(&dm->dc_lock)) in
amdgpu_dm_ism_disable() to clarify the locking requirement.

Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Suggested-by: Leo Li <sunpeng.li@amd.com>
Cc: Ray Wu <ray.wu@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Move dml2_destroy to non-FPU compilation unit

On PREEMPT_RT kernels, vfree() can sleep because spin_lock is
converted to rt_mutex. dml2_destroy() calls vfree() while inside
an FPU-guarded region (preempt_count=2), which is illegal.

dml2_wrapper_fpu.c is compiled with CC_FLAGS_FPU which defines
_LINUX_FPU_COMPILATION_UNIT, making DC_RUN_WITH_PREEMPTION_ENABLED()
resolve to a no-op. This prevents the macro from cycling FPU
context off/on around vfree().

Move dml2_destroy() to dml2_wrapper.c (non-FPU compilation unit)
where DC_RUN_WITH_PREEMPTION_ENABLED() properly cycles DC_FP_END/
DC_FP_START around vfree(). This pairs it with dml2_allocate_memory()
which already lives there.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Rafal Ostrowski <rafal.ostrowski@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix fpu guard warning

[Why]
Due to improper fpu guarding, we encounter this warning during boot up:

[   10.027021] WARNING: drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/dc_fpu.c:58 at dc_assert_fp_enabled+0x12/0x20 [amdgpu], CPU#8: (udev-worker)/469
[   10.027644] Modules linked in: binfmt_misc snd_ctl_led nls_iso8859_1 intel_rapl_msr amd_atl intel_rapl_common amdgpu(+) snd_acp_legacy_mach snd_acp_mach snd_soc_nau8821 snd_acp3x_pdm_dma snd_acp3x_rn snd_soc_dmic snd_sof_amd_acp63 snd_sof_amd_vangogh snd_sof_amd_rembrandt snd_sof_amd_renoir snd_sof_amd_acp snd_sof_pci snd_hda_codec_alc269 snd_sof_xtensa_dsp snd_hda_scodec_component snd_hda_codec_realtek_lib snd_sof snd_hda_codec_generic snd_sof_utils snd_pci_ps snd_soc_acpi_amd_match snd_amd_sdw_acpi soundwire_amd snd_hda_codec_atihdmi soundwire_generic_allocation snd_hda_codec_hdmi soundwire_bus snd_soc_sdca edac_mce_amd snd_hda_intel snd_soc_core snd_hda_codec kvm_amd snd_compress snd_hda_core ac97_bus ee1004 amdxcp snd_pcm_dmaengine snd_intel_dspcfg snd_intel_sdw_acpi kvm drm_panel_backlight_quirks snd_rpl_pci_acp6x gpu_sched snd_hwdep snd_acp_pci irqbypass snd_amd_acpi_mach drm_buddy snd_acp_legacy_common snd_seq_midi ghash_clmulni_intel drm_ttm_helper aesni_intel snd_seq_midi_event snd_pci_acp6x joydev rapl
[   10.027750]  snd_pcm snd_rawmidi ttm snd_seq snd_pci_acp5x drm_exec drm_suballoc_helper snd_seq_device wmi_bmof snd_rn_pci_acp3x drm_display_helper snd_timer snd_acp_config cec snd_soc_acpi snd rc_core i2c_piix4 ccp snd_pci_acp3x i2c_smbus soundcore k10temp i2c_algo_bit spi_amd cdc_mbim input_leds cdc_wdm mac_hid sch_fq_codel msr parport_pc ppdev lp parport efi_pstore nfnetlink dmi_sysfs autofs4 cdc_ncm cdc_ether usbnet mii hid_logitech_hidpp hid_logitech_dj hid_generic nvme nvme_core ahci serio_raw nvme_keyring usbhid ucsi_acpi amd_xgbe nvme_auth libahci hkdf typec_ucsi video typec wmi i2c_hid_acpi i2c_hid hid
[   10.027853] CPU: 8 UID: 0 PID: 469 Comm: (udev-worker) Not tainted 6.19.0asdn-260408-asdn #1 PREEMPT(voluntary)
[   10.027858] Hardware name: AMD Crater-RN/Crater-RN, BIOS TCR1004A 03/12/2024
[   10.027861] RIP: 0010:dc_assert_fp_enabled+0x12/0x20 [amdgpu]
[   10.028416] Code: 00 00 00 00 00 0f 1f 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 65 8b 05 39 79 cc c4 85 c0 7e 07 31 c0 e9 9e 75 2a c3 <0f> 0b 31 c0 e9 95 75 2a c3 0f 1f 44 00 00 90 90 90 90 90 90 90 90
[   10.028420] RSP: 0018:ffffcca10188b348 EFLAGS: 00010246
[   10.028425] RAX: 0000000000000000 RBX: ffff88c6077f8000 RCX: 0000000000000000
[   10.028428] RDX: ffff88c607d0e400 RSI: ffffffffc204d860 RDI: ffff88c624c00000
[   10.028430] RBP: ffffcca10188b3e8 R08: ffff88c624c35c88 R09: 0000000000000000
[   10.028433] R10: 0000000000000000 R11: 0000000000000000 R12: ffffcca10188b548
[   10.028435] R13: ffff88c60be5bd00 R14: ffffffffc204d860 R15: ffff88c624c00000
[   10.028438] FS:  00007c80c2432980(0000) GS:ffff88cdc7464000(0000) knlGS:0000000000000000
[   10.028441] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   10.028443] CR2: 00007866ae013da8 CR3: 000000010a511000 CR4: 0000000000350ef0
[   10.028446] Call Trace:
[   10.028449]  <TASK>
[   10.028452]  ? dcn21_update_bw_bounding_box+0x38/0xb30 [amdgpu]
[   10.028991]  ? srso_return_thunk+0x5/0x5f
[   10.029001]  dc_create+0x37c/0x730 [amdgpu]
[   10.029505]  ? srso_return_thunk+0x5/0x5f
[   10.029512]  amdgpu_dm_init+0x374/0x2ff0 [amdgpu]
[   10.030053]  ? srso_return_thunk+0x5/0x5f
[   10.030057]  ? __irq_work_queue_local+0x61/0xe0
[   10.030063]  ? srso_return_thunk+0x5/0x5f
[   10.030067]  ? irq_work_queue+0x2f/0x70
[   10.030071]  ? srso_return_thunk+0x5/0x5f
[   10.030075]  ? __wake_up_klogd+0x75/0xa0
[   10.030081]  ? srso_return_thunk+0x5/0x5f
[   10.030085]  ? vprintk_emit+0x35b/0x3f0
[   10.030102]  dm_hw_init+0x1c/0x110 [amdgpu]
[   10.030625]  amdgpu_device_init+0x23e8/0x3210 [amdgpu]
[   10.031041]  ? pci_read+0x55/0x90
[   10.031047]  ? srso_return_thunk+0x5/0x5f
[   10.031051]  ? pci_read_config_word+0x27/0x50
[   10.031057]  ? srso_return_thunk+0x5/0x5f
[   10.031061]  ? do_pci_enable_device+0x155/0x180
[   10.031068]  amdgpu_driver_load_kms+0x1a/0xd0 [amdgpu]
[   10.031486]  amdgpu_pci_probe+0x28c/0x6f0 [amdgpu]
[   10.031902]  local_pci_probe+0x47/0xb0
[   10.031908]  pci_device_probe+0xf3/0x270
[   10.031914]  really_probe+0xf1/0x410
[   10.031920]  __driver_probe_device+0x8c/0x190
[   10.031924]  driver_probe_device+0x24/0xd0
[   10.031928]  __driver_attach+0x10b/0x240
[   10.031932]  ? __pfx___driver_attach+0x10/0x10
[   10.031936]  bus_for_each_dev+0x8c/0xf0
[   10.031942]  driver_attach+0x1e/0x30
[   10.031947]  bus_add_driver+0x160/0x2a0
[   10.031952]  driver_register+0x5e/0x130
[   10.031957]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
[   10.032361]  __pci_register_driver+0x5e/0x70
[   10.032366]  amdgpu_init+0x5d/0xff0 [amdgpu]
[   10.032768]  ? srso_return_thunk+0x5/0x5f
[   10.032773]  do_one_initcall+0x5d/0x340
[   10.032783]  do_init_module+0x97/0x2c0
[   10.032788]  load_module+0x2b49/0x2c30
[   10.032800]  init_module_from_file+0xf4/0x120
[   10.032804]  ? init_module_from_file+0xf4/0x120
[   10.032813]  idempotent_init_module+0x10f/0x300
[   10.032820]  __x64_sys_finit_module+0x73/0xf0
[   10.032824]  ? srso_return_thunk+0x5/0x5f
[   10.032829]  x64_sys_call+0x1d68/0x26b0
[   10.032834]  do_syscall_64+0x81/0x500
[   10.032839]  ? srso_return_thunk+0x5/0x5f
[   10.032843]  ? do_syscall_64+0x2e5/0x500
[   10.032848]  ? srso_return_thunk+0x5/0x5f
[   10.032852]  ? native_flush_tlb_global+0x95/0xb0
[   10.032860]  ? srso_return_thunk+0x5/0x5f
[   10.032864]  ? __flush_tlb_all+0x13/0x60
[   10.032870]  ? srso_return_thunk+0x5/0x5f
[   10.032874]  ? do_flush_tlb_all+0xe/0x20
[   10.032879]  ? srso_return_thunk+0x5/0x5f
[   10.032882]  ? __flush_smp_call_function_queue+0x9c/0x430
[   10.032888]  ? srso_return_thunk+0x5/0x5f
[   10.032897]  ? irqentry_exit+0xb2/0x740
[   10.032901]  ? srso_return_thunk+0x5/0x5f
[   10.032906]  ? srso_return_thunk+0x5/0x5f
[   10.032911]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[   10.032915] RIP: 0033:0x7c80c1d3490d
[   10.032920] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d3 f4 0f 00 f7 d8 64 89 01 48
[   10.032923] RSP: 002b:00007fff3a12fe28 EFLAGS: 00000246 ORIG_RAX: 0000000000000139
[   10.032928] RAX: ffffffffffffffda RBX: 00005c44096804f0 RCX: 00007c80c1d3490d
[   10.032930] RDX: 0000000000000000 RSI: 00005c4409681690 RDI: 000000000000002b
[   10.032933] RBP: 00007fff3a12fec0 R08: 0000000000000000 R09: 00005c4409681790
[   10.032935] R10: 0000000000000000 R11: 0000000000000246 R12: 00005c4409681690
[   10.032937] R13: 0000000000020000 R14: 00005c44094ff7f0 R15: 00005c4409681690
[   10.032945]  </TASK>
[   10.032948] ---[ end trace 0000000000000000 ]---

[How]
Add wrapper function to guard fpu properly for dcn21/dcn31/dcn315/dcn316.

Fixes: 3539437f354b ("drm/amd/display: Move FPU Guards From DML To DC - Part 1")
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Reviewed-by: Rafal Ostrowski <rafal.ostrowski@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Clear cached EDID pointer after drm_edid_free()

The driver stores EDID in amdgpu_connector->edid and uses it as a cache.

amdgpu_connector_get_edid() checks this pointer. If it is not NULL, it
assumes EDID is already present and does not read it again.

In some detect paths, the driver frees the EDID using drm_edid_free(),
but does not set the pointer to NULL. Because of this, the pointer still
looks valid even though the memory is already freed.

Later, when amdgpu_connector_get_edid() is called, it returns early and
does not read a new EDID. This can lead to using a freed pointer.

Fix this by setting amdgpu_connector->edid = NULL after drm_edid_free().

This makes sure the driver reads a fresh EDID and does not use invalid
memory.

Fixes: 71036457ad85 ("drm/amdgpu/amdgpu_connectors: remove amdgpu_connector_free_edid")
Reported-by: Dan Carpenter <error27@gmail.com>
Cc: Joshua Peisach <jpeisach@ubuntu.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Joshua Peisach <jpeisach@ubuntu.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.378

DC v3.2.378 summary:

New:
   - Add p-state schedule admissibility flags and frame-time utility

Fixes:
   - Fixed incorrect math_mod() result due to wrong variable in fmod implementation (Cc: stable)
   - Use overlay cursor when a color pipeline is active to avoid incorrect rendering
Cleanups:
   - Add const qualifiers to watermark params struct
   - Fix narrowing-conversion compiler warnings

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: add pstate schedule admissibility flags and frame-time utility

[Why]
Core needs to track pstate schedule admissibility for different global
change scenarios (fclk, temp read, PPT) and requires a reusable way to compute
per-stream frame time from timing parameters.

[How]
Extend dml2_core_internal_mode_support_info with:
fclk_pstate_schedule_admissible
temp_read_pstate_schedule_admissible
ppt_pstate_schedule_admissible
Add dummy_double_array[3][DML2_MAX_PLANES] to
dml2_core_calcs_mode_support_locals.
Introduce dml2_core_utils_get_frame_time_us() in dml2_core_utils.c and export
it in dml2_core_utils.h to compute frame time in microseconds from stream
timing (vline time * (vactive + vblank)).

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: add const qualifiers to watermark params struct

[why]
There are few non const input pointer fields. Setting them to const to
prevent future modification of read-only data.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: fix math_mod() using arg1 instead of arg2

[Why]
math_mod() multiplied by arg1 instead of arg2, returning a wrong
result for any non-trivial modulo operation.

[How]
Replace arg1 with arg2 in the subtraction term to correctly
implement fmod(arg1, arg2).

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Use overlay cursor when color pipeline is active

Force overlay cursor mode when an underlying plane has a non-bypassed
color pipeline to avoid incorrect cursor transformation.

Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix compiler warnings

[Why] Implicit conversions from wider integer types to byte-sized fields
were generating compiler warnings. These warnings hide intentional protocol
/storage boundaries and reduce signal quality during builds. Making
conversion intent explicit improves readability and warning hygiene
without changing behavior.

[How] Added explicit, type-safe casts at intentional narrow-storage
boundaries. Kept data models & runtime logic unchanged, only clarifying
conversion intent.

Functionality and behavior is unchanged; only type intent is explicit.
Aligned warning cleanup with existing coding standards for explicit
boundary conversions.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: fix NULL ptr deref in ISM delayed work

dc_destroy() sets dm->dc to NULL before amdgpu_dm_ism_fini() is called,
leaving a window where in-flight ISM delayed work dereferences the stale
pointer. Call amdgpu_dm_ism_fini() in amdgpu_dm_fini() before dc_destroy().

Fixes: 754003486c3c ("drm/amd/display: Add Idle state manager(ISM)")
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add missing do_mccs parameter description

Add missing description for do_mccs parameter in
amdgpu_dm_update_freesync_caps.

Fixes the below with gcc W=1:
../display/amdgpu_dm/amdgpu_dm.c:13269 function parameter 'do_mccs' not described in 'amdgpu_dm_update_freesync_caps'

Fixes: 8dc88c6a5948 ("drm/amd/display: Avoid to do MCCS transaction if unnecessary")
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Wayne Lin <Wayne.Lin@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove redundant includes from DC

[Why]
The explicit include of linux/array_size.h in Display Core (DC) is
redundant. The ARRAY_SIZE macro is already provided by dm_services.h
(via os_types.h) which DC includes.

[How]
Remove the unnecessary #include <linux/array_size.h> from
dc_hw_sequencer.c and dce_clock_source.c.

Fixes: 2d2366176445 ("drm/amd/display: Replace inline NUM_ELEMENTS macro with ARRAY_SIZE")
CC: Linus Probert <linus.probert@gmail.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.377

This version brings along the following updates:

- Enable sink freesync via MCCS with pcon whitelist adjustments
- Rework YCbCr422 DSC policy
- Update DML2.1 parameters
- Fix coding style issues and compiler warnings

Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix coding style issue

[Why & How]
Function logic should put after variable declare section, so let's move it.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Chuanyu Tseng <Chuanyu.Tseng@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove Duplicate Prefetch Parameter

[Why/How]
UrgLatency value is passed in twice to the prefetch calculations.
Once through the UrgentLatency term and once through the Turg term.
Only Turg is used in the prefetch calculation so remove the unused UrgentLatency parameter

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Zheng, Austin <Austin.Zheng@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add DCN42 PMO policy for DML2.1

[Why]
The MinTTU policy in DML2.1 does not guarantee that we support p-state
in blank. This is a delta vs dml2 and earlier revisions as the prefetch
mode override has been removed in favor of a more configurable pstate
optimizer.

[How]
Split off DCN42 with its own PMO helpers so that we can use a simpler
strategy of only allowing the mode if we support p-state in vblank and
if vactive has enough latency hiding.

The actual hookup to use these helpers in the PMO factory will be
done in a later patch to satisfy build system requirements.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: move memory latency update to dml for dcn42

Memory latencies are soc specific and should be part of dml soc
bounding box. This change removes them from clk_mgr and has
latency update happen based on memory type when dml socbb is being
updated.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix implicit narrowing conversions in modules

[Why]: Implicit narrowing of wider integer types (unsigned int, uint64_t)
into narrower fields (uint8_t, uint16_t, unsigned short) has potential
truncation issues.

[How]: For each warning site, added ASSERT(<value> <= 0xFFFF/0xFF) for
debug-mode bounds verification followed by an explicit cast. Typed
intermediate variables introduced where needed for clarity.

No functional change intended.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: update dcn42 memory latencies

Add latency update based on memory type to dml2.1

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix DCN42 gpuvm_min_page_size_kbytes in SOC BB

[Why & How]
To match the HW specification this should be 4, not 256.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Pass min page size from SOC BB to dml2_1 plane config

[Why]
Like dml2_0 this isn't guaranteed to be constant for every ASIC.

This can cause corruption or underflow for linear surfaces due to a
wrong PTE_ROW_HEIGHT_LINEAR value if not correctly specified.

[How]
Like dml2_0 pass in the SOC bb into the plane configuration population
functions.

Set both GPUVM and HostVM page sizes in the overrides.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Correct MALL parameters for DCN42 soc bb

[Why & How]
The MALL and DCC parameters were copied and pasted from a previous ASIC
but the correct value per HW specification should all be 0.

If not correct this can impact urgent bandwidth calculation and PMO.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix HostVMMinPageSize unit mismatch in DML2.1

[Why]
This was found back on DML2 but was missed when creating DML2.1.

The bottom layer calculation (CalculateHostVMDynamicLevels) expects
a value in bytes, not KB, but we pass in the value in KB (eg. 4).

This causes an extra page table level to be required in the prefetch
bytes which can be significant overhead - preventing some modes
from being supported that should otherwise be.

[How]
Correct the units by multiplying the input and override values by 1024.

Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Avoid to do MCCS transaction if unnecessary

We don't have to do MCCS/DDCCI transactions with sink side every time by calling
get_modes(). Limit it to be operated when hotplug occurs.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Enable sink freesync via MCCS

If sink like HDMI indicates supporting freesync via MCCS,
explicitly to send vcp set command on sink to enable freesync.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Read sink freesync support via mccs

If EDID AMD VSDB declares that sink supports MCCS method for freesync
usage, send mccs request to understand sink freesync current supporting
state.

If sink supports freesync but user toggles OSD to turn off it, disable
freesync.

If HDMI sink doesn't support MCCS method for freesync usage, disable
freesync as well.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Parse freesync mccs vcp code

[Why & How]
DMUB supports to parse freesynce mccs vcp code now. Store it for
later freesync mccs manipulation.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Adjust freesync pcon whitelist

Add more freesync supported pcon ID into the whitelist.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove unnecessary Freesync w/a from DCN32

[Why/How]
A workaround was previously used for certain Freesync cases that would
override the vstartup_start value from DML to position the SDP
correctly. This is no longer needed in DCN32 and above, so remove the
workaround.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Rework YCbCr422 DSC policy

- Reworked YCbCr4:2:2 Native/Simple policy decision making with DSC
enabled based on DSC caps and stream signal type

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Relja Vojvodic <Relja.Vojvodic@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: update dcn42 bounding box

[why]
update according hw spec.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Drop unused tiling formats from dml2

Remove unused legacy tiling format support from dml2.
Legacy asics don't use dml2.

Fixes: e56e3cff2a1b ("drm/amd/display: Sync dcn42 with DC 3.2.373")
Reviewed-by: Leo Li <sunpeng.li@amd.com>
Signed-off-by: Roman Li <Roman.Li@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix unused parameters warnings in dml2_0

[Why] Resolve warnings by marking unused parameters explicitly.

[How] Keep parameter names in signatures and add a line with
'(void)param;' inside the function body

Preserved function signatures and avoids breaking code paths that
may reference the parameter under conditional compilation.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Reviewed-by: Clayton King <clayton.king@amd.com>
Signed-off-by: Gaghik Khachatrian <gaghik.khachatrian@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mes_v12_1: Fix iterator reuse in mes_v12_1_test_ring()

This code waits for the MES self-test to complete by repeatedly checking
a register or memory value until it becomes valid or a timeout occurs.
The fix ensures the timeout counter works correctly by not reusing the
same variable inside another loop.

mes_v12_1_test_ring() uses 'i' as the outer timeout loop counter, but
reuses the same variable for the inner XCC scan in cooperative mode.

This makes the timeout counter ambiguous and can lead to incorrect
timeout handling. It also triggers a Smatch warning about reusing the
outer loop iterator.

Fix this by introducing a separate iterator for the inner XCC loop so
that 'i' continues to represent only the timeout wait duration.

drivers/gpu/drm/amd/amdgpu/mes_v12_1.c:2080 mes_v12_1_test_ring()
warn: reusing outside iterator: 'i'

drivers/gpu/drm/amd/amdgpu/mes_v12_1.c
    2069         atomic64_set((atomic64_t *)wptr_cpu_addr, wptr);
    2070         WDOORBELL64(doorbell_idx, wptr);
    2071
    2072         for (i = 0; i < adev->usec_timeout; i++) {

i is counting usec

    2073                 if (queue_type == AMDGPU_RING_TYPE_SDMA) {
    2074                         tmp = le32_to_cpu(*cpu_ptr);
    2075                 } else {
    2076                         if (!adev->mes.enable_coop_mode) {
    2077                                 tmp = RREG32_SOC15(GC, GET_INST(GC, xcc_id),
    2078                                                    regSCRATCH_REG0);
    2079                         } else {
--> 2080                                 for (i = 0; i < num_xcc; i++) {

and then re-used to count something else

Fixes: 44e5195fa3d4 ("drm/amdgpu/mes_v12_1: add mes self test")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Jack Xiao <Jack.Xiao@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Jack Xiao <Jack.Xiao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: add od table upload error message parsing for smu v14.0.x

parse and print detailed reasons for od table upload failures to
help users understand error causes.

example:
$ echo "0 30 40" | sudo tee fan_curve
$ echo "1 40 30" | sudo tee fan_curve
$ echo "c" | sudo tee fan_curve

kernel log:
[   75.040174] amdgpu 0000:0a:00.0: Failed to upload overdrive table, ret:-5
[   75.040178] amdgpu 0000:0a:00.0: Invalid overdrive table content: OD_FAN_CURVE_PWM_ERROR (13)
[   75.040181] amdgpu 0000:0a:00.0: Failed to upload overdrive table!

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: add read arg support to smu_cmn_update_table

Extend the smu_cmn_update_table function to support reading a 32-bit return
argument from the SMU firmware during table transfer operations.

- Rename the original function to smu_cmn_update_table_read_arg
- Add a uint32_t *read_arg output parameter to capture firmware response
- Pass the read_arg pointer to the SMU message command
- Keep full backward compatibility using a macro wrapper for the old API

This allows the driver to retrieve status codes, results, or configuration
feedback from the SMU firmware after table data transfer.

No functional changes for existing users of the original smu_cmn_update_table()
API.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: fix runtime PM imbalance issue in amdgpu_pm.c

Fix runtime PM counter imbalance to prevent device from failing to enter low power state

Fixes: a50d32c41fb2 ("drm/amd/pm: Deprecate print_clock_levels interface")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/sdma7.1: add support for disable_kq

Plumb in support for disabling kernel queues and make it
the default. For testing, kernel queues can be re-enabled
by setting amdgpu.user_queue=0. Kernel queues are still
created for use by the kernel driver for memory management,
etc., just not user submissions.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix IP discovery v0 handling

Cyan skillfish uses IP discovery v0. This was broken when the
IP discovery was refactored for newer versions.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/5189
Fixes: d0c647a6aae2 ("drm/amdgpu/discovery: support new discovery binary header")
Signed-off-by: filippor <filippo.rossoni@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Fix mode2 reset ACK handling on aldebaran v2

aldebaran_mode2_reset() sends a mode2 reset message and waits for
an acknowledgment from the SMU.

The current ACK handling is incorrect.

The wait loop runs only when ret is -ETIME. But after a successful
async send, ret is 0. Because of this, the loop is skipped and the
code does not wait for the reset acknowledgment.

Also, the code checks for ret != 1 after calling
smu_msg_wait_response(). However, smu_msg_wait_response() returns
0 on success and negative error codes on failure. So checking
against 1 is wrong.

Return -EOPNOTSUPP when the firmware does not support this reset
message.

Fix this by setting ret to -ETIME before entering the wait loop,
checking for ret != 0 after getting the SMU response, and returning
-EOPNOTSUPP when the firmware does not support the message.

v2:
- Update ACK check to use ret != 0 instead of ret != 1, since
smu_msg_wait_response() returns 0 on success (Feifei)
- Remove unnecessary handling for ret == 0

Fixes: e42569d02acb ("drm/amd/pm: Modify mode2 msg sequence on aldebaran")
Reported-by: Dan Carpenter <error27@gmail.com>
Cc: Feifei Xu <Feifei.Xu@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: smu7: Remove stale error check in smu7_hwmgr_backend_init

smu7_hwmgr_backend_init() is responsible for initializing the SMU7 power
management backend. It allocates and sets up the backend structure,
initializes voltage tables, configures dependency tables, and prepares
platform-specific power and clock parameters.

The function follows a typical pattern where each initialization step
returns a status in "result", and failures are handled via a common
"goto fail" path that performs cleanup.

Commit 2c21648bb814 ("drm/amd/pm/smu7: Remove non-functional SMU7
voltage dependency on DAL") removed a function call in this
initialization sequence, but left behind the corresponding error check.

As a result, "result" is checked twice without being updated in between:

    result = smu7_init_voltage_dependency_on_display_clock_table(hwmgr);
    if (result)
        goto fail;

    ...

    if (result)
        goto fail;

The second check is redundant and unreachable for any new failure, since
no operation modifies "result" between the two checks. This triggers a
Smatch warning about a duplicate zero check and reduces code clarity.

Remove the stale error check to keep the control flow correct and
readable.

Fixes: 9f49e3d4cb86 ("drm/amd/pm/smu7: Remove non-functional SMU7 voltage dependency on DAL")
Reported-by: Dan Carpenter <error27@gmail.com>
Cc: Timur Kristóf <timur.kristof@gmail.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Avoid ECC status update in hw_fini for VF unload

VF sends IDH_REQ_GPU_FINI_ACCESS before hw_fini during unload.
PF no longer accepts requests, so skip ECC status update to prevent
mailbox timeout.

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix CPER ring header parsing

amdgpu_cper_ring_get_ent_sz() parses CPER headers directly from the
circular ring buffer to determine the current entry size. When the ring
is full and the write pointer lands near the end of the buffer, the
header can wrap across the ring boundary.

The existing code treats the 4-byte CPER signature as a C string and
uses strcmp() on in-ring binary data, then reads record_length through a
direct struct pointer cast. Both assumptions are unsafe for wrapped
entries and can read past the end of the ring mapping.

Fix the parser by comparing the signature as raw bytes and by copying
the header into a local buffer before reading record_length, handling
wraparound explicitly in both cases. This avoids out-of-bounds reads in
amdgpu_cper_ring_get_ent_sz() when the CPER ring is full or the current
entry starts at the tail of the ring.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix heap buffer overflow in amdgpu_coredump ring dump

The off variable in the ring content dump loop tracks a byte offset
accumulated from ring->ring_size (which is in bytes), but it is used
as an index into u32 *rings_dw.  C pointer arithmetic on a u32 pointer
automatically multiplies the index by sizeof(u32) = 4, so the actual
byte address accessed is:

    &rings_dw[off]  ==  (char *)rings_dw + off * 4

This means off is effectively quadrupled, causing a 4x overshoot.

Concrete example -- two rings, each ring_size = 8 192 bytes (8 KB):

    total_ring_size = 16 384 bytes
    rings_dw = kzalloc(16 384)          /* 16 KB buffer */

  Ring 0: off = 0
    memcpy(&rings_dw[0], ring0->ring, 8192)
        -> writes bytes 0 .. 8 191                              OK

    off += ring->ring_size            -> off = 8 192   (BUG)

  Ring 1: off = 8 192
    memcpy(&rings_dw[8192], ring1->ring, 8192)
        -> actual byte offset = 8 192 * 4 = 32 768
        -> writes bytes 32 768 .. 40 959
        -> but buffer is only 16 384 bytes!             OVERFLOW

With the fix (off += ring->ring_size / 4):

  Ring 0: off = 0
    memcpy(&rings_dw[0], ring0->ring, 8192)             OK
    off += 8 192 / 4                  -> off = 2 048

  Ring 1: off = 2 048
    memcpy(&rings_dw[2048], ring1->ring, 8192)
        -> byte offset = 2 048 * 4 = 8 192
        -> writes bytes 8 192 .. 16 383                 OK

KASAN catches the overflow as a slab-use-after-free when the write
lands on a quarantined slab object:

  BUG: KASAN: slab-use-after-free in amdgpu_coredump+0x775/0x13c0 [amdgpu]
  Write of size 8192 at addr ffff8890b2400000 by task kworker/u128:1/329
  Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
  Call Trace:
   __asan_memcpy+0x3c/0x60
   amdgpu_coredump+0x775/0x13c0 [amdgpu]
   amdgpu_job_timedout+0xdb5/0x1420 [amdgpu]

The corrupted object was a 4 KB drm_exec buffer from a completed
amdgpu_cs_ioctl -- the ring dump memcpy overshot into this freed
slab region.

Fix by accumulating off in dword units (ring->ring_size / 4) so the
u32* indexing produces the correct byte address.  The reader in
amdgpu_devcoredump_format() already consumes the stored offset as a
dword index (rings_dw[off + j / 4]), so no change is needed there.

Fixes: eea85914d15b ("drm/amdgpu: save ring content before resetting the device")
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: correct single device PCIe reset flow for DPC

For triggering the dpc event with a single device, we still need
to set the in_link_reset flag and the dpc status.

Signed-off-by: Ce Sun <cesun102@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix NULL pointer dereference in amdgpu_devcoredump_format

A race condition in the devcoredump code causes a NULL pointer
dereference in amdgpu_devcoredump_format() when multiple GPU resets
occur in quick succession.

The sequence of events:

1. First reset calls amdgpu_coredump(), creates coredump1, sets
   adev->coredump = coredump1, and queues the deferred work.
2. The deferred work begins executing (work_pending() returns false
   since the work is now running, not just queued).
3. A second reset calls amdgpu_coredump(). work_pending() returns
   false because the work is running, so amdgpu_coredump() proceeds:
   creates coredump2, overwrites adev->coredump = coredump2, and
   re-queues the deferred work with queue_work().
4. The first deferred work finishes and unconditionally sets
   adev->coredump = NULL, destroying the reference to coredump2.
5. The re-queued deferred work starts and reads
   adev->coredump = NULL. It then passes this NULL into
   amdgpu_devcoredump_format() which dereferences coredump->adev
   (offset 0 in the struct), triggering:

   KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
   RIP: 0010:amdgpu_devcoredump_format+0xa6/0x36b0 [amdgpu]

This was observed during the amd_deadlock IGT test where multiple
subtests trigger rapid ring resets. The dmesg log shows four
coredumps created within 120ms (at 102.377s, 104.424s, 104.492s,
and 104.497s), with the crash occurring 13ms after the last one.

Fix this with two changes:

- Replace work_pending() with work_busy() in amdgpu_coredump() to
  also reject new coredumps while the deferred work is executing,
  not just when it is queued. This closes the main race window.

- Add a defensive NULL check for adev->coredump at the start of
  amdgpu_devcoredump_deferred_work() to prevent the crash if the
  race still occurs (work_busy() is advisory, not a full barrier).

v2: Drop the job->pasid NULL guard -- that fix was independently
    submitted and merged as commit 4c1f0a162da5 ("drm/amdgpu: add
    job->pasid in check as amdgpu_job could be NULL") by Sunil
    Khatri, reviewed by Christian König.  Integrate with that
    patch as suggested by Christian.

Fixes: 4bbba79a7f1d ("drm/amdgpu: move devcoredump generation to a worker")
Cc: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add job->pasid in check as amdgpu_job could be NULL

In below stack job->pasid is accessed while job is NULL. Access it
within the check when job is non NULL.

Failure call stack.
[  222.653622] BUG: kernel NULL pointer dereference, address: 000000000000014c
[  222.653625] #PF: supervisor read access in kernel mode
[  222.653628] #PF: error_code(0x0000) - not-present page
[  222.653630] PGD 0 P4D 0
[  222.653635] Oops: Oops: 0000 [#1] SMP NOPTI
[  222.653639] CPU: 1 UID: 0 PID: 12 Comm: kworker/u96:0 Not tainted 6.19.0-amd-staging-drm-next #271 PREEMPT(voluntary)
[  222.653644] Hardware name: Gigabyte Technology Co., Ltd. X570 AORUS ELITE/X570 AORUS ELITE, BIOS F37c 05/12/2022
[  222.653646] Workqueue: amdgpu-reset-dev amdgpu_userq_reset_work [amdgpu]
[  222.653961] RIP: 0010:amdgpu_coredump+0x8b/0x470 [amdgpu]
[  222.654158] Code: 48 83 c4 20 5b 41 5c 41 5d 41 5e 41 5f 5d 31 c0 31 c9 31 ff 31 d2 31 f6 45 31 c0 45 31 db e9 8c a9 1a e2 88 58 48 44 88 68 49 <41> 8b b7 4c 01 00 00 89 b0 80 00 00 00 4d 85 ff 48 89 45 d0 0f 84
[  222.654161] RSP: 0018:ffffce68c0147c00 EFLAGS: 00010282
[  222.654165] RAX: ffff8bc337407740 RBX: 0000000000000000 RCX: 0000000000000000
[  222.654167] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
[  222.654170] RBP: ffffce68c0147c48 R08: 0000000000000000 R09: 0000000000000000
[  222.654172] R10: ffff8bc337407740 R11: ffffffffc10dda10 R12: ffff8bc2d2e00000
[  222.654174] R13: 0000000000000001 R14: ffff8bc2d2e5b368 R15: 0000000000000000
[  222.654176] FS:  0000000000000000(0000) GS:ffff8bc64a5fe000(0000) knlGS:0000000000000000
[  222.654179] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  222.654182] CR2: 000000000000014c CR3: 0000000135eca000 CR4: 0000000000350ef0
[  222.654184] Call Trace:
[  222.654187]  <TASK>
[  222.654190]  ? amdgpu_ip_block_resume+0x28/0x70 [amdgpu]
[  222.654376]  ? srso_return_thunk+0x5/0x5f
[  222.654382]  amdgpu_device_reinit_after_reset+0x184/0x320 [amdgpu]
[  222.654552]  amdgpu_do_asic_reset+0x129/0x160 [amdgpu]
[  222.654720]  amdgpu_device_asic_reset+0x92/0x710 [amdgpu]
[  222.654890]  amdgpu_device_gpu_recover+0x2ae/0x3d0 [amdgpu]
[  222.655060]  amdgpu_userq_reset_work+0x76/0xa0 [amdgpu]
[  222.655229]  process_scheduled_works+0x1f0/0x450
[  222.655235]  worker_thread+0x27f/0x370

Fixes: 32ab301b89b3 ("drm/amdgpu: store ib info for devcoredump")
Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Clear VRAM on allocation to prevent stale data exposure

KFD VRAM allocations set AMDGPU_GEM_CREATE_VRAM_WIPE_ON_RELEASE
but not AMDGPU_GEM_CREATE_VRAM_CLEARED, leaving freshly allocated
VRAM with stale data from prior use observable by compute kernels.

The GEM ioctl path already sets VRAM_CLEARED for all userspace
allocations via amdgpu_gem_create_ioctl() and
amdgpu_mode_dumb_create(). The KFD path was missing this flag,
allowing stale page table remnants to leak into user buffers.

This causes crashes in RCCL P2P transport where non-zero data in
ptrExchange/head/tail fields corrupts the protocol handshake.

Signed-off-by: Amir Shetaia <Amir.Shetaia@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amdgpu: Use NBIF offset for register RCC_STRAP0_RCC_DEV0_EPF0_STRAP0 .

Define and use regRCC_STRAP0_RCC_DEV0_EPF0_STRAP0_nbif_4_10,
to get correct rev_id in nbif_v6_3_1_get_rev_id().

Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Signed-off-by: Ramalingeswara Reddy, Kanala <Kanala.RamalingeswaraReddy@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amd: Add missing firmware declaration for PSP v15.0.0

PSP v15.0.0 needs both TOC and TA firmware. Without the declaration
it won't get included in initramfs and leads to following failure:

```
Direct firmware load for amdgpu/psp_15_0_0_ta.bin failed with error -2
early_init of IP block <psp> failed -19
Fatal error during GPU init
```

Fixes: 9b24f63d825e7 ("drm/amdgpu: Enable support for PSP 15_0_0")
Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

amdgpu/jpeg: fix deepsleep register for jpeg 5_0_0 and 5_0_2

PCTL0__MMHUB_DEEPSLEEP_IB is 0x69004 on MMHUB 4,1,0 and
and 0x60804 on MMHUB 4,2,0. 0x62a04 is on MMHUB 1,8,0/1.

The DS bits are adjusted to cover more JPEG engines and MMHUB
version.

Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amdgpu: gate VM CPU HDP flush on reset lock

During GPU reset, the application could still run CPU page table updates. Each commit called
amdgpu_device_flush_hdp(), which on SR-IOV sends work through the KIQ ring.
That can advance sync_seq while the GPU is being reset,
leaving fence writeback out of sync and causing amdgpu_fence_emit_polling()
to time out on later KIQ use.

Fix:
amdgpu_vm_cpu_commit():
  Reset will flush HDP anyway, the HDP flush in amdgpu_vm_cpu_commit() can be skipped
  when a reset is ongoging.
  Take reset_domain->sem with down_read_trylock() before amdgpu_device_flush_hdp().
  If the reset path holds the write lock, skip the HDP flush so no HDP-related HW
  access (including KIQ) runs during reset; state is re-established after reset.

Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amdgpu: Use SMUIO 15.0.0 offsets for TSC upper and lower count.

Define and use regGOLDEN_TSC_COUNT_UPPER_smu_15_0_0 and
regGOLDEN_TSC_COUNT_LOWER_smu_15_0_0 for TSC upper and lower count.

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Pratik Vishwakarma <Pratik.Vishwakarma@amd.com>
Signed-off-by: Ramalingeswara Reddy, Kanala <Kanala.RamalingeswaraReddy@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org

drm/amdgpu: Remove sys file compute_partition_mem_alloc_mode at module unload

Module reload would fail when create sys file that was not removed during
module unload.

Fixes: e0e9792ea2d4 ("drm/amdgpu: add an option to allow gpu partition allocate all available memory")
Signed-off-by: Xiaogang Chen <xiaogang.chen@amd.com>
Reviewed-by: Philip Yang <philip.yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: fix incorrect FeatureCtrlMask setting on smu v14.0.x

OverDriveTable.FanMinimumPwm and FeatureCtrlMask.PP_OD_FEATURE_FAN_LEGACY_BIT
have a hard dependency.
Invalid handling of this dependency leads to disabled thermal monitoring
and temperature boundary validation.

v2: squash in typo fix (Yang)

Fixes: 9710b84e2a6a ("drm/amd/pm: add overdrive support on smu v14.0.2/3")
Cc: stable@vger.kernel.org
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Merge tag 'drm-misc-next-fixes-2026-04-17' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

Short summary of fixes pull:

dma-buf:
- fix documentation formatting

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://patch.msgid.link/20260417061430.GA11880@linux.fritz.box

Merge tag 'drm-intel-next-fixes-2026-04-16' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

- Fix VESA backlight possible check condition [backlight] (Suraj Kandpal)
- Verify the correct plane DDB entry [wm] (Ville Syrjälä)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Tvrtko Ursulin <tursulin@igalia.com>
Link: https://patch.msgid.link/aeCGoL4FFwT66bF4@linux