git.ipfire.org Git - thirdparty/kernel/linux.git/log

drm/amd/display: Add kdoc params/returns in dc/link detection helpers

The link detection helpers in dc/link/link_detection.c were missing
kdoc annotations for parameters and return values.

Fixes the below with gcc W=1:
...link_detection.c:872 parameter 'edid_header' not described
...link_detection.c:890 parameter 'link' not described
...link_detection.c:914 parameter 'link' not described
...link_detection.c:1355 parameter 'link' not described
...link_detection.c:1355 parameter 'type' not described

Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix annotations for connector poll/detect parameters

Adds the missing @aconnector, @connector, and @force descriptions:

@aconnector – This is the DM (Display Manager) connector. It gives
access to the DRM connector, the DC link, and hotplug/poll state. The
code uses it to check the link, update the sink, and manage connector
state changes.

@connector – This is the main DRM connector given by the DRM core.
Inside the detect function, it is converted to amdgpu_dm_connector so we
can run DC link detection, either light or full.

@force – This flag tells the function whether to run a full detect
again. If false, we avoid heavy DAC load detect steps to prevent
flicker. If true, we force a re-detect even when we normally skip it.

Fixes the below with gcc W=1:
function param 'aconnector' not described in 'amdgpu_dm_connector_poll'
function param 'force' not described in 'amdgpu_dm_connector_poll'
function param 'connector' not described in 'amdgpu_dm_connector_detect'
function param 'force' not described in 'amdgpu_dm_connector_detect'

Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/amdgpu: Ensure isp_kernel_buffer_alloc() creates a new BO

When the BO pointer provided to amdgpu_bo_create_kernel() points to
non-NULL, amdgpu_bo_create_kernel() takes it as a hint to pin that address
rather than allocate a new BO.

This functionality is never desired for allocating ISP buffers. A new BO
should always be created when isp_kernel_buffer_alloc() is called, per the
description for isp_kernel_buffer_alloc().

Ensure this by zeroing *bo right before the amdgpu_bo_create_kernel() call.

Fixes: 55d42f616976 ("drm/amd/amdgpu: Add helper functions for isp buffers")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Pratap Nirujogi <pratap.nirujogi@amd.com>
Signed-off-by: Sultan Alsawaf <sultan@kerneltoast.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.358

Summary:

* Enable VRR when unsynced with the stream
* Refactor DSC cap calculation for dcn35
* Add debug log for power feature
* Fix fill latency issue
* Do not initialize LSDMA if it is not supported by DMU

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: [FW Promotion] Release 0.1.35.0

Summary for changes in firmware:
* Use panel_inst instead of otg_inst when getting fw state
* Contrast strength improves when HDR desktop mode
* Ensure pipes have no outstanding HUBP requests prior to IPS RCG entry
* Check for vm request and vm idle status in IPS1/2 entry sequence

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Refactor HDCP Status Log Format

Add missing part for
drm/amd/display: fw locality check refactors

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: dynamically clock gate before and after prefetch

[Why]
An invalidation request arriving during prefetch can potentially hang
the system if dynamic clock gating is enabled and memory power requests
are disabled.

[How]
• Disable clock gating and enable memory power requests for the duration
of the prefetch.
• Turn on clock gating and disable memory power requests again after
prefetch is complete.

Limit the scope for DCN35 and DCN42 only.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Revert in_transfer_func_change to MED

[Why]
Last commit accidentally changed handling of in_transfer_func_change
from MED to FAST.

[How]
* Revert the line.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Dominik Kaszewski <dominik.kaszewski@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: To support Replay frame skip mode

[Why & How]
The change is to optimize the Replay power saving by
reducing the refresh rate with frame skipping mode

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Chuntao Tso <chunttso@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Change lock descriptor values

[Why]
Review of usage scenarios requires dc_lock_descriptor modification.

[How]
Replace STATE/LINK/STREAM/PLANE with GLOBAL/STREAM/LINK, where
the first means all streams to be locked.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Dominik Kaszewski <dominik.kaszewski@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: refactor DSC cap calculation for dcn35

why:
dcn35 currently uses a hardcoded DSC display clock value which is incorrect
for some asic types. Newer DCN versions retrieve dsc display clock from
clk_mgr. The same can be done for dcn35.

how:
Refactor the DSC cap calculation using pre-existing logic.
Handle ODM combine requirements in dc_dsc.c.
Replace hardcoded display clock with actual value retrieved from clk_mgr.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Reviewed-by: Charlene Liu <charlene.liu@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Mohit Bawa <Mohit.Bawa@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add new SMART POWER OLED interfaces

[why && how]
To optimize power consumption on certain OLED LED panels
by sending MaxCLL per frame to TCON

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Ian Chen <ian.chen@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add interface to capture power feature status for debug logging

[Why]
The status of various power features is often important information when
debugging certain issues, such as underflow. This info helps to
narrow down the potential sources of errors.

[How]
Add dc interface to capture power feature enablement status.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: George Shen <george.shen@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Allow VRR params change if unsynced with the stream

[Why]
When changing resolution (e.g., 4K → FHD) in mirror/clone mode with
certain monitors, the monitor blanks and loses connection due to an early
exit in vrr_settings_require_update(). The function only checks if VRR
state, fixed refresh target, or min/max refresh rate range has changed.

During mode changes, if the calculated min/max refresh values remain the
same even though the stream's v_total changed, the function returns early
without updating vrr_params.adjust.v_total_min/max, leaving the monitor's
VRR timing parameters unsynced with the new mode, causing it to blank out.

[How]
Explicitly adjust VRR parameters to the stream's nominal v_total when VRR
is supported, but inactive.

Fixes: 6d31602a9f57 ("drm/amd/display: more liberal vmin/vmax update for freesync")
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix index bug for fill latency

[WHY&HOW]
This array should be indexed by pstate type followed by plane index.

Reviewed-by: Austin Zheng <austin.zheng@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Only initialize LSDMA if it is supported in DMU

Need to check caps flag to determine whether LSDMA is supported in DMU

Reviewed-by: Rafal Ostrowski <rafal.ostrowski@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Fangzhi Zuo <jerry.zuo@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: remove unnecessary prints for smu busy

smu busy is a normal case when calling SMU_MSG_GetBadPageCount, so no need
to print error status at each time.Instead, only print error status when
timeout given by user is reached.

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: optimize timeout implemention in ras_eeprom_update_record_num

The busy status returned by ras_eeprom_update_record_num may not be
an error, increase timeout to exclude false busy status. Also add more
comments to make the code readable.

v2: define a macro for the timeout value.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add RAS bad page threshold handling for PMFW manages eeprom

Check if bad page threshold is reached and take actions accordingly.

v2: remove rma message sent to smu when pmfw manages eeprom.
v3: add null pointer check for con.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix lock warning in amdgpu_userq_fence_driver_process

Fix a potential deadlock caused by inconsistent spinlock usage
between interrupt and process contexts in the userq fence driver.

The issue occurs when amdgpu_userq_fence_driver_process() is called
from both:
- Interrupt context: gfx_v11_0_eop_irq() -> amdgpu_userq_fence_driver_process()
- Process context: amdgpu_eviction_fence_suspend_worker() ->
  amdgpu_userq_fence_driver_force_completion() -> amdgpu_userq_fence_driver_process()

In interrupt context, the spinlock was acquired without disabling
interrupts, leaving it in {IN-HARDIRQ-W} state. When the same lock
is acquired in process context, the kernel detects inconsistent
locking since the process context acquisition would enable interrupts
while holding a lock previously acquired in interrupt context.

Kernel log shows:
[ 4039.310790] inconsistent {IN-HARDIRQ-W} -> {HARDIRQ-ON-W} usage.
[ 4039.310804] kworker/7:2/409 [HC0[0]:SC0[0]:HE1:SE1] takes:
[ 4039.310818] ffff9284e1bed000 (&fence_drv->fence_list_lock){?...}-{3:3},
[ 4039.310993] {IN-HARDIRQ-W} state was registered at:
[ 4039.311004]   lock_acquire+0xc6/0x300
[ 4039.311018]   _raw_spin_lock+0x39/0x80
[ 4039.311031]   amdgpu_userq_fence_driver_process.part.0+0x30/0x180 [amdgpu]
[ 4039.311146]   amdgpu_userq_fence_driver_process+0x17/0x30 [amdgpu]
[ 4039.311257]   gfx_v11_0_eop_irq+0x132/0x170 [amdgpu]

Fix by using spin_lock_irqsave()/spin_unlock_irqrestore() to properly
manage interrupt state regardless of calling context.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: try for more times if RAS bad page number is not updated

RAS info update in PMFW is time cost, wait for it.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: jump to the correct label on failure

drm_sched_entity_init wasn't called yet, so the only thing to
do is to release allocated memory.
This doesn't fix any bug since entity is zero allocated and
drm_sched_entity_fini does nothing in this case.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fixing the clang format

This patch fixes the formatting in the patch
"amdkfd: Do not wait for queue op response during reset"

Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add NULL check for power limit

Add NULL check for smu power limit pointer

v2: Update error code on failure (Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: get RAS bad page address from MCA address

Instead of from physical address.

v2: add comment to make the code more readable

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Clarify that amdgpu.audio only works for non-DC

The comment already explains it but the module parameter help text
doesn't.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4684
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: disable peer-to-peer access for DCC-enabled GC12 VRAM surfaces

Certain multi-GPU configurations (especially GFX12) may hit
data corruption when a DCC-compressed VRAM surface is shared across GPUs
using peer-to-peer (P2P) DMA transfers.

Such surfaces rely on device-local metadata and cannot be safely accessed
through a remote GPU’s page tables. Attempting to import a DCC-enabled
surface through P2P leads to incorrect rendering or GPU faults.

This change disables P2P for DCC-enabled VRAM buffers that are contiguous
and allocated on GFX12+ hardware. In these cases, the importer falls back
to the standard system-memory path, avoiding invalid access to compressed
surfaces.

Future work could consider optional migration (VRAM→System→VRAM) if a
performance regression is observed when `attach->peer2peer = false`.

Tested on:
- Dual RX 9700 XT (Navi4x) setup
- GNOME and Wayland compositor scenarios
- Confirmed no corruption after disabling P2P under these conditions
v2: Remove check TTM_PL_VRAM & TTM_PL_FLAG_CONTIGUOUS.
v3: simplify for upsteam and fix ip version check (Alex)

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: add macros to simplify code

[Why & How]
Adding macros to simplify the process of adding new error codes.
Currently, to add an error code, the developer needs to add both the
enum and the string translation. This is error prone and can lead to
inconsistencies. The refactor adds a macro to automatically add the
string translation based on the enum.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: load RAS bad page from PMFW in page retirement

In legacy way, bad page is queried from MCA registers, switch to
getting it from PMFW when PMFW manages eeprom data.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Merge tag 'amd-drm-next-6.19-2025-11-07' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.19-2025-11-07:

amdgpu:
- Misc fixes
- HMM cleanup
- HDP flush rework
- RAS updates
- SMU 13.x updates
- SI DPM cleanup
- Suspend rework
- UQ reset support
- Replay/PSR fixes
- HDCP updates
- DC PMO fixes
- DC pstate fixes
- DCN4 fixes
- GPUVM fixes
- SMU 13 parition metrics
- Fix possible fence leak in job cleanup
- Hibernation fix
- MST fix

amdkfd:
- HMM cleanup
- Process cleanup fix

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20251107145938.26669-1-alexander.deucher@amd.com

Merge tag 'drm-misc-next-2025-11-05-1' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for v6.19-rc1:

UAPI Changes:
- Add userptr support to ivpu.
- Add IOCTL's for resource and telemetry data in amdxdna.

Core Changes:
- Improve some atomic state checking handling.
- drm/client updates.
- Use forward declarations instead of including drm_print.h
- RUse allocation flags in ttm_pool/device_init and allow specifying max
  useful pool size and propagate ENOSPC.
- Updates and fixes to scheduler and bridge code.
- Add support for quirking DisplayID checksum errors.

Driver Changes:
- Assorted cleanups and fixes in rcar-du, accel/ivpu, panel/nv3052cf,
  sti, imxm, accel/qaic, accel/amdxdna, imagination, tidss, sti,
  panthor, vkms.
- Add Samsung S6E3FC2X01 DDIC/AMS641RW, Synaptics TDDI series DSI,
  TL121BVMS07-00 (IL79900A) panels.
- Add mali MediaTek MT8196 SoC gpu support.
- Add etnaviv GC8000 Nano Ultra VIP r6205 support.
- Document powervr ge7800 support in the devicetree.

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Link: https://patch.msgid.link/5afae707-c9aa-4a47-b726-5e1f1aa7a106@linux.intel.com

Merge tag 'drm-intel-next-2025-11-04' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

drm/i915 feature pull for v6.19:

Features and functionality:
- Enable LNL+ content adaptive sharpness filter (CASF) (Nemesa)
- Use optimized VRR guardband (Ankit, Ville)
- Enable Xe3p LT PHY (Suraj)
- Enable FBC support for Xe3p_LPD display (Sai Teja, Vinod)
- Specify DMC firmware for display version 30.02 (Dnyaneshwar)
- Report reason for disabling PSR to debugfs (Michał)
- Extend i915_display_info with Type-C port details (Khaled)
- Log DSI send packet sequence errors and contents

Refactoring and cleanups:
- Refactoring to prepare for VRR guardband optimization (Ankit)
- Abstract VRR live status wait (Ankit)
- Refactor VRR and DSB timing to handle Set Context Latency explicitly (Ankit)
- Helpers for prefill latency calculations (Ville)
- Refactor SKL+ watermark latency setup (Ville)
- VRR refactoring and cleanups (Ville)
- SKL+ universal plane cleanups (Ville)
- Decouple CDCLK from state->modeset refactor (Ville)
- Refactor VLV/CHV clock functions (Jani)
- Refactor fbdev handling (Jani)
- Call i915 and xe runtime PM from display via function pointers (Jouni)
- IRQ code refactoring (Jani)
- Drop display dependency on i915 feature check macros (Jani)
- Refactor and unify i915 and xe stolen memory interfaces towards display (Jani)
- Switch to driver agnostic drm to display pointer chase (Jani)
- Use display version over graphics version in display code (Matt A)
- GVT cleanups (Jonathan, Andi)
- Rename a VLV clock function to unify (Michał)
- Explicitly sanitize DMC package header num entries (Luca)
- Remove redundant port clock check from ALPM (Jouni)
- Use sysfs_emit() instead of sprintf() in PMU sysfs (Madhur Kumar)
- Clean up C20 PHY PLL register macros (Imre, Mika))
- Abstract "address in MMIO table" helper for general use (Matt A)
- Improve VRR platform abstractions (Ville)
- Move towards more standard PCI PM code usage (Ville)
- Framebuffer refactoring (Ville)
- Drop display dependency on i915_utils.h (Jani)
- Include cleanups (Jani)

Fixes:
- Workaround docking station DSC issues with high pixel clock and bpp (Imre)
- Fix Panel Replay in DSC mode (Imre)
- Disable tracepoints for PREEMPT_RT as a workaround (Maarten)
- Fix intel_crtc_get_vblank_counter() on PREEMPT_RT (Maarten)
- Fix C10 PHY identification on PTL/WCL (Dnyaneshwar)
- Take AS SDP into account with optimized guardband (Jouni)
- Fix panic structure allocation memory leak (Jani)
- Adjust an FBC workaround platforms (Vinod)
- Add fallback for CDCLK selection (Naladala)
- Avoid using invalid transcoder in MST transport select (Suraj)
- Don't use cursor size reduction on display version 14+ (Nemesa)
- Fix C20 PHY PLL register programming (Imre, Mika)
- Fix PSR frontbuffer flush handling (Jouni)
- Store ALPM parameters in crtc state (Jouni)
- Defeature DRRS on LNL+ (Ville)
- Fix the scope of the large DRAM DIMM workaround (Ville)
- Fix PICA vs. AUX power ordering issue (Gustavo)
- Fix pixel rate for computing watermark line time (Ville)
- Fix framebuffer set_tiling vs. addfb race (Ville)
- DMC event handler fixes (Ville)

DRM Core:
- CRTC sharpness strength property (Nemesa)
- DPCD DSC quirk for Synaptics Panamera devices (Imre)
- Helpers to query the branch DSC max throughput/line-width (Imre)

Merges:
- Backmerge drm-next for v6.18-rc and to sync with drm-xe-next (Jani)

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Jani Nikula <jani.nikula@intel.com>
Link: https://patch.msgid.link/ec5a05f2df6d597a62033ee2d57225cce707b320@intel.com

drm/amd/pm: Update default power1_cap

Update default power1_cap to max limit for smu_v13_0_6 and smu_v13_0_12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: skip writing eeprom when PMFW manages RAS data

Only update bad page number in legacy eeprom write path.

v2: add null pointer check for con.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Enable mst when it's detected but yet to be initialized

[Why]
drm_dp_mst_topology_queue_probe() is used under the assumption that
mst is already initialized. If we connect system with SST first
then switch to the mst branch during suspend, we will fail probing
topology by calling the wrong API since the mst manager is yet to
be initialized.

[How]
At dm_resume(), once it's detected as mst branc connected, check if
the mst is initialized already. If not, call
dm_helpers_dp_mst_start_top_mgr() instead to initialize mst

V2: Adjust the commit msg a bit

Fixes: bc068194f548 ("drm/amd/display: Don't write DP_MSTM_CTRL after LT")
Cc: Fangzhi Zuo <jerry.zuo@amd.com>
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: support to load RAS bad pages from PMFW

PMFW manages eeprom bad page records, update bad page loading
accrodingly.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix wait after reset sequence in S3

For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if
TOS is unloaded.

Fixes: 32f73741d6ee ("drm/amdgpu: Wait for bootloader after PSPv11 reset")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add ras_eeprom_read_idx interface

PMFW will manage RAS eeprom data by itself, add new interface to read
eeprom data via PMFW, we can read part of records by setting index.

v2: use IPID parse interface.
pa is not used and set it to a fixed value.
v3: optimize the null pointer check for IPID parse interface.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: make MCA IPID parse global

So we can call it in other blocks.

v2: add a new IPID parse interface for umc and we can
implement it for each ASIC.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Fix suspend failure with secure display TA

commit c760bcda83571 ("drm/amd: Check whether secure display TA loaded
successfully") attempted to fix extra messages, but failed to port the
cleanup that was in commit 5c6d52ff4b61e ("drm/amd: Don't try to enable
secure display TA multiple times") to prevent multiple tries.

Add that to the failure handling path even on a quick failure.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4679
Fixes: c760bcda8357 ("drm/amd: Check whether secure display TA loaded successfully")
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Fix the issue of incorrect function call

When amdgpu_device_health_check fails, amdgpu_ras_pre_reset
will not be called and therefore amdgpu_ras_post_reset
cannot be called either.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix gpu page fault after hibernation on PF passthrough

On PF passthrough environment, after hibernate and then resume, coralgemm
will cause gpu page fault.

Mode1 reset happens during hibernate, but partition mode is not restored
on resume, register mmCP_HYP_XCP_CTL and mmCP_PSP_XCP_CTL is not right
after resume. When CP access the MQD BO, wrong stride size is used,
this will cause out of bound access on the MQD BO, resulting page fault.

The fix is to ensure gfx_v9_4_3_switch_compute_partition() is called
when resume from a hibernation.
KFD resume is called separately during a reset recovery or resume from
suspend sequence. Hence it's not required to be called as part of
partition switch.

Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: ras supports i2c eeprom for mp1 v13_0_12

ras supports i2c eeprom for mp1 v13_0_12.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Do not wait for queue op response during reset

This patch adds the condition to not wait for
the queue response for unmap, if the gpu is in reset.

Signed-off-by: Ahmad Rehman <Ahmad.Rehman@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/userq: need to unref bo

unref bo after amdgpu_bo_reserve() failure as it has
called amdgpu_bo_ref() already

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: David (Ming Qiang) Wu <David.Wu3@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: initialize max record count after table reset

initialize max record count and record offset after table reset

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: check pmfw eeprom feature bit

get and check the pmfw eeprom feature bit to
decide if pmfw eeprom is supported

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add check function for pmfw eeprom

add check function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add initialization function for pmfw eeprom

add initialization function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: adapt reset function for pmfw eeprom

adapt reset function for pmfw eeprom

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

dt-bindings: gpu: img,powervr-rogue: Document GE7800 GPU in Renesas R-Car M3-N

Document Imagination Technologies PowerVR Rogue GE7800 BNVC 15.5.1.64
present in Renesas R-Car R8A77965 M3-N SoC.

Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/20251104135716.12497-2-marek.vasut+renesas@mailbox.org
Signed-off-by: Matt Coster <matt.coster@imgtec.com>

dt-bindings: gpu: img,powervr-rogue: Keep lists sorted alphabetically

Sort the enum: list alphabetically. No functional change.

Signed-off-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Reviewed-by: Matt Coster <matt.coster@imgtec.com>
Link: https://patch.msgid.link/20251104135716.12497-1-marek.vasut+renesas@mailbox.org
Signed-off-by: Matt Coster <matt.coster@imgtec.com>

drm: rcar-du: fix incorrect return in rcar_du_crtc_cleanup()

The rcar_du_crtc_cleanup() function has a void return type, but
incorrectly uses a return statement with a call to drm_crtc_cleanup(),
which also returns void.

Remove the return statement to ensure proper function semantics.
No functional change intended.

Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Reviewed-by: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com>
Reviewed-by: Kieran Bingham <kieran.bingham+renesas@ideasonboard.com>
Link: https://patch.msgid.link/20251017191634.1454201-1-alok.a.tiwari@oracle.com
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>

accel/ivpu: Improve debug and warning messages

Add IOCTL debug bit for logging user provided parameter validation
errors.

Refactor several warning and error messages to better reflect fault
reason. User generated faults should not flood kernel messages with
warnings or errors, so change those to ivpu_dbg(). Add additional debug
logs for parameter validation in IOCTLs.

Check size provided by in metric streamer start and return -EINVAL
together with a debug message print.

Reviewed-by: Jeff Hugo <jeff.hugo@oss.qualcomm.com>
Signed-off-by: Karol Wachowski <karol.wachowski@linux.intel.com>
Link: https://patch.msgid.link/20251104132418.970784-1-karol.wachowski@linux.intel.com

accel/amdxdna: Add IOCTL parameter for telemetry data

Extend DRM_IOCTL_AMDXDNA_GET_INFO to include additional parameters
that allow collection of telemetry data.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251104062546.833771-3-lizhi.hou@amd.com

accel/amdxdna: Add IOCTL parameter for resource data

Extend DRM_IOCTL_AMDXDNA_GET_INFO to include additional parameters
that allow collection of resource data.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251104062546.833771-2-lizhi.hou@amd.com

accel/amdxdna: Add hardware specific attributes

Add three hardware specific attributes to describe device capabilities:
  hwctx_limit: The maximum number of hardware context supported.
  max_tops: The maximum TOPS supported.
  curr_tops: The TOPS achievable with the current power and frequency
             configuration.

Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
Link: https://patch.msgid.link/20251104062546.833771-1-lizhi.hou@amd.com

drm/amdgpu: fix possible fence leaks from job structure

If we don't end up initializing the fences, free them when
we free the job. We can't set the hw_fence to NULL after
emitting it because we need it in the cleanup path for the
submit direct case.

v2: take a reference to the fences if we emit them
v3: handle non-job fence in error paths

Fixes: db36632ea51e ("drm/amdgpu: clean up and unify hw fence handling")
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com> (v1)
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: suspend ras module before gpu reset

During gpu reset, all GPU-related resources are
inaccessible. To avoid affecting ras functionality,
suspend ras module before gpu reset and resume
it after gpu reset is complete.

V2:
  Rename functions to avoid misunderstanding.

V3:
  Move flush_delayed_work to amdgpu_ras_process_pause,
  Move schedule_delayed_work to amdgpu_ras_process_unpause.

V4:
  Rename functions.

V5:
  Move the function to amdgpu_ras.c.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add wrapper functions for pmfw eeprom interface

add wrapper functions for pmfw eeprom interface, for these interfaces
to be easily and safely called

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add function to check if pmfw eeprom is supported

add function to check if pmfw is supported, skip eeprom
check and recover when pmfw eeprom is supported

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: add smu ras driver framework

add functions to get smu ras driver

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: implement ras_smu_drv interface for smu v13.0.12

implement ras_smu_drv interface for smu v13.0.12

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: add new message definitions for pmfw eeprom interface

Add new message definitions for pmfw eeprom interface

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Revert "drm/amdkfd: Improve signal event slow path"

To fix regression report on gfx8, which requires the exhaustive search
path for signaled event.

The high CPU usage of KFD interrupt wq issue is gone after HIP/ROCr add
option to reduce HW event interrupts, safe to revert this optimization
patch now.

This reverts commit de844846f72b152119faaef1b363448dc8ea368f.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix NULL deref in debugfs odm_combine_segments

When a connector is connected but inactive (e.g., disabled by desktop
environments), pipe_ctx->stream_res.tg will be destroyed. Then, reading
odm_combine_segments causes kernel NULL pointer dereference.

BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 0 P4D 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 16 UID: 0 PID: 26474 Comm: cat Not tainted 6.17.0+ #2 PREEMPT(lazy)  e6a17af9ee6db7c63e9d90dbe5b28ccab67520c6
Hardware name: LENOVO 21Q4/LNVNB161216, BIOS PXCN25WW 03/27/2025
RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
PKRU: 55555554
Call Trace:
  <TASK>
  seq_read_iter+0x125/0x490
  ? __alloc_frozen_pages_noprof+0x18f/0x350
  seq_read+0x12c/0x170
  full_proxy_read+0x51/0x80
  vfs_read+0xbc/0x390
  ? __handle_mm_fault+0xa46/0xef0
  ? do_syscall_64+0x71/0x900
  ksys_read+0x73/0xf0
  do_syscall_64+0x71/0x900
  ? count_memcg_events+0xc2/0x190
  ? handle_mm_fault+0x1d7/0x2d0
  ? do_user_addr_fault+0x21a/0x690
  ? exc_page_fault+0x7e/0x1a0
  entry_SYSCALL_64_after_hwframe+0x6c/0x74
RIP: 0033:0x7f44d4031687
Code: 48 89 fa 4c 89 df e8 58 b3 00 00 8b 93 08 03 00 00 59 5e 48 83 f8 fc 74 1a 5b c3 0f 1f 84 00 00 00 00 00 48 8b 44 24 10 0f 05 <5b> c3 0f 1f 80 00 00 00 00>
RSP: 002b:00007ffdb4b5f0b0 EFLAGS: 00000202 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 00007f44d3f9f740 RCX: 00007f44d4031687
RDX: 0000000000040000 RSI: 00007f44d3f5e000 RDI: 0000000000000003
RBP: 0000000000040000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 00007f44d3f5e000
R13: 0000000000000003 R14: 0000000000000000 R15: 0000000000040000
  </TASK>
Modules linked in: tls tcp_diag inet_diag xt_mark ccm snd_hrtimer snd_seq_dummy snd_seq_midi snd_seq_oss snd_seq_midi_event snd_rawmidi snd_seq snd_seq_device x>
  snd_hda_codec_atihdmi snd_hda_codec_realtek_lib lenovo_wmi_helpers think_lmi snd_hda_codec_generic snd_hda_codec_hdmi snd_soc_core kvm snd_compress uvcvideo sn>
  platform_profile joydev amd_pmc mousedev mac_hid sch_fq_codel uinput i2c_dev parport_pc ppdev lp parport nvme_fabrics loop nfnetlink ip_tables x_tables dm_cryp>
CR2: 0000000000000000
---[ end trace 0000000000000000 ]---
RIP: 0010:odm_combine_segments_show+0x93/0xf0 [amdgpu]
Code: 41 83 b8 b0 00 00 00 01 75 6e 48 98 ba a1 ff ff ff 48 c1 e0 0c 48 8d 8c 07 d8 02 00 00 48 85 c9 74 2d 48 8b bc 07 f0 08 00 00 <48> 8b 07 48 8b 80 08 02 00>
RSP: 0018:ffffd1bf4b953c58 EFLAGS: 00010286
RAX: 0000000000005000 RBX: ffff8e35976b02d0 RCX: ffff8e3aeed052d8
RDX: 00000000ffffffa1 RSI: ffff8e35a3120800 RDI: 0000000000000000
RBP: 0000000000000000 R08: ffff8e3580eb0000 R09: ffff8e35976b02d0
R10: ffffd1bf4b953c78 R11: 0000000000000000 R12: ffffd1bf4b953d08
R13: 0000000000040000 R14: 0000000000000001 R15: 0000000000000001
FS:  00007f44d3f9f740(0000) GS:ffff8e3caa47f000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000000 CR3: 00000006485c2000 CR4: 0000000000f50ef0
PKRU: 55555554

Fix this by checking pipe_ctx->stream_res.tg before dereferencing.

Fixes: 07926ba8a44f ("drm/amd/display: Add debugfs interface for ODM combine info")
Signed-off-by: Rong Zhang <i@rong.moe>
Reviewed-by: Mario Limoncello <mario.limonciello@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Don't clear PT after process killed

If process is killed. the vm entity is stopped, submit pt update job
will trigger the error message "*ERROR* Trying to push to a killed
entity", job will not execute.

Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Add ras support for umc v12_5_0

Add ras support for umc v12_5_0.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Add ras support for nbio v7_9_1

Add ras support for nbio v7_9_1.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add ras ip block name

Add ras ip block name.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Increase ras switch control range

Increase ras switch control range.

Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/smu: Handle S0ix for vangogh

Fix the flows for S0ix. There is no need to stop
rlc or reintialize PMFW in S0ix.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4659
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reported-by: Antheas Kapenekakis <lkml@antheas.dev>
Tested-by: Antheas Kapenekakis <lkml@antheas.dev>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Update SMUv13.0.12 partition metrics

Update SMUv13.0.12 partition metrics to partition metrics v1.1 schema.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Update SMUv13.0.6 partition metrics

For SMU v13.0.6 SOCs, move to partition metrics v1.1 schema

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add schema v1.1 for parition metrics

Use a schema similar to gpu metrics v1.9 for partition metrics also. It
will have field type encoded followed by the field value(s). The
attribute ids used will be shared with gpu metrics. The structure
definition is only to distinguish between gpu metrics and partition
metrics though both gpu metrics v1.9 and partition metrics v1.1 follow
the same definition.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.12

Fill and publish GPU metrics in v1.9 format for SMUv13.0.12 SOCs

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: validate the bo from done list for NULL

Make sure the bo is valid before using it.

Signed-off-by: Sunil Khatri <sunil.khatri@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: lock bo before calling amdgpu_vm_bo_update_shared

BO's reservation object must be locked before using
amdgpu_vm_bo_update_shared otherwise dma_resv_assert_held will
complain in amdgpu_vm_update_shared.

Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: grab a BO reference in vm_lock_done_list.

Otherwise it is possible that between dropping the status lock and
locking the BO that the BO is freed up.

Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Sunil Khatri <sunil.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Fix format truncation

../ras/rascore/ras_cper.c: In function ‘cper_generate_fatal_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                    ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   76 |                     RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
../ras/rascore/ras_cper.c: In function ‘cper_generate_runtime_record.isra’:
../ras/rascore/ras_cper.c:75:36: error: ‘%llX’ directive output may be truncated writing between 1 and 14 bytes into a region of size between 0 and 7 [-Werror=format-truncation=]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                    ^~~~
../ras/rascore/ras_cper.c:75:32: note: directive argument in the range [0, 72057594037927935]
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |                                ^~~~~~~~~
../ras/rascore/ras_cper.c:75:9: note: ‘snprintf’ output between 4 and 27 bytes into a destination of size 9
   75 |         snprintf(record_id, 9, "%d:%llX", dev_info.socket_id,
      |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   76 |                     RAS_LOG_SEQNO_TO_BATCH_IDX(trace->seqno));
      |                     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.357

This version brings along following update:

- HDCP2 FW locality check refactors
- Fix black screen issue with HDMI output
- Increase IB mem size
- Revert max buffered cursor size to 64
- Extend inbox0 lock to run Replay / PSR
- Refactor VActive implementation
- Add Pstate viewport reduction
- Persist stream refcount through restore

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: [FW Promotion] Release 0.1.34.0

Release hightlights

DCN35/36
* Dynamically clock gate before and after prefetch

Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix black screen with HDMI outputs

[Why & How]
This fixes the black screen issue on certain APUs with HDMI,
accompanied by the following messages:

amdgpu 0000:c4:00.0: amdgpu: [drm] Failed to setup vendor info
frame on connector DP-1: -22
amdgpu 0000:c4:00.0: [drm] Cannot find any crtc or sizes [drm]
Cannot find any crtc or sizes

Fixes: 489f0f600ce2 ("drm/amd/display: Fix DVI-D/HDMI adapters")
Suggested-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Increase IB mem size

[Why & How]
Increase IB mem size to match size of largest structure that will
use IB transfer between driver and DMU.

Reviewed-by: Oleh Kuzhylnyi <oleh.kuzhylnyi@amd.com>
Signed-off-by: Alvin Lee <Alvin.Lee2@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Revert DCN4 max buffered cursor size to 64

[Why & How]
The buffered cursor cap is expressed assuming a square cursor, and usage
of the cursor buffer is limited by the request size. For greater than 32
pixels, the request size is fixed at 256 bytes, so the maximum width
must be floored to the nearest 256th byte. At 4bpp this means even with
24kB DCN4 can only hold a 64x64 cursor in the buffer as even 65 pixels
would require 512 bytes per line instead of 256.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Dillon Varone <Dillon.Varone@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Persist stream refcount through restore

[Why & How]
Overwriting the refcount on stream restore can lead to double-free errors
or memory leaks if an unbalanced number of retains and releases occurs
between a backup and restore.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add Pstate viewport reduction

[Why/How]
Add struct to hold calculated reduced viewport pstate
recout reduction lines per plane

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Refactor VActive implementation

[Why & How]
Refactors VActive accounting in PMO, and breaks down fill time
requirement by P-State type as it can result in drasitcally different
bandwidth requirements depending on the blackout length.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Update P-state naming for clarity.

[Why & How]
P-state can refer to different things like UCLK P-state, PPT, or temp read
Update naming for clarity

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove old PMO options

[Why & How]
Removes deprecated or unused PMO options.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add pte_buffer_mode and force_one_row_for_frame in dchub reg

[Why & How]
Update structs for rq regs

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Extend inbox0 lock to run Replay/PSR

[Why]
The inbox1 infrastructure is deprecated, so to support display
power features requiring a DMUB interlock moving forward extend
the inbox0 locking conditions to also include Replay or PSR.

[How]
Implemented a series of changes to improve HW lock handling:
- Deprecated should_use_dmub_inbox1_lock() and guarded it with
  DCN401 flag.
- Migrated lock checks into inbox0 helpers and added PSR/Replay
  enablement checks to ensure correct behavior.
- Updated HWSS fast update path to acquire HW lock as needed
  using the new helpers.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Andrew Mazour <Andrew.Mazour@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: fw locality check refactors

[why]
There are some new changes for HDCP2 firmware locality check. The
implementation doesn't perfectly fit the intended design and clarity.

1. Clarify and consolidate variable responsibilities.
The previous implementation introduced the following variables:
- config.ddc.funcs.atomic_write_poll_read_i2c (optional pointer)
- hdcp->config.ddc.funcs.atomic_write_poll_read_aux (optional pointer)
- hdcp->connection.link.adjust.hdcp2.force_sw_locality_check (bool)
- hdcp->config.debug.lc_enable_sw_fallback (bool)
- use_fw (bool)
They will be used together to determine two operations:
- Whether to use FW locality check
- Whether to use SW fallback on FW locality check failure
The refactor streamlines this by introducing two variables in the hdcp2
link adjustment, while ensuring function pointers are always assigned
and remain independent from policy decisions:
- use_fw_locality_check (bool) -> true if fw locality should be used.
- use_sw_locality_fallback (bool) -> true to reset use_fw_locality_check
back to false and retry on fw locality check failure.

2. Mixed meanings of l_prime_read transition input
l_prime_read originally means if l_prime is read when sw locality check
is used. When FW locality check is used, l_prime_read means if lc init
write, l prime poll and l_prime read combo operation is successful. The
mix of meanings is confusing. The refactor introduces a new variable
l_prime_combo_read to isolate the second meaning into its own variable.

3. Missing specific error code on firmware locality error.
The original change reuses the generic DDC failure error code when
firmware fails to return locality check result. This is not ideal as
DDC failure indicates an error occurred during an I2C/AUX transaction.
FW locality failure could be caused by polling timeout in firmware or
failure to acquire firmware access. Which sits at a higher level of
abstraction above DDC hardware. An incorrect error code could mislead
the debug into a wrong direction.

4. Correcting misplaced comments. The previous implementation of the
firmware locality check resulted in some comments in hdcp2_transition
being incorrectly positioned. This refactor relocates those comments to
their appropriate locations for better clarity.

Reviewed-by: Aric Cyr <aric.cyr@amd.com>
Signed-off-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Implement user queue reset functionality

This patch adds robust reset handling for user queues (userq) to improve
recovery from queue failures. The key components include:

1. Queue detection and reset logic:
   - amdgpu_userq_detect_and_reset_queues() identifies failed queues
   - Per-IP detect_and_reset callbacks for targeted recovery
   - Falls back to full GPU reset when needed

2. Reset infrastructure:
   - Adds userq_reset_work workqueue for async reset handling
   - Implements pre/post reset handlers for queue state management
   - Integrates with existing GPU reset framework

3. Error handling improvements:
   - Enhanced state tracking with HUNG state
   - Automatic reset triggering on critical failures
   - VRAM loss handling during recovery

4. Integration points:
   - Added to device init/reset paths
   - Called during queue destroy, suspend, and isolation events
   - Handles both individual queue and full GPU resets

The reset functionality works with both gfx/compute and sdma queues,
providing better resilience against queue failures while minimizing
disruption to unaffected queues.

v2: add detection and reset calls when preemption/unmaped fails.
    add a per device userq counter for each user queue type.(Alex)
v3: make sure we hold the adev->userq_mutex when we call amdgpu_userq_detect_and_reset_queues. (Alex)
   warn if the adev->userq_mutex is not held.
v4: make sure we have all of the uqm->userq_mutex held.
   warn if the uqm->userq_mutex is not held.

v5: Use array for user queue type counters.(Alex)
    all of the uqm->userq_mutex need to be held when calling detect and reset.  (Alex)

v6: fix lock dep warning in amdgpu_userq_fence_dence_driver_process

v7: add the queue types in an array and use a loop in amdgpu_userq_detect_and_reset_queues (Lijo)
v8: remove atomic_set(&userq_mgr->userq_count[i], 0).
   it should already be 0 since we kzalloc the structure (Alex)
v9: For consistency with kernel queues, We may want something like:
   amdgpu_userq_is_reset_type_supported (Alex)

Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Don't stretch non-native images by default in eDP

commit 978fa2f6d0b12 ("drm/amd/display: Use scaling for non-native
resolutions on eDP") started using the GPU scaler hardware to scale
when a non-native resolution was picked on eDP. This scaling was done
to fill the screen instead of maintain aspect ratio.

The idea was supposed to be that if a different scaling behavior is
preferred then the compositor would request it. The not following
aspect ratio behavior however isn't desirable, so adjust it to follow
aspect ratio and still try to fill screen.

Note: This will lead to black bars in some cases for non-native
resolutions. Compositors can request the previous behavior if desired.

Fixes: 978fa2f6d0b1 ("drm/amd/display: Use scaling for non-native resolutions on eDP")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4538
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fix Unchecked Return Values

Properly Check for return values from calls to debug functions in
runtime_disable().

v2: storing the last non zero returned value from the loop.

Signed-off-by: Sunday Clement <Sunday.Clement@amd.com>
Reviewed-by: Jonathan Kim <Jonathan.Kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Unwind for failed device suspend

If device suspend has failed, add a recovery flow that will attempt
to unwind the suspend and get things back up and running.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4627
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Add an unwind for failures in amdgpu_device_ip_suspend_phase2()

If any hardware IPs involved with the second phase of suspend fail, unwind
all steps to restore back to original state.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Add an unwind for failures in amdgpu_device_ip_suspend_phase1()

If any hardware IPs involved with the first phase of suspend fail, unwind
all steps to restore back to original state.

Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Drop PMFW RLC notifier from amdgpu_device_suspend()

For S3 on vangogh, PMFW needs to be notified before the
driver powers down RLC. This already happens in smu_disable_dpms()
so drop the superfluous call in amdgpu_device_suspend().

Co-developed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>