git.ipfire.org Git - thirdparty/kernel/linux.git/log

drm/amdgpu/gfx11: adjust KGQ reset sequence

Kernel gfx queues do not need to be reinitialized or
remapped after a reset. This fixes queue reset failures
on APUs.

v2: preserve init and remap for MMIO case.

Fixes: b3e9bfd86658 ("drm/amdgpu/gfx11: add ring reset callbacks")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx12: fix wptr reset in KGQ init

wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx11: fix wptr reset in KGQ init

wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx10: fix wptr reset in KGQ init

wptr is a 64 bit value and we need to update the
full value, not just 32 bits. Align with what we
already do for KCQs.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Use AMDGPU_MQD_SIZE_ALIGN in gfx11+ kfd mqd manager

MES is enabled by default from gfx11+, use AMDGPU_MQD_SIZE_ALIGN
unconditionally for gfx11+.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Adjust parameter of allocate_mqd

Make allocate_mqd consistent with other callbacks.
Prepare for next patch to use mqd_manager->mqd_size.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Use AMDGPU_MQD_SIZE_ALIGN in KGD

Use AMDGPU_MQD_SIZE_ALIGN for both kernel and user queue.

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Initialize allowed feature list

Instead of returning feature bit mask of allowed features, initialize
the allowed features in the callback implementation itself.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Remove unused logic in SMUv14.0.2

Remove commented and redundant logic in get_allowed_feature_mask
implementation.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add smu feature interface functions

Instead of using bitmap operations, add wrapper interface functions to
operate on smu features.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add smu feature bits data struct

Add a bitmap struct to represent smu feature bits and functions to set/clear features.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add a helper macro to align mqd size

MES FW uses address(mqd_addr + sizeof(struct mqd) + 3*sizeof(uint32_t))
as fence address and writes a 32 bit fence value to this address. Driver
needs to allocate some extra memory(at least 4 DWs) in addition to
sizeof(struct mqd) as mqd memory(limited to gfx/compute/sdma queue).

For gfx11/12, sizeof(struct mqd) < PAGE_SIZE, KGD allocates mqd memory with
PAGE_SIZE aligned works. For gfx12.1, sizeof(struct mqd) == PAGE_SIZE,
it doesn't work.

KFD mqd manager hardcodes mqd size to PAGE_SIZE/MQD_SIZE across different
IP versions to solve this issue.

To avoid hardcoding in differnet places and across different IP versions.
Let's use AMDGPU_MQD_SIZE_ALIGN instead. It is used in two places.

1. mqd memory alloction
2. mqd stride handling for multi xcc config

v2: Use AMDGPU_GPU_PAGE_ALIGN. (Mukul)

Signed-off-by: Lang Yu <lang.yu@amd.com>
Reviewed-by: David Belanger <david.belanger@amd.com> (v1)
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Mukul Joshi <mukul.joshi@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: validate user queue size constraints

Add validation to ensure user queue sizes meet hardware requirements:
- Size must be a power of two for efficient ring buffer wrapping
- Size must be at least AMDGPU_GPU_PAGE_SIZE to prevent undersized allocations

This prevents invalid configurations that could lead to GPU faults or
unexpected behavior.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix cond_exec handling in amdgpu_ib_schedule()

The EXEC_COUNT field must be > 0. In the gfx shadow
handling we always emit a cond_exec packet after the gfx_shadow
packet, but the EXEC_COUNT never gets patched. This leads
to a hang when we try and reset queues on gfx11 APUs.

Fixes: c68cbbfd54c6 ("drm/amdgpu: cleanup conditional execution")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4789
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/soc21: fix xclk for APUs

The reference clock is supposed to be 100Mhz, but it
appears to actually be slightly lower (99.81Mhz).

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14451
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: gfx12.1 trap handler instruction fixup for VOP3PX

A trap may occur in the middle of VOP3PX instruction co-issue.
The PC would be restored incorrectly if left unmodified.

Identify this case by examining the instruction opcode and
rewind the PC 8 bytes if it occurs.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Reviewed-by: Vladimir Indic <vladimir.indic@amd.com>
Cc: Shweta Khatri <shweta.khatri@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Fix null pointer dereference issue

If SMU is disabled, during RAS initialization,
there will be null pointer dereference issue here.

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: add extended capabilities to device snapshot

Add additional capabilities reporting.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: James Zhu <james.zhu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Send RMA CPER at bad page loading

Some older builds weren't sending RMA CPERs when the bad page threshold
was exceeded. Newer builds have resolved this, but there could be
systems out there with bad page numbers higher than the threshold, that
haven't sent out an RMA CPER. To be thorough and safe, send an RMA CPER
when we load the table, if the threshold is met or exceeded, instead of
waiting for the next UE to trigger the CPER.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix jpeg ring test order in vcn_v4_0_3

Fix the vcn reset sequence in vcn_v4_0_3_ring_reset() to restore
JPEG power state and unlock the JPEG powergating mutex before
running the JPEG ring post-reset helper.

Fixes: d25c67fd9d6f ("drm/amdgpu/vcn4.0.3: rework reset handling")
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: fix race in power state check before mutex lock

The power state check in amdgpu_dpm_set_powergating_by_smu() is done
before acquiring the pm mutex, leading to a race condition where:
1. Thread A checks state and thinks no change is needed
2. Thread B acquires mutex and modifies the state
3. Thread A returns without updating state, causing inconsistency

Fix this by moving the mutex lock before the power state check,
ensuring atomicity of the state check and modification.

Fixes: 6ee27ee27ba8 ("drm/amd/pm: avoid duplicate powergate/ungate setting")
Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.367

* Fw release 0.1.44.0
* Fixes for corruption on platforms older than DCN4x.
* Bug fixes related to USB4 link training
* Fixes related to FP guard
* Debug helpers and other stability fixes.
* Some refactors to improve code quality

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: [FW Promotion] Release 0.1.44.0

* Panel Replay related features/bugfixes
* BootCRC feature

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Migrate HUBBUB register access from hwseq to hubbub component.

[why]
Direct HUBBUB register access in the hwseq layer was creating register conflicts.

[how]
Migrated HUBBUB registers from hwseq to the hubbub component.

Reviewed-by: Martin Leung <martin.leung@amd.com>
Signed-off-by: Bhuvanachandra Pinninti <bpinnint@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: mouse event trigger to boost RR when idle

[WHY+HOW]
Add trigger event to boost refresh rate on mouse movement.

Reviewed-by: Jun Lei <jun.lei@amd.com>
Signed-off-by: Muaaz Nisar <muanisar@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add debug flag to override min dispclk

[WHY]
Enable dynamic ODM testing without needing a valid dispclk table

[HOW]
Create a debug flag to specify an override value for min dispclk

Reviewed-by: Dmytro Laktyushkin <dmytro.laktyushkin@amd.com>
Signed-off-by: Michael Strauss <michael.strauss@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: avoid dig reg access timeout on usb4 link training fail

[Why]
When usb4 link training fails, the dpia sym clock will be disabled and SYMCLK
source should be changed back to phy clock. In enable_streams, it is
assumed that link training succeeded and will switch from refclk to
phy clock. But phy clk here might not be on. Dig reg access timeout
will occur.

[How]
When enable_stream is hit, check if link training failed for usb4.
If it did, fall back to the ref clock to avoid reg access timeout.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Zhongwei <Zhongwei.Zhang@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove unnecessary DC FP guard

[Why & How]
For dcn2x_fast_validate_bw(), not only populate_dml_pipes needs FP guard
but also dml_get_voltage_level().

Remove unnecessary DC_FP_START/DC_FP_END guard in dcn20_fast_validate_bw
and dcn21_fast_validate_bw. FP guard is already there before calling
dcn2x_validate_bandwidth_fp().

Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: add setup_stereo for dcn4x or later

[why]
stereo_sync pin is removed, but we still support display stereo

Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: perform clear update flags for all DCN asics

Existing version check that limits the sequence to clear update flags
should be performed for all asics. Exclude DCE asics for now.

Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Enable bootcrc on FW side

[Why]
The bootcrc feature is controlled on the FW side.

[How]
Pass the control bits in boot options to FW.

Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com>
Signed-off-by: Wayne Lin <Wayne.Lin@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add FR skipping CTS functions

1. To check whether Sink reaches maximum skipping number

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Jack Chang <jack.chang@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix GFX12 family constant checks

Using >=, <= for checking the family is not always correct.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Enable vstateup hook for DCN401 to be reused

Add the hook to the DCN401 header file so that it can be reused in other
files

Reviewed-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix NULL pointer dereference in amdgpu_gmc_filter_faults_remove

On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and
ih2 interrupt ring buffers are not initialized. This is by design, as
these secondary IH rings are only available on discrete GPUs. See
vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when
AMD_IS_APU is set.

However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to
get the timestamp of the last interrupt entry. When retry faults are
enabled on APUs (noretry=0), this function is called from the SVM page
fault recovery path, resulting in a NULL pointer dereference when
amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[].

The crash manifests as:

  BUG: kernel NULL pointer dereference, address: 0000000000000004
  RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu]
  Call Trace:
   amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu]
   svm_range_restore_pages+0xae5/0x11c0 [amdgpu]
   amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu]
   gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu]
   amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu]
   amdgpu_ih_process+0x84/0x100 [amdgpu]

This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW
IP 9.3.0 from noretry=1") which changed the default for Renoir APU from
noretry=1 to noretry=0, enabling retry fault handling and thus
exercising the buggy code path.

Fix this by adding a check for ih1.ring_size before attempting to use
it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu:
Rework retry fault removal").  This is needed if the hardware doesn't
support secondary HW IH rings.

v2: additional updates (Alex)

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814
Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Jon Doron <jond@wiz.io>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: fix smu v14 soft clock frequency setting issue

v1:
resolve the issue where some freq frequencies cannot be set correctly
due to insufficient floating-point precision.

v2:
patch this convert on 'max' value only.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: fix smu v13 soft clock frequency setting issue

v1:
resolve the issue where some freq frequencies cannot be set correctly
due to insufficient floating-point precision.

v2:
patch this convert on 'max' value only.

Signed-off-by: Yang Wang <kevinyang.wang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/include : Update MES v12 API header - SUSPEND

Update SUSPEND API to support sdma queues.
It's been supportted since 0x82 for gfx12

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Simplify sorting of the bo list

Sort function only cares about the sign so we can replace the conditionals
with a single subtraction.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mes: Remove idr leftovers v2

Commit
cb17fff3a254 ("drm/amdgpu/mes: remove unused functions")
removed most of the code using these IDRs but forgot to remove the struct
members and init/destroy paths.

There is also interrupt handling code in SDMA 5.0 and 5.2 which appears to
be using it, but is is unreachable since nothing ever allocates the
relevant IDR. We replace those with one time warnings just to avoid any
functional difference, but it is also possible they should be removed.

v2: also fix up gfx_v12_1.c and sdma_v7_1.c

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@igalia.com>
References: cb17fff3a254 ("drm/amdgpu/mes: remove unused functions")
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Merge tag 'amd-drm-next-6.20-2026-01-23' of https://gitlab.freedesktop.org/agd5f/linux into drm-next

amd-drm-next-6.20-2026-01-23:

amdgpu:
- GC 12 fix
- UserQ fixes
- Misc error path fixes
- IP discovery fixes
- Misc cleanups
- HDMI fixes
- Documentation update
- Panel replay fixes
- Panel type handling fixes
- DCN 3.1.x fixes
- DC analog display fix
- SMU 6 fixes
- VCN 4.0.3 queue reset fix
- VCN 5.0.1 queue reset fix
- GPUVM TLB flush fix
- RAS fixes
- DC idle optimization fix

amdkfd:
- MQD fix for GC 9.4.3 and 9.5
- GC 11 cooperative launch fix
- GC 12.1 trap handler fixes
- Misc cleanups

radeon:
- Use drm_* logging helpers for VCE

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://patch.msgid.link/20260123173537.17221-1-alexander.deucher@amd.com

Merge tag 'drm-misc-next-2026-01-22' of https://gitlab.freedesktop.org/drm/misc/kernel into drm-next

drm-misc-next for 6.20:

Core Changes:
- buddy: Fix free_trees memory leak, prevent a BUG_ON
- dma-buf: Start to introduce cgroup memory accounting in heaps, Remove
   sysfs stats, add new tracepoints
- hdmi: Limit infoframes exposure to userspace based on driver
   capabilities
- property: Account for property blobs in memcg

Driver Changes:
- atmel-hlcdc: Switch to drmm resources, Support nomodeset parameter,
   various patches to use newish helpers and fix memory safety bugs
- hisilicon: Fix various DisplayPort related bugs
- imagination: Introduce hardware version checks
- renesas: Fix kernel panic on reboot
- rockchip: Fix RK3576 HPD interrupt handling, Improve RK3588 HPD
   interrupt handling
- v3d: Convert to drm logging helpers

- bridge:
   - Continuation of the refcounting effort
   - new bridge: Algoltek AG6311

- panel:
   - new panel: Anbernic RG-DS

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20260122-antique-sexy-junglefowl-1bc5a8@houat

Merge tag 'drm-intel-next-fixes-2026-01-22' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next

- Don't enable Panel Replay on sink if globally disabled

Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Link: https://patch.msgid.link/aXHZjieG9PkFKjZ2@jlahtine-mobl

Revert "drm/amd/display: pause the workload setting in dm"

This reverts commit bc6d54ac7e7436721a19443265f971f890c13cc5.

The workload profile needs to be in the default state when
the dc idle optimizaion state is entered.  However, when
jobs come in for video or GFX or compute, the profile may
be set to a non-default profile resulting in the dc idle
optimizations not taking affect and resulting in higher
power usage.  As such we need to pause the workload profile
changes during this transition.  When this patch was originally
committed, it caused a regression with a Dell U3224KB display,
but no other problems were reported at the time.  When it
was reapplied (this patch) to address increased power usage, it
seems to have caused additional regressions.  This change seems
to have a number of side affects (audio issues, stuttering,
etc.).  I suspect the pause should only happen when all displays
are off or in static screen mode, but I think this call site
gets called more often than that which results in idle state
entry more often than intended.  For now revert.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4894
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4717
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4725
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4517
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4806
Cc: Yang Wang <kevinyang.wang@amd.com>
Cc: Kenneth Feng <kenneth.feng@amd.com>
Cc: Roman Li <Roman.Li@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion()

The function no longer signals the fence so rename it to
better match what it does.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix type for wptr in ring backup

Needs to be a u64.

Fixes: 77cc0da39c7c ("drm/amdgpu: track ring state associated with a fence")
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: mark invalid records with U64_MAX

set retired_page of invalid ras records to U64_MAX, and skip
them when reading ras records

Signed-off-by: Gangliang Xie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Avoid excessive dmesg log

KIQ access is not guaranteed to work reliably under all reset
situations. Avoid flooding dmesg with HDP flush failure messages.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix validating flush_gpu_tlb_pasid()

When a function holds a lock and we return without unlocking it,
it deadlocks the kernel. We should always unlock before returning.

This commit fixes suspend/resume on SI.
Tested on two Tahiti GPUs: FirePro W9000 and R9 280X.

Fixes: f4db9913e4d3 ("drm/amdgpu: validate the flush_gpu_tlb_pasid()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Closes: https://lore.kernel.org/r/202601190121.z9C0uml5-lkp@intel.com/
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: simplify svm_range_unmap_from_gpus()

The function calls bitmap_or() followed by for_each_set_bit().
Switch it to the dedicated for_each_or_bit() and drop the temporary
bitmap.

Signed-off-by: Yury Norov <ynorov@nvidia.com>
Signed-off-by: Felix Kuehling <felix.kuehling@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Do not include VGPR MSBs in saved PC during save

The current trap handler uses the top bits of ttmp1 to store a copy of
sq_wave_mode.*vgpr_msb (except for src2_vgpr_msb). This is so the
effective values in sq_wave_mode can be cleared to ensure correct
behavior of the trap handler.

When saving sq_wave_mode, the trap handler correctly rebuilds the
expected value (with *vgpr_msb restored), so the save area is correct.
However, the PC itself is copied from ttmp[0:1], which contains the
wave's PC as well as the saved MSBs.

The debugger reads the PC from the save area and is confused when non-0
values from VGPR_MSBs are present.

This patch fixes this by saving the PC in the save area's PC slot, not
the composite of the PC and VGPR_MSBs. On restore, the VGPR_MSBs are
restored from sq_wave_mode.

Signed-off-by: Lancelot Six <lancelot.six@amd.com>
Tested-by: Alexey Kondratiev <Alexey.Kondratiev@amd.com>
Reviewed-by: Jay Cornwall <jay.cornwall@amd.com>
Cc: Vladimir Indic <vladimir.indic@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Correct comment above power2_cap attributes

Previously only Van Gogh supported this, but that is not true
anymore since:
commit 12c958d1db36 ("drm/amd/pm: Expose ppt1 limit for gc_v9_5_0")

Update the comment to reflect that.

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Return -EOPNOTSUPP when can't read power limit

So that hwmon_attributes_visible() will see that the power2_cap
attributes should not be visible on GPUs that don't support
the get_power_limit() function.

This fixes an error when running the "sensors" command on SI.

Fixes: 12c958d1db36 ("drm/amd/pm: Expose ppt1 limit for gc_v9_5_0")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Workaround SI powertune issue on Radeon 430 (v2)

Radeon 430 and 520 are OEM GPUs from 2016~2017
They have the same device id: 0x6611 and revision: 0x87

On the Radeon 430, powertune is buggy and throttles the GPU,
never allowing it to reach its maximum SCLK. Work around this
bug by raising the TDP limits we program to the SMC from
24W (specified by the VBIOS on Radeon 430) to 32W.

Disabling powertune entirely is not a viable workaround,
because it causes the Radeon 520 to heat up above 100 C,
which I prefer to avoid.

Additionally, revise the maximum SCLK limit. Considering the
above issue, these GPUs never reached a high SCLK on Linux,
and the workarounds were added before the GPUs were released,
so the workaround likely didn't target these specifically.
Use 780 MHz (the maximum SCLK according to the VBIOS on the
Radeon 430). Note that the Radeon 520 VBIOS has a higher
maximum SCLK: 905 MHz, but in practice it doesn't seem to
perform better with the higher clock, only heats up more.

v2:
Move the workaround to si_populate_smc_tdp_limits.

Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Don't clear SI SMC table when setting power limit

There is no reason to clear the SMC table.
We also don't need to recalculate the power limit then.

Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: gfx12.1 trap handler support for expert scheduling mode

- Leave DEP_MODE unchanged as it is ignored in the trap handler
- Save/restore SCHED_MODE (gfx12.0 saves in ttmp11)

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Cc: Vladimir Indic <vladimir.indic@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: gfx12.1 cluster barrier context save workaround

Trap cluster barrier may not serialize with user cluster barrier
under some circumstances. Add a check for pending user cluster
barrier complete.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Tested-by: Gang Ba <Gang.Ba@amd.com>
Cc: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Cc: Vladimir Indic <vladimir.indic@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fix scalar load ordering in gfx12.1 trap handler

Scalar loads may arrive out-of-order with respect to KMCNT.
The affected code expects the two loads to arrive in-order.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Cc: Joseph Greathouse <joseph.greathouse@amd.com>
Cc: Vladimir Indic <vladimir.indic@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Sync trap handler binary with source

Binary and source desynced during branch activity. Source merge
also introduced compile error.

Signed-off-by: Jay Cornwall <jay.cornwall@amd.com>
Reviewed-by: Lancelot Six <lancelot.six@amd.com>
Cc: Vladimir Indic <vladimir.indic@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn5.0.1: rework reset handling

Resetting VCN resets the entire tile, including jpeg.
When resetting the VCN, we need to ensure that JPEG data blocks are accessible and we also need to handle the JPEG queue.
Add a helper function to restore the JPEG queue during the VCN reset.

v2: split the jpeg helper in two, in the top helper we can stop the sched workqueues and attempt to wait for any outstanding fences.
Then in the bottom helper, we can force completion, re-init the rings, and restart the sched workqueues (Alex)

v3: merge patches 4 and 5 into one patch (Alex)

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn4.0.3: rework reset handling

Resetting VCN resets the entire tile, including jpeg.
When resetting the VCN, we need to ensure that JPEG data blocks are accessible and we also need to handle the JPEG queue.
Add a helper function to restore the JPEG queue during the VCN reset.

v2: split the jpeg helper in two, in the top helper we can stop the sched workqueues and attempt to wait for any outstanding fences.
Then in the bottom helper, we can force completion, re-init the rings, and restart the sched workqueues (Alex)

v3: merge patches 1 and 2 into one patch (Alex)

Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn4.0.3: implement DPG pause mode handling for VCN 4.0.3

For MI projects, when Dynamic Power Gating (DPG) is enabled,
VCN reset operations should be performed with DPG in pause mode.
Otherwise, the hardware may perform undesirable reset operations

Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Fix si_dpm mmCG_THERMAL_INT setting

Use WREG32 to write mmCG_THERMAL_INT.
This is a direct access register.

Fixes: 841686df9f7d ("drm/amdgpu: add SI DPM support (v4)")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/bridge: fix kdoc syntax

Use the correct kdoc syntax for bullet list.

Fixes kdoc error and warning:

Documentation/gpu/drm-kms-helpers:197: ./drivers/gpu/drm/drm_bridge.c:1519: ERROR: Unexpected indentation. [docutils]
Documentation/gpu/drm-kms-helpers:197: ./drivers/gpu/drm/drm_bridge.c:1521: WARNING: Block quote ends without a blank line; unexpected unindent. [docutils]

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202512302319.1PGGt3CN-lkp@intel.com/
Fixes: 9da0e06abda8 ("drm/bridge: deprecate of_drm_find_bridge()")
Reviewed-by: Nicolas Frattaroli <nicolas.frattaroli@collabora.com>
Link: https://patch.msgid.link/20251231-drm-bridge-alloc-getput-drm_of_find_bridge-kdoc-fix-v1-1-193a03f0609c@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm: rcar-du: lvds: convert to of_drm_find_and_get_bridge()

of_drm_find_bridge() is deprecated. Move to its replacement
of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it
is put when done.

Since the companion bridge pointer is used by .atomic_enable, putting its
reference in the remove function would be dangerous. Use .destroy to put it
on final deallocation.

Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-6-8d7a3dbacdf4@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/exynos: hdmi: convert to of_drm_find_and_get_bridge()

of_drm_find_bridge() is deprecated. Move to its replacement
of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it
is put when done.

Tested-by: Marek Szyprowski <m.szyprowski@samsung.com>
Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-5-8d7a3dbacdf4@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/mediatek: mtk_hdmi*: convert to of_drm_find_and_get_bridge()

of_drm_find_bridge() is deprecated. Move to its replacement
of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it
is put when done by using the drm_bridge::next_bridge pointer.

Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-4-8d7a3dbacdf4@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/imx/dw-hdmi: convert to of_drm_find_and_get_bridge()

of_drm_find_bridge() is deprecated. Move to its replacement
of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it
is put when done.

Acked-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de>
Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-3-8d7a3dbacdf4@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/meson/dw-hdmi: convert to of_drm_find_and_get_bridge()

of_drm_find_bridge() is deprecated. Move to its replacement
of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it
is put when done.

dw_hdmi->bridge is used only in dw_hdmi_top_thread_irq(), so in order to
avoid potential use-after-free ensure the irq is freed before putting the
dw_hdmi->bridge reference.

Acked-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com>
Acked-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-2-8d7a3dbacdf4@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

drm/bridge: dw-hdmi: convert to of_drm_find_and_get_bridge()

of_drm_find_bridge() is deprecated. Move to its replacement
of_drm_find_and_get_bridge() which gets a bridge reference, and ensure it
is put when done by using the drm_bridge::next_bridge pointer.

Acked-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20260109-drm-bridge-alloc-getput-drm_of_find_bridge-3-v2-1-8d7a3dbacdf4@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>

dma-buf: add some tracepoints to debug.

Since we can only inspect dmabuf by iterating over process FDs or the
dmabuf_list, we need to add our own tracepoints to track its status in
real time in production.

For example:
   binder:3016_1-3102    [006] ...1.   255.126521: dma_buf_export: exp_name=qcom,system size=12685312 ino=2738
   binder:3016_1-3102    [006] ...1.   255.126528: dma_buf_fd: exp_name=qcom,system size=12685312 ino=2738 fd=8
   binder:3016_1-3102    [006] ...1.   255.126642: dma_buf_mmap_internal: exp_name=qcom,system size=28672 ino=2739
     kworker/6:1-86      [006] ...1.   255.127194: dma_buf_put: exp_name=qcom,system size=12685312 ino=2738
    RenderThread-9293    [006] ...1.   316.618179: dma_buf_get: exp_name=qcom,system size=12771328 ino=2762 fd=176
    RenderThread-9293    [006] ...1.   316.618195: dma_buf_dynamic_attach: exp_name=qcom,system size=12771328 ino=2762 attachment:ffffff880a18dd00 is_dynamic=0 dev_name=kgsl-3d0
    RenderThread-9293    [006] ...1.   318.878220: dma_buf_detach: exp_name=qcom,system size=12771328 ino=2762 attachment:ffffff880a18dd00 is_dynamic=0 dev_name=kgsl-3d0

Signed-off-by: Xiang Gao <gaoxiang17@xiaomi.com>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260109115411.115270-1-gxxa03070307@gmail.com

drm/bridge: simple: add the Algoltek AG6311 DP-to-HDMI bridge

The Algoltek AG6311 is a transparent DisplayPort to HDMI bridge.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Val Packett <val@packett.cool>
Link: https://patch.msgid.link/20260120234029.419825-8-val@packett.cool
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

dt-bindings: display: bridge: simple: document the Algoltek AG6311 DP-to-HDMI bridge

The Algoltek AG6311 is a transparent DisplayPort to HDMI bridge.

Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Acked-by: Rob Herring (Arm) <robh@kernel.org>
Signed-off-by: Val Packett <val@packett.cool>
Link: https://patch.msgid.link/20260120234029.419825-7-val@packett.cool
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

dt-bindings: vendor-prefixes: Add AlgolTek

AlgolTek is a Taiwanese chip manufacturer specialized in high-speed
signal and power transmission and conversion.

Signed-off-by: Val Packett <val@packett.cool>
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org>
Link: https://patch.msgid.link/20260120234029.419825-4-val@packett.cool
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

drm/bridge: anx7625: Fix invalid EDID size

DRM checks EDID block count against allocated size in drm_edid_valid
function. We have to allocate the right EDID size instead of the max
size to prevent the EDID to be reported as invalid.

Cc: stable@kernel.org
Fixes: 7c585f9a71aa ("drm/bridge: anx7625: use struct drm_edid more")
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Loic Poulain <loic.poulain@oss.qualcomm.com>
Link: https://patch.msgid.link/20251218151307.95491-1-loic.poulain@oss.qualcomm.com
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>

drm/tests/drm_buddy: Add tests for allocations exceeding max_order

Add kunit tests that exercise edge cases where allocation requests
exceed mm->max_order after rounding. This can happen with
non-power-of-two VRAM sizes when the allocator rounds up requests.

For example, with 10G VRAM (8G + 2G roots), mm->max_order represents
the 8G block. A 9G allocation can round up to 16G in multiple ways:
CONTIGUOUS allocation rounds to next power-of-two, or non-CONTIGUOUS
with 8G min_block_size rounds to next alignment boundary.

The test validates CONTIGUOUS and RANGE flag combinations, ensuring that
only CONTIGUOUS-alone allocations use try_harder fallback, while other
combinations return -EINVAL when rounded size exceeds memory, preventing
BUG_ON assertions.

Cc: Christian König <christian.koenig@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patch.msgid.link/20260108113227.2101872-6-sanjay.kumar.yadav@intel.com

drm/buddy: Prevent BUG_ON by validating rounded allocation

When DRM_BUDDY_CONTIGUOUS_ALLOCATION is set, the requested size is
rounded up to the next power-of-two via roundup_pow_of_two().
Similarly, for non-contiguous allocations with large min_block_size,
the size is aligned up via round_up(). Both operations can produce a
rounded size that exceeds mm->size, which later triggers
BUG_ON(order > mm->max_order).

Example scenarios:
- 9G CONTIGUOUS allocation on 10G VRAM memory:
roundup_pow_of_two(9G) = 16G > 10G
- 9G allocation with 8G min_block_size on 10G VRAM memory:
round_up(9G, 8G) = 16G > 10G

Fix this by checking the rounded size against mm->size. For
non-contiguous or range allocations where size > mm->size is invalid,
return -EINVAL immediately. For contiguous allocations without range
restrictions, allow the request to fall through to the existing
__alloc_contig_try_harder() fallback.

This ensures invalid user input returns an error or uses the fallback
path instead of hitting BUG_ON.

v2: (Matt A)
- Add Fixes, Cc stable, and Closes tags for context

Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6712
Fixes: 0a1844bf0b53 ("drm/buddy: Improve contiguous memory allocation")
Cc: <stable@vger.kernel.org> # v6.7+
Cc: Christian König <christian.koenig@amd.com>
Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Suggested-by: Matthew Auld <matthew.auld@intel.com>
Signed-off-by: Sanjay Yadav <sanjay.kumar.yadav@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com>
Link: https://patch.msgid.link/20260108113227.2101872-5-sanjay.kumar.yadav@intel.com

drm/atmel-hlcdc: don't reject the commit if the src rect has fractional parts

Don’t reject the commit when the source rectangle has fractional parts.
This can occur due to scaling: drm_atomic_helper_check_plane_state() calls
drm_rect_clip_scaled(), which may introduce fractional parts while
computing the clipped source rectangle. This does not imply the commit is
invalid, so we should accept it instead of discarding it.

Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Reviewed-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Link: https://patch.msgid.link/20251120-lcd_scaling_fix-v1-1-5ffc98557923@microchip.com
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>

drm/atmel-hlcdc: fix use-after-free of drm_crtc_commit after release

The atmel_hlcdc_plane_atomic_duplicate_state() callback was copying
the atmel_hlcdc_plane state structure without properly duplicating the
drm_plane_state. In particular, state->commit remained set to the old
state commit, which can lead to a use-after-free in the next
drm_atomic_commit() call.

Fix this by calling
__drm_atomic_helper_duplicate_plane_state(), which correctly clones
the base drm_plane_state (including the ->commit pointer).

It has been seen when closing and re-opening the device node while
another DRM client (e.g. fbdev) is still attached:

=============================================================================
BUG kmalloc-64 (Not tainted): Poison overwritten
-----------------------------------------------------------------------------

0xc611b344-0xc611b344 @offset=836. First byte 0x6a instead of 0x6b
FIX kmalloc-64: Restoring Poison 0xc611b344-0xc611b344=0x6b
Allocated in drm_atomic_helper_setup_commit+0x1e8/0x7bc age=178 cpu=0
pid=29
drm_atomic_helper_setup_commit+0x1e8/0x7bc
drm_atomic_helper_commit+0x3c/0x15c
drm_atomic_commit+0xc0/0xf4
drm_framebuffer_remove+0x4cc/0x5a8
drm_mode_rmfb_work_fn+0x6c/0x80
process_one_work+0x12c/0x2cc
worker_thread+0x2a8/0x400
kthread+0xc0/0xdc
ret_from_fork+0x14/0x28
Freed in drm_atomic_helper_commit_hw_done+0x100/0x150 age=8 cpu=0
pid=169
drm_atomic_helper_commit_hw_done+0x100/0x150
drm_atomic_helper_commit_tail+0x64/0x8c
commit_tail+0x168/0x18c
drm_atomic_helper_commit+0x138/0x15c
drm_atomic_commit+0xc0/0xf4
drm_atomic_helper_set_config+0x84/0xb8
drm_mode_setcrtc+0x32c/0x810
drm_ioctl+0x20c/0x488
sys_ioctl+0x14c/0xc20
ret_fast_syscall+0x0/0x54
Slab 0xef8bc360 objects=21 used=16 fp=0xc611b7c0
flags=0x200(workingset|zone=0)
Object 0xc611b340 @offset=832 fp=0xc611b7c0

Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Reviewed-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Link: https://patch.msgid.link/20251024-lcd_fixes_mainlining-v1-2-79b615130dc3@microchip.com
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>

drm/atmel-hlcdc: fix memory leak from the atomic_destroy_state callback

After several commits, the slab memory increases. Some drm_crtc_commit
objects are not freed. The atomic_destroy_state callback only put the
framebuffer. Use the __drm_atomic_helper_plane_destroy_state() function
to put all the objects that are no longer needed.

It has been seen after hours of usage of a graphics application or using
kmemleak:

unreferenced object 0xc63a6580 (size 64):
  comm "egt_basic", pid 171, jiffies 4294940784
  hex dump (first 32 bytes):
    40 50 34 c5 01 00 00 00 ff ff ff ff 8c 65 3a c6  @P4..........e:.
    8c 65 3a c6 ff ff ff ff 98 65 3a c6 98 65 3a c6  .e:......e:..e:.
  backtrace (crc c25aa925):
    kmemleak_alloc+0x34/0x3c
    __kmalloc_cache_noprof+0x150/0x1a4
    drm_atomic_helper_setup_commit+0x1e8/0x7bc
    drm_atomic_helper_commit+0x3c/0x15c
    drm_atomic_commit+0xc0/0xf4
    drm_atomic_helper_set_config+0x84/0xb8
    drm_mode_setcrtc+0x32c/0x810
    drm_ioctl+0x20c/0x488
    sys_ioctl+0x14c/0xc20
    ret_fast_syscall+0x0/0x54

Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com>
Reviewed-by: Manikandan Muralidharan <manikandan.m@microchip.com>
Link: https://patch.msgid.link/20251024-lcd_fixes_mainlining-v1-1-79b615130dc3@microchip.com
Signed-off-by: Manikandan Muralidharan <manikandan.m@microchip.com>

drm/amd/pm: Fix unneeded semicolon warning

Fix the warning reported.
drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_12_ppt.c:909:3-4: Unneeded semicolon

Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202601182157.r1AfndME-lkp@intel.com/
Fixes: b480f573a8ab ("drm/amd/pm: Use gpu metrics 1.9 for SMUv13.0.12")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: convert VCE logging to drm_* helpers

Replace legacy DRM_INFO() logging in the VCE code with drm_info()
helper that takes a struct drm_device.

This provides proper device context in dmesg on multi-GPU systems and
aligns the radeon driver with current DRM logging practices.

Note that this change logs firmware version information at info level
and does not indicate a failure.

Signed-off-by: Mukesh Ogare <mukeshogare871@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Only poll analog connectors

Analog connectors may be hot-plugged unlike other connector
types that don't support HPD.

Stop DRM from polling other connector types that don't
support HPD, such as eDP, LVDS, etc. These were wrongly
polled when analog connector support was added,
causing issues with the seamless boot process.

Fixes: c4f3f114e73c ("drm/amd/display: Poll analog connectors (v3)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: fix misspelling of "minimum" in dc_dsc.c

Fix the typo "miniumum" → "minimum" in a comment in dc_dsc.c.

This typo is already listed in scripts/spelling.txt by commit
8c3200265787 ("scripts/spelling.txt: add several more common spelling
mistakes").

Suggested-by: Cryolitia PukNgae <cryolitia@uniontech.com>
Signed-off-by: Runrun Liu <liurunrun@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.366

This version brings along the following updates:

- Update memory QoS measurement interface.
- Panel inst for monitors.
- Disable FEC when powering down encoders.
- Detect panel type from VSDB.
- Check NULL before accessing a variable.
- Initialize a default to phyd32clk.
- Revert "init dispclk from bootup clock".
- Add IPS residency info to debugfs.
- Ensure link output is disabled in backend reset for PLL_ON.
- Remove unused code.
- Add DMU crash recovery callback to DM.
- Remove coverity comments.

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add new job ids

Use this for gfx, sdma, vpe IB tests and kernel shaders.
The end goal it to get rid of the direct IB submit without a
job structure.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix error handling in ib_schedule()

If fence emit fails, free the fence if necessary.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove coverity comments and fix spaces

[WHAT]
Coverity annotation is useless and thus is removed along with other fixes
for spacing errors.

Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Matthew Stewart <Matthew.Stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add DMU crash recovery callback to DM

[Why]
To improve reliability of the system in the case of infrequent or
potentially recoverable hangs when the DMU times out.

[How]
Attempt to recover the ASIC when DMU hangs by triggering a crash
recovery callback for the DM to forward to the base driver.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove unused code

[why]
two sets of dccg_init:
one in dc/dccg, one in dc/hwss.
remove hwss's dccg_init for asics not use it.

Reviewed-by: Chris Park <chris.park@amd.com>
Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Ilya Bakoulin <Ilya.Bakoulin@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Ensure link output is disabled in backend reset for PLL_ON

[Why]
We're missing the code to actually disable the link output when we have
to leave the SYMCLK_ON but the TX remains OFF.

[How]
Port the code from DCN401 that detects SYMCLK_ON_TX_OFF and disable
the link output when the backend is reset.

Reviewed-by: Ovidiu (Ovi) Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add IPS residency info to debugfs

[Why]
For debugging and testing purpose

[How]
Usage:
- echo 1 > /sys/kernel/debug/dri/0/amdgpu_dm_ips_residency_cntl
- echo 0 > /sys/kernel/debug/dri/0/amdgpu_dm_ips_residency_cntl
- cat /sys/kernel/debug/dri/0/amdgpu_dm_ips_residency

Reviewed-by: ChiaHsuan (Tom) Chung <chiahsuan.chung@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Revert "init dispclk from bootup clock for DCN315"

[Why&How]
This reverts commit 14bb17cc37e0.
Due to the change, the display shows garbage on startup.

We have an alternative solution for the original issue:
d24203bb629f ("drm/amd/display: Re-check seamless boot can be enabled or not")

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Wang, Sung-huai <Danny.Wang@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Revert "init dispclk from bootup clock for DCN314"

[Why&How]
This reverts commit f082daf08f2f.
Due to the change, the display shows garbage on startup.

We have an alternative solution for the original issue:
d24203bb629f ("drm/amd/display: Re-check seamless boot can be enabled or not")

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Wang, Sung-huai <Danny.Wang@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Initialize a default to phyd32clk

[WHAT]
phyd32clk may not be assigned before used. Give it a value based
on the default from get_phyd32clk_src().

This is reported as an UNINIT error by Coverity.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Check NULL before accessing a variable

[WHAT]
Check NULL before accessing link, not after.

This is reported as a REVERSE_INULL error by Coverity.

Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Detect panel type from VSDB

[Why]
The AMD VSDB contains two bits that indicate the type of panel connected.
This can be useful for policy decisions based upon panel technology.

[How]
Read the bits for the panel type when parsing VSDB and store them in
the dc_link.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Disable FEC when powering down encoders

[why & how]
VBIOS DMCUB FW can enable FEC for capable eDPs, but S/W DC state is
only updated for link0 when transitioning into OS with driver loaded.
This causes issues when the eDP is immediately hidden and DIG0 is
assigned to another link that does not support FEC. Driver will
attempt to disable FEC but FEC enablement occurs based on the link
state, which does not have fec_state updated since it is a different
link. Thus, FEC disablement on DIG0 will get skipped and cause no
light up.

Reviewed-by: Karen Chen <karen.chen@amd.com>
Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: panel inst for monitors

[WHY]
To find proper panel replay panel inst for external monitors.

[HOW]
Use otg index as panel replay panel inst.
Both Replay features use the same logic to get panel inst.
Correct the bug that would overwrite panel inst in cmd

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Matthew Stewart <matthew.stewart2@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>