git.ipfire.org Git - thirdparty/kernel/linux.git/log

drm/amdgpu: Rework MES initialization on GFX 12.1

Currently, only SPX mode works on GFX 12.1. This patch reworks
the MES initialization to get other non-SPX modes working. For example,
for CPX mode, coop_enable bit needs to be set to 0. The shared command
buffer initialization is also not needed in CPX mode.
The shared command buffer initialization needs further improvements which
will be handled in later patches.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Michael Chen <michael.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Use correct MES pipe in non-SPX mode on GFX 12.1

On GFX 12.1, use the correct MES pipe instance for readiness before
sending MES commands on that pipe. Additionally, send the TLB requests
on the correct MES pipe in non-SPX modes.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Sierra <alex.sierra@amd.com>
Reviewed-by: Michael Chen <michael.chen@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: adjust xcc_id program logic for sdma v7_1

Adjust program logic for sdam v7_1, only use physical xcc_id
when program register to support compute partition.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: adjust xcc logic for gfxhub v12_1

Adjust xcc_id logic to only use physical xcc_id when program
register, (use logic xcc_id by default), to fit for compute
partition.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: adjust xcc_cp_resume function for gfx_v12_1

Adjust gfx_v12_1_xcc_cp_resume function to program
cp resume per xcc_id (logic xcc number) to fix for
xcp_resume.
V2: Allocate compute microcode bo when sw init

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Add SDMA queue quantum support for GFX12.1

program SDMAx_QUEUEx_SCHEDULE_CNTL for context switch due to
quantum in KFD for GFX12.1

Signed-off-by: Gang Ba <Gang.Ba@amd.com>
Reviewed-by: Amber Lin <Amber.Lin@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Set SDMA_QUEUEx_IB_CNTL/SWITCH_INSIDE_IB

    When submitting MQD to CP, set SDMA_QUEUEx_IB_CNTL/SWITCH_INSIDE_IB bit
    so it'll allow SDMA preemption if there is a massive command buffer of
    long-running SDMA commands.

Signed-off-by: Gang Ba <Gang.Ba@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: disable burst for gfx v12_1

Disable burst in GL1A and GLARBA for gfx v12_1.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Setup Retry based thrashing prevention on GFX 12.1

Enable the new UTCL0 retry-based thrashing prevention on GFX 12.1.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Program IH_VMID_LUT_INDEX register on GFX 12.1

For querying VMID <-> PASID mapping on GFX 12.1, we need to first
program the IH_VMID_LUT_INDEX before fetching the LUT mapping. Without
this TLB flush may not work.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Michael Chen <michael.chen@amd.com>
Reviewed-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Support physical address convert

Support physical address convert to current NPS
pages in uniras.

Signed-off-by: Jinzhou Su <jinzhou.su@amd.com>
Reviewed-by: YiPeng Chai <YiPeng.Chai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx_v12_1: add mqd_stride_size input parameter

mqd_stride_size is used to calculate the next mqd offset
for cooperative dispatch.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fix a couple of spelling mistakes

There are a couple of spelling mistakes, one in a pr_warn message
and one in a seq_printf message. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Describe @AMD_IP_BLOCK_TYPE_RAS in amd_ip_block_type enum

Sphinx reports kernel-doc warning:

WARNING: ./drivers/gpu/drm/amd/include/amd_shared.h:113 Enum value 'AMD_IP_BLOCK_TYPE_RAS' not described in enum 'amd_ip_block_type'

Describe the value to fix it.

Fixes: 7169e706c82d ("drm/amdgpu: Add ras module ip block to amdgpu discovery")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Don't use kernel-doc comment in dc_register_software_state struct

Sphinx reports kernel-doc warning:

WARNING: ./drivers/gpu/drm/amd/display/dc/dc.h:2796 This comment starts with '/**', but isn't a kernel-doc comment. Refer to Documentation/doc-guide/kernel-doc.rst
* Software state variables used to program register fields across the display pipeline

Don't use kernel-doc comment syntax to fix it.

Fixes: b0ff344fe70c ("drm/amd/display: Add interface to capture expected HW state from SW state")
Signed-off-by: Bagas Sanjaya <bagasdotme@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Reduce number of arguments of dcn30's CalculateWatermarksAndDRAMSpeedChangeSupport()

CalculateWatermarksAndDRAMSpeedChangeSupport() has a large number of
parameters, which must be passed on the stack. Most of the parameters
between the two callsites are the same, so they can be accessed through
the existing mode_lib pointer, instead of being passed as explicit
arguments. Doing this reduces the stack size of
dml30_ModeSupportAndSystemConfigurationFull() from 1912 bytes to 1840
bytes building for x86_64 with clang-22, helping stay under the 2048
byte limit for display_mode_vba_30.c.

Additionally, now that there is a pointer to mode_lib->vba available,
use 'v' consistently throughout the entire function.

Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Reduce number of arguments of dcn30's CalculatePrefetchSchedule()

After an innocuous optimization change in clang-22,
dml30_ModeSupportAndSystemConfigurationFull() is over the 2048 byte
stack limit for display_mode_vba_30.c.

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_mode_vba_30.c:3529:6: warning: stack frame size (2096) exceeds limit (2048) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
   3529 | void dml30_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
        |      ^

With clang-21, this function was already close to the limit:

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn30/display_mode_vba_30.c:3529:6: warning: stack frame size (1912) exceeds limit (1586) in 'dml30_ModeSupportAndSystemConfigurationFull' [-Wframe-larger-than]
   3529 | void dml30_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
        |      ^

CalculatePrefetchSchedule() has a large number of parameters, which must
be passed on the stack. Most of the parameters between the two callsites
are the same, so they can be accessed through the existing mode_lib
pointer, instead of being passed as explicit arguments. Doing this
reduces the stack size of dml30_ModeSupportAndSystemConfigurationFull()
from 2096 bytes to 1912 bytes with clang-22.

Closes: https://github.com/ClangBuiltLinux/linux/issues/2117
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Apply e4479aecf658 to dml

After an innocuous optimization change in clang-22, allmodconfig (which
enables CONFIG_KASAN and CONFIG_WERROR) breaks with:

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:1724:6: error: stack frame size (3144) exceeds limit (3072) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
   1724 | void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
        |      ^

With clang-21, this function was already pretty close to the existing
limit of 3072 bytes.

  drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:1724:6: error: stack frame size (2904) exceeds limit (2048) in 'dml32_ModeSupportAndSystemConfigurationFull' [-Werror,-Wframe-larger-than]
   1724 | void dml32_ModeSupportAndSystemConfigurationFull(struct display_mode_lib *mode_lib)
        |      ^

A similar situation occurred in dml2, which was resolved by
commit e4479aecf658 ("drm/amd/display: Increase sanitizer frame larger
than limit when compile testing with clang") by increasing the limit for
clang when compile testing with certain sanitizer enabled, so that
allmodconfig (an easy testing target) continues to work.

Apply that same change to the dml folder to clear up the warning for
allmodconfig, unbreaking the build.

Closes: https://github.com/ClangBuiltLinux/linux/issues/2135
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon : Use devm_i2c_add_adapter instead of i2c_add_adapter

Replace i2c_add_adapter() with devm_i2c_add_adapter() and remove all
associated cleanup, as devm_i2c_add_adapter() handles adapter teardown
automatically.

Signed-off-by: Erick Karanja <karanja99erick@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Update AMDGPU_INFO_UQ_FW_AREAS query for sdma

Add a query for sdma queues. Userspace can use this to
query the size of the CSA buffers for sdma user queues.

Proposed userspace:
https://gitlab.freedesktop.org/yogeshmohan/mesa/-/commits/userq_query

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Update AMDGPU_INFO_UQ_FW_AREAS query for compute

Add a query for compute queues. Userspace can use this to
query the size of the EOP buffers for compute user queues.

Proposed userspace:
https://gitlab.freedesktop.org/yogeshmohan/mesa/-/commits/userq_query

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: Convert legacy DRM logging in evergreen.c to drm_* helpers

Replace DRM_DEBUG(), DRM_ERROR(), and DRM_INFO() calls with the
corresponding drm_dbg(), drm_err(), and drm_info() helpers in the
radeon driver.

The drm_*() logging helpers take a struct drm_device * argument,
allowing the DRM core to prefix log messages with the correct device
name and instance. This is required to correctly distinguish log
messages on systems with multiple GPUs.

This change aligns radeon with the DRM TODO item:
"Convert logging to drm_* functions with drm_device parameter".

Signed-off-by: Abhishek Rajput <abhiraj21put@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add gfx v12_1 interrupt source header

To acommandate specific interrupt source for gfx v12_1

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Override KFD SVM mappings for GFX 12.1

Override the local MTYPE mappings in KFD SVM code with mtype_local
modprobe param for GFX 12.1.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Alex Sierra <alex.sierra@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: correct rlc autoload for xcc harvest

If the number instances of firmware is RLC_NUM_INS_CODE0(Only 1 inst),
need to copy it directly for rlcautolad.
For the firmware which instances number bigger than 1, only copy for
enabled XCC to save copy time.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add gfx sysfs support for gfx_v12_1

Add gfx sysfs support for gfx_v12_1.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/mes_v12_1: fix mes access xcd register

Fix to use local register offset inside die for mes fw accessing
local/remote xcd register.

Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: normalize reg addr as local xcc for gfx v12_1

Normalize registers address to local xcc address for gfx v12_1.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: support xcc harvest for ih translate

Support xcc harvest for ih translate to logic xcc.
V2: Only check available instances

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Correct inst_id input from physical to logic

Correct inst_id input from physical to logic for sdma v7_1.
V2: Show real instance number on logic xcc.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: use physical xcc id to get rrmt

Use physical xcc_id to get rrmt on misc_op for mes v12_1.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: Convert logging in radeon_display.c to drm_* helpers

Replace DRM_ERROR() and DRM_INFO() calls in
drivers/gpu/drm/radeon/radeon_display.c with the corresponding
drm_err() and drm_info() helpers.

The drm_*() logging functions take a struct drm_device * argument,
allowing the DRM core to prefix log messages with the correct device
name and instance. This is required to correctly distinguish log
messages on systems with multiple GPUs.

This change aligns radeon with the DRM TODO item:
"Convert logging to drm_* functions with drm_device parameter".

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Mukesh Ogare <mukeshogare871@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Fix improper NULL termination of queue restore SMI event string

Pass character "0" rather than NULL terminator to properly format
queue restoration SMI events. Currently, the NULL terminator precedes
the newline character that is intended to delineate separate events
in the SMI event buffer, which can break userspace parsers.

Signed-off-by: Brian Kocoloski <brian.kocoloski@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Correct xcc_id input to GET_INST from physical to logic

Correct xcc_id input to GET_INST from physical to logic for
gfx_v12_1.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix CP_MEC_MDBASE in multi-xcc for gfx v12_1

Need to allocate memory for MEC FW data and program
registers CP_MEC_MDBASE for each XCC respectively.

Signed-off-by: Michael Chen <michael.chen@amd.com>
Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com>
Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Support 57bit fault address for GFX 12.1.0

The gmc fault virtual address is up to 57bit for 5 level page table,
this also works with 48bit virtual address for 4 level page table.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add pde3 table invalidation request for GFX 12.1.0

Set pde3 invalidation request bit during tlb flush for up to 5 level
page table.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Christian König <christian.koenig@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Update LDS, Scratch base for 57bit address

For 5-level page tables, update compute vmid sh_mem_base LDS aperture
and Scratch aperture base address to above 57-bit, use the same setting
from gfx vmid, we can remove the duplicate macro.

Update queue pdd lds_base and scratch_base to the same value as
sh_mem_base setting. Then application get process apertures return the
correct value to access LDS and Scratch memory for 57bit address 5-level
page tables. This may pass to MES in future when mapping queue.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Enable 5-level page table for GFX 12.1.0

GFX 12.1.0 support 57bit virtual, 52bit physical address, set PDE
max_level to 4, min_vm_size to 128PB to enable GPU vm 5-level page
tables to support 57bit virtual address.

Signed-off-by: Philip Yang <Philip.Yang@amd.com>
Acked-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: init RS64_MEC_P2/P3_STACK for gfx12.1

Add GFX12.1 MEC P2/P3 STACK firmware init.

Signed-off-by: Feifei Xu <Feifei.Xu@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix CU info calculations for GFX 12.1

This patch fixes the CU info calculations for gfx 12.1.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Update CWSR area calculations for GFX 12.1

Update the SGPR, VGPR, HWREG size and number of waves supported
for GFX 12.1 CWSR memory limits. The CU calculation changed in
topology, as a result, the values need to be updated.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Feifei Xu <Feifei.Xu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add soc v1_0 ih client id table

To acommandate the specific ih client for soc v1_0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Flush TLB on all XCCs on GFX 12.1

Currently, the driver code is flushing TLB on XCC 0 only.
Fix it by flushing on all XCCs within the partition.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: restore SCLK settings after S0ix resume

User-configured SCLK(GPU core clock)frequencies were not persisting
across S0ix suspend/resume cycles on smu v14 hardware.
The issue occurred because of the code resetting clock frequency
to zero during resume.

This patch addresses the problem by:
- Preserving user-configured values in driver and sets the
clock frequency across resume
- Preserved settings are sent to the hardware during resume

Signed-off-by: mythilam <mythilam@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: do not use amdgpu_bo_gpu_offset_no_check individually

This should not be used indiviually, use amdgpu_bo_gpu_offset
with bo reserved.

v3 - unpin bo in queue destroy (Christian)
v2 - pin bo so that offset returned won't change after unlock (Christian)

Signed-off-by: Saleemkhan Jamadar <saleemkhan083@gmail.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Change set ip clock/power gating param

It's not required to use generic void *, change to struct amdgpu_device *.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Use helper to get ip block

Replace individual searches with the utility function get_ip_block

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Move ip block related functions

Move ip block related functions to amdgpu_ip.c. No functional change
intended.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix a job->pasid access race in gpu recovery

Avoid a possible UAF in GPU recovery due to a race between
the sched timeout callback and the tdr work queue.

The gpu recovery function calls drm_sched_stop() and
later drm_sched_start().  drm_sched_start() restarts
the tdr queue which will eventually free the job.  If
the tdr queue frees the job before time out callback
completes, the job will be freed and we'll get a UAF
when accessing the pasid.  Cache it early to avoid the
UAF.

Example KASAN trace:
[  493.058141] BUG: KASAN: slab-use-after-free in amdgpu_device_gpu_recover+0x968/0x990 [amdgpu]
[  493.067530] Read of size 4 at addr ffff88b0ce3f794c by task kworker/u128:1/323
[  493.074892]
[  493.076485] CPU: 9 UID: 0 PID: 323 Comm: kworker/u128:1 Tainted: G            E       6.16.0-1289896.2.zuul.bf4f11df81c1410bbe901c4373305a31 #1 PREEMPT(voluntary)
[  493.076493] Tainted: [E]=UNSIGNED_MODULE
[  493.076495] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019
[  493.076500] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched]
[  493.076512] Call Trace:
[  493.076515]  <TASK>
[  493.076518]  dump_stack_lvl+0x64/0x80
[  493.076529]  print_report+0xce/0x630
[  493.076536]  ? _raw_spin_lock_irqsave+0x86/0xd0
[  493.076541]  ? __pfx__raw_spin_lock_irqsave+0x10/0x10
[  493.076545]  ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu]
[  493.077253]  kasan_report+0xb8/0xf0
[  493.077258]  ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu]
[  493.077965]  amdgpu_device_gpu_recover+0x968/0x990 [amdgpu]
[  493.078672]  ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu]
[  493.079378]  ? amdgpu_coredump+0x1fd/0x4c0 [amdgpu]
[  493.080111]  amdgpu_job_timedout+0x642/0x1400 [amdgpu]
[  493.080903]  ? pick_task_fair+0x24e/0x330
[  493.080910]  ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu]
[  493.081702]  ? _raw_spin_lock+0x75/0xc0
[  493.081708]  ? __pfx__raw_spin_lock+0x10/0x10
[  493.081712]  drm_sched_job_timedout+0x1b0/0x4b0 [gpu_sched]
[  493.081721]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  493.081725]  process_one_work+0x679/0xff0
[  493.081732]  worker_thread+0x6ce/0xfd0
[  493.081736]  ? __pfx_worker_thread+0x10/0x10
[  493.081739]  kthread+0x376/0x730
[  493.081744]  ? __pfx_kthread+0x10/0x10
[  493.081748]  ? __pfx__raw_spin_lock_irq+0x10/0x10
[  493.081751]  ? __pfx_kthread+0x10/0x10
[  493.081755]  ret_from_fork+0x247/0x330
[  493.081761]  ? __pfx_kthread+0x10/0x10
[  493.081764]  ret_from_fork_asm+0x1a/0x30
[  493.081771]  </TASK>

Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info")
Link: https://github.com/HansKristian-Work/vkd3d-proton/pull/2670
Cc: SRINIVASAN.SHANMUGAM@amd.com
Cc: vitaly.prosyak@amd.com
Cc: christian.koenig@amd.com
Suggested-by: Matthew Brost <matthew.brost@intel.com>
Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.363

This version brings along the following updates:

- Replay Video Conferencing V2
- Fix scratch registers offsets for DCN35 and DCN351
- Fix DP no audio issue
- Add use_max_lsw parameter
- Fix presentation of Z8 efficiency
- Add USB-C DP Alt Mode lane limitation in DCN32
- Support DRR granularity
- Don't disable DPCD mst_en if sink connected
- Set enable_legacy_fast_update to false for DCN35/351
- Split update_planes_and_stream_v3 into parts (V2)

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: [FW Promotion] Release 0.1.40.0

Summary for changes in firmware:
* Update DCHVM restore sequence for dcn35
* Add 2 new debug polling methods for dchvm "busy" during IPS entry for DCN35

Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Acked-by: Wayne Lin <Wayne.Lin@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Split update_planes_and_stream_v3 into parts (V2)

[Why]
Currently all of the preparation and execution of plane update is done
under a DC lock, blocking other code from accessing DC for longer than
strictly necessary.

[How]
Break the v3 update flow into 3 parts:
    * prepare - locked, calculate update flow and modify DC state
    * execute - unlocked, program hardware
    * cleanup - locked, finalize DC state and free temp resources
Legacy v2 flow too compilicated to break down for now, link new API
with old by executing everything in slightly misnamed prepare stage.

V2:
Keep the new code structure, but point all users back at the old code,
until fully tested.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Dominik Kaszewski <dominik.kaszewski@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: DPP low mem pwr related adjustment -Part I

[why]
Default low pwr mem state get chagned.
SW needs to wake mem up first
also need to put back to LS again after use: will do in Part II.

Reviewed-by: Leo Chen <leo.chen@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Set enable_legacy_fast_update to false for DCN35/351

[Why]
Existing logic will treat color temperature update = full update, cause
user color temp adjustment goes wait for update logic and fsleep in that
cause the adjustment not smooth.

[How]
Let DCN35/351 to follow DCN401 to set default value to false.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Fudong Wang <fudong.wang@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Don't disable DPCD mst_en if sink connected

[WHY]
User may connect mst dock with multi monitors and do quick unplug
and plug in one of the monitor. This operatioin may create CSN from
dock to display driver. Then display driver would disable and then enable
mst link and also disable/enable DPCD mst_en bit in dock RX. However,
when mst_en bit being disabled, if dock has another CSN message to
transmit then the message would be removed because of the disabling of
mst_en. In this case, the message is missing and it ends up no display in
the replugged monitor.

[HOW]
Don't disable mst_en bit when link still has sink connected.

Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Support DRR granularity

[Why&How]
Support DRR granularity for coasting Vtotal calculation

Reviewed-by: Robin Chen <robin.chen@amd.com>
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com>
Signed-off-by: Weiguang Li <wei-guang.li@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add USB-C DP Alt Mode lane limitation in DCN32

[Why]
USB-C DisplayPort Alt Mode with concurrent USB data needs lane count
limitation to prevent incorrect 4-lane DP configuration when only 2 lanes
are available due to hardware lane sharing between DP and USB3.

[How]
Query DMUB for Alt Mode status (is_dp_alt_disable, is_usb, is_dp4) in
dcn32_link_encoder_get_max_link_cap() and cap DP to 2 lanes when USB is
active on USB-C port. Added inline documentation explaining the USB-C
lane sharing constraint.

Reviewed-by: PeiChen Huang <peichen.huang@amd.com>
Signed-off-by: LinCheng Ku <lincheng.ku@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix presentation of Z8 efficiency

[Why/How]
Should differentiate when vblank is or isn't included

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add use_max_lsw parameter

[WHY&HOW]
Add use_max_lsw parameter to make prefetch for linear surfaces similar to
tiled.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: Oleh Kuzhylnyi <okuzhyln@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix DP no audio issue

[why]
need to enable APG_CLOCK_ENABLE enable first
also need to wake up az from D3 before access az block

Reviewed-by: Swapnil Patel <swapnil.patel@amd.com>
Signed-off-by: Charlene Liu <Charlene.Liu@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix scratch registers offsets for DCN351

[Why]
Different platforms use different NBIO header files,
causing display code to use differnt offset and read
wrong accelerated status.

[How]
- Unified NBIO offset header file across platform.
- Correct scratch registers offsets to proper locations.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4667
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix scratch registers offsets for DCN35

[Why]
Different platforms use differnet NBIO header files,
causing display code to use differnt offset and read
wrong accelerated status.

[How]
- Unified NBIO offset header file across platform.
- Correct scratch registers offsets to proper locations.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4667
Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Ray Wu <ray.wu@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Replay Video Conferencing V2

[WHY&HOW]
Add new coasting vtotal type and an union to optimize
the video conference for more power saving.

Reviewed-by: Robin Chen <robin.chen@amd.com>
Signed-off-by: ChunTao Tso <chuntao.tso@amd.com>
Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd: Resume the device in thaw() callback when console suspend is disabled

If console suspend has been disabled using `no_console_suspend` also
wake up during thaw() so that some messages can be seen for debugging.

Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4191
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: allow debug subscription to lds violations on gfx 1250

GFX 1250 allows the debugger to subcribe to LDS out-of-range read/write
memory violations.
Bump IOCTL minor version and flag KFD capabilities for enablement
hint.

Signed-off-by: Jonathan Kim <jonathan.kim@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: enable gpu tlb flush for gfxhub

Enable gpu tlb flush for gfxhub without check gfx.is_poweron
as gfx is power on by default for gfx v12_1 ASIC.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/include : Update MES v12 API header

Add LDS out of range reporting support in mes API

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Jonathan Kim <jonathan.kim@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: flush tlb properly for GMC v12.1 in early phase

Flush tlb properly for GMC v12.1

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Use AMDGPU_IS_GFXHUB to screen out GFXHUB for GMC v12.1

There're multiple gfxhubs on GMC v12.1

Signed-off-by: Le Ma <le.ma@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: only copy ucode for enabled xcc

Only copy ucode for enabled xcc instead of copy for all 8 xcc
for rlc autoload on gfx v12_1 to save time.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix issue when switch NPS1 to NPSX

fix the function execution sequence after removing
kgd2kfd_init_zone_device out of gpu full access region.

Fixes: c71980a3fc1d ("drm/amdgpu: reduce the full gpu access time in amdgpu_device_init.")
Signed-off-by: chong li <chongli2@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix 64-bit state pointer passed as 32-bit GPINT response buffer

edp_pr_get_state() incorrectly casts a uint64_t * to uint32_t * when
calling dc_wake_and_execute_gpint(). The GPINT path writes only 32 bits,
leaving the upper 32 bits of the u64 output uninitialized. Replace the
cast with a u32 temporary and copy the result into the u64 pointer.

Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/dc/link/protocols/link_edp_panel_control.c
    1448 bool edp_pr_get_state(const struct dc_link *link, uint64_t *state)
                                                           ^^^^^^^^^^^^^^^
    1449 {

    ...

    1457         do {
    1458                 // Send gpint command and wait for ack
--> 1459                 if (!dc_wake_and_execute_gpint(dc->ctx, DMUB_GPINT__GET_REPLAY_STATE, panel_inst,
    1460                                                (uint32_t *)state, DM_DMUB_WAIT_TYPE_WAIT_WITH_REPLY)) {
                                                        ^^^^^^^^^^^^^^^^^

The dc_wake_and_execute_gpint() function doesn't take a u64, it takes a
u32.  It tries to initialize the state to zero at the start but that's
not going to work because of the type mismatch.  It suggests that
callers are allowed to pass uninitialized data to edp_pr_get_state() but
at present there are no callers so this is only a bug in the code but
doesn't affect runtime.

    1461                         // Return invalid state when GPINT times out
    1462                         *state = PR_STATE_INVALID;
    1463                 }

Fixes: 74ce00932e7e ("drm/amd/display: Refactor panel replay set dmub cmd flow")
Reported by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Robin Chen <robin.chen@amd.com>
Cc: Jack Chang <jack.chang@amd.com>
Cc: Leon Huang <Leon.Huang1@amd.com>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/include : Update MES v12 comments on RESET API

Added comments for the layout of contents that addressed by doorbell_offset_addr
in RESET API

Signed-off-by: Shaoyun Liu <shaoyun.liu@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Set xcp id for mes ring

Set xcp id for mes ring

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Init partition_mode and xcc_mask for GFX_IMU_PARTITION_SWITCH

Set partition_mode and physical xcc mask fields in
GFX_IMU_PARTITION_SWITCH register

v2: cleanup (Alex)

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Initialize vram_info for gmc v12_1

Initialize vram_info for gmc v12_1

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Init compute partition mode for gfx v12_1

Init compute partition mode for gfx v12_1

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Initialize memory ranges for gmc v12_1

Initialize memory ranges for gmc v12_1

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Initialize memory partition callbacks for gmc v12_1

Initialize memory partition callbacks for gmv v12_1

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: support rlc autoload for muti-xcc

Support rlc autload for muti-xcc on gfx v12_1.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Enable atomics for all the available xcc

Apply TCP_UTCL0_CNTL1 settings to all the available
xcc

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Update MES VM_CNTX_CNTL for XNACK off for GFX 12.1

Currently, we do not turn off retry faults in VM_CONTEXT_CNTL value
when passing it to MES if XNACK is off. This creates a situation where
XNACK is disabled in SQ but enabled in UTCL2, which is not recommended.
As a result, turn off/on retry faults in both SQ and UTCL2 when passing
vm_context_cntl value to MES if XNACK is disabled/enabled.

Suggested-by: Jay Cornwall <jay.cornwall@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Enable per-process XNACK for GFX 12.1.0

GFX 12.1.0 will support enabling/disabling XNACK on a per-
process basis. This change enables the per process XNACK feature.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Enable retry faults for GFX 12.1

Enable retry faults in both GCVM/MMVM Context1 Control
and L2_PROTECTION_FAULT_CNTL2 registers for GFX 12.1.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add IH node-id to XCC mapping

Add a generic function to map IH node-id to XCC instance.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add interrupt handler for GFX 12.1.0

Add a separate interrupt handler for handling interrupts,
both retry and no-retry, for GFX 12.1.0.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add UTCL2 Retry fault interrupt for GFX 12.1

Add the UTCL2 retry fault interrupt for both GCVM and MMVM for GFX 12.1.

Signed-off-by: Mukul Joshi <mukul.joshi@amd.com>
Reviewed-by: Philip Yang <Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/sdma: add query for CSA size and alignment

Needed to query the CSA size and alignment for SDMA
user queues.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix mes packet params issue when flush hdp.

v4:
use func "amdgpu_gfx_get_hdp_flush_mask" to get ref_and_mask for
gfx9 through gfx12.

v3:
Unify the get_ref_and_mask function in amdgpu_gfx_funcs,
to support both GFX11 and earlier generations

v2:
place "get_ref_and_mask" in amdgpu_gfx_funcs instead of amdgpu_ring,
since this function only assigns the cp entry.

v1:
both gfx ring and mes ring use cp0 to flush hdp, cause conflict.

use function get_ref_and_mask to assign the cp entry.
reassign mes to use cp8 instead.

Signed-off-by: chong li <chongli2@amd.com>
Acked-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx: add eop size and alignment to shadow info

This is used by firmware for compute user queues.

Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Add vram_type to ras_ta_init_flags

Add vram_type to ras_ta_init_flags.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: update sdma configuration for soc v1_0

Update SDMA instances/masks according to xcc num for
multi-xcc models on soc v1.0.

Signed-off-by: Likun Gao <Likun.Gao@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Initialize xcp manager for soc v1_0

Initialize xcp manager for soc v1_0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add soc_v1_0_xcp_funcs

Implement xcp mgr callbacks for soc v1_0

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Export sdma_v7_1_xcp_funcs

To be used by soc v1_0 xcp manager

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Export gfx_v12_1_xcp_func

To be used by soc v1_0 xcp manager

Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add vram_type to ras init_flags

Add vram_type to ras init_flags.

Signed-off-by: Candice Li <candice.li@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/ras: Reduce stack usage in amdgpu_virt_ras_get_cper_records()

amdgpu_virt_ras_get_cper_records() was using a large stack array
of ras_log_info pointers. This contributed to the frame size
warning on this function.

Replace the fixed-size stack array:

struct ras_log_info *trace[MAX_RECORD_PER_BATCH];

with a heap-allocated array using kcalloc().

We free the trace buffer together with out_buf on all exit paths.
If allocation of trace or out_buf fails, we return a generic RAS
error code.

This reduces stack usage and keeps the runtime behaviour
unchanged.

Fixes:
stack frame size: 1112 bytes (limit: 1024)

Cc: Tao Zhou <tao.zhou1@amd.com>
Cc: Hawking Zhang <Hawking.Zhang@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>