git.ipfire.org Git - thirdparty/kernel/linux.git/log

drm/amdgpu/gfx11: set MQD as appriopriate for queue types

Set the MQD as appropriate for the kernel vs user queues.

Acked-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix DP audio DTO1 clock source on DCE 6.

On DCE 6, DP audio was not working. However, it worked when an
HDMI monitor was also plugged in.

Looking at dce_aud_wall_dto_setup it seems that the main
difference is that we use DTO1 when only DP is plugged in.

When programming DTO1, it uses audio_dto_source_clock_in_khz
which is set from get_dp_ref_freq_khz

The dce60_get_dp_ref_freq_khz implementation looks incorrect,
because DENTIST_DISPCLK_CNTL seems to be always zero on DCE 6,
so it isn't usable.
I compared dce60_get_dp_ref_freq_khz to the legacy display code,
specifically dce_v6_0_audio_set_dto, and it turns out that in
case of DCE 6, it needs to use the display clock. With that,
DP audio started working on Pitcairn, Oland and Cape Verde.

However, it still didn't work on Tahiti. Despite having the
same DCE version, Tahiti seems to have a different audio device.
After some trial and error I realized that it works with the
default display clock as reported by the VBIOS, not the current
display clock.

The patch was tested on all four SI GPUs:

* Pitcairn (DCE 6.0)
* Oland (DCE 6.4)
* Cape Verde (DCE 6.0)
* Tahiti (DCE 6.0 but different)

The testing was done on Samsung Odyssey G7 LS28BG700EPXEN on
each of the above GPUs, at the following settings:

* 4K 60 Hz
* 1080p 60 Hz
* 1080p 144 Hz

Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: Use vmalloc_array and vcalloc to simplify code

Use vcalloc() and vmalloc_array() to simplify the functions
radeon_gart_init().

vmalloc_array() is also optimized better, resulting in less instructions
being used.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn: Remove unnecessary check

The function amdgpu_vcn_sysfs_reset_mask_init already returns 0, which
makes the check of the result unnecessary in the vcn_v4_0_3_sw_init().
Just return the amdgpu_vcn_sysfs_reset_mask_init directly.

Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix fractional fb divider in set_pixel_clock_v3

For later VBIOS versions, the fractional feedback divider is
calculated as the remainder of dividing the feedback divider by
a factor, which is set to 1000000. For reference, see:
- calculate_fb_and_fractional_fb_divider
- calc_pll_max_vco_construct

However, in case of old VBIOS versions that have
set_pixel_clock_v3, they only have 1 byte available for the
fractional feedback divider, and it's expected to be set to the
remainder from dividing the feedback divider by 10.
For reference see the legacy display code:
- amdgpu_pll_compute
- amdgpu_atombios_crtc_program_pll

This commit fixes set_pixel_clock_v3 by dividing the fractional
feedback divider passed to the function by 100000.

Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Don't print errors for nonexistent connectors

When getting the number of connectors, the VBIOS reports
the number of valid indices, but it doesn't say which indices
are valid, and not every valid index has an actual connector.
If we don't find a connector on an index, that is not an error.

Considering these are not actual errors, don't litter the logs.

Fixes: 60df5628144b ("drm/amd/display: handle invalid connector indices")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Don't warn when missing DCE encoder caps

On some GPUs the VBIOS just doesn't have encoder caps,
or maybe not for every encoder.

This isn't really a problem and it's handled well,
so let's not litter the logs with it.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fill display clock and vblank time in dce110_fill_display_configs

Also needed by DCE 6.
This way the code that gathers this info can be shared between
different DCE versions and doesn't have to be repeated.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Find first CRTC and its line time in dce110_fill_display_configs

dce110_fill_display_configs is shared between DCE 6-11, and
finding the first CRTC and its line time is relevant to DCE 6 too.
Move the code to find it from DCE 11 specific code.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Adjust DCE 8-10 clock, don't overclock by 15%

Adjust the nominal (and performance) clocks for DCE 8-10,
and set them to 625 MHz, which is the value used by the legacy
display code in amdgpu_atombios_get_clock_info.

This was tested with Hawaii, Tonga and Fiji.
These GPUs can output 4K 60Hz (10-bit depth) at 625 MHz.

The extra 15% clock was added as a workaround for a Polaris issue
which uses DCE 11, and should not have been used on DCE 8-10 which
are already hardcoded to the highest possible display clock.
Unfortunately, the extra 15% was mistakenly copied and kept
even on code paths which don't affect Polaris.

This commit fixes that and also adds a check to make sure
not to exceed the maximum DCE 8-10 display clock.

Fixes: 8cd61c313d8b ("drm/amd/display: Raise dispclk value for Polaris")
Fixes: dc88b4a684d2 ("drm/amd/display: make clk mgr soc specific")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Don't overclock DCE 6 by 15%

The extra 15% clock was added as a workaround for a Polaris issue
which uses DCE 11, and should not have been used on DCE 6 which
is already hardcoded to the highest possible display clock.
Unfortunately, the extra 15% was mistakenly copied and kept
even on code paths which don't affect Polaris.

This commit fixes that and also adds a check to make sure
not to exceed the maximum DCE 6 display clock.

Fixes: 8cd61c313d8b ("drm/amd/display: Raise dispclk value for Polaris")
Fixes: dc88b4a684d2 ("drm/amd/display: make clk mgr soc specific")
Fixes: 3ecb3b794e2c ("drm/amd/display: dc/clk_mgr: add support for SI parts (v2)")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Rodrigo Siqueira <siqueira@igalia.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: replace min/max nesting with clamp()

The clamp() macro explicitly expresses the intent of constraining
a value within bounds.Therefore, replacing min(max(a, b), c) with
clamp(val, lo, hi) can improve code readability.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Use swap() to simplify code

Replace the original swapping logic with swap() to improve readability and
remove temporary variables

Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Clean up coding style

Adjust whitespace around operators to improve code readability
and comply with kernel coding style guidelines.

These changes are purely stylistic and introduce no
functional modifications.

Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add null pointer check in mod_hdcp_hdcp1_create_session()

The function mod_hdcp_hdcp1_create_session() calls the function
get_first_active_display(), but does not check its return value.
The return value is a null pointer if the display list is empty.
This will lead to a null pointer dereference.

Add a null pointer check for get_first_active_display() and return
MOD_HDCP_STATUS_DISPLAY_NOT_FOUND if the function return null.

This is similar to the commit c3e9826a2202
("drm/amd/display: Add null pointer check for get_first_active_display()").

Fixes: 2deade5ede56 ("drm/amd/display: Remove hdcp display state with mst fix")
Signed-off-by: Chenyuan Yang <chenyuan0y@gmail.com>
Reviewed-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.346

This version brings along following updates:

- Fix Xorg desktop unresponsive on Replay panel
- [FW Promotion] Release 0.1.23.0
- Avoid a NULL pointer dereference
- Attach privacy screen to DRM connector
- Setup Second Stutter Watermark Implementation
- Align LSDMA commands fields
- Delete unused functions
- Optimize amdgpu_dm_atomic_commit_tail()
- Add primary plane to commits for correct VRR handling
- Refactor DPP enum for backwards compatibility.
- Add LSDMA Linear Sub Window Copy support

Acked-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix Xorg desktop unresponsive on Replay panel

[WHY & HOW]
IPS & self-fresh feature can cause vblank counter resets between
vblank disable and enable.
It may cause system stuck due to wait the vblank counter.

Call the drm_crtc_vblank_restore() during vblank enable to estimate
missed vblanks by using timestamps and update the vblank counter in
DRM.

It can make the vblank counter increase smoothly and resolve this issue.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: [FW Promotion] Release 0.1.23.0

1. Fix loop counter.
2. Check whether rb->capacity is 0.

Acked-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Avoid a NULL pointer dereference

[WHY]
Although unlikely drm_atomic_get_new_connector_state() or
drm_atomic_get_old_connector_state() can return NULL.

[HOW]
Check returns before dereference.

Cc: Mario Limonciello <mario.limonciello@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Attach privacy screen to DRM connector

[WHY]
If a system has a privacy screen advertised by a driver it should
be included in the DRM connector for the eDP panel.

[HOW]
Detect statically declared privacy screens when creating eDP connector
and attach privacy screen DRM properties.

Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Setup Second Stutter Watermark Implementation

[WHY & HOW]
Setup initial changes required to program another set of watermarks
for a 2nd stutter mode. The 2nd stutter mode will be lower power but
have higher enter/exit latencies.

PMFW to choose which stutter mode to use based on stutter efficiences
to see if original stutter (LP1) or low power stutter (LP2) will result
in better power savings.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Austin Zheng <Austin.Zheng@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Align LSDMA commands fields

[WHY]
DC LSDMA functions had to remember to extract 1 from several fields
to be compliant with DMUB LSDMA commands interface.
Now this logic is moved to DMUB.

[HOW]
Moved extraction by 1 in several fields of LSDMA commands to DMUB.
Changed DC to not do it.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Rafal Ostrowski <rostrows@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Delete unused functions

[WHAT]
Removing unused code

Reviewed-by: Joshua Aberback <joshua.aberback@amd.com>
Signed-off-by: Clay King <clayking@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Optimize amdgpu_dm_atomic_commit_tail()

[WHY]
The first two loops of for_each_oldnew_connector_in_state() both operate
on an HDCP queue. If one isn't setup then each connector is iterated but
skipped TWICE. This is wasteful for the majority of cases.

[HOW]
Combine the two HDCP related loops of for_each_oldnew_connector_in_state()
and check for the HDCP workqueue before even running either of them. This
should avoid running the functions in most cases, and if HDCP is setup only
run once.

Reviewed-by: Alex Hung <alex.hung@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Refactor DPP enum for backwards compatibility

[WHY]
Conflict for enum type in DPP source files.

[HOW]
Refactor DPP source files to resolve the enum conflicts.

Reviewed-by: Ilya Bakoulin <ilya.bakoulin@amd.com>
Reviewed-by: Martin Leung <martin.leung@amd.com>
Signed-off-by: Lohita Mudimela <lohita.mudimela@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add LSDMA Linear Sub Window Copy support

[WHAT]
Add support for LSDMA Linear Sub Window Copy command.

Reviewed-by: Alvin Lee <alvin.lee2@amd.com>
Signed-off-by: Rafal Ostrowski <rostrows@amd.com>
Signed-off-by: Alex Hung <alex.hung@amd.com>
Tested-by: Dan Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: refactor bad_page_work for corner case handling

When a poison is consumed on the guest before the guest receives the host's poison creation msg, a corner case may occur to have poison_handler complete processing earlier than it should to cause the guest to hang waiting for the req_bad_pages reply during a VF FLR, resulting in the VM becoming inaccessible in stress tests.

To fix this issue, this patch refactored the mailbox sequence by seperating the bad_page_work into two parts req_bad_pages_work and handle_bad_pages_work.
Old sequence:
  1.Stop data exchange work
  2.Guest sends MB_REQ_RAS_BAD_PAGES to host and keep polling for IDH_RAS_BAD_PAGES_READY
  3.If the IDH_RAS_BAD_PAGES_READY arrives within timeout limit, re-init the data exchange region for updated bad page info
    else timeout with error message
New sequence:
req_bad_pages_work:
  1.Stop data exhange work
  2.Guest sends MB_REQ_RAS_BAD_PAGES to host
Once Guest receives IDH_RAS_BAD_PAGES_READY event
handle_bad_pages_work:
  3.re-init the data exchange region for updated bad page info

Signed-off-by: Chenglei Xie <Chenglei.Xie@amd.com>
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: remove duplicated argument wptr_va

The duplicate judgment of wptr_va could be removed to simplify the logic

Signed-off-by: Qiang Liu <liuqiang@kylinos.cn>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add NULL pointer checks in dc_stream cursor attribute functions

The function dc_stream_set_cursor_attributes() currently dereferences
the `stream` pointer and nested members `stream->ctx->dc->current_state`
without checking for NULL.

All callers of these functions, such as in
`dcn30_apply_idle_power_optimizations()` and
`amdgpu_dm_plane_handle_cursor_update()`, already perform NULL checks
before calling these functions.

Fixes below:
drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c:336 dc_stream_program_cursor_attributes()
error: we previously assumed 'stream' could be null (see line 334)

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c
    327 bool dc_stream_program_cursor_attributes(
    328         struct dc_stream_state *stream,
    329         const struct dc_cursor_attributes *attributes)
    330 {
    331         struct dc  *dc;
    332         bool reset_idle_optimizations = false;
    333
    334         dc = stream ? stream->ctx->dc : NULL;
                     ^^^^^^
The old code assumed stream could be NULL.

    335
--> 336         if (dc_stream_set_cursor_attributes(stream, attributes)) {
                                                    ^^^^^^
The refactor added an unchecked dereference.

drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_stream.c
   313  bool dc_stream_set_cursor_attributes(
   314          struct dc_stream_state *stream,
   315          const struct dc_cursor_attributes *attributes)
   316  {
   317          bool result = false;
   318
   319          if (dc_stream_check_cursor_attributes(stream, stream->ctx->dc->current_state, attributes)) {
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Here.
This function used to check for if stream as NULL and return false at
the start. Probably we should add that back.

Fixes: 4465dd0e41e8 ("drm/amd/display: Refactor SubVP cursor limiting logic")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Alvin Lee <alvin.lee2@amd.com>
Cc: Ray Wu <ray.wu@amd.com>
Cc: Dillon Varone <dillon.varone@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Daniel Wheeler <daniel.wheeler@amd.com>
Cc: Tom Chung <chiahsuan.chung@amd.com>
Cc: Wenjing Liu <wenjing.liu@amd.com>
Cc: Jun Lei <Jun.Lei@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Dillon Varone <Dillon.varone@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: fix typos

Various small typos found around.

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/swm14: Update power limit logic

Take into account the limits from the vbios. Ported
from the SMU13 code.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4352
Reviewed-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Kenneth Feng <kenneth.feng@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Free SMUv13.0.6 resources on failure

Free the resources allocated if smu_v13_0_12_tables_init fails.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Fixes: 5bf93e1d6efd ("drm/amd/pm: Add caching to SMUv13.0.12 temp metric")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/vcn: Add late_init callback for VCN v4.0.3 reset handling

This change reorganizes VCN reset capability detection by:

1. Moving reset mask configuration from sw_init to new late_init phase
2. Adding vcn_v4_0_3_late_init() to properly check for per-queue reset support
3. Only setting soft full reset mask as fallback when per-queue reset isn't supported
4. Removing TODO comment now that queue reset support is implemented

V2: Removed unrelated changes. Keep amdgpu_get_soft_full_reset_mask in place
and remove TODO comment. (Alex)
v3: set the flags at one place (all in late_init) (Lijo)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Handle lack of READ permissions in SVM mapping

HMM assumes that pages have READ permissions by default. Inside
svm_range_validate_and_map, we add READ permissions then add WRITE
permissions if the VMA isn't read-only. This will conflict with regions
that only have PROT_WRITE or have PROT_NONE. When that happens,
svm_range_restore_work will continue to retry, silently, giving the
impression of a hang if pr_debug isn't enabled to show the retries..

If pages don't have READ permissions, simply unmap them and continue. If
they weren't mapped in the first place, this would be a no-op. Since x86
doesn't support write-only, and PROT_NONE doesn't allow reads or writes
anyways, this will allow the svm range validation to continue without
getting stuck in a loop forever on mappings we can't use with HMM.

Signed-off-by: Kent Russell <kent.russell@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add VCN reset support for SMU v13.0.6

This commit implements VCN reset capability for SMU v13.0.6 with the following changes:

1. Added new PPSMC message ID (0x5B) for VCN reset in SMU firmware interface
2. Extended SMU capabilities to include VCN_RESET support
3. Implemented VCN reset support check:
- Added smu_v13_0_6_reset_vcn_is_supported() function
4. Updated SMU v13.0.6 PPT functions to include VCN reset operations

v2: clean up debug info (Alex)
v3: remove unsupported message and split smu v13.0.6 changes to a separate patch (Lijo)
v4: simply the function (smu_v13_0_6_reset_vcn_is_supported) (Lijo)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add VCN reset support check capability

This change introduces infrastructure to check whether VCN reset
is supported by the SMU firmware. Key changes include:

1. Added new functions to query VCN reset support:
   - amdgpu_dpm_reset_vcn_is_supported()
   - smu_reset_vcn_is_supported()
   - pptable_funcs.reset_vcn_is_supported callback

2. Implemented proper locking in the DPM layer with mutex protection

3. Maintained consistency with existing SDMA reset support checks

The new capability allows callers to check for VCN reset support
before attempting the operation, preventing unnecessary attempts
on unsupported platforms.

v2: clean up debug info(Alex)

Suggested-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Ruili Ji <ruiliji2@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix nullptr err of vm_handle_moved

If a amdgpu_bo_va is fpriv->prt_va, the bo of this one is always NULL.
So, such kind of amdgpu_bo_va should be updated separately before
amdgpu_vm_handle_moved.

Signed-off-by: Heng Zhou <Heng.Zhou@amd.com>
Reviewed-by: Kasiviswanathan, Harish <Harish.Kasiviswanathan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: set uuid for each partition in topology

Currently each kfd compute partition/node is sharing
the same uuid of AID, which doen't meet the CUDA spec
for visible device, so corresponding XCD id for each
partition in smu has been assigned to xcp, and exposed
to kfd topology.

v2: add NULL check (Lijo)

Signed-off-by: Eric Huang <jinhuieric.huang@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Use boolean context for pointer null checks

Replace "out == 0" with "!out" for pointer comparison to improve code
readability and conform to coding style.

Signed-off-by: Qianfeng Rong <rongqianfeng@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Remove redundant semicolons

Remove unnecessary semicolons.

Fixes: dda4fb85e433 ("drm/amd/display: DML changes for DCN32/321")
Signed-off-by: Liao Yuanhong <liaoyuanhong@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon: replace min/max nesting with clamp()

The clamp() macro explicitly expresses the intent of constraining
a value within bounds.Therefore, replacing min(max(a, b), c) and
max(min(a,b),c) with clamp(val, lo, hi) can improve code readability.

Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Xichao Zhao <zhao.xichao@vivo.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix task hang from failed job submission during process kill

During process kill, drm_sched_entity_flush() will kill the vm
entities. The following job submissions of this process will fail, and
the resources of these jobs have not been released, nor have the fences
been signalled, causing tasks to hang and timeout.

Fix by check entity status in amdgpu_vm_ready() and avoid submit jobs to
stopped entity.

v2: add amdgpu_vm_ready() check before amdgpu_vm_clear_freed() in
function amdgpu_cs_vm_handling().

Signed-off-by: Liu01 Tong <Tong.Liu01@amd.com>
Signed-off-by: Lin.Cao <lincao12@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix incorrect vm flags to map bo

It should use vm flags instead of pte flags
to specify bo vm attributes.

Fixes: 7946340fa389 ("drm/amdgpu: Move csa related code to separate file")
Signed-off-by: Jack Xiao <Jack.Xiao@amd.com>
Reviewed-by: Likun Gao <Likun.Gao@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix vram reservation issue

The vram block allocation flag must be cleared
before making vram reservation, otherwise reserving
addresses within the currently freed memory range
will always fail.

Fixes: c9cad937c0c5 ("drm/amdgpu: add drm buddy support to amdgpu")
Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: return -ENOTTY for unsupported IOCTLs

Some kfd ioctls may not be available depending on the kernel version the
user is running, as such we need to report -ENOTTY so userland can
determine the cause of the ioctl failure.

Signed-off-by: Geoffrey McRae <geoffrey.mcrae@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add PSP fw version check for fw reserve GFX command

The fw reserved GFX command is only supported starting from PSP fw
version 0x3a0e14 and 0x3b0e0d. Older versions do not support this command.

Add a version guard to ensure the command is only used when the running
PSP fw meets the minimum version requirement.

This ensures backward compatibility and safe operation across fw
revisions.

Fixes: a3b7f9c306e1 ("drm/amdgpu: reclaim psp fw reservation memory region")
Signed-off-by: Frank Min <Frank.Min@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add description for partition commands

Add string description for partition commands.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/radeon/r600_cs: clean up of dead code in r600_cs

GCC 16 enables -Werror=unused-but-set-variable= which results in build
error with the following message.

drivers/gpu/drm/radeon/r600_cs.c: In function ‘r600_texture_size’:
drivers/gpu/drm/radeon/r600_cs.c:1411:29: error: variable ‘level’ set but not used [-Werror=unused-but-set-variable=]
1411 | unsigned offset, i, level;
| ^~~~~
cc1: all warnings being treated as errors
make[6]: *** [scripts/Makefile.build:287: drivers/gpu/drm/radeon/r600_cs.o] Error 1

level although is set, but in never used in the function
r600_texture_size. Thus resulting in dead code and this error getting
triggered.

Fixes: 60b212f8ddcd ("drm/radeon: overhaul texture checking. (v3)")
Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Brahmajit Das <listout@listout.xyz>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: fix incorrect comment format

Comments should not have a leading plus sign.

Acked-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Cryolitia PukNgae <cryolitia@uniontech.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.345

This version brings along following update:
-Fix close and open lid may cause eDP remaining blank
-Fix frequently disabling/enabling OTG may cause incorrect
configuration of OTG

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: [FW Promotion] Release 0.1.22.0

Add a new command for Panel Replay.

Acked-by: Wayne Lin <wayne.lin@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Reset apply_eamless_boot_optimization when dpms_off

[WHY&HOW]
The user closed the lid while the system was powering on and opened it
again before the “apply_seamless_boot_optimization” was set to false,
resulting in the eDP remaining blank.
Reset the “apply_seamless_boot_optimization” to false when dpms off.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Danny Wang <Danny.Wang@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Wait until OTG enable state is cleared

[Why]
Customer reported an issue that OS starts and stops device multiple times
during driver installation. Frequently disabling and enabling OTG may
prevent OTG from being safely disabled and cause incorrect configuration
upon the next enablement.

[How]
Add a wait until OTG_CURRENT_MASTER_EN_STATE is cleared as a short term
solution.

Reviewed-by: Dillon Varone <dillon.varone@amd.com>
Signed-off-by: TungYu Lu <tungyu.lu@amd.com>
Signed-off-by: Tom Chung <chiahsuan.chung@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add to custom amdgpu_drm_release drm_dev_enter/exit

User queues are disabled before GEM objects are released
(protecting against user app crashes).
No races with PCI hot-unplug (because drm_dev_enter prevents cleanup
if iewdevice is being removed).

Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Save and restore switch state

During a DPC error kernel waits for the link to be active before
notifying downstream devices. On certain platforms with Broadcom switch
in synthetiic mode, switch responds with values even though the link is
not fully ready. The config space restoration done by pcie port driver
for SWUS/DS of dGPU is thus not effective as the switch is still doing
internal enumeration.

As a workaround, save state of SWUS/DS device in driver. Add additional
check to see if link is active and restore the values during DPC error
callbacks.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn: Hold pg_lock before vcn power off

Acquire vcn_pg_lock before changes to vcn power state
and release it after power off in idle work handler.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/jpeg: Hold pg_lock before jpeg poweroff

Acquire jpeg_pg_lock before changes to jpeg power state
and release it after power off from idle work handler.

Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Reviewed-by: Leo Liu <leo.liu@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Assign unique id to compute partition

Assign unique id to compute partition. This is the unique id of the
first XCD instance belonging to the partition.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add unique ids for SMUv13.0.12 SOCs

Fetch and store the unique ids for AIDs/XCDs in SMUv13.0.12 SOCs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: add missing vram lost check for LEGACY RESET

Legacy resets reset the memory controllers so VRAM contents
may be unreliable after reset.

Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/discovery: fix fw based ip discovery

We only need the fw based discovery table for sysfs. No
need to parse it. Additionally parsing some of the board
specific tables may result in incorrect data on some boards.
just load the binary and don't parse it on those boards.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4441
Fixes: 80a0e8282933 ("drm/amdgpu/discovery: optionally use fw based ip discovery")
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Add NULL check for stream before dereference in 'dm_vupdate_high_irq'

Add a NULL check for acrtc->dm_irq_params.stream before
accessing its members.

Fixes below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:623
dm_vupdate_high_irq() warn: variable dereferenced before check
'acrtc->dm_irq_params.stream' (see line 615)

614 if (vrr_active) {
615 bool replay_en = acrtc->dm_irq_params.stream->link->replay_settings.replay_feature_enabled;
^^^^^^^^^^^^^^^^^^^^^^^^^^^
616 bool psr_en = acrtc->dm_irq_params.stream->link->psr_settings.psr_feature_enabled;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^ New dereferences

617 bool fs_active_var_en = acrtc->dm_irq_params.freesync_config.state
618 == VRR_STATE_ACTIVE_VARIABLE;
619
620 amdgpu_dm_crtc_handle_vblank(acrtc);
621
622 /* BTR processing for pre-DCE12 ASICs */
623 if (acrtc->dm_irq_params.stream &&
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^ But the existing code assumed it could be NULL. Someone is wrong.

624     adev->family < AMDGPU_FAMILY_AI) {
625 spin_lock_irqsave(&adev_to_drm(adev)->event_lock, flags);

Fixes: 6d31602a9f57 ("drm/amd/display: more liberal vmin/vmax update for freesync")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Cc: Alex Hung <alex.hung@amd.com>
Cc: Aurabindo Pillai <aurabindo.pillai@amd.com>
Cc: Roman Li <roman.li@amd.com>
Cc: ChiaHsuan Chung <chiahsuan.chung@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Ray Wu <ray.wu@amd.com>
Cc: Daniel Wheeler <daniel.wheeler@amd.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com>
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add caching to SMUv13.0.12 temp metric

Add table caching logic to temperature metrics tables in SMUv13.0.12

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add cache logic for temperature metric

Add caching logic for baseboard and gpuboard temperature metrics tables.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Remove cache logic from SMUv13.0.12

Remove caching logic of temperature metrics from SMUv13.0.12. The
caching logic needs to be moved to a higher level.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add unique ids for SMUv13.0.6 SOCs

Fetch and store the unique ids for AIDs/XCDs in SMUv13.0.6 SOCs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add helpers to set/get unique ids

Add a struct to store unique id information for each type. Add helper
to fetch the unique id.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Prevent hardware access in dpc state

Don't allow hardware access while in dpc state.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/vcn: Fix double-free of vcn dump buffer

The buffer is already freed as part of amdgpu_vcn_reg_dump_fini(). The
issue is introduced by below patch series.

Fixes: de55cbff5ce9 ("drm/amdgpu/vcn: Add regdump helper functions")
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Sathishkumar S <sathishkumar.sundararaju@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Log reset source during recovery

To get more context, add reset source to identify the source of gpu
recovery - job timeout, RAS, HWS hang etc.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Generate BP threshold exceed CPER once threshold exceeded

The bad pages threshold exceed CPER should be generated once threshold
exceeded, no matter the bad_page_threshold setted or not.

Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Enable temperature metrics caps

Enable temperature metrics caps for smu_v13_0_12

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add temperature metrics sysfs entry

Add temperature metrics sysfs entry to expose gpuboard/baseboard
temperature metrics

v2: Removed unused function, rename functions(Lijo)

v3: Remove unnecessary initialization

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Fetch and fill temperature metrics

Fetch system metrics table to fill gpuboard/baseboard temperature
metrics data for smu_v13_0_12

v2: Remove unnecessary checks, used separate metrics time for
temperature metrics table(Lijo)

v3: Use cached values for back to back system metrics query(Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Update pmfw header for smu_v13_0_12

Update pmfw header for smu_v13_0_12 with system temperature metrics
table

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add smu interface for temp metrics

Add smu interface to get baseboard/gpuboard temperature metrics

v2: Rename is_support to is_supported(Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add dpm interface for temp metrics

Add dpm interface to get gpuboard/baseboard temperature metrics

v2: Add temperature metrics support check(Lijo)

v3: Return error code in case of operation not supported(Lijo)

Signed-off-by: Asad Kamal <asad.kamal@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Fix vupdate_offload_work doc

Fix the following warning in struct documentation:

drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h:168: warning: expecting prototype for struct dm_vupdate_work. Prototype was for struct vupdate_offload_work instead

Fixes: c210b757b400 ("drm/amd/display: fix dmub access race condition")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: return migration pages from copy function

dst MIGRATE_PFN_VALID bit and src MIGRATE_PFN_MIGRATE bit
should always be set when migration success. cpage includes
src MIGRATE_PFN_MIGRATE bit set and MIGRATE_PFN_VALID bit
unset pages for both ram and vram when memory is only allocated
without being populated before migration, those ram pages should
be counted as migrate pages and those vram pages should not be
counted as migrate pages. Here migration pages refer to how many
vram pages involved.

-v2 use dst to check MIGRATE_PFN_VALID bit (suggested-by Philip)
-v3 add warning when vram pages is less than migration pages
return migration pages directly from copy function
-v4 correct comments and copy function return mpage (suggested-by Felix)

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Felix Kuehling <felix.kuehling@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: remove unused code

upages is assigned under cpages = 0, so it isn't really used in this function.

Signed-off-by: James Zhu <James.Zhu@amd.com>
Reviewed-by: Philip.Yang<Philip.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Add priority messages for SMU v13.0.6

Certain messages will processed with high priority by PMFW even if it
hasn't responded to a previous message. Send the priority message
regardless of the success/fail status of the previous message. Add
support on SMUv13.0.6 and SMUv13.0.12

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Set dpc status appropriately

Set the dpc status based on hardware state. Also, clear the status before
reinitialization after a successful reset.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdkfd: Destroy KFD debugfs after destroy KFD wq

Since KFD proc content was moved to kernel debugfs, we can't destroy KFD
debugfs before kfd_process_destroy_wq. Move kfd_process_destroy_wq prior
to kfd_debugfs_fini to fix a kernel NULL pointer problem. It happens
when /sys/kernel/debug/kfd was already destroyed in kfd_debugfs_fini but
kfd_process_destroy_wq calls kfd_debugfs_remove_process. This line
debugfs_remove_recursive(entry->proc_dentry);
tries to remove /sys/kernel/debug/kfd/proc/<pid> while
/sys/kernel/debug/kfd is already gone. It hangs the kernel by kernel
NULL pointer.

Signed-off-by: Amber Lin <Amber.Lin@amd.com>
Reviewed-by: Eric Huang <jinhuieric.huang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Wait for bootloader after PSPv11 reset

Some PSPv11 SOCs take a longer time for PSP based mode-1 reset. Instead
of checking for C2PMSG_33 status, add the callback wait_for_bootloader.
Wait for bootloader to be back to steady state is already part of the
generic mode-1 reset flow. Increase the retry count for bootloader wait
and also fix the mask to prevent fake pass.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx9.4.3: remove redundant repeated nested 0 check

The repeated checks on grbm_soft_reset are unnecessary. Remove them.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx9: remove redundant repeated nested 0 check

The repeated checks on grbm_soft_reset are unnecessary. Remove them.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu/gfx10: remove redundant repeated nested 0 check

The repeated checks on grbm_soft_reset are unnecessary. Remove them.

Signed-off-by: Ethan Carter Edwards <ethan@ethancedwards.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

amdgpu/amdgpu_discovery: increase timeout limit for IFWI init

With a timeout of only 1 second, my rx 5700XT fails to initialize,
so this increases the timeout to 2s.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3697
Signed-off-by: Xaver Hugl <xaver.hugl@kde.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

Documentation: Remove VCE support from OLAND's features

OLAND doesn't support VCE at all, but it does support UVD (3 or 4,
depending of the sources).

Signed-off-by: Alexandre Demers <alexandre.f.demers@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Make static table support conditional

Add PMFW version check for static table support on SMU v13.0.6 VFs.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Yang Wang <kevinyang.wang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix vcn v4.0.3 poison irq call trace on sriov guest

Sriov guest side doesn't init ras feature hence the poison irq shouldn't
be put during hw fini.

[25209.468816] Call Trace:
[25209.468817]  <TASK>
[25209.468818]  ? srso_alias_return_thunk+0x5/0x7f
[25209.468820]  ? show_trace_log_lvl+0x28e/0x2ea
[25209.468822]  ? show_trace_log_lvl+0x28e/0x2ea
[25209.468825]  ? vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu]
[25209.468936]  ? show_regs.part.0+0x23/0x29
[25209.468939]  ? show_regs.cold+0x8/0xd
[25209.468940]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.469038]  ? __warn+0x8c/0x100
[25209.469040]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.469135]  ? report_bug+0xa4/0xd0
[25209.469138]  ? handle_bug+0x39/0x90
[25209.469140]  ? exc_invalid_op+0x19/0x70
[25209.469142]  ? asm_exc_invalid_op+0x1b/0x20
[25209.469146]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.469241]  vcn_v4_0_3_hw_fini+0xaf/0xe0 [amdgpu]
[25209.469343]  amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu]
[25209.469511]  amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu]

Fixes: 4c4a89149608 ("drm/amdgpu: Register aqua vanjaram vcn poison irq")
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Fix jpeg v4.0.3 poison irq call trace on sriov guest

Sriov guest side doesn't init ras feature hence the poison irq shouldn't
be put during hw fini.

[25209.467154] Call Trace:
[25209.467156]  <TASK>
[25209.467158]  ? srso_alias_return_thunk+0x5/0x7f
[25209.467162]  ? show_trace_log_lvl+0x28e/0x2ea
[25209.467166]  ? show_trace_log_lvl+0x28e/0x2ea
[25209.467171]  ? jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu]
[25209.467300]  ? show_regs.part.0+0x23/0x29
[25209.467303]  ? show_regs.cold+0x8/0xd
[25209.467304]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.467403]  ? __warn+0x8c/0x100
[25209.467407]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.467503]  ? report_bug+0xa4/0xd0
[25209.467508]  ? handle_bug+0x39/0x90
[25209.467511]  ? exc_invalid_op+0x19/0x70
[25209.467513]  ? asm_exc_invalid_op+0x1b/0x20
[25209.467518]  ? amdgpu_irq_put+0x9e/0xc0 [amdgpu]
[25209.467613]  ? amdgpu_irq_put+0x5f/0xc0 [amdgpu]
[25209.467709]  jpeg_v4_0_3_hw_fini+0x6f/0x90 [amdgpu]
[25209.467805]  amdgpu_ip_block_hw_fini+0x34/0x61 [amdgpu]
[25209.467971]  amdgpu_device_fini_hw+0x3b3/0x467 [amdgpu]

Fixes: 1b2231de4163 ("drm/amdgpu: Register aqua vanjaram jpeg poison irq")
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Stanley.Yang <Stanley.Yang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add wrapper function for dpc state

Use wrapper functions to set/indicate dpc status.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Ce Sun <cesun102@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Allow static metrics table query in VF

Allow statics metrics table to be queried on SMUv13.0.6 SOCs in VF mode.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Update SDMA firmware version check for user queue support

This commit fixes a firmware version check for enabling user queue
support in SDMA v7.0. The previous version check (7836028) was
incorrect and could lead to issues with PROTECTED_FENCE_SIGNAL
commands causing register conflicts between MCU_DBG0 and MCU_DBG1.

Fixes: 8c011408ed84 ("drm/amdgpu/sdma7: add ucode version checks for userq support")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Use cached metrics data on arcturus

Cached metrics data validity is 1ms on arcturus. It's not reasonable for
any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/pm: Use cached metrics data on aldebaran

Cached metrics data validity is 1ms on aldebaran. It's not reasonable
for any client to query gpu_metrics at a faster rate and constantly
interrupt PMFW.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Asad Kamal <asad.kamal@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amdgpu: Add NULL check for asic_funcs

If driver load fails too early, asic_funcs pointer remains unassigned.
Add NULL check to sanitize unwind path.

Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Promote DC to 3.2.344

Summary:
* Add interface to log hw state when underflow happens
* Fix hubp programming of 3dlut fast load
* Avoid Read Remote DPCD Many Times
* More liberal vmin/vmax update for freesync
* Fix dmub access race condition

Acked-by: Sun peng (Leo) Li <sunpeng.li@amd.com>
Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>

drm/amd/display: Adding interface to log hw state when underflow happens

[why]
Will help us better debug underflow issues.

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Signed-off-by: Muhammad Ahmed <Muhammad.Ahmed@amd.com>
Signed-off-by: Roman Li <roman.li@amd.com>
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>