Fixes the below with gcc W=1:
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c:935: warning: Excess function parameter 'handle' description in 'vcn_v5_0_1_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c:1972: warning: Excess function parameter 'handle' description in 'vcn_v4_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c:1583: warning: Excess function parameter 'handle' description in 'vcn_v4_0_3_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c:1200: warning: Excess function parameter 'handle' description in 'vcn_v5_0_0_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Function parameter or struct member 'ip_block' not described in 'vcn_v4_0_5_is_idle'
drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c:1460: warning: Excess function parameter 'handle' description in 'vcn_v4_0_5_is_idle'
Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: update SDMA sysfs reset mask in late_init
- Added `sdma_v4_4_2_update_reset_mask` function to update the reset mask.
- update the sysfs reset mask to the `late_init` stage to ensure that the SMU initialization
and capability setup are completed before checking the SDMA reset capability.
- For IP versions 9.4.3 and 9.4.4, enable per-queue reset if the MEC firmware version is at least 0xb0 and PMFW supports queue reset.
- Add a TODO comment for future support of per-queue reset for IP version 9.5.0.
This change ensures that per-queue reset is only enabled when the MEC and PMFW support it.
v2: fix ip version (9.5.4 -> 9.5.0)(Lijo)
Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jonathan Kim [Mon, 10 Feb 2025 18:15:48 +0000 (13:15 -0500)]
drm/amdgpu: simplify xgmi peer info calls
Deprecate KFD XGMI peer info calls in favour of calling directly from
simplified XGMI peer info functions.
Signed-off-by: Jonathan Kim <jonathan.kim@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 16 Feb 2025 21:27:26 +0000 (16:27 -0500)]
drm/amd/display: Promote DAL to 3.2.322
- Disable PSR-SU on eDP panels
- Fix HPD after GPU reset
- Fixes on dcn4x init, DML2 state policy on DCN36
- Various minor logic fixes
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Sun, 16 Feb 2025 20:06:06 +0000 (15:06 -0500)]
drm/amd/display: [FW Promotion] Release 0.0.255.0
Reviewed-by: Leo Li <sunpeng.li@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Roman Li [Wed, 12 Feb 2025 19:49:36 +0000 (14:49 -0500)]
drm/amd/display: Fix HPD after gpu reset
[Why]
DC is not using amdgpu_irq_get/put to manage the HPD interrupt refcounts.
So when amdgpu_irq_gpu_reset_resume_helper() reprograms all of the IRQs,
HPD gets disabled.
[How]
Use amdgpu_irq_get/put() for HPD init/fini in DM in order to sync refcounts
Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Roman Li <Roman.Li@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Mike Katsnelson [Thu, 13 Feb 2025 16:52:32 +0000 (11:52 -0500)]
drm/amd/display: stop DML2 from removing pipes based on planes
[Why]
Transitioning from low to high resolutions at high refresh rates caused grey corruption.
During the transition state, there is a period where plane size is based on low resultion
state and ODM slices are based on high resoultion state, causing the entire plane to be
contained in one ODM slice. DML2 would turn off the pipe for the ODM slice with no plane,
causing an underflow since the pixel rate for the higher resolution cannot be supported on
one pipe. This change stops DML2 from turning off pipes that are mapped to an ODM slice
with no plane. This is possible to do without negative consequences because pipes can now
take the minimum viewport and draw with zero recout size, removing the need to have the
pipe turned off.
[How]
In map_pipes_from_plane(), remove "check" that skips ODM slices that are not covered by
the plane. This prevents the pipes for those ODM slices from being freed.
Reviewed-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Mike Katsnelson <mike.katsnelson@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Increase halt timeout for DMCUB to 1s
[Why]
If we soft reset before halt finishes and there are outstanding
memory transactions then the memory interface may produce unexpected
results, such as out of order transactions when the firmware next runs.
These can manifest as random or unexpected load/store violations.
[How]
Increase the timeout before soft reset to ensure the DMCUB has quiesced.
Reviewed-by: Dillon Varone <dillon.varone@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
[WHY]
If max_downscale_src_width check fails, we exit early from TAP calculation and left a NULL
value to the scaling data structure to cause the zero divide in the DML validation.
[HOW]
Call set default TAP calculation before early exit in get_optimal_number_of_taps due to
max downscale limit exceed.
Reviewed-by: Samson Tam <samson.tam@amd.com> Signed-off-by: Yihan Zhu <Yihan.Zhu@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michael Strauss [Fri, 24 Jan 2025 20:02:27 +0000 (15:02 -0500)]
drm/amd/display: Update FIXED_VS Link Rate Toggle Workaround Usage
[WHY]
Previously the 128b/132b LTTPR support DPCD field was used to decide if
FIXED_VS training sequence required a rate toggle before initiating LT.
When running DP2.1 4.9.x.x compliance tests, emulated LTTPRs can report
no-128b/132b support which is then forwarded by the FIXED_VS retimer.
As a result this test exposes the rate toggle again, erroneously causing
failures as certain compliance sinks don't expect this behaviour.
[HOW]
Add new DPCD register defines/reads to read LTTPR IEEE OUI and device ID.
Decide whether to perform the rate toggle based on the LTTPR's IEEE OUI
which guarantees that we only perform the toggle on affected retimers.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Michael Strauss <michael.strauss@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Charlene Liu [Thu, 13 Feb 2025 17:37:10 +0000 (12:37 -0500)]
drm/amd/display: fix dcn4x init failed
[why]
failed due to cmdtable not created.
switch atombios cmdtable as default.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Aurabindo Pillai [Mon, 20 Jan 2025 20:27:23 +0000 (15:27 -0500)]
drm/amd/display: Temporarily disable hostvm on DCN31
With HostVM enabled, DCN31 fails to pass validation for 3x4k60. Some Linux
userspace does not downgrade one of the monitors to 4k30, and the result
is that the monitor does not light up. Disable it until the bandwidth
calculation failure is resolved.
Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rafal Ostrowski [Wed, 12 Feb 2025 07:08:07 +0000 (08:08 +0100)]
drm/amd/display: ACPI Re-timer Programming
[Why]
We must implement an ACPI re-timer programming interface and notify
ACPI driver whenever a PHY transition is about to take place.
Because some trace lengths on certain platforms are very long,
then a re-timer may need to be programmed whenever a PHY transition
takes place. The implementation of this re-timer programming interface
will notify ACPI driver that PHY transition is taking place and it
will trigger the re-timer as needed.
First we need to gather retimer information from ACPI interface.
Then, in the PRE case, the re-timer interface needs to be called before we call
transmitter ENABLE.
In the POST case, it has to be called after we call transmitter DISABLE.
Yilin Chen [Fri, 7 Feb 2025 20:26:19 +0000 (15:26 -0500)]
drm/amd/display: add a quirk to enable eDP0 on DP1
[why]
some board designs have eDP0 connected to DP1, need a way to enable
support_edp0_on_dp1 flag, otherwise edp related features cannot work
[how]
do a dmi check during dm initialization to identify systems that
require support_edp0_on_dp1. Optimize quirk table with callback
functions to set quirk entries, retrieve_dmi_info can set quirks
according to quirk entries
Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Yilin Chen <Yilin.Chen@amd.com> Signed-off-by: Zaeem Mohamed <zaeem.mohamed@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 7 Jan 2025 19:17:15 +0000 (14:17 -0500)]
drm/amd/display: Fix unit test failure
[Why]
Some of unit tests use large scaling ratio such that when we
calculate optimal number of taps, max_taps is negative.
Then in recent change, we changed max_taps to uint instead
of int so now max_taps wraps and is positive. This change
changed the behaviour from returning back false to return
true and breaks unit test check
[How]
Add check to prevent max_taps from wrapping and set to 0
instead
Signed-off-by: Samson Tam <Samson.Tam@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 21 Jan 2025 16:01:47 +0000 (11:01 -0500)]
drm/amd/display: fix check for identity ratio
[Why]
IDENTITY_RATIO check uses 2 bits for integer, which only allows
checking downscale ratios up to 3. But we support up to 6x
downscale
[How]
Update IDENTITY_RATIO to check 3 bits for integer
Add ASSERT to catch if we downscale more than 6x
Signed-off-by: Samson Tam <Samson.Tam@amd.com> Reviewed-by: Jun Lei <jun.lei@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Assadian, Navid [Thu, 19 Dec 2024 22:19:09 +0000 (17:19 -0500)]
drm/amd/display: Fix mismatch type comparison
The mismatch type comparison/assignment may cause data loss. Since the
values are always non-negative, it is safe to use unsigned variables to
resolve the mismatch.
Signed-off-by: Navid Assadian <navid.assadian@amd.com> Reviewed-by: Joshua Aberback <joshua.aberback@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Samson Tam [Tue, 7 Jan 2025 19:16:04 +0000 (14:16 -0500)]
drm/amd/display: Fix mismatch type comparison in custom_float
[Why & How]
Passing uint into uchar function param. Pass uint instead
Signed-off-by: Samson Tam <Samson.Tam@amd.com> Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom Chung [Thu, 6 Feb 2025 03:31:23 +0000 (11:31 +0800)]
drm/amd/display: Disable PSR-SU on eDP panels
[Why]
PSR-SU may cause some glitching randomly on several panels.
[How]
Temporarily disable the PSR-SU and fallback to PSR1 for
all eDP panels.
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/3388 Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
We planning to disable the PSR-SU and fallback to PSR1 for
all eDP panels not only for specific eDP panel temporarily.
Reviewed-by: Sun peng Li <sunpeng.li@amd.com> Signed-off-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Melissa Wen [Sat, 15 Feb 2025 21:15:47 +0000 (18:15 -0300)]
drm/amd/display: restore edid reading from a given i2c adapter
When switching to drm_edid, we slightly changed how to get edid by
removing the possibility of getting them from dc_link when in aux
transaction mode. As MST doesn't initialize the connector with
`drm_connector_init_with_ddc()`, restore the original behavior to avoid
functional changes.
v2:
- Fix build warning of unchecked dereference (kernel test bot)
CC: Alex Hung <alex.hung@amd.com> CC: Mario Limonciello <mario.limonciello@amd.com> CC: Roman Li <Roman.Li@amd.com> CC: Aurabindo Pillai <Aurabindo.Pillai@amd.com> Fixes: 48edb2a4256e ("drm/amd/display: switch amdgpu_dm_connector to use struct drm_edid") Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Melissa Wen <mwen@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The nbif_v6_3_1_sriov_funcs instance of amdgpu_nbio_funcs was added in
commit 894c6d3522d1 ("drm/amdgpu: Add nbif v6_3_1 ip block support")
but has remained unused.
Alex has confirmed it wasn't needed.
Remove it, together with the four unused stub functions:
nbif_v6_3_1_sriov_ih_doorbell_range
nbif_v6_3_1_sriov_gc_doorbell_init
nbif_v6_3_1_sriov_vcn_doorbell_range
nbif_v6_3_1_sriov_sdma_doorbell_range
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Tue, 18 Feb 2025 18:06:48 +0000 (23:36 +0530)]
drm/amdgpu: Add ring reset callback for JPEG5_0_1
Add ring reset function callback for JPEG5_0_1 to
recover from job timeouts without a full gpu reset.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
André Almeida [Thu, 20 Feb 2025 16:27:49 +0000 (13:27 -0300)]
drm/amdgpu: Log after a successful ring reset
When a ring reset happens, the kernel log shows only "amdgpu: Starting
<ring name> ring reset", but when it finishes nothing appears in the
log. Explicitly write in the log that the reset has finished correctly.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
André Almeida [Thu, 20 Feb 2025 16:27:48 +0000 (13:27 -0300)]
drm/amdgpu: Log the creation of a coredump file
After a GPU reset happens, the driver creates a coredump file. However,
the user might not be aware of it. Log the file creation the user can
find more information about the device and add the file to bug reports.
This is similar to what the xe driver does.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: André Almeida <andrealmeid@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Tue, 18 Feb 2025 18:05:42 +0000 (23:35 +0530)]
drm/amdgpu: Add core reset registers for JPEG5_0_1
Add core reset control register definitions and align
all prior register definitions to end at 100 column
length for uniformity.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Tue, 18 Feb 2025 17:56:37 +0000 (23:26 +0530)]
drm/amdgpu: Per-instance init func for JPEG5_0_1
Add helper functions to handle per-instance and per-core
initialization and deinitialization in JPEG5_0_1.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Mon, 17 Feb 2025 15:55:05 +0000 (10:55 -0500)]
drm/amdgpu: disable BAR resize on Dell G5 SE
There was a quirk added to add a workaround for a Sapphire
RX 5600 XT Pulse that didn't allow BAR resizing. However,
the quirk caused a regression with runtime pm on Dell laptops
using those chips, rather than narrowing the scope of the
resizing quirk, add a quirk to prevent amdgpu from resizing
the BAR on those Dell platforms unless runtime pm is disabled.
v2: update commit message, add runpm check
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/1707 Fixes: 907830b0fc9e ("PCI: Add a REBAR size quirk for Sapphire RX 5600 XT Pulse") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Asad Kamal [Mon, 10 Feb 2025 07:56:51 +0000 (15:56 +0800)]
drm/amd/pm: Update pmfw headers for smu_v13_0_12
Update pmfw headers for smu_v13_0_12 new messages & metrics table.
Static metrics table for frequency added, Separate metrics table
for smu_v13_0_12 added.
Signed-off-by: Asad Kamal <asad.kamal@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Update amdgpu_job_timedout to check if the ring is guilty
This patch updates the `amdgpu_job_timedout` function to check if
the ring is actually guilty of causing the timeout. If not, it
skips error handling and fence completion.
v2: move the is_guilty check down into the queue reset area (Alex)
v3: need to call is_guilty before reset (Alex)
v4: squash in is_guilty logic fixes (Alex)
Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/pm: add support for checking SDMA reset capability
This patch introduces a new function to check if the SMU supports resetting the SDMA engine.
This capability check ensures that the driver does not attempt to reset the SDMA engine
on hardware that does not support it.
The following changes are included:
- New function `amdgpu_dpm_reset_sdma_is_supported` to check SDMA reset
support at the AMDGPU driver level.
- New function `smu_reset_sdma_is_supported` to check SDMA reset support
at the SMU level.
- Implementation of `smu_v13_0_6_reset_sdma_is_supported` for the specific
SMU version v13.0.6.
- Updated `smu_v13_0_6_reset_sdma` to use the new capability check before
attempting to reset the SDMA engine.
v2: change smu_reset_sdma_is_supported type to bool (Tim)
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Tim Huang <tim.huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Add reset function pointer for SDMA v4.4.2 page ring
This patch adds a reset function pointer to the SDMA v4.4.2 page ring
functionality. The new function pointer `reset` is set to
`sdma_v4_4_2_reset_queue`, which is responsible for resetting the SDMA queue.
Changes:
- Add `reset` function pointer to `sdma_v4_4_2_page_ring_funcs`.
Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Improve SDMA reset logic with guilty queue tracking
This patch includes the remaining improvements to the SDMA reset logic:
- Added `gfx_guilty` and `page_guilty` flags to track guilty queues.
- Updated the reset and resume functions to handle the guilty state.
- Cached the `rptr` before reset.
v2:
1.replace the caller with a guilty bool.
If the queue is the guilty one, set the rptr and wptr to the saved wptr value,
else, set the rptr and wptr to the saved rptr value. (Alex)
2. cache the rptr before the reset. (Alex)
v3: Keeping intermediate variables like u64 rwptr simplifies resotre rptr/wptr.(Lijo)
Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Introduce cached_rptr and is_guilty callback in amdgpu_ring
This patch introduces the following changes:
- Add `cached_rptr` to the `amdgpu_ring` structure to store the read pointer before a reset.
- Add `is_guilty` callback to the `amdgpu_ring_funcs` structure to check if a ring is guilty of causing a timeout.
Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Introduce conditional user queue suspension for SDMA resets
- Modify the `amdgpu_sdma_reset_engine` function to accept a `suspend_user_queues` parameter.
- This parameter allows the function to conditionally suspend and resume user queues during SDMA resets.
- Ensure that user queues are suspended only when necessary to avoid unnecessary overhead and potential deadlocks.
- Restart the scheduler's work queue for the GFX and page rings after the reset to allow new tasks to be submitted.
This change improves synchronization between the KGD and the KFD during SDMA resets,
ensuring proper handling of user queues and avoiding race conditions.
V2: replace the ring_lock with the existed the scheduler
locks for the queues (ring->sched) on the sdma engine.(Alex)
v3: call drm_sched_wqueue_stop() rather than job_list_lock.
If a GPU ring reset was already initiated for one ring at amdgpu_job_timedout,
skip resetting that ring and call drm_sched_wqueue_stop()
for the other rings (Alex)
replace the common lock (sdma_reset_lock) with DQM lock to
to resolve reset races between the two driver sections during KFD eviction.(Jon)
Rename the caller to Reset_src and
Change AMDGPU_RESET_SRC_SDMA_KGD/KFD to AMDGPU_RESET_SRC_SDMA_HWS/RING (Jon)
v4: restart the wqueue if the reset was successful,
or fall back to a full adapter reset. (Alex)
move definition of reset source to enumeration AMDGPU_RESET_SRCS, and
check reset src in amdgpu_sdma_reset_instance (Jon)
v5: Call amdgpu_amdkfd_suspend/resume at the start/end of reset function respectively under !SRC_HWS
conditions only (Jon)
v6: replace the paramter src with a bool suspend_user_queues,
remove the paramter src in pre/post func. (Jon)
Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Suggested-by: Jonathan Kim <Jonathan.Kim@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Acked-by: Jonathan Kim <jonathan.kim@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Wed, 19 Feb 2025 04:29:03 +0000 (09:59 +0530)]
drm/amdgpu: Do not poweroff UVDJ in JPEG4_0_3
Update power gate setting to not poweroff UVDJ in JPEG4_0_3.
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Rodrigo Siqueira [Wed, 19 Feb 2025 18:53:45 +0000 (11:53 -0700)]
Documentation/gpu: Add acronyms for some firmware components
Users can check the file "/sys/kernel/debug/dri/0/amdgpu_firmware_info"
to get information on the firmware loaded in the system. This file has
multiple acronyms that are not documented in the glossary. This commit
introduces some missing acronyms to the AMD glossary documentation. The
meaning of each acronym in this commit was extracted from code
documentation available in the following files:
drm/amdgpu/sdma: Refactor SDMA reset functionality and add callback support
This patch refactors the SDMA reset functionality in the `sdma_v4_4_2` driver
to improve modularity and support shared usage between AMDGPU and KFD. The
changes include:
1. **Refactored SDMA Reset Logic**:
- Split the `sdma_v4_4_2_reset_queue` function into two separate functions:
- `sdma_v4_4_2_stop_queue`: Stops the SDMA queue before reset.
- `sdma_v4_4_2_restore_queue`: Restores the SDMA queue after reset.
- These functions are now used as callbacks for the shared reset mechanism.
2. **Added Callback Support**:
- Introduced a new structure `sdma_v4_4_2_reset_funcs` to hold the stop and
restore callbacks.
- Added `sdma_v4_4_2_set_reset_funcs` to register these callbacks with the
shared reset mechanism using `amdgpu_set_on_reset_callbacks`.
3. **Fixed Reset Queue Function**:
- Modified `sdma_v4_4_2_reset_queue` to use the shared `amdgpu_sdma_reset_queue`
function, ensuring consistency across the driver.
This patch ensures that SDMA reset functionality is more modular, reusable, and
aligned with the shared reset mechanism between AMDGPU and KFD.
v2: Renamed sdma_v4_4_2_set_reset_funcs to sdma_v4_4_2_set_engine_reset_funcs.
Renamed sdma_v4_4_2_reset_funcs to sdma_v4_4_2_engine_reset_funcs.(Alex)
Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu/kfd: Add shared SDMA reset functionality with callback support
This patch introduces shared SDMA reset functionality between AMDGPU and KFD.
The implementation includes the following key changes:
1. Added `amdgpu_sdma_reset_queue`:
- Resets a specific SDMA queue by instance ID.
- Invokes registered pre-reset and post-reset callbacks to allow KFD and AMDGPU
to save/restore their state during the reset process.
2. Added `amdgpu_set_on_reset_callbacks`:
- Allows KFD and AMDGPU to register callback functions for pre-reset and
post-reset operations.
- Callbacks are stored in a global linked list and invoked in the correct order
during SDMA reset.
This patch ensures that both AMDGPU and KFD can handle SDMA reset events
gracefully, with proper state saving and restoration. It also provides a flexible
callback mechanism for future extensions.
v2: fix CamelCase and put the SDMA helper into amdgpu_sdma.c (Alex)
v3: rename the `amdgpu_register_on_reset_callbacks` function to
`amdgpu_sdma_register_on_reset_callbacks`
move global reset_callback_list to struct amdgpu_sdma (Alex)
v4: Update the reset callback function description and
rename the reset function to amdgpu_sdma_reset_engine (Alex)
Suggested-by: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Signed-off-by: Jesse Zhang <jesse.zhang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
David Yat Sin [Wed, 19 Feb 2025 22:34:38 +0000 (17:34 -0500)]
drm/amdkfd: Preserve cp_hqd_pq_control on update_mqd
When userspace applications call AMDKFD_IOC_UPDATE_QUEUE. Preserve
bitfields that do not need to be modified as they contain flags to
track queue states that are used by CP FW.
Signed-off-by: David Yat Sin <David.YatSin@amd.com> Reviewed-by: Jay Cornwall <jay.cornwall@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2524 Fixes: 3712e7a49459 ("drm/amd/pm: unified lock protections in amdgpu_dpm.c") Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Tested-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> # on Oland PRO Signed-off-by: chr[] <chris@rudorff.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Wed, 29 Jan 2025 15:28:49 +0000 (16:28 +0100)]
drm/amdgpu: remove all KFD fences from the BO on release
Remove all KFD BOs from the private dma_resv object.
This prevents the KFD from being evict unecessarily when an exported BO
is released.
Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: James Zhu <James.Zhu@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Reviewed-and-tested-by: James Zhu <James.Zhu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Add clear DCC and Tiling callback for DCE
Introduce the DCC and Tiling reset callback to all DCE versions that can
call it.
Reviewed-by: Alvin Lee <alvin.lee2@amd.com> Signed-off-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdkfd: Fix error handling for missing PASID in 'kfd_process_device_init_vm'
In the kfd_process_device_init_vm function, a valid error code is now
returned when the associated Process Address Space ID (PASID) is not
present.
If the address space virtual memory (avm) does not have an associated
PASID, the function sets the ret variable to -EINVAL before proceeding
to the error handling section. This ensures that the calling function,
such as kfd_ioctl_acquire_vm, can appropriately handle the error,
thereby preventing any issues during virtual memory initialization.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c:1694 kfd_process_device_init_vm()
warn: missing error code 'ret'
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_process.c
1647 int kfd_process_device_init_vm(struct kfd_process_device *pdd,
1648 struct file *drm_file)
1649 {
...
1690
1691 if (unlikely(!avm->pasid)) {
1692 dev_warn(pdd->dev->adev->dev, "WARN: vm %p has no pasid associated",
1693 avm);
--> 1694 goto err_get_pasid;
ret = -EINVAL?
1695 }
Fixes: 8544374c0f82 ("drm/amdkfd: Have kfd driver use same PASID values from graphic driver")
Reported by: Dan Carpenter <dan.carpenter@linaro.org> Cc: Xiaogang Chen <xiaogang.chen@amd.com> Cc: Felix Kuehling <felix.kuehling@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Wed, 29 Jan 2025 13:31:25 +0000 (19:01 +0530)]
drm/amdgpu: Add ring reset callback for JPEG4_0_3
Add ring reset function callback for JPEG4_0_3 to
recover from job timeouts without a full gpu reset.
V2:
- sched->ready flag shouldn't be modified by HW backend (Christian)
V3:
- Dont modifying sched/job-submission state from HW backend (Christian)
- Implement per-core reset sequence
V4:
- Dont create reset_mask sysfs and return -EOPNOTSUPP on VFs (Lijo)
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Sathishkumar S [Wed, 12 Feb 2025 04:51:40 +0000 (10:21 +0530)]
drm/amdgpu: Add JPEG4_0_3 core reset control reg
Add core reset control registers for JPEG4_0_3
Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV
RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.
The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.
The [ BUG: Invalid wait context ] indicates that a thread is trying to
acquire a mutex while it is in a context that does not allow it to sleep
(like holding a spinlock).
v2: s/spin_trylock/spin_lock_irqsave to be safe (Christian).
Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface") Cc: lin cao <lin.cao@amd.com> Cc: Jingwen Chen <Jingwen.Chen2@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Zhigang Luo <zhigang.luo@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Taimur Hassan [Mon, 10 Feb 2025 00:10:14 +0000 (19:10 -0500)]
drm/amd/display: 3.2.321
Summary:
* Add support for disconnected eDP streams
* Add log for MALL entry on DCN32x
* Add DCC/Tiling reset helper for DCN and DCE
* Guard against setting dispclk low when active
* Other minor fixes
Reviewed-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Add support for disconnected eDP streams
[Why]
eDP may not be connected to the GPU on driver start causing
fail enumeration.
[How]
Move the virtual signal type check before the eDP connector
signal check.
Reviewed-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Harry VanZyllDeJong <hvanzyll@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Peichen Huang [Tue, 4 Feb 2025 08:00:40 +0000 (16:00 +0800)]
drm/amd/display: dpia should avoid encoder used by dp2
[WHY]
In current HPO DP2 implementation, driver would enable/disable DIG
encoder when configuring HPO DP2. Therefore, usb4 dp tunnelling should
not use the DIG encoder if the corresponded phy is used by a HPO DP2
stream.
[HOW]
A DP2 stream is treated as a dig stream.
Reviewed-by: Meenakshikumar Somasundaram <meenakshikumar.somasundaram@amd.com> Signed-off-by: Peichen Huang <PeiChen.Huang@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>