Ville Syrjälä [Fri, 29 Nov 2024 06:50:11 +0000 (08:50 +0200)]
drm/i915/fb: Relax clear color alignment to 64 bytes
Mesa changed its clear color alignment from 4k to 64 bytes
without informing the kernel side about the change. This
is now likely to cause framebuffer creation to fail.
The only thing we do with the clear color buffer in i915 is:
1. map a single page
2. read out bytes 16-23 from said page
3. unmap the page
So the only requirement we really have is that those 8 bytes
are all contained within one page. Thus we can deal with the
Mesa regression by reducing the alignment requiment from 4k
to the same 64 bytes in the kernel. We could even go as low as
32 bytes, but IIRC 64 bytes is the hardware requirement on
the 3D engine side so matching that seems sensible.
Note that the Mesa alignment chages were partially undone
so the regression itself was already fixed on userspace
side.
Rodrigo Vivi [Fri, 10 Jan 2025 14:46:39 +0000 (09:46 -0500)]
drm/i915/guc/slpc: Allow GuC SLPC default strategies on MTL+
The Balancer and DCC strategies were left off on a fear that
these strategies would conflict with the i915's waitboost.
However, on MTL and Beyond these strategies are only active in
certain conditions where the system is TDP limited.
So, they don't conflict, but help the
waitboost by guaranteeing a bit more of GT frequency.
Without these strategies we were likely leaving some performance
behind on some scenarios.
With this change in place, the enabling/disabling of DCC and Balancer
will now be chosen by GuC, on a platform/GT basis.
v2: - Fix typos and be clear on GuC decision on platform basis (Vinay)
- Limit change to MTL and beyond, where GuC started to take
TDP limit into consideration.
v3: Fix compilation. Actually amend the changes...
The last use of intel_gvt_in_force_nonpriv_whitelist() was
removed in 2020 by
commit 02dd2b12a685 ("drm/i915/gvt: unify lri cmd handler and mmio
handlers")
intel_gvt_ggtt_h2g_index() and intel_gvt_ggtt_index_g2h() were
added in 2016 by
commit 2707e4446688 ("drm/i915/gvt: vGPU graphics memory virtualization")
but haven't been used.
Remove them.
They were the only users of intel_gvt_ggtt_gmadr_g2h() and
intel_gvt_ggtt_gmadr_h2g().
drm/i915/display: Update DBUF_TRACKER_STATE_SERVICE only on appropriate platforms
The bspec only asks the driver to reprogram the DBUF_CTL's
DBUF_TRACKER_STATE_SERVICE field to 0x8 on DG2 and platforms with
display version 12. All other platforms should avoid reprogramming
this register at driver init.
Although we've been accidentally reprogramming DBUF_CTL on platforms
where the spec does not ask us to, that mistake has been harmless so
far because the value being programmed by the driver happened to
match the hardware's default settings.
So, update DBUF_TRACKER_STATE_SERVICE field to 0x8 only for
1. display version 12
2. DG2.
Other platforms unless stated should use their default value.
Bspec: 49213 Signed-off-by: Ravi Kumar Vodapalli <ravi.kumar.vodapalli@intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20250108200210.1815229-1-ravi.kumar.vodapalli@intel.com
[mattrope: Tweaked patch subject to accurately reflect content] Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
Gustavo Sousa [Thu, 19 Dec 2024 22:14:16 +0000 (19:14 -0300)]
drm/i915/dmc_wl: Allow enable_dmc_wl=3 to mean "always locked"
When debugging issues that might be related to the DMC wakelock code, it
might be useful to compare runs with the lock acquired while DC states
are enabled vs the regular case. If issues disappear with the former, it
might be a symptom of something wrong in our code. Support having this
"always locked" behavior with enable_dmc_wl=3.
Gustavo Sousa [Thu, 19 Dec 2024 22:14:15 +0000 (19:14 -0300)]
drm/i915/dmc_wl: Allow enable_dmc_wl=2 to mean "match any register"
When debugging issues that might be related to the DMC wakelock code, it
is sometimes useful to compare runs when we match any register offset vs
the regular case. If issues disappear when we take the wakelock for any
register, it might indicate that we are missing some offset to be
tracked. Support matching any register offset with enable_dmc_wl=2.
Gustavo Sousa [Thu, 19 Dec 2024 22:14:14 +0000 (19:14 -0300)]
drm/i915/dmc_wl: Show description string for enable_dmc_wl
We already provide the value resulting from sanitization of
enable_dmc_wl in dmesg, however the reader will need to either have the
meanings memorized or look them up in the parameter's documentation.
Let's make things easier by providing a short human-readable name for
the parameter in dmesg.
Gustavo Sousa [Thu, 19 Dec 2024 22:14:13 +0000 (19:14 -0300)]
drm/i915/dmc_wl: Use enum values for enable_dmc_wl
Currently, after sanitization, enable_dmc_wl will behave like a boolean
parameter (enabled vs disabled). However, in upcoming changes, we will
allow more values for debugging purposes. For that, let's make the
sanitized value an enumeration.
Jani Nikula [Tue, 31 Dec 2024 16:27:38 +0000 (18:27 +0200)]
drm/i915/pmdemand: make struct intel_pmdemand_state opaque
Only intel_pmdemand.c should look inside the struct
intel_pmdemand_state. Indeed, this is already the case. Finish the job
and make struct intel_pmdemand_state opaque, preventing any direct pokes
at the guts of it in the future.
Jani Nikula [Fri, 3 Jan 2025 13:52:39 +0000 (15:52 +0200)]
drm/i915/dp: compute config for 128b/132b SST w/o DSC
Enable basic 128b/132b SST functionality without compression. Reuse
intel_dp_mtp_tu_compute_config() to figure out the TU after we've
determined we need to use an UHBR rate.
It's slightly complicated as the M/N computation is done in different
places in MST and SST paths, so we need to avoid trashing the values
later for UHBR.
If uncompressed UHBR fails, we drop to compressed non-UHBR, which is
quite likely to fail as well. We still lack 128b/132b SST+DSC.
We need mst_master_transcoder also for 128b/132b SST. Use cpu_transcoder
directly. Enhanced framing is "don't care" for 128b/132b link.
Jani Nikula [Fri, 3 Jan 2025 13:52:36 +0000 (15:52 +0200)]
drm/i915/ddi: start distinguishing 128b/132b SST and MST at state readout
We'll want to distinguish 128b/132b SST and MST modes at state
readout. There's a catch, though. From the hardware perspective,
128b/132b SST and MST programming are pretty much the same. And we can't
really ask the sink at this point.
If we have more than one transcoder in 128b/132b mode associated with
the port, we can safely assume it's MST. But for MST with only a single
stream enabled, we are pretty much out of luck. Let's fall back to
looking at the software state, i.e. intel_dp->is_mst. It should be fine
for the state checker, but for hardware takeover at probe, we'll have to
trust the GOP has only enabled SST.
TODO: Not sure how this *or* our current code handles 128b/132b enabled
by GOP.
Jani Nikula [Fri, 3 Jan 2025 13:52:30 +0000 (15:52 +0200)]
drm/i915/mst: adapt intel_dp_mtp_tu_compute_config() for 128b/132b SST
Handle 128b/132b SST in intel_dp_mtp_tu_compute_config(). The remote
bandwidth overhead and time slot allocation are only relevant for MST;
SST only needs the local bandwidth and a check that 64 slots isn't
exceeded.
Jani Nikula [Fri, 3 Jan 2025 13:52:29 +0000 (15:52 +0200)]
drm/i915/mst: split out a helper for figuring out the TU
Extract intel_dp_mtp_tu_compute_config() for figuring out the TU. Move
the link configuration and mst state access to the callers. This will be
easier to adapt to 128b/132b SST.
Jani Nikula [Fri, 3 Jan 2025 13:52:27 +0000 (15:52 +0200)]
drm/i915/mst: change return value of mst_stream_find_vcpi_slots_for_bpp()
The callers of mst_stream_find_vcpi_slots_for_bpp() don't need the
returned slots for anything. On the contrary, they need to jump through
hoops to just distinguish between success and failure. Just return 0
instead of slots from mst_stream_find_vcpi_slots_for_bpp() for success,
and simplify the callers.
There's a pointless ret local variable that we can drop in the process.
Jani Nikula [Fri, 3 Jan 2025 13:52:24 +0000 (15:52 +0200)]
drm/mst: remove mgr parameter and debug logging from drm_dp_get_vc_payload_bw()
The struct drm_dp_mst_topology_mgr *mgr parameter is only used for debug
logging in case the passed in link rate or lane count are zero. There's
no further error checking as such, and the function returns 0.
There should be no case where the parameters are zero. The returned
value is generally used as a divisor, and if we were hitting this, we'd
be seeing division by zero.
Just remove the debug logging altogether, along with the mgr parameter,
so that the function can be used in non-MST contexts without the
topology manager.
v2: Also remove drm_dp_mst_helper_tests_init as unnecessary (Imre)
Animesh Manna [Mon, 6 Jan 2025 09:44:08 +0000 (15:14 +0530)]
drm/i915/display: Adjust Added Wake Time with PKG_C_LATENCY
Increase the PKG_C_LATENCY Pkg C Latency field by the added wake time.
v1: Initial version.
v2: Rebase and cosmetic changes.
v3:
- Place latency adjustment early to accommodate round-up. [Suraj]
- Modify commit description and cosmetic change. [Suraj]
Ankit Nautiyal [Fri, 3 Jan 2025 03:14:24 +0000 (08:44 +0530)]
drm/i915/dp: Return early if dsc is required but not supported
Currently, when bandwidth is insufficient for a given mode, we attempt
to use DSC. This is indicated by a debug print, followed by a check for
DSC support.
The debug message states that we are trying DSC, but DSC might not be
supported, which can give an incorrect picture in the logs if we bail
out later.
Correct the order for both DP and DP MST to:
- Check if DSC is required and supported, and return early if DSC is
not supported.
- Print a debug message to indicate that DSC will be tried next.
Suraj Kandpal [Fri, 3 Jan 2025 08:45:17 +0000 (14:15 +0530)]
Revert "drm/i915/hdcp: Don't enable HDCP1.4 directly from check_link"
This reverts commit 483f7d94a0453564ad9295288c0242136c5f36a0.
This needs to be reverted since HDCP even after updating the connector
state HDCP property we don't reenable HDCP until the next commit
in which the CP Property is set causing compliance to fail.
Jani Nikula [Mon, 30 Dec 2024 14:14:45 +0000 (16:14 +0200)]
drm/i915/ddi: only call shutdown hooks for valid encoders
DDI might be HDMI or DP only, leaving the other encoder
uninitialized. Calling the shutdown hook on an uninitialized encoder may
lead to a NULL pointer dereference. Check the encoder types (and thus
validity via the DP output_reg or HDMI hdmi_reg checks) before calling
the hooks.
Jani Nikula [Mon, 30 Dec 2024 14:14:43 +0000 (16:14 +0200)]
drm/i915/ddi: gracefully handle errors from intel_ddi_init_hdmi_connector()
Errors from intel_ddi_init_hdmi_connector() can just mean "there's no
HDMI" while we'll still want to continue with DP only. Handle the errors
gracefully, but don't propagate. Clear the hdmi_reg which is used as a
proxy to indicate the HDMI is initialized.
Jani Nikula [Mon, 30 Dec 2024 14:14:40 +0000 (16:14 +0200)]
drm/i915/ddi: change intel_ddi_init_{dp, hdmi}_connector() return type
The caller doesn't actually need the returned struct intel_connector;
it's stored in the ->attached_connector of intel_dp and
intel_hdmi. Switch to returning an int with 0 for success and negative
errors codes to be able to indicate success even when we don't have a
connector.
Ankit Nautiyal [Tue, 17 Dec 2024 09:32:43 +0000 (15:02 +0530)]
drm/i915/dp: Set the DSC link limits in intel_dp_compute_config_link_bpp_limits
The helper intel_dp_compute_config_link_bpp_limits is the correct place
to set the DSC link limits. Move the code to this function and remove
the #TODO item.
v2: Add argument intel_connector to the helper to get correct connector
for DP MST. (Imre)
v3: Remove redundant calls to intel_dp_dsc_sink_max_compressed_bpp as
its already accounted while setting link bpp limits.
Ankit Nautiyal [Tue, 17 Dec 2024 09:32:42 +0000 (15:02 +0530)]
drm/i915/dp: Make dsc helpers accept const crtc_state pointers
Modify the dsc helpers to get max/min compressed bpp to accept
`const struct intel_crtc_state *` pointers instead of
`struct intel_crtc_state *`.
These helpers are not supposed to modify `crtc_state`.
Accepting const pointers will allow these helpers to be called from
functions that have const pointer to crtc_state.
Ankit Nautiyal [Tue, 17 Dec 2024 09:32:41 +0000 (15:02 +0530)]
drm/i915/dp: Use clamp for pipe_bpp limits with DSC
Currently to get the max pipe_bpp with dsc we take the min of
limits->pipe.max_bpp and dsc max bpp (dsc max bpc * 3). This can result
in problems when limits->pipe.max_bpp is less than the computed dsc min bpp
(dsc min bpc * 3).
Replace the min/max functions with clamp while computing
limits->pipe.max/min_bpp to ensure that the pipe_bpp limits are constrained
within the DSC-defined minimum and maximum values.
Ankit Nautiyal [Tue, 17 Dec 2024 09:32:39 +0000 (15:02 +0530)]
drm/i915/dp: Refactor pipe_bpp limits with dsc
With DSC there are additional limits for pipe_bpp. Currently these are
scattered in different places.
Instead set the limits->pipe.max/min_bpp in one place and use them
wherever required.
Ankit Nautiyal [Tue, 17 Dec 2024 09:32:38 +0000 (15:02 +0530)]
drm/i915/dp: Drop max_requested_bpc for dsc pipe_min/max bpp
Currently we are including both max_requested_bpc and
limits->pipe.bpp_max while computing maximum possible pipe bpp with dsc.
However, while setting limits->pipe.max_bpp, the max_requested_bpc is
already taken into account.
Drop the redundant check for max_requested_bpc and use only
limits->pipe.bpp_max. This will also result in dropping conn_state
argument in functions where it was used only to get max_requested_bpc.
Ankit Nautiyal [Tue, 17 Dec 2024 09:32:36 +0000 (15:02 +0530)]
drm/i915/dp: Return int from dsc_max/min_src_input_bpc helpers
Use ints for dsc_max/min_bpc instead of u8 in
dsc_max/min_src_input_bpc helpers and their callers.
This will also help replace min_t/max_t macros with min/max ones.
Ankit Nautiyal [Tue, 17 Dec 2024 09:32:31 +0000 (15:02 +0530)]
drm/i915/dp: Refactor FEC support check in intel_dp_supports_dsc
Forward Error Correction is required for DP if we are using DSC but
is optional for eDP.
Currently the helper intel_dp_supports_dsc checks if fec_enable is set for
DP or not. The helper is called after fec_enable is set in crtc_state.
Instead of this a better approach would be to:
first, call intel_dp_supports_dsc to check for DSC support
(along with FEC requirement for DP) and then set fec_enable for DP
(if not already set) in crtc_state.
To achieve this, remove the check for fec_enable in the helper and instead
check for FEC support for DP. With this change the helper
intel_dp_supports_dsc can be called earlier and return early if DSC is
not supported. The structure intel_dp is added to the helper to get the
FEC support for DP.
v2: Pass intel_dp to adjust_limits_for_dsc_hblank_expansion_quirk
instead of deriving it from connector. (Jani)
drm/i915/selftests: Use preemption timeout on cleanup
Many selftests call igt_flush_test() on cleanup. With default preemption
timeout of compute engines raised to 7.5 seconds, hardcoded flush timeout
of 3 seconds is too short. That results in GPU forcibly wedged and kernel
taineted, then IGT abort triggered. CI BAT runs loose a part of their
expected coverage.
Calculate the flush timeout based on the longest preemption timeout
currently configured for any engine. That way, selftest can still report
detected issues as non-critical, and the GPU gets a chance to recover from
preemptible hangs and prepare for fluent execution of next test cases.
Add a optional properties to change LVDS output voltage. This should not
be static as this depends mainly on the connected display voltage
requirement. We have three properties:
- "ti,lvds-termination-ohms", which sets near end termination,
- "ti,lvds-vod-swing-data-microvolt" and
- "ti,lvds-vod-swing-clock-microvolt" which both set LVDS differential
output voltage for data and clock lanes. They are defined as an array
with min and max values. The appropriate bitfield will be set if
selected constraints can be met.
If "ti,lvds-termination-ohms" is not defined the default of 200 Ohm near
end termination will be used. Selecting only one:
"ti,lvds-vod-swing-data-microvolt" or
"ti,lvds-vod-swing-clock-microvolt" can be done, but the output voltage
constraint for only data/clock lanes will be met. Setting both is
recommended.
Andrej Picej [Mon, 16 Dec 2024 08:54:08 +0000 (09:54 +0100)]
dt-bindings: drm/bridge: ti-sn65dsi83: Add properties for ti,lvds-vod-swing
Add properties which can be used to specify LVDS differential output
voltage. Since this also depends on near-end signal termination also
include property which sets this. LVDS differential output voltage is
specified with an array (min, max), which should match the one from
connected device.
Tomi Valkeinen [Wed, 23 Oct 2024 11:52:43 +0000 (14:52 +0300)]
drm: xlnx: zynqmp_dpsub: Add DP audio support
Add basic DisplayPort audio support.
Support non-live audio playback from two PCMs (DMA channels), and the
volume control in the audio mixer.
As older dtb files may not have the audio DMA channels defined, the
driver will just mark the audio support as disabled if the audio DMA is
missing, and will continue with only display support.
Note: Reset doesn't seem to work (ZYNQMP_DISP_AUD_SOFT_RESET). If we do
a reset, audio playback won't start again even if, afaics, we do set up
all the necessary registers. So, at the moment, resetting the audio
block in dp_dai_hw_free() is commented out.
The DP subsystem for ZynqMP supports audio via two channels, and the DP
DMA has dma-engines for those channels. For some reason the DT binding
has not specified those channels, even if the picture included in
xlnx,zynqmp-dpsub.yaml shows "2 x aud" DMAs.
This hasn't caused any issues as the drivers have not supported audio,
and has thus gone unnoticed.
To make it possible to add the audio support to the driver, add the two
audio DMAs to the binding. While strictly speaking this is an ABI break,
there should be no regressions caused by this as we're adding new
entries at the end of the dmas list, and, after the audio support has
been added in "arm64: dts: zynqmp: Add DMA for DP audio", the driver
will treat the audio DMAs as optional to also support the old bindings.
Dave Airlie [Wed, 18 Dec 2024 21:59:20 +0000 (07:59 +1000)]
Merge tag 'drm-intel-gt-next-2024-12-18' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Driver Changes:
- More accurate engine busyness metrics with GuC submission (Umesh)
- Ensure partial BO segment offset never exceeds allowed max (Krzysztof)
- Flush GuC CT receive tasklet during reset preparation (Zhanjun)
Michel Dänzer [Tue, 17 Dec 2024 17:22:56 +0000 (18:22 +0100)]
drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update
Third time's the charm, I hope?
Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Replacing kmalloc() + memcpy() with kmemdump() doesn't change semantics.
Original code works without fault, so this is not a bug fix but proposed improvement.
Link: https://lwn.net/Articles/198928/ Fixes: 84a2947ecc85 ("drm/amdgpu: Implement virt req_ras_err_count") Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Zhigang Luo <Zhigang.Luo@amd.com> Cc: Victor Skvortsov <victor.skvortsov@amd.com> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: Lijo Lazar <lijo.lazar@amd.com> Cc: Yunxiang Li <Yunxiang.Li@amd.com> Cc: Jack Xiao <Jack.Xiao@amd.com> Cc: Vignesh Chander <Vignesh.Chander@amd.com> Cc: Danijel Slivka <danijel.slivka@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Mirsad Todorovac <mtodorovac69@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Fix NULL pointer dereference in dmub_tracebuffer_show
It corrects the issue by checking if 'adev->dm.dmub_srv' is NULL before
accessing its 'meta_info' member. This ensures that we do not
dereference a NULL pointer.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_debugfs.c:917 dmub_tracebuffer_show()
warn: address of 'adev->dm.dmub_srv->meta_info' is non-NULL
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm_debugfs.c
901 static int dmub_tracebuffer_show(struct seq_file *m, void *data)
902 {
903 struct amdgpu_device *adev = m->private;
904 struct dmub_srv_fb_info *fb_info = adev->dm.dmub_fb_info;
905 struct dmub_fw_meta_info *fw_meta_info = &adev->dm.dmub_srv->meta_info;
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Even if adev->dm.dmub_srv is NULL, the address of ->meta_info can't be NULL
v2: Initialize struct dmub_fw_meta_info *fw_meta_info to NULL (Dan Carpenter)
Fixes: 5a498172c8d0 ("drm/amd/display: Make DMCUB tracebuffer debugfs chronological") Cc: Leo Li <sunpeng.li@amd.com> Cc: Tom Chung <chiahsuan.chung@amd.com> Cc: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Cc: Roman Li <roman.li@amd.com> Cc: Alex Hung <alex.hung@amd.com> Cc: Aurabindo Pillai <aurabindo.pillai@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Cc: Hamza Mahfooz <hamza.mahfooz@amd.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 3 Dec 2024 15:00:25 +0000 (10:00 -0500)]
drm/amdgpu: Show warning message if IH ring overflow
If IH primary ring and KFD ih fifo overflows, we may miss CP, SDMA
interrupts and cause application soft hang. Show warning message with
ring name if overflow happens.
Add function to get ih ring name to avoid duplicating it. To keep
warning message consistent between GPU generations, change all
*_ih.c except ASICs older than Vega which has only one ih ring.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Wed, 4 Dec 2024 22:49:08 +0000 (17:49 -0500)]
drm/amdkfd: Improve signal event slow path
If event slot is not signaled, kfd_signal_event_interrupt goes to slow
path to scan all event slots to find the signaled event, this is needed
for old ASICs that don't have the event ID or the event IDs are
incorrect in the IH payload.
There is case that GPU signal the same event twice, then driver process
the first event interrupt, set_event and event slot is auto-reset, then
for the second event interrupt, KFD goes to slow path as event is not
signaled, just drop the second event interrupt because the application
only need wakeup once.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Tue, 26 Nov 2024 16:33:15 +0000 (11:33 -0500)]
drm/amdkfd: Queue interrupt work to different CPU
For CPX mode, each KFD node has interrupt worker to process ih_fifo to
send events to user space. Currently all interrupt workers of same adev
queue to same CPU, all workers execution are actually serialized and
this cause KFD ih_fifo overflow when CPU usage is high.
Use per-GPU unbounded highpri queue with number of workers equals to
number of partitions, let queue_work select the next CPU round robin
among the local CPUs of same NUMA.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
After GPU page fault, there are lots of page fault interrupts generated
at short period even with CAM filter enabled because the fault address
is different. Each page fault copy to KFD ih fifo to send event to user
space by KFD interrupt worker, this could cause KFD ih fifo overflow
while other processes generate events at same time.
KFD process is aborted after GPU page fault, we only need one GPU page
fault interrupt sent to KFD ih fifo to send memory exception event to
user space.
Incease KFD ih fifo size to 2 times of IH primary ring size, to handle
the burst events case.
This patch handle the gfx v9 path, cover retry on/off and CAM filter
on/off cases.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Philip Yang [Fri, 22 Nov 2024 22:36:15 +0000 (17:36 -0500)]
drm/amdkfd: KFD interrupt access ih_fifo data in-place
To handle 40000 to 80000 interrupts per second running CPX mode with 4
streams/queues per KFD node, KFD interrupt handler becomes the
performance bottleneck.
Remove the kfifo_out memcpy overhead by accessing ih_fifo data in-place
and updating rptr with kfifo_skip_count.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This commit introduced a new state variable into adev without even
remotely worrying about CPU barriers.
Since we already have the amdgpu_in_reset() function exactly for this
use case partially revert that.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 12 Dec 2024 15:43:45 +0000 (16:43 +0100)]
drm/amdgpu: set the VM pointer to NULL in amdgpu_job_prepare
As soon as the prepare phase is completed the VM might be released,
better set it to NULL.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Christian König [Thu, 12 Dec 2024 15:29:18 +0000 (16:29 +0100)]
drm/amdgpu: fix amdgpu_coredump
The VM pointer might already be outdated when that function is called.
Use the PASID instead to gather the information instead.
Signed-off-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The sdma context empty interrupt is dropped in amdgpu_irq_dispatch
as unregistered interrupt src_id 243, this interrupt accounts to 1/3 of
total interrupts and causes IH primary ring overflow when running
stressful benchmark application. Disable this interrupt has no side
effect found.
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Karol Przybylski [Sun, 15 Dec 2024 12:28:57 +0000 (13:28 +0100)]
drm/amdgpu: Fix potential integer overflow in scheduler mask calculations
The use of 1 << i in scheduler mask calculations can result in an
unintentional integer overflow due to the expression being
evaluated as a 32-bit signed integer.
This patch replaces 1 << i with 1ULL << i to ensure the operation
is performed as a 64-bit unsigned integer, preventing overflow