Thomas Hellström [Thu, 27 Jan 2022 11:56:22 +0000 (12:56 +0100)]
drm/i915: Fix a race between vma / object destruction and unbinding
The vma destruction code was using an unlocked advisory check for
drm_mm_node_allocated() to avoid racing with eviction code unbinding
the vma.
This is very fragile and prohibits the dereference of non-refcounted
pointers of dying vmas after a call to __i915_vma_unbind(). It also
prohibits the dereference of vma->obj of refcounted pointers of
dying vmas after a call to __i915_vma_unbind(), since even if a
refcount is held on the vma, that won't guarantee that its backing
object doesn't get destroyed.
So introduce an unbind under the vm mutex at object destroy time,
removing all weak references of the vma and its object from the
object vma list and from the vm bound list.
drm/i915/pmu: Fix KMD and GuC race on accessing busyness
GuC updates shared memory and KMD reads it. Since this is not
synchronized, we run into a race where the value read is inconsistent.
Sometimes the inconsistency is in reading the upper MSB bytes of the
last_switch_in value. 2 types of cases are seen - upper 8 bits are zero
and upper 24 bits are zero. Since these are non-zero values, it is
not trivial to determine validity of these values. Instead we read the
values multiple times until they are consistent. In test runs, 3
attempts results in consistent values. The upper bound is set to 6
attempts and may need to be tuned as per any new occurences.
Since the duration that gt is parked can vary, the patch also updates
the gt timestamp on unpark before starting the worker.
v2:
- Initialize i
- Use READ_ONCE to access engine record
drm/i915/guc: Update guc shim control programming on newer platforms
Starting from xehpsdv, bit 0 of the GuC shim control register has
been repurposed, while bit 2 is now reserved, so we need to avoid
setting those for their old meaning on newer platforms.
Starting from DG2, some of the programming previously done by i915 and
the GuC has been moved to the GSC and the relevant registers are no
longer writable by either CPU or GuC. This is also referred to as GuC
deprivilege.
On the i915 side, this affects the WOPCM registers: these are no longer
programmed by the driver and we do instead expect to find them already
set. This can lead to verification failures because in i915 we cheat a bit
with the WOPCM size defines, to keep the code common across platforms, by
sometimes using a smaller WOPCM size that the actual HW support (which isn't
a problem because the extra size is not needed if the FW fits in the smaller
chunk), while the pre-programmed values can use the actual size.
Given tha the new programming entity is trusted, relax the amount of the
checks done on the pre-programmed values by not limiting the max
programmed size. In the extremely unlikely scenario that the registers
have been misprogrammed, we will still fail later at DMA time.
Anusha Srivatsa [Tue, 25 Jan 2022 22:36:25 +0000 (14:36 -0800)]
drm/i915/rpl-s: Add stepping info
Add stepping-substepping info in
accordance to BSpec changes.
Though it looks weird, the revision ID
for the newer stepping is indeed backwards
and is in accordance to the spec.
v2: Rearrange the platforms in logical order (Matt)
drm/i915/guc: Use struct_size() helper in kmalloc()
Make use of the struct_size() helper instead of an open-coded version,
in order to avoid any potential type mistakes or integer overflows that,
in the worst scenario, could lead to heap overflows.
Also, address the following sparse warnings:
drivers/gpu/drm/i915/gt/uc/intel_guc_ct.c:792:23: warning: using sizeof on a flexible structure
drm/i915/pmu: Use PM timestamp instead of RING TIMESTAMP for reference
All timestamps returned by GuC for GuC PMU busyness are captured from
GUC PM TIMESTAMP. Since this timestamp does not tick when GuC goes idle,
kmd uses RING_TIMESTAMP to measure busyness of an engine with an active
context. In further stress testing, the MMIO read of the RING_TIMESTAMP
is seen to cause a rare hang. Resolve the issue by using gt specific
timestamp from PM which is in sync with the GuC PM timestamp.
Matthew Brost [Sat, 22 Jan 2022 00:08:22 +0000 (16:08 -0800)]
drm/i915/selftests: Use less in contexts steal guc id test
Using more guc_ids in the stealing guc id test has no real benefit.
Tearing down lots of contexts all at the same time takes a bit of time
due to the H2G / G2H ping-pong with the GuC. On some slower platforms
this can cause timeous when flushing the test as the GT isn't idle when
this ping-pong is happening. Reduce the number of guc ids to speed up
the flushing of the test.
Matthew Brost [Wed, 19 Jan 2022 21:06:39 +0000 (13:06 -0800)]
drm/i915/guc: Ensure multi-lrc fini breadcrumb math is correct
Realized that the GuC multi-lrc fini breadcrumb emit code is very
delicate as the math this code does relies on functions it calls to emit
a certain number of DWs. Add a few GEM_BUG_ONs to assert the math is
correct.
v2:
- Rebase + resend for CI
(Checkpatch)
- Fix blank line warning
Andi Shyti [Mon, 24 Jan 2022 09:44:18 +0000 (11:44 +0200)]
drm/i915: fix header file inclusion for might_alloc()
Replace "linux/slab.h" with "linux/sched/mm.h" header inclusion
as the first is not required, while the second, if not included,
prodouces the following error:
drivers/gpu/drm/i915/i915_vma_resource.c: In function ‘i915_vma_resource_bind_dep_await’:
drivers/gpu/drm/i915/i915_vma_resource.c:381:9: error: implicit declaration of function ‘might_alloc’; did you mean ‘might_lock’? [-Werror=implicit-function-declaration]
381 | might_alloc(gfp);
| ^~~~~~~~~~~
| might_lock
Fixes: 2f6b90da9192 ("drm/i915: Use vma resources for async unbinding") Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220124094418.2661-1-andi.shyti@linux.intel.com
Matthew Brost [Fri, 21 Jan 2022 04:31:18 +0000 (20:31 -0800)]
drm/i915/guc: Flush G2H handler during a GT reset
Now that the error capture is fully decoupled from fence signalling
(request retirement to free memory, which in turn depends on resets) we
can safely flush the G2H handler during a GT reset. This eliminates
corner cases where GuC generated G2H (e.g. engine resets) race with a GT
reset.
v2:
(John Harrison)
- Fix typo in commit message (s/is/in)
Matthew Brost [Fri, 21 Jan 2022 04:31:17 +0000 (20:31 -0800)]
drm/i915/guc: Add work queue to trigger a GT reset
The G2H handler needs to be flushed during a GT reset but a G2H
indicating engine reset failure can trigger a GT reset. Add a worker to
trigger the GT rest when an engine reset failure is received to break
this circular dependency.
v2:
(John Harrison)
- Store engine reset mask
- Fix typo in commit message
v3:
(John Harrison)
- Fix another typo in commit message
- s/reset_*/reset_fail_*/
Matthew Brost [Thu, 13 Jan 2022 18:13:51 +0000 (10:13 -0800)]
drm/i915/guc: Remove hacks for reset and schedule disable G2H being received out of order
In the i915 there are several hacks in place to make request cancellation
work with an old version of the GuC which delivered the G2H indicating
schedule disable is done before G2H indicating a context reset. Version
69 fixes this, so we can remove these hacks.
Matthew Brost [Thu, 13 Jan 2022 18:13:50 +0000 (10:13 -0800)]
drm/i915/selftests: Add a cancel request selftest that triggers a reset
Add a cancel request selftest that results in an engine reset to cancel
the request as it is non-preemptable. Also insert a NOP request after
the cancelled request and confirm that it completes successfully.
v2:
(Tvrtko)
- Skip test if preemption timeout compiled out
- Skip test if engine reset isn't supported
- Update debug prints to be more descriptive
v3:
- Add comment explaining test
v4:
(John Harrison)
- Fix typos in comment explaining test
- goto out_rq is NOP creation fails
drm/i915: Remove short-term pins from execbuf, v6.
Add a flag PIN_VALIDATE, to indicate we don't need to pin and only
protected by the object lock.
This removes the need to unpin, which is done by just releasing the
lock.
eb_reserve is slightly reworked for readability, but the same steps
are still done:
- First pass pins with NONBLOCK.
- Second pass unbinds all objects first, then pins.
- Third pass is only called when not all objects are softpinned, and
unbinds all objects, then calls i915_gem_evict_vm(), then pins.
Changes since v1:
- Split out eb_reserve() into separate functions for readability.
Changes since v2:
- Make batch buffer mappable on platforms where only GGTT is available,
to prevent moving the batch buffer during relocations.
Changes since v3:
- Preserve current behavior for batch buffer, instead be cautious when
calling i915_gem_object_ggtt_pin_ww, and re-use the current batch vma
if it's inside ggtt and map-and-fenceable.
- Remove impossible condition check from eb_reserve. (Matt)
Changes since v5:
- Do not even temporarily pin, just call i915_gem_evict_vm() and mark
all vma's as unpinned.
drm/i915: Add i915_vma_unbind_unlocked, and take obj lock for i915_vma_unbind, v2.
We want to remove more members of i915_vma, which requires the locking to
be held more often.
Start requiring gem object lock for i915_vma_unbind, as it's one of the
callers that may unpin pages.
Some special care is needed when evicting, because the last reference to
the object may be held by the VMA, so after __i915_vma_unbind, vma may be
garbage, and we need to cache vma->obj before unlocking.
Changes since v1:
- Make trylock failing a WARN. (Matt)
- Remove double i915_vma_wait_for_bind() (Matt)
- Move atomic_set to right before mutex_unlock(), to make it more clear
they belong together. (Matt)
i915_gem_evict_vm will need to be able to evict objects that are
locked by the current ctx. By testing if the current context already
locked the object, we can do this correctly. This allows us to
evict the entire vm even if we already hold some objects' locks.
Previously, this was spread over several commits, but it makes
more sense to commit the changes to i915_gem_evict_vm separately
from the changes to i915_gem_evict_something() and
i915_gem_evict_for_node().
Changes since v1:
- Handle evicting dead objects better.
Changes since v2:
- Use for_i915_gem_ww in igt_evict_vm. (Thomas)
drm/i915: Call i915_gem_evict_vm in vm_fault_gtt to prevent new ENOSPC errors, v2.
Now that we cannot unbind kill the currently locked object directly
because we're removing short term pinning, we may have to unbind the
object from gtt manually, using a i915_gem_evict_vm() call.
Changes since v1:
- Remove -ENOSPC warning, can still happen with concurrent mmaps
where we can't unbind the other mmap because of the lock held.
This fixes the gem_mmap_gtt@cpuset tests.
Daniel Vetter [Fri, 14 Jan 2022 14:15:56 +0000 (15:15 +0100)]
Merge tag 'drm-misc-fixes-2022-01-14' of git://anongit.freedesktop.org/drm/drm-misc into drm-next
Two DT bindings fixes for meson, a device refcounting fix for sun4i, a
probe fix for vga16fb, a locking fix for the CMA dma-buf heap and a
compilation fix for ttm.
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
[danvet: I made sure I have exactly the same conflict resolution as
Linus in 8d0749b4f83b ("Merge tag 'drm-next-2022-01-07' of
git://anongit.freedesktop.org/drm/drm") to avoid further conflict fun.
From: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20220114125454.zs46ny52lrxk3ljz@houat
Claudio Suarez [Thu, 2 Dec 2021 09:51:12 +0000 (10:51 +0100)]
drm: fix error found in some cases after the patch d1af5cd86997
The patch d1af5cd86997 ("drm: get rid of DRM_DEBUG_* log
calls in drm core, files drm_a*.c") fails when the drm_device
cannot be found in the parameter plane_state->crtc.
Fix it using plane_state->plane.
Daniel Vetter [Fri, 14 Jan 2022 12:34:39 +0000 (13:34 +0100)]
Merge tag 'drm-intel-next-fixes-2022-01-13' of git://anongit.freedesktop.org/drm/drm-intel into drm-next
- Hold runtime PM wakelock during PXP unbind (Juston Li)
- Three fixes for the TTM backend fault handling (Matthew Auld)
- Make sure to unmap when purging in the TTM backend (Matthew Auld)
Johannes Berg [Mon, 20 Dec 2021 10:15:22 +0000 (11:15 +0100)]
drm/ttm: fix compilation on ARCH=um
Even if it's probably not really useful, it can get selected
by e.g. randconfig builds, and then failing to compile is an
annoyance. Unfortunately, it's hard to fix in Kconfig, since
DRM_TTM is selected by many things that don't really depend
on any specific architecture, and just depend on PCI (which
is indeed now available in ARCH=um via simulation/emulation).
Fix this in the code instead by just ifdef'ing the relevant
two lines that depend on "real X86".
video: vga16fb: Only probe for EGA and VGA 16 color graphic cards
The vga16fb framebuffer driver only supports Enhanced Graphics Adapter
(EGA) and Video Graphics Array (VGA) 16 color graphic cards.
But it doesn't check if the adapter is one of those or if a VGA16 mode
is used. This means that the driver will be probed even if a VESA BIOS
Extensions (VBE) or Graphics Output Protocol (GOP) interface is used.
This issue has been present for a long time but it was only exposed by
commit d391c5827107 ("drivers/firmware: move x86 Generic System
Framebuffers support") since the platform device registration to match
the {vesa,efi}fb drivers is done later as a consequence of that change.
All non-x86 architectures though treat orig_video_isVGA as a boolean so
only do the supported video mode check for x86 and not for other arches.
Kent Russell [Tue, 11 Jan 2022 17:28:27 +0000 (12:28 -0500)]
drm/amdkfd: Fix ASIC name typos
Three misspelled ASICs in comments here, so fix the spelling
Signed-off-by: Kent Russell <kent.russell@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 03:03:52 +0000 (22:03 -0500)]
drm/amdkfd: Fix DQM asserts on Hawaii
start_nocpsch would never set dqm->sched_running on Hawaii due to an
early return statement. This would trigger asserts in other functions
and end up in inconsistent states.
Bug: https://github.com/RadeonOpenCompute/ROCm/issues/1624 Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Kent Russell <kent.russell@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Tue, 4 Jan 2022 15:45:41 +0000 (10:45 -0500)]
drm/amdgpu: Use correct VIEWPORT_DIMENSION for DCN2
For some reason this file isn't using the appropriate register
headers for DCN headers, which means that on DCN2 we're getting
the VIEWPORT_DIMENSION offset wrong.
This means that we're not correctly carving out the framebuffer
memory correctly for a framebuffer allocated by EFI and
therefore see corruption when loading amdgpu before the display
driver takes over control of the framebuffer scanout.
Fix this by checking the DCE_HWIP and picking the correct offset
accordingly.
Long-term we should expose this info from DC as GMC shouldn't
need to know about DCN registers.
Cc: stable@vger.kernel.org Signed-off-by: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Huang Rui <ray.huang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tao Zhou [Mon, 10 Jan 2022 07:12:38 +0000 (15:12 +0800)]
drm/amd/pm: only send GmiPwrDnControl msg on master die (v3)
PMFW only returns 0 on master die and sends NACK back on other dies for
the message.
v2: only send GmiPwrDnControl msg on master die instead of all
dies.
v3: remove the pointer check for get_socket_id and get_die_id as they
should be present on Aldebaran.
Signed-off-by: Tao Zhou <tao.zhou1@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 22:51:43 +0000 (17:51 -0500)]
drm/amdkfd: Use prange->update_list head for remove_list
The remove_list head was only used for keeping track of existing ranges
that are to be removed from the svms->list. The update_list was used for
new or existing ranges that need updated attributes. These two cases are
mutually exclusive (i.e. the same range will never be on both lists).
Therefore we can use the update_list head to track the remove_list and
save another 16 bytes in the svm_range struct.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Felix Kuehling [Wed, 8 Dec 2021 22:33:48 +0000 (17:33 -0500)]
drm/amdkfd: Use prange->list head for insert_list
There are seven list_heads in struct svm_range: list, update_list,
remove_list, insert_list, svm_bo_list, deferred_list, child_list. This
patch and the next one remove two of them that are redundant.
The insert_list head was only used for new ranges that are not on the
svms->list yet. So we can use that list head for keeping track of
new ranges before they get added, and use list_move_tail to move them
to the svms->list when ready.
Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Lukas Bulwahn [Thu, 16 Dec 2021 09:45:03 +0000 (10:45 +0100)]
drm/amdkfd: make SPDX License expression more sound
Commit b5f57384805a ("drm/amdkfd: Add sysfs bitfields and enums to uAPI")
adds include/uapi/linux/kfd_sysfs.h with the "GPL-2.0 OR MIT WITH
Linux-syscall-note" SPDX-License expression.
The command ./scripts/spdxcheck.py warns:
include/uapi/linux/kfd_sysfs.h: 1:48 Exception not valid for license MIT: Linux-syscall-note
For a uapi header, the file under GPLv2 License must be combined with the
Linux-syscall-note, but combining the MIT License with the
Linux-syscall-note makes no sense, as the note provides an exception for
GPL-licensed code, not for permissively licensed code.
So, reorganize the SPDX expression to only combine the note with the GPL
License condition. This makes spdxcheck happy again.
Fixes: b5f57384805a ("drm/amdkfd: Add sysfs bitfields and enums to uAPI") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: kstewart@linuxfoundation.org Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Jiasheng Jiang [Wed, 5 Jan 2022 09:09:43 +0000 (17:09 +0800)]
drm/amdkfd: Check for null pointer after calling kmemdup
As the possible failure of the allocation, kmemdup() may return NULL
pointer.
Therefore, it should be better to check the 'props2' in order to prevent
the dereference of NULL pointer.
Fixes: 3a87177eb141 ("drm/amdkfd: Add topology support for dGPUs") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: reset dcn31 SMU mailbox on failures
Otherwise future commands may fail as well leading to downstream
problems that look like they stemmed from a timeout the first time
but really didn't.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the amdkfd sysfs code to use default_groups field which has
been the preferred way since aa30f47cf666 ("kobject: Add support for
default attribute groups to kobj_type") so that we can soon get rid of
the obsolete default_attrs field.
Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There are currently 2 ways to create a set of sysfs files for a
kobj_type, through the default_attrs field, and the default_groups
field. Move the amdgpu sysfs code to use default_groups field which has
been the preferred way since aa30f47cf666 ("kobject: Add support for
default attribute groups to kobj_type") so that we can soon get rid of
the obsolete default_attrs field.
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Cc: "Pan, Xinhui" <Xinhui.Pan@amd.com> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Hawking Zhang <Hawking.Zhang@amd.com> Cc: John Clements <john.clements@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Jonathan Kim <jonathan.kim@amd.com> Cc: Kevin Wang <kevin1.wang@amd.com> Cc: shaoyunl <shaoyun.liu@amd.com> Cc: Tao Zhou <tao.zhou1@amd.com> Cc: amd-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Tom St Denis [Fri, 7 Jan 2022 11:57:41 +0000 (06:57 -0500)]
drm/amd/amdgpu: Add pcie indirect support to amdgpu_mm_wreg_mmio_rlc()
The function amdgpu_mm_wreg_mmio_rlc() is used by debugfs to write to
MMIO registers. It didn't support registers beyond the BAR mapped MMIO
space. This adds pcie indirect write support.
Signed-off-by: Tom St Denis <tom.stdenis@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nirmoy Das [Fri, 7 Jan 2022 08:51:15 +0000 (09:51 +0100)]
drm/amdgpu: recover gart table at resume
Get rid off pin/unpin of gart BO at resume/suspend and
instead pin only once and try to recover gart content
at resume time. This is much more stable in case there
is OOM situation at 2nd call to amdgpu_device_evict_resources()
while evicting GART table.
v3: remove gart recovery from other places
v2: pin gart at amdgpu_gart_table_vram_alloc()
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nirmoy Das [Fri, 7 Jan 2022 08:51:14 +0000 (09:51 +0100)]
drm/amdgpu: do not pass ttm_resource_manager to vram_mgr
Do not allow exported amdgpu_vram_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call vram_mgr functions.
v2: pass adev's vram_mgr instead of adev
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Nirmoy Das [Fri, 7 Jan 2022 08:51:13 +0000 (09:51 +0100)]
drm/amdkfd: remove unused function
Remove unused amdgpu_amdkfd_get_vram_usage()
CC: Felix.Kuehling@amd.com Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Fixes: dfcbe6d5f4a340 ("drm/amdgpu: Remove unused function pointers")
Nirmoy Das [Fri, 7 Jan 2022 22:42:28 +0000 (17:42 -0500)]
drm/amdgpu: do not pass ttm_resource_manager to gtt_mgr
Do not allow exported amdgpu_gtt_mgr_*() to accept
any ttm_resource_manager pointer. Also there is no need
to force other module to call a ttm function just to
eventually call gtt_mgr functions.
v4: remove unused adev.
v3: upcast mgr from ttm resopurce manager instead of
getting it from adev.
v2: pass adev's gtt_mgr instead of adev.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Leslie Shi [Wed, 5 Jan 2022 02:57:20 +0000 (10:57 +0800)]
drm/amdgpu: Unmap MMIO mappings when device is not unplugged
Patch: 3efb17ae7e92 ("drm/amdgpu: Call amdgpu_device_unmap_mmio() if device
is unplugged to prevent crash in GPU initialization failure") makes call to
amdgpu_device_unmap_mmio() conditioned on device unplugged. This patch unmaps
MMIO mappings even when device is not unplugged.
v2: Add condition of drm_dev_enter() to deleted unmaps in patch
"drm/amdgpu: Unmap all MMIO mappings"
Signed-off-by: Leslie Shi <Yuliang.Shi@amd.com> Reviewed-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: explicitly set is_dsc_supported to false before use
When UBSAN is enabled a case is shown on unplugging the display that
this variable hasn't been initialized by `update_dsc_caps`, presumably
when the display was unplugged it wasn't copied from the DPCD.
John Harrison [Fri, 7 Jan 2022 00:06:22 +0000 (16:06 -0800)]
drm/i915/guc: Improve GuC loading status check/error reports
If the GuC fails to load, it is useful to know what firmware file /
version was attempted. So move the version info report to before the
load attempt rather than only after a successful load.
If the GuC does fail to load, then make the error messages visible
rather than being 'debug' prints that do not appears in dmesg output
by default.
When waiting for the GuC to load, it used to be necessary to check for
two different states - READY and (LAPIC_DONE | MIA_CORE). Apparently
the second signified init complete on RC6 exit. However, in more
recent GuC versions the RC6 exit sequence now finishes with status
READY as well. So the test can be simplified.
Also, add an enum giving all the current status codes that GuC loading
can report as a reference without having to pull and search through
the GuC source files.
John Harrison [Fri, 7 Jan 2022 00:06:21 +0000 (16:06 -0800)]
drm/i915/guc: Update to GuC version 69.0.3
Update to the latest GuC release.
The latest GuC firmware introduces a number of interface changes:
GuC may return NO_RESPONSE_RETRY message for requests sent over CTB.
Add support for this reply and try resending the request again as a
new CTB message.
A KLV (key-length-value) mechanism is now used for passing
configuration data such as CTB management.
With the new KLV scheme, the old CTB management actions are no longer
used and are removed.
Register capture on hang is now supported by GuC. Full i915 support
for this will be added by a later patch. A minimum support of
providing capture memory and register lists is required though, so add
that in.
The device id of the current platform needs to be provided at init time.
The 'poll CS' w/a (Wa_22012773006) was blanket enabled by previous
versions of GuC. It must now be explicitly requested by the KMD. So,
add in the code to turn it on when relevant.
The GuC log entry format has changed. This requires adding a new field
to the log header structure to mark the wrap point at the end of the
buffer (as the buffer size is no longer a multiple of the log entry
size).
New CTB notification messages are now sent for some things that were
previously only sent via MMIO notifications.
Of these, the crash dump notification was not really being handled by
i915. It called the log flush code but that only flushed the regular
debug log and then only if relay logging was enabled. So just report
an error message instead.
The 'exception' notification was just being ignored completely. So add
an error message for that as well.
Note that in either the crash dump or the exception case, the GuC is
basically dead. The KMD will detect this via the heartbeat and trigger
both an error log (which will include the crash dump as part of the
GuC log) and a GT reset. So no other processing is really required.
John Harrison [Fri, 7 Jan 2022 00:06:20 +0000 (16:06 -0800)]
drm/i915/guc: Temporarily bump the GuC load timeout
There is a known (but exceedingly unlikely) race condition where the
asynchronous frequency management code could reduce the GT clock while
a GuC reload is in progress (during a full GT reset). A fix is in
progress but there are complex locking issues to be resolved. In the
meantime bump the timeout to 200ms. Even at slowest clock, this
should be sufficient. And in the working case, a larger timeout makes
no difference.
Liu Ying [Thu, 30 Dec 2021 04:06:26 +0000 (12:06 +0800)]
drm/atomic: Check new_crtc_state->active to determine if CRTC needs disable in self refresh mode
Actual hardware state of CRTC is controlled by the member 'active' in
struct drm_crtc_state instead of the member 'enable', according to the
kernel doc of the member 'enable'. In fact, the drm client modeset
and atomic helpers are using the member 'active' to do the control.
Referencing the member 'enable' of new_crtc_state, the function
crtc_needs_disable() may fail to reflect if CRTC needs disable in
self refresh mode, e.g., when the framebuffer emulation will be blanked
through the client modeset helper with the next commit, the member
'enable' of new_crtc_state is still true while the member 'active' is
false, hence the relevant potential encoder and bridges won't be disabled.
So, let's check new_crtc_state->active to determine if CRTC needs disable
in self refresh mode instead of new_crtc_state->enable.
Fixes: 1452c25b0e60 ("drm: Add helpers to kick off self refresh mode in drivers") Cc: Sean Paul <seanpaul@chromium.org> Cc: Rob Clark <robdclark@chromium.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Liu Ying <victor.liu@nxp.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211230040626.646807-1-victor.liu@nxp.com
Thomas Hellström [Mon, 10 Jan 2022 17:22:18 +0000 (18:22 +0100)]
drm/i915: Asynchronous migration selftest
Add a selftest to exercise asynchronous migration and -unbining.
Extend the gem_migrate selftest to perform the migrations while
depending on a spinner and a bound vma set up on the migrated
buffer object.
Thomas Hellström [Mon, 10 Jan 2022 17:22:17 +0000 (18:22 +0100)]
drm/i915: Use vma resources for async unbinding
Implement async (non-blocking) unbinding by not syncing the vma before
calling unbind on the vma_resource.
Add the resulting unbind fence to the object's dma_resv from where it is
picked up by the ttm migration code.
Ideally these unbind fences should be coalesced with the migration blit
fence to avoid stalling the migration blit waiting for unbind, as they
can certainly go on in parallel, but since we don't yet have a
reasonable data structure to use to coalesce fences and attach the
resulting fence to a timeline, we defer that for now.
Note that with async unbinding, even while the unbind waits for the
preceding bind to complete before unbinding, the vma itself might have been
destroyed in the process, clearing the vma pages. Therefore we can
only allow async unbinding if we have a refcounted sg-list and keep a
refcount on that for the vma resource pages to stay intact until
binding occurs. If this condition is not met, a request for an async
unbind is diverted to a sync unbind.
v2:
- Use a separate kmem_cache for vma resources for now to isolate their
memory allocation and aid debugging.
- Move the check for vm closed to the actual unbinding thread. Regardless
of whether the vm is closed, we need the unbind fence to properly wait
for capture.
- Clear vma_res::vm on unbind and update its documentation.
v4:
- Take cache coloring into account when searching for vma resources
pending unbind. (Matthew Auld)
v5:
- Fix timeout and error check in i915_vma_resource_bind_dep_await().
- Avoid taking a reference on the object for async binding if
async unbind capable.
- Fix braces around a single-line if statement.
v6:
- Fix up the cache coloring adjustment. (Kernel test robot <lkp@intel.com>)
- Don't allow async unbinding if the vma_res pages are not the same as
the object pages. (Matthew Auld)
v7:
- s/unsigned long/u64/ in a number of places (Matthew Auld)
Thomas Hellström [Mon, 10 Jan 2022 17:22:15 +0000 (18:22 +0100)]
drm/i915: Use the vma resource as argument for gtt binding / unbinding
When introducing asynchronous unbinding, the vma itself may no longer
be alive when the actual binding or unbinding takes place.
Update the gtt i915_vma_ops accordingly to take a struct i915_vma_resource
instead of a struct i915_vma for the bind_vma() and unbind_vma() ops.
Similarly change the insert_entries() op for struct i915_address_space.
Replace a couple of i915_vma_snapshot members with their newly introduced
i915_vma_resource counterparts, since they have the same lifetime.
Also make sure to avoid changing the struct i915_vma_flags (in particular
the bind flags) async. That should now only be done sync under the
vm mutex.
v2:
- Update the vma_res::bound_flags when binding to the aliased ggtt
v6:
- Remove I915_VMA_ALLOC_BIT (Matthew Auld)
- Change some members of struct i915_vma_resource from unsigned long to u64
(Matthew Auld)
v7:
- Fix vma resource size parameters to be u64 rather than unsigned long
(Matthew Auld)
Miaoqian Lin [Fri, 7 Jan 2022 08:36:32 +0000 (08:36 +0000)]
drm/sun4i: dw-hdmi: Fix missing put_device() call in sun8i_hdmi_phy_get
The reference taken by 'of_find_device_by_node()' must be released when
not needed anymore.
Add the corresponding 'put_device()' in the error handling path.
Thomas Hellström [Mon, 10 Jan 2022 17:22:14 +0000 (18:22 +0100)]
drm/i915: Initial introduction of vma resources
Introduce vma resources, sort of similar to TTM resources, needed for
asynchronous bind management. Initially we will use them to hold
completion of unbinding when we capture data from a vma, but they will
be used extensively in upcoming patches for asynchronous vma unbinding.
Matthew Auld [Thu, 6 Jan 2022 17:49:10 +0000 (17:49 +0000)]
drm/i915/ttm: ensure we unmap when purging
Purging can happen during swapping out, or directly invoked with the
madvise ioctl. In such cases this doesn't involve a ttm move, which
skips umapping the object.
v2(Thomas):
- add ttm_truncate helper, and just call into i915_ttm_move_notify() to
handle the unmapping step
Matthew Auld [Thu, 6 Jan 2022 17:49:09 +0000 (17:49 +0000)]
drm/i915/ttm: add unmap_virtual callback
Ensure we call ttm_bo_unmap_virtual when releasing the pages.
Importantly this should now handle the ttm swapping case, and all other
places that already call into i915_ttm_move_notify().
Matthew Auld [Thu, 6 Jan 2022 17:49:07 +0000 (17:49 +0000)]
drm/i915: don't call free_mmap_offset when purging
The TTM backend is in theory the only user here(also purge should only
be called once we have dropped the pages), where it is setup at object
creation and is only removed once the object is destroyed. Also
resetting the node here might be iffy since the ttm fault handler
uses the stored fake offset to determine the page offset within the pages
array.
This also blows up in the dontneed-before-mmap test, since the
expectation is that the vma_node will live on, until the object is
destroyed:
Matthew Auld [Thu, 6 Jan 2022 17:49:10 +0000 (17:49 +0000)]
drm/i915/ttm: ensure we unmap when purging
Purging can happen during swapping out, or directly invoked with the
madvise ioctl. In such cases this doesn't involve a ttm move, which
skips umapping the object.
v2(Thomas):
- add ttm_truncate helper, and just call into i915_ttm_move_notify() to
handle the unmapping step
Matthew Auld [Thu, 6 Jan 2022 17:49:09 +0000 (17:49 +0000)]
drm/i915/ttm: add unmap_virtual callback
Ensure we call ttm_bo_unmap_virtual when releasing the pages.
Importantly this should now handle the ttm swapping case, and all other
places that already call into i915_ttm_move_notify().
Matthew Auld [Thu, 6 Jan 2022 17:49:07 +0000 (17:49 +0000)]
drm/i915: don't call free_mmap_offset when purging
The TTM backend is in theory the only user here(also purge should only
be called once we have dropped the pages), where it is setup at object
creation and is only removed once the object is destroyed. Also
resetting the node here might be iffy since the ttm fault handler
uses the stored fake offset to determine the page offset within the pages
array.
This also blows up in the dontneed-before-mmap test, since the
expectation is that the vma_node will live on, until the object is
destroyed:
Matthew Auld [Wed, 15 Dec 2021 11:07:45 +0000 (11:07 +0000)]
drm/i915: remove writeback hook
Ditch the writeback hook and drop i915_gem_object_writeback(). We
already support the shrinker_release_pages hook which can just call
shmem_writeback directly.
Fixes: 0cfab4cb3c4e ("drm/i915/pxp: Enable PXP power management") Suggested-by: Lee Shawn C <shawn.c.lee@intel.com> Signed-off-by: Juston Li <juston.li@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: John Harrison <John.C.Harrison@Intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220106200236.489656-2-juston.li@intel.com
Mikita Lipski [Wed, 15 Dec 2021 16:01:45 +0000 (11:01 -0500)]
drm/amd/display: introduce mpo detection flags
[why]
We want to know if new crtc state is enabling MPO configuration before
enabling it.
[how]
Detect if both primary and overlay planes are enabled on the same CRTC.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Bhawanpreet Lakha <Bhawanpreet.Lakha@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Mikita Lipski <mikita.lipski@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amd/display: Don't reinitialize DMCUB on s0ix resume
[Why]
PSP will suspend and resume DMCUB. Driver should just wait for DMCUB to
finish the auto load before continuining instead of placing it into
reset, wiping its firmware state and reinitializing.
If we don't let DMCUB fully finish initializing for S0ix then some state
will be lost and screen corruption can occur due to incorrect address
translation.
[How]
Use dmub_srv callbacks to determine in DMCUB is running and wait for
auto-load to complete before continuining.
In S0ix DMCUB will be running and DAL fw so initialize will skip.
In S3 DMCUB will not be running and we will do a full hardware init.
In S3 DMCUB will be running but will not be DAL fw so we will also do
a full hardware init.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Mikita Lipski <Mikita.Lipski@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Wenjing Liu [Mon, 13 Dec 2021 23:29:27 +0000 (18:29 -0500)]
drm/amd/display: unhard code link to phy idx mapping in dc link and clean up
[why]
1. Current code hard codes link to PHY mapping in dc link level per asic
per revision.
This is not scalable. In long term the mapping will be obatined from
DMUB and store in dc resource.
2. Depending on DCN revision and endpoint type, the definition of
dio_output_idx dio_output_type and phy_idx are not consistent. We need
to unify the meaning of these hardware indices across different system
configuration.
[how]
1. Temporarly move the hardcoded mapping to dc_resource level, which
should have full awareness of asic specific configuration and add a TODO
comment to move the mapping to DMUB.
2. populate dio_output_idx/phy_idx for all configuration, define
usb4_enabled bit instead of dio_output_type as an external enum.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Eric Yang <Eric.Yang2@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Wenjing Liu <wenjing.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Yi-Ling Chen [Mon, 13 Dec 2021 08:13:26 +0000 (16:13 +0800)]
drm/amd/display: Fix underflow for fused display pipes case
[Why]
Depend on res_pool->res_cap->num_timing_generator to query timing
gernerator information, it would case underflow at the fused display
pipes case.
Due to the res_pool->res_cap->num_timing_generator records default
timing generator resource built in driver, not the current chip.
[How]
Some ASICs would be fused display pipes less than the default setting.
In dcnxx_resource_construct function, driver would obatin real timing
generator count and store it into res_pool->timing_generator_count.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Yi-Ling Chen <Yi-Ling.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: don't set s3 and s0ix at the same time
This makes it clearer which codepaths are in use specifically in
one state or the other.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: explicitly check for s0ix when evicting resources
This codepath should be running in both s0ix and s3, but only does
currently because s3 and s0ix are both set in the s0ix case.
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Yao [Wed, 29 Dec 2021 10:10:32 +0000 (18:10 +0800)]
drm/amdgpu: add dummy event6 for vega10
[why]
Malicious mailbox event1 fails driver loading on vega10.
A dummy event6 prevent driver from taking response from malicious event1 as its own.
[how]
On vega10, send a mailbox event6 before sending event1.
Signed-off-by: James Yao <yiqing.yao@amd.com> Reviewed-by: Jingwen Chen <Jingwen.Chen2@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Michał Winiarski [Sun, 19 Dec 2021 21:24:59 +0000 (23:24 +0200)]
drm/i915/selftests: Use to_gt() helper for GGTT accesses
GGTT is currently available both through i915->ggtt and gt->ggtt, and we
eventually want to get rid of the i915->ggtt one.
Use to_gt() for all i915->ggtt accesses to help with the future
refactoring.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211219212500.61432-6-andi.shyti@linux.intel.com
Michał Winiarski [Sun, 19 Dec 2021 21:24:58 +0000 (23:24 +0200)]
drm/i915/display: Use to_gt() helper for GGTT accesses
GGTT is currently available both through i915->ggtt and gt->ggtt, and we
eventually want to get rid of the i915->ggtt one.
Use to_gt() for all i915->ggtt accesses to help with the future
refactoring.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211219212500.61432-5-andi.shyti@linux.intel.com
Michał Winiarski [Sun, 19 Dec 2021 21:24:57 +0000 (23:24 +0200)]
drm/i915/gem: Use to_gt() helper for GGTT accesses
GGTT is currently available both through i915->ggtt and gt->ggtt, and we
eventually want to get rid of the i915->ggtt one.
Use to_gt() for all i915->ggtt accesses to help with the future
refactoring.
Signed-off-by: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Reviewed-by: Sujaritha Sundaresan <sujaritha.sundaresan@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211219212500.61432-4-andi.shyti@linux.intel.com
GGTT is currently available both through i915->ggtt and gt->ggtt, and we
eventually want to get rid of the i915->ggtt one.
Use to_gt() for all i915->ggtt accesses to help with the future
refactoring.