Specifying VM during lrc->bo creation requires VM's reference
to be held for the lifetime of lrc->bo as it will use VM's dma
reservation object. Using VM's dma reservation object for
lrc->bo doesn't provide any advantage. Hence do not pass VM
while creating lrc->bo.
v2: Use xe_bo_unpin_map_no_vm (Matthew Brost)
Fixes: 264eecdba211 ("drm/xe: Decouple xe_exec_queue and xe_lrc") Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250529052031.2429120-2-niranjana.vishwanathapura@intel.com
(cherry picked from commit fbeaad071a98fef87deccee81d564de1c8e8e16d) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Matthew Auld [Tue, 3 Jun 2025 17:42:14 +0000 (18:42 +0100)]
drm/xe/guc_submit: add back fix
Daniele noticed that the fix in commit 2d2be279f1ca ("drm/xe: fix UAF
around queue destruction") looks to have been unintentionally removed as
part of handling a conflict in some past merge commit. Add it back.
Fixes: ac44ff7cec33 ("Merge tag 'drm-xe-fixes-2024-10-10' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes") Reported-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.12+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250603174213.1543579-2-matthew.auld@intel.com
(cherry picked from commit 9d9fca62dc49d96f97045b6d8e7402a95f8cf92a) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
drm/xe/pxp: Clarify PXP queue creation behavior if PXP is not ready
The expected flow of operations when using PXP is to query the PXP
status and wait for it to transition to "ready" before attempting to
create an exec_queue. This flow is followed by the Mesa driver, but
there is no guarantee that an incorrectly coded (or malicious) app
will not attempt to create the queue first without querying the status.
Therefore, we need to clarify what the expected behavior of the queue
creation ioctl is in this scenario.
Currently, the ioctl always fails with an -EBUSY code no matter the
error, but for consistency it is better to distinguish between "failed
to init" (-EIO) and "not ready" (-EBUSY), the same way the query ioctl
does. Note that, while this is a change in the return code of an ioctl,
the behavior of the ioctl in this particular corner case was not clearly
spec'd, so no one should have been relying on it (and we know that Mesa,
which is the only known userspace for this, didn't).
v2: Minor rework of the doc (Rodrigo)
Fixes: 72d479601d67 ("drm/xe/pxp/uapi: Add userspace and LRC support for PXP-using queues") Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Reviewed-by: John Harrison <John.C.Harrison@Intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250522225401.3953243-7-daniele.ceraolospurio@intel.com
(cherry picked from commit 21784ca96025b62d95b670b7639ad70ddafa69b8) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
drm/xe/pxp: Use the correct define in the set_property_funcs array
The define of the extension type was accidentally used instead of the
one of the property itself. They're both zero, so no functional issue,
but we should use the correct define for code correctness.
Matthew Auld [Wed, 28 May 2025 11:33:29 +0000 (12:33 +0100)]
drm/xe/sched: stop re-submitting signalled jobs
Customer is reporting a really subtle issue where we get random DMAR
faults, hangs and other nasties for kernel migration jobs when stressing
stuff like s2idle/s3/s4. The explosions seems to happen somewhere
after resuming the system with splats looking something like:
PM: suspend exit
rfkill: input handler disabled
xe 0000:00:02.0: [drm] GT0: Engine reset: engine_class=bcs, logical_mask: 0x2, guc_id=0
xe 0000:00:02.0: [drm] GT0: Timedout job: seqno=24496, lrc_seqno=24496, guc_id=0, flags=0x13 in no process [-1]
xe 0000:00:02.0: [drm] GT0: Kernel-submitted job timed out
The likely cause appears to be a race between suspend cancelling the
worker that processes the free_job()'s, such that we still have pending
jobs to be freed after the cancel. Following from this, on resume the
pending_list will now contain at least one already complete job, but it
looks like we call drm_sched_resubmit_jobs(), which will then call
run_job() on everything still on the pending_list. But if the job was
already complete, then all the resources tied to the job, like the bb
itself, any memory that is being accessed, the iommu mappings etc. might
be long gone since those are usually tied to the fence signalling.
This scenario can be seen in ftrace when running a slightly modified
xe_pm IGT (kernel was only modified to inject artificial latency into
free_job to make the race easier to hit):
So the job_run() is clearly triggered twice for the same job, even
though the first must have already signalled to completion during
suspend. We can also see a CAT error after the re-submit.
To prevent this only resubmit jobs on the pending_list that have not yet
signalled.
v2:
- Make sure to re-arm the fence callbacks with sched_start().
v3 (Matt B):
- Stop using drm_sched_resubmit_jobs(), which appears to be deprecated
and just open-code a simple loop such that we skip calling run_job()
on anything already signalled.
Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4856 Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: William Tseng <william.tseng@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Link: https://lore.kernel.org/r/20250528113328.289392-2-matthew.auld@intel.com
(cherry picked from commit 38fafa9f392f3110d2de431432d43f4eef99cd1b) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Thomas Hellström [Wed, 28 May 2025 16:41:05 +0000 (18:41 +0200)]
drm/xe: Rework eviction rejection of bound external bos
For preempt_fence mode VM's we're rejecting eviction of
shared bos during VM_BIND. However, since we do this in the
move() callback, we're getting an eviction failure warning from
TTM. The TTM callback intended for these things is
eviction_valuable().
However, the latter doesn't pass in the struct ttm_operation_ctx
needed to determine whether the caller needs this.
Instead, attach the needed information to the vm under the
vm->resv, until we've been able to update TTM to provide the
needed information. And add sufficient lockdep checks to prevent
misuse and races.
v2:
- Fix a copy-paste error in xe_vm_clear_validating()
v3:
- Fix kerneldoc errors.
Arnd Bergmann [Thu, 29 May 2025 17:23:56 +0000 (10:23 -0700)]
drm/xe/vsec: fix CONFIG_INTEL_VSEC dependency
The XE driver can be built with or without VSEC support, but fails to link as
built-in if vsec is in a loadable module:
x86_64-linux-ld: vmlinux.o: in function `xe_vsec_init':
(.text+0x1e83e16): undefined reference to `intel_vsec_register'
The normal fix for this is to add a 'depends on INTEL_VSEC || !INTEL_VSEC',
forcing XE to be a loadable module as well, but that causes a circular
dependency:
symbol DRM_XE depends on INTEL_VSEC
symbol INTEL_VSEC depends on X86_PLATFORM_DEVICES
symbol X86_PLATFORM_DEVICES is selected by DRM_XE
The problem here is selecting a symbol from another subsystem, so change
that as well and rephrase the 'select' into the corresponding dependency.
Since X86_PLATFORM_DEVICES is 'default y', there is no change to
defconfig builds here.
Karthik Poosa [Thu, 29 May 2025 16:34:54 +0000 (22:04 +0530)]
drm/xe/hwmon: Move card reactive critical power under channel card
Move power2/curr2_crit to channel 1 i.e power1/curr1_crit as this
represents the entire card critical power/current.
v2: Update the date of curr1_crit also in hwmon documentation.
Signed-off-by: Karthik Poosa <karthik.poosa@intel.com> Fixes: 345dadc4f68b ("drm/xe/hwmon: Add infra to support card power and energy attributes") Reviewed-by: Badal Nilawar <badal.nilawar@intel.com> Link: https://lore.kernel.org/r/20250529163458.2354509-3-karthik.poosa@intel.com Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
(cherry picked from commit 25e963a09e059ffdb15c09cc79cfded855b43668) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Karthik Poosa [Thu, 29 May 2025 16:34:53 +0000 (22:04 +0530)]
drm/xe/hwmon: Add support to manage power limits though mailbox
Add support to manage power limits using pcode mailbox commands
for supported platforms.
v2:
- Address review comments. (Badal)
- Use mailbox commands instead of registers to manage power limits
for BMG.
- Clamp the maximum power limit to GPU firmware default value.
v3:
- Clamp power limit in write also for platforms with mailbox support.
v4:
- Remove unnecessary debug prints. (Badal)
v5:
- Update description of variable pl1_on_boot to fix kernel-doc error.
v6:
- Improve commit message, refer to BIOS as GPU firmware.
- Change macro READ_PL_FROM_BIOS to READ_PL_FROM_FW.
- Rectify drm_warn to drm_info.
Matthew Auld [Wed, 14 May 2025 15:24:26 +0000 (16:24 +0100)]
drm/xe/vm: move xe_svm_init() earlier
In xe_vm_close_and_put() we need to be able to call xe_svm_fini(),
however during vm creation we can call this on the error path, before
having actually initialised the svm state, leading to various splats
followed by a fatal NPD.
Matthew Auld [Wed, 14 May 2025 15:24:25 +0000 (16:24 +0100)]
drm/xe/vm: move rebind_work init earlier
In xe_vm_close_and_put() we need to be able to call
flush_work(rebind_work), however during vm creation we can call this on
the error path, before having actually set up the worker, leading to a
splat from flush_work().
It looks like we can simply move the worker init step earlier to fix
this.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Matthew Auld <matthew.auld@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250514152424.149591-3-matthew.auld@intel.com
(cherry picked from commit 96af397aa1a2d1032a6e28ff3f4bc0ab4be40e1d) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Aradhya Bhatia [Fri, 16 May 2025 14:19:02 +0000 (14:19 +0000)]
drm/xe/guc: Make creation of SLPC debugfs files conditional
Platforms that do not support SLPC are exempted from the GuC PC support.
The GuC PC does not get initialized, and neither do its BOs get created.
This causes a problem because the GuC PC debugfs file is still being
created. Whenever the file is attempted to read, it causes a NULL
pointer dereference on the supposed BO of the GuC PC.
So, make the creation of SLPC debugfs files conditional to when SLPC
features are supported.
Fixes: aaab5404b16f ("drm/xe: Introduce GuC PC debugfs") Suggested-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Tejas Upadhyay <tejas.upadhyay@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Signed-off-by: Aradhya Bhatia <aradhya.bhatia@intel.com> Link: https://lore.kernel.org/r/20250516141902.5614-1-aradhya.bhatia@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
(cherry picked from commit 17486cf3df5320752cc67ee8bcb2379d1b9de76c) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
Dave Airlie [Fri, 9 May 2025 20:10:04 +0000 (06:10 +1000)]
Merge tag 'drm-intel-next-2025-05-08' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Non-display related:
- Fix undefined reference to `intel_pxp_gsccs_is_ready_for_sessions'
Display related:
- More work towards display separation (Jani)
- Stop writing VRR_CTL_IGN_MAX_SHIFT for MTL onwards (Jouni)
- DSC checks for 3 engines (Ankit)
- Add link rate and lane count to i915_display_info (Khaled)
- PSR fixes and workaround for underrun on idle (Jouni)
- LOBF enablement and ALMP fixes (Animesh)
- Clean up VGA plane handling (Ville)
- Use an intel_connector pointer everywhere (Imre)
- Fix warning for coffeelake on SunrisePoint PCH (Jiajia)
- Rework/Correction on minimum hblank calculation (Arun)
- Dmesg clean up (Jani)
- Add a couple of simple display workarounds (Ankit, Vinod)
- Refactor HDCP GSC (Jani)
Dave Airlie [Fri, 9 May 2025 01:39:27 +0000 (11:39 +1000)]
Merge tag 'drm-intel-gt-next-2025-05-08-1' of https://gitlab.freedesktop.org/drm/i915/kernel into drm-next
Driver Changes:
- Fix SLPC wait boosting reference counting to avoid getting stuck on non-boost
frequency on power saving profile on DG1/DG2 (Vinay)
- Add 20ms delay to engine reset for robustness on HSW (Nitin)
- Use proper sleeping functions for timeouts shorter than 20ms (Andi)
- Fix fence not released on early probe errors for HuC (Janusz)
- Remove const from struct i915_wa list allocation (Kees)
- Apply SPDX license format where missing and use single-line format (Andi)
- Whitespace fixes (Dan, Andi)
- Selftest improvements (Mikolaj, Badal, Sk,
APU doesn't have second IH ring, so re-routing action here is a no-op.
It will take a lot of time to wait timeout from PSP during the
initialization. So remove the function in psp v12.
Signed-off-by: Huang Rui <ray.huang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
There is a problem occurring on VCN 4.0.5 where in some situations a job
is timing out. This triggers a job timeout which then causes a GPU
reset for recovery. That has exposed a number of issues with GPU reset
that have since been fixed. But also a GPU reset isn't actually needed
for this circumstance. Just restarting the ring is enough.
Add a reset callback for the ring which will stop and start VCN if the
issue happens.
smu_v13_0_display_clock_voltage_request() and
smu_v13_0_set_min_deep_sleep_dcefclk() were added in 2020 by
commit c05d1c401572 ("drm/amd/swsmu: add aldebaran smu13 ip support (v3)")
but have remained unused.
Remove them.
smu_v13_0_display_clock_voltage_request() was the only user
of smu_v13_0_set_hard_freq_limited_range(). Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The last use of smu_v11_0_get_dpm_level_range() was removed in 2020 by
commit 46a301e14e8a ("drm/amd/powerplay: drop unnecessary Navi1x specific
APIs")
Remove it.
Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
amd/amdkfd: Trigger segfault for early userptr unmmapping
If applications unmap the memory before destroying the userptr, it needs
trigger a segfault to notify user space to correct the free sequence in
VM debug mode.
v2: Send gpu access fault to user space
v3: Report gpu address to user space, remove unnecessary params
v4: update pr_err into one line, remove userptr log info
Signed-off-by: Shane Xiao <shane.xiao@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
In VM debug mode, it is desirable to notify the application
to correct the freeing sequence by unmapping the memory before
destroying the userptr in the old userptr path. Add a bitmask
to decide whether to send gpu vm fault to the applition.
Signed-off-by: Shane Xiao <shane.xiao@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Prike Liang [Tue, 6 May 2025 08:32:23 +0000 (16:32 +0800)]
drm/amdgpu: unreserve the gem BO before returning from attach error
It requires unlocking the reserved gem BO before returning from
attaching the eviction fence error.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: promote the implicit sync to the dependent read fences
The driver doesn't want to implicitly sync on the DMA_RESV_USAGE_BOOKKEEP
usage fences, and the BOOKEEP fences should be synced explicitly. So, as
the VM implicit syncing only need to return and sync the dependent read
fences.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Tue, 29 Apr 2025 13:45:03 +0000 (09:45 -0400)]
drm/amdgpu/psp: mark securedisplay TA as optional
This is an optional TA which is only available on
certain embedded systems. Mark it as optional to avoid
user confusion. This mirrors what we already do for
other optional TAs.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4181 Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Alex Deucher [Thu, 1 May 2025 17:46:46 +0000 (13:46 -0400)]
drm/amdgpu: fix pm notifier handling
Set the s3/s0ix and s4 flags in the pm notifier so that we can skip
the resource evictions properly in pm prepare based on whether
we are suspending or hibernating. Drop the eviction as processes
are not frozen at this time, we we can end up getting stuck trying
to evict VRAM while applications continue to submit work which
causes the buffers to get pulled back into VRAM.
v2: Move suspend flags out of pm notifier (Mario)
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support") Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ellen Pan [Tue, 29 Apr 2025 21:18:44 +0000 (17:18 -0400)]
drm/amdgpu: Implement unrecoverable error message handling for VFs
This notification may arrive in VF mailbox while polling for response from
another event.
This patches covers the following scenarios:
- If VF is already in RMA state, then do not attempt to contact the host.
Host will ignore the VF after sending the notification.
- If the notification is detected during polling, then set the RMA status,
and return error to caller.
- If the notification arrives by interrupt, then set the RMA status and
queue a reset. This reset will fail and VF will stop runtime services.
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ellen Pan [Tue, 29 Apr 2025 21:00:50 +0000 (17:00 -0400)]
drm/amdgpu: Add unrecoverable error message definitions for VFs
Host may stop runtime services after reaching a bad page threshold.
This notification will indicate to the VF that it no longer has
access to the GPU.
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
This breaks S4 because we end up setting the s3/s0ix flags
even when we are entering s4 since prepare is used by both
flows. The causes both the S3/s0ix and s4 flags to be set
which breaks several checks in the driver which assume they
are mutually exclusive.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3634 Cc: Mario Limonciello <mario.limonciello@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
The dma_resv_add_fence() already refers to the added fence.
So when attaching the evciton fence to the gem bo, it needn't
refer to it anymore.
Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ellen Pan [Tue, 29 Apr 2025 20:48:09 +0000 (16:48 -0400)]
drm/amdgpu: Implement Runtime Bad Page query for VFs
Host will send a notification when new bad pages are available.
Uopn guest request, the first 256 bad page addresses
will be placed into the PF2VF region.
Guest should pause the PF2VF worker thread while
the copy is in progress.
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ellen Pan [Tue, 29 Apr 2025 20:22:53 +0000 (16:22 -0400)]
drm/amdgpu: Add Runtime Bad Page message definitions for VFs
Currently VFs rely on poison consumption interrupt from HW
to kick off the bad page retirement process. Part of this process
includes a VF reset.
This patch adds the following:
1) Host Bad Pages notification message.
2) Guest request bad pages message.
When combined, VFs are able to reserve the pages early, and potentially
avoid future poison consumption that will disrupt user services
from consequent FLR.
Reviewed-by: Shravan Kumar Gande <Shravankumar.Gande@amd.com> Signed-off-by: Victor Skvortsov <victor.skvortsov@amd.com> Signed-off-by: Ellen Pan <yunru.pan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Harry Wentland [Mon, 28 Apr 2025 20:26:20 +0000 (16:26 -0400)]
drm/amd/display: Don't check for NULL divisor in fixpt code
[Why]
We check for a NULL divisor but don't act on it.
This check does nothing other than throw a warning.
It does confuse static checkers though:
See https://lkml.org/lkml/2025/4/26/371
[How]
Drop the ASSERTs in both DC and SPL variants.
Fixes: 4562236b3bc0 ("drm/amd/dc: Add dc display driver (v2)") Fixes: 6efc0ab3b05d ("drm/amd/display: add back quality EASF and ISHARP and dc dependency changes") Signed-off-by: Harry Wentland <harry.wentland@amd.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Leo Li <sunpeng.li@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Ivan Shamliev [Thu, 24 Apr 2025 15:14:53 +0000 (18:14 +0300)]
drm/amd/display: Use true/false for boolean variables in DML2 core files
Replace 0 and 1 with false and true for boolean variables in
dml2_core_dcn4_calcs.c and dml2_core_utils.c to align with the Linux
kernel coding style guidelines, which recommend using C99 bool type
with true/false values.
Signed-off-by: Ivan Shamliev <ivan.shamliev.dev@abv.bg> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
James Flowers [Sat, 3 May 2025 21:18:51 +0000 (14:18 -0700)]
drm/amd/display: adds kernel-doc comment for dc_stream_remove_writeback()
Adds kernel-doc for externally linked dc_stream_remove_writeback function.
Signed-off-by: James Flowers <bold.zone2373@fastmail.com> Reviewed-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Shuicheng Lin [Wed, 7 May 2025 02:23:02 +0000 (02:23 +0000)]
drm/xe: Release force wake first then runtime power
xe_force_wake_get() is dependent on xe_pm_runtime_get(), so for
the release path, xe_force_wake_put() should be called first then
xe_pm_runtime_put().
Combine the error path and normal path together with goto.
v2 (Matt):
refine commit message to have more details
add Fixes tag
move the code to xe_svm.h which already have the config
remove a blank line per codestyle suggestion
Fixes: 63f6e480d115 ("drm/xe: Add SVM garbage collector") Cc: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Shuicheng Lin <shuicheng.lin@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250502170052.1787973-1-shuicheng.lin@intel.com
Qiu-ji Chen [Wed, 6 Nov 2024 09:59:06 +0000 (17:59 +0800)]
drm/tegra: Fix a possible null pointer dereference
In tegra_crtc_reset(), new memory is allocated with kzalloc(), but
no check is performed. Before calling __drm_atomic_helper_crtc_reset,
state should be checked to prevent possible null pointer dereference.
Fixes: b7e0b04ae450 ("drm/tegra: Convert to using __drm_atomic_helper_crtc_reset() for reset.") Cc: stable@vger.kernel.org Signed-off-by: Qiu-ji Chen <chenqiuji666@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20241106095906.15247-1-chenqiuji666@gmail.com
Biju Das [Wed, 5 Feb 2025 11:21:35 +0000 (11:21 +0000)]
drm/tegra: rgb: Fix the unbound reference count
The of_get_child_by_name() increments the refcount in tegra_dc_rgb_probe,
but the driver does not decrement the refcount during unbind. Fix the
unbound reference count using devm_add_action_or_reset() helper.
Mikko Perttunen [Tue, 4 Feb 2025 02:45:46 +0000 (02:45 +0000)]
gpu: host1x: Remove mid-job CDMA flushes
The current code can issue CDMA flushes (DMAPUT bumps) in the middle
of a job, before all opcodes have been written into the pushbuffer.
This can happen when pushbuffer fills up. Presumably this made sense
at some point in the past, but it doesn't anymore, as it cannot lead
to more space appearing in the pushbuffer as it is only cleaned full
jobs at a time.
Mid-job flushes can also cause problems, as in an extreme situation
(seen in practice), the hardware can run through the entire pushbuffer
including the prefix of a partially written job without the driver
being able to process any CDMA updates. This can cause the engine
MLOCK to be taken and held for extended periods as the tail of the
job is not yet available to hardware.
Mikko Perttunen [Wed, 5 Feb 2025 06:10:27 +0000 (06:10 +0000)]
drm/tegra: falcon: Pipeline firmware copy
The Falcon DMA engine allows queueing multiple operations for
improved performance. Do this to optimize firmware loading.
A performance improvement of 4x to 6x is observed.
Co-developed-by: Ivan Raul Guadarrama <iguadarrama@nvidia.com> Signed-off-by: Ivan Raul Guadarrama <iguadarrama@nvidia.com> Signed-off-by: Mikko Perttunen <mperttunen@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Link: https://lore.kernel.org/r/20250205061027.1205748-1-mperttunen@nvidia.com
Jon Hunter [Mon, 28 Apr 2025 15:34:35 +0000 (16:34 +0100)]
drm/tegra: Remove unneeded include
The header file 'tegra_drm.h' is included in gem.c, but this file is
already include 'drm.h'. Although there is no harm in including this
file again, it is also not necessary. Furthermore, the header file is
located under 'include/uapi/drm' so ideally the full path would be
used to be explicit. For now, just remove from gem.c.
Changes to a plane's type after it has been registered aren't propagated
to userspace automatically. This could possibly be achieved by updating
the property, but since we can already determine which type this should
be before the registration, passing in the right type from the start is
a much better solution.
Jani Nikula [Wed, 7 May 2025 08:31:36 +0000 (11:31 +0300)]
drm/i915/rps: fix stale reference to i915->irq_lock
The RPS code has been switched from using i915->irq_lock to gt->irq_lock
years ago in commit d762043f7ab1 ("drm/i915: Extract GT powermanagement
interrupt handling"). Fix the stale comment referencing i915->irq_lock,
which has also ceased to exist.
drm/i915/gt: Remove const from struct i915_wa list allocation
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct i915_wa *". The returned type, while
technically matching, will be const qualified. As there is no general
way to remove const qualifiers, adjust the allocation type to match
the assignment.
Jani Nikula [Tue, 6 May 2025 13:06:50 +0000 (16:06 +0300)]
drm/i915/irq: move i915->irq_lock to display->irq.lock
Observe that i915->irq_lock is no longer used to protect anything
outside of display. Make it a display thing.
This allows us to remove the ugly #define irq_lock irq.lock hack from xe
compat header.
Note that this is slightly more subtle than it first looks. For i915,
there's no functional change here. The lock is moved. However, for xe,
we'll now have *two* locks, xe->irq.lock and display->irq.lock. These
should protect different things, though. Indeed, nesting in the past
would've lead to a deadlock because they were the same lock.
With the i915 references gone, we can make a handful more files
independent of i915_drv.h.
Jani Nikula [Tue, 6 May 2025 13:06:49 +0000 (16:06 +0300)]
drm/i915/rps: refactor display rps support
Make the gt rps code and display irq code interact via
intel_display_rps.[ch], instead of direct access. Add no-op static
inline stubs for xe instead of having a separate build unit doing
nothing. All of this clarifies the interfaces between i915 core and
display.
All users of vlv_display_irq_postinstall() outside of
intel_display_irq.c have a lock/unlock pair. Move the locking inside the
function. Add an unlocked variant for internal use, similar to the
_vlv_display_irq_reset() and vlv_display_irq_reset() functions.
drm/vkms: Adjust vkms_state->active_planes allocation type
In preparation for making the kmalloc family of allocators type aware,
we need to make sure that the returned type from the allocation matches
the type of the variable being assigned. (Before, the allocator would
always return "void *", which can be implicitly cast to any pointer type.)
The assigned type is "struct vkms_plane_state **", but the returned type
will be "struct drm_plane **". These are the same size (pointer size), but
the types don't match. Adjust the allocation type to match the assignment.
Signed-off-by: Kees Cook <kees@kernel.org> Reviewed-by: Louis Chauvet <louis.chauvet@bootlin.com> Fixes: 8b1865873651 ("drm/vkms: totally reworked crc data tracking") Link: https://lore.kernel.org/r/20250426061431.work.304-kees@kernel.org Signed-off-by: Louis Chauvet <contact@louischauvet.fr>
drm/xe/gsc: do not flush the GSC worker from the reset path
The workqueue used for the reset worker is marked as WQ_MEM_RECLAIM,
while the GSC one isn't (and can't be as we need to do memory
allocations in the gsc worker). Therefore, we can't flush the latter
from the former.
The reason why we had such a flush was to avoid interrupting either
the GSC FW load or in progress GSC proxy operations. GSC proxy
operations fall into 2 categories:
1) GSC proxy init: this only happens once immediately after GSC FW load
and does not support being interrupted. The only way to recover from
an interruption of the proxy init is to do an FLR and re-load the GSC.
2) GSC proxy request: this can happen in response to a request that
the driver sends to the GSC. If this is interrupted, the GSC FW will
timeout and the driver request will be failed, but overall the GSC
will keep working fine.
Flushing the work allowed us to avoid interruption in both cases (unless
the hang came from the GSC engine itself, in which case we're toast
anyway). However, a failure on a proxy request is tolerable if we're in
a scenario where we're triggering a GT reset (i.e., something is already
gone pretty wrong), so what we really need to avoid is interrupting
the init flow, which we can do by polling on the register that reports
when the proxy init is complete (as that ensure us that all the load and
init operations have been completed).
Note that during suspend we still want to do a flush of the worker to
make sure it completes any operations involving the HW before the power
is cut.
v2: fix spelling in commit msg, rename waiter function (Julia)
Fixes: dd0e89e5edc2 ("drm/xe/gsc: GSC FW load") Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4830 Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: John Harrison <John.C.Harrison@Intel.com> Cc: Alan Previn <alan.previn.teres.alexis@intel.com> Cc: <stable@vger.kernel.org> # v6.8+ Reviewed-by: Julia Filipchuk <julia.filipchuk@intel.com> Link: https://lore.kernel.org/r/20250502155104.2201469-1-daniele.ceraolospurio@intel.com
Currently userspace software systemd treats `brightness` and
`actual_brightness` identically due to a bug found in an out of tree
driver.
This however causes problems for in-tree drivers that use brightness
to report user requested `brightness` and `actual_brightness` to report
what the hardware actually has programmed.
Clarify the documentation to match the behavior described in commit 6ca017658b1f9 ("[PATCH] backlight: Backlight Class Improvements").
drm/amdgpu: only keep most recent fence for each context
Keep only the latest fences to reduce the number of values
given back to userspace
v2: - Export this code from dma-fence-unwrap.c(by Christian).
v3: - To split this in a dma_buf patch and amd userq patch(by Sunil).
- No need to add a new function just re-use existing(by Christian).
v4: Export dma_fence_dedub_array function and used it(by Christian).
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
drm/amdgpu: Add Support for enforcing isolation without Cleaner Shader
Adjusted the enforce isolation setting handling to include the ability
to disable the cleaner shader without affecting isolation between tasks.
v2: Updated enforce isolation documentation and parameters. (Alex)
Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
dma-fence: Add helper to sort and deduplicate dma_fence arrays
Export a new helper function `dma_fence_dedup_array()` that sorts
an array of dma_fence pointers by context, then deduplicates the array
by retaining only the most recent fence per context.
This utility is useful when merging or optimizing sets of fences where
redundant entries from the same context can be pruned. The operation is
performed in-place and releases references to dropped fences using
dma_fence_put().
v2: - Export this code from dma-fence-unwrap.c(by Christian).
v3: - To split this in a dma_buf patch and amd userq patch(by Sunil).
- No need to add a new function just re-use existing(by Christian).
v4: - Export dma_fence_dedub_array and use it(by Christian).
Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Sunil Khatri <sunil.khatri@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Arvind Yadav <Arvind.Yadav@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>