drm/format-helper: Cache buffers with struct drm_format_conv_state
Hold temporary memory for format conversion in an instance of struct
drm_format_conv_state. Update internal helpers of DRM's format-conversion
code accordingly. Drivers will later be able to maintain this cache by
themselves.
Besides caching, struct drm_format_conv_state will be useful to hold
additional information for format conversion, such as palette data or
foreground/background colors. This will enable conversion from indexed
color formats to component-based formats.
v5:
* improve documentation (Javier, Noralf)
v3:
* rename struct drm_xfrm_buf to struct drm_format_conv_state
(Javier)
* remove managed cleanup
* add drm_format_conv_state_copy() for shadow-plane support
Luben Tuikov [Sat, 11 Nov 2023 01:01:57 +0000 (20:01 -0500)]
Revert "drm/sched: Define pr_fmt() for DRM using pr_*()"
From Jani:
The drm_print.[ch] facilities use very few pr_*() calls directly. The
users of pr_*() calls do not necessarily include <drm/drm_print.h> at
all, and really don't have to.
Even the ones that do include it, usually have <linux/...> includes
first, and <drm/...> includes next. Notably, <linux/kernel.h> includes
<linux/printk.h>.
And, of course, <linux/printk.h> defines pr_fmt() itself if not already
defined.
No, it's encouraged not to use pr_*() at all, and prefer drm device
based logging, or device based logging.
Currently the DRM GPUVM offers common infrastructure to track GPU VA
allocations and mappings, generically connect GPU VA mappings to their
backing buffers and perform more complex mapping operations on the GPU VA
space.
However, there are more design patterns commonly used by drivers, which
can potentially be generalized in order to make the DRM GPUVM represent
a basis for GPU-VM implementations. In this context, this patch aims
at generalizing the following elements.
1) Provide a common dma-resv for GEM objects not being used outside of
this GPU-VM.
2) Provide tracking of external GEM objects (GEM objects which are
shared with other GPU-VMs).
3) Provide functions to efficiently lock all GEM objects dma-resv the
GPU-VM contains mappings of.
4) Provide tracking of evicted GEM objects the GPU-VM contains mappings
of, such that validation of evicted GEM objects is accelerated.
5) Provide some convinience functions for common patterns.
Big thanks to Boris Brezillon for his help to figure out locking for
drivers updating the GPU VA space within the fence signalling path.
Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Suggested-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-12-dakr@redhat.com
drm/gpuvm: add an abstraction for a VM / BO combination
Add an abstraction layer between the drm_gpuva mappings of a particular
drm_gem_object and this GEM object itself. The abstraction represents a
combination of a drm_gem_object and drm_gpuvm. The drm_gem_object holds
a list of drm_gpuvm_bo structures (the structure representing this
abstraction), while each drm_gpuvm_bo contains list of mappings of this
GEM object.
This has multiple advantages:
1) We can use the drm_gpuvm_bo structure to attach it to various lists
of the drm_gpuvm. This is useful for tracking external and evicted
objects per VM, which is introduced in subsequent patches.
2) Finding mappings of a certain drm_gem_object mapped in a certain
drm_gpuvm becomes much cheaper.
3) Drivers can derive and extend the structure to easily represent
driver specific states of a BO for a certain GPUVM.
The idea of this abstraction was taken from amdgpu, hence the credit for
this idea goes to the developers of amdgpu.
Cc: Christian König <christian.koenig@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Danilo Krummrich <dakr@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231108001259.15123-11-dakr@redhat.com
drm/nouveau: make use of the GPUVM's shared dma-resv
DRM GEM objects private to a single GPUVM can use a shared dma-resv.
Make use of the shared dma-resv of GPUVM rather than a driver specific
one.
The shared dma-resv originates from a "root" GEM object serving as
container for the dma-resv to make it compatible with drm_exec.
In order to make sure the object proving the shared dma-resv can't be
freed up before the objects making use of it, let every such GEM object
take a reference on it.
drm/gpuvm: add common dma-resv per struct drm_gpuvm
Provide a common dma-resv for GEM objects not being used outside of this
GPU-VM. This is used in a subsequent patch to generalize dma-resv,
external and evicted object handling and GEM validation.
drm/gpuvm: don't always WARN in drm_gpuvm_check_overflow()
Don't always WARN in drm_gpuvm_check_overflow() and separate it into a
drm_gpuvm_check_overflow() and a dedicated
drm_gpuvm_warn_check_overflow() variant.
This avoids printing warnings due to invalid userspace requests.
Gurchetan Singh [Wed, 18 Oct 2023 18:17:27 +0000 (11:17 -0700)]
drm/uapi: add explicit virtgpu context debug name
There are two problems with the current method of determining the
virtio-gpu debug name.
1) TASK_COMM_LEN is defined to be 16 bytes only, and this is a
Linux kernel idiom (see PR_SET_NAME + PR_GET_NAME). Though,
Android/FreeBSD get around this via setprogname(..)/getprogname(..)
in libc.
On Android, names longer than 16 bytes are common. For example,
one often encounters a program like "com.android.systemui".
The virtio-gpu spec allows the debug name to be up to 64 bytes, so
ideally userspace should be able to set debug names up to 64 bytes.
2) The current implementation determines the debug name using whatever
task initiated virtgpu. This is could be a "RenderThread" of a
larger program, when we actually want to propagate the debug name
of the program.
To fix these issues, add a new CONTEXT_INIT param that allows userspace
to set the debug name when creating a context.
It takes a null-terminated C-string as the param value. The length of the
string (excluding the terminator) **should** be <= 64 bytes. Otherwise,
the debug_name will be truncated to 64 bytes.
Link to open-source userspace:
https://android-review.googlesource.com/c/platform/hardware/google/gfxstream/+/2787176
drm/panfrost: Set regulators on/off during system sleep on MediaTek SoCs
All of the MediaTek SoCs supported by Panfrost can completely cut power
to the GPU during full system sleep without any user-noticeable delay
in the resume operation, as shown by measurements taken on multiple
MediaTek SoCs (MT8183/86/92/95).
As an example, for MT8195 - a "before" with only runtime PM operations
(so, without turning on/off regulators), and an "after" executing both
the system sleep .resume() handler and .runtime_resume() (so the time
refers to T_Resume + T_Runtime_Resume):
Average Panfrost-only system sleep resume time, before: ~33500ns
Average Panfrost-only system sleep resume time, after: ~336200ns
Keep in mind that this additional ~308200 nanoseconds delay happens only
in resume from a full system suspend, and not in runtime PM operations,
hence it is acceptable.
Measurements were also taken on MT8186, showing a delay of ~312000 ns.
Testing of this happened on all of the aforementioned MediaTek SoCs, but:
MT8183 got tested only by KernelCI with <=10 suspend/resume cycles
MT8186, MT8192, MT8195 were tested manually with over 100 suspend/resume
cycles with GNOME DE (Mutter + Wayland).
drm/panfrost: Implement ability to turn on/off regulators in suspend
Some platforms/SoCs can power off the GPU entirely by completely cutting
off power, greatly enhancing battery time during system suspend: add a
new pm_feature GPU_PM_VREG_OFF to allow turning off the GPU regulators
during full suspend only on selected platforms.
drm/panfrost: Set clocks on/off during system sleep on MediaTek SoCs
All of the MediaTek SoCs supported by Panfrost can switch the clocks
off and on during system sleep to save some power without any user
experience penalty.
Measurements taken on multiple MediaTek SoCs (MT8183/8186/8192/8195)
show that adding this will not prolong the time that is required to
resume the system in any meaningful way.
As an example, for MT8195 - a "before" with only runtime PM operations
(so, without turning on/off GPU clocks), and an "after" executing both
the system sleep .resume() handler and .runtime_resume() (so the time
refers to T_Resume + T_Runtime_Resume):
Average Panfrost-only system sleep resume time, before: ~28000ns
Average Panfrost-only system sleep resume time, after: ~33500ns
drm/panfrost: Implement ability to turn on/off GPU clocks in suspend
Currently, the GPU is being internally powered off for runtime suspend
and turned back on for runtime resume through commands sent to it, but
note that the GPU doesn't need to be clocked during the poweroff state,
hence it is possible to save some power on selected platforms.
Add suspend and resume handlers for full system sleep and then add
a new panfrost_gpu_pm enumeration and a pm_features variable in the
panfrost_compatible structure: BIT(GPU_PM_CLK_DIS) will be used to
enable this power saving technique only on SoCs that are able to
safely use it.
Note that this was implemented only for the system sleep case and not
for runtime PM because testing on one of my MediaTek platforms showed
issues when turning on and off clocks aggressively (in PM runtime)
resulting in a full system lockup.
Doing this only for full system sleep never showed issues during my
testing by suspending and resuming the system continuously for more
than 100 cycles.
drm/panfrost: Tighten polling for soft reset and power on
In many cases, soft reset takes more than 1 microsecond, but definitely
less than 10; moreover in the poweron flow, tilers, shaders and l2 will
become ready (each) in less than 10 microseconds as well.
Even in the cases (at least on my platforms, rarely) in which those take
more than 10 microseconds, it's very unlikely to see both soft reset and
poweron to take more than 70 microseconds.
Shorten the polling delay to 10 microseconds to consistently reduce the
runtime resume time of the GPU.
As an indicative example, measurements taken on a MediaTek MT8195 SoC
Average runtime resume time in nanoseconds before this commit:
GDM, user selection up/down: 88435ns
GDM, Text Entry (typing user/password): 91489ns
GNOME Desktop, idling, GKRELLM running: 73200ns
After this commit:
GDM: user selection up/down: 26690ns
GDM: Text Entry (typing user/password): 27917ns
GNOME Desktop, idling, GKRELLM running: 25304ns
drm/panfrost: Perform hard reset to recover GPU if soft reset fails
Even though soft reset should ideally never fail, during development of
some power management features I managed to get some bits wrong: this
resulted in GPU soft reset failures, where the GPU was never able to
recover, not even after suspend/resume cycles, meaning that the only
way to get functionality back was to reboot the machine.
Perform a hard reset after a soft reset failure to be able to recover
the GPU during runtime (so, without any machine reboot).
drm/panfrost: Really power off GPU cores in panfrost_gpu_power_off()
The layout of the registers {TILER,SHADER,L2}_PWROFF_LO, used to request
powering off cores, is the same as the {TILER,SHADER,L2}_PWRON_LO ones:
this means that in order to request poweroff of cores, we are supposed
to write a bitmask of cores that should be powered off!
This means that the panfrost_gpu_power_off() function has always been
doing nothing.
Fix powering off the GPU by writing a bitmask of the cores to poweroff
to the relevant PWROFF_LO registers and then check that the transition
(from ON to OFF) has finished by polling the relevant PWRTRANS_LO
registers.
While at it, in order to avoid code duplication, move the core mask
logic from panfrost_gpu_power_on() to a new panfrost_get_core_mask()
function, used in both poweron and poweroff.
Danilo Krummrich [Fri, 10 Nov 2023 00:16:33 +0000 (01:16 +0100)]
drm/sched: implement dynamic job-flow control
Currently, job flow control is implemented simply by limiting the number
of jobs in flight. Therefore, a scheduler is initialized with a credit
limit that corresponds to the number of jobs which can be sent to the
hardware.
This implies that for each job, drivers need to account for the maximum
job size possible in order to not overflow the ring buffer.
However, there are drivers, such as Nouveau, where the job size has a
rather large range. For such drivers it can easily happen that job
submissions not even filling the ring by 1% can block subsequent
submissions, which, in the worst case, can lead to the ring run dry.
In order to overcome this issue, allow for tracking the actual job size
instead of the number of jobs. Therefore, add a field to track a job's
credit count, which represents the number of credits a job contributes
to the scheduler's credit limit.
Luben Tuikov [Fri, 10 Nov 2023 00:11:46 +0000 (19:11 -0500)]
drm/sched: Define pr_fmt() for DRM using pr_*()
Define pr_fmt() as "[drm] " for DRM code using pr_*() facilities, especially
when no devices are available. This makes it easier to browse kernel logs.
Hsin-Yi Wang [Tue, 7 Nov 2023 20:41:52 +0000 (12:41 -0800)]
drm/panel-edp: drm/panel-edp: Fix AUO B116XTN02 name
Rename AUO 0x235c B116XTN02 to B116XTN02.3 according to decoding edid.
Fixes: 3db2420422a5 ("drm/panel-edp: Add AUO B116XTN02, BOE NT116WHM-N21,836X2, NV116WHM-N49 V8.0") Cc: stable@vger.kernel.org Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20231107204611.3082200-3-hsinyi@chromium.org
Hsin-Yi Wang [Tue, 7 Nov 2023 20:41:51 +0000 (12:41 -0800)]
drm/panel-edp: drm/panel-edp: Fix AUO B116XAK01 name and timing
Rename AUO 0x405c B116XAK01 to B116XAK01.0 and adjust the timing of
auo_b116xak01: T3=200, T12=500, T7_max = 50 according to decoding edid
and datasheet.
Fixes: da458286a5e2 ("drm/panel: Add support for AUO B116XAK01 panel") Cc: stable@vger.kernel.org Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org> Reviewed-by: Douglas Anderson <dianders@chromium.org> Acked-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20231107204611.3082200-2-hsinyi@chromium.org
Luben Tuikov [Thu, 9 Nov 2023 23:53:26 +0000 (18:53 -0500)]
drm/sched: Qualify drm_sched_wakeup() by drm_sched_entity_is_ready()
Don't "wake up" the GPU scheduler unless the entity is ready, as well as we
can queue to the scheduler, i.e. there is no point in waking up the scheduler
for the entity unless the entity is ready.
Jani Nikula [Tue, 31 Oct 2023 10:16:41 +0000 (12:16 +0200)]
drm/edid: use a temp variable for sads to drop one level of dereferences
Use a temporary variable struct cea_sad *, instead of using struct
cea_sad ** directly with the double dereferences. It's arguably easier
on the eyes, and drops a set of parenthesis too.
Jani Nikula [Tue, 31 Oct 2023 10:16:38 +0000 (12:16 +0200)]
drm/edid: split out drm_eld.h from drm_edid.h
The drm_edid.[ch] files are starting to be a bit crowded, and with plans
to add more ELD related functionality, it's perhaps cleanest to split
the ELD code out to a header of its own.
Include drm_eld.h from drm_edid.h for starters, and leave it to
follow-up work to only include drm_eld.h where needed.
Maxime Ripard [Wed, 25 Oct 2023 13:24:28 +0000 (15:24 +0200)]
drm/todo: Add entry to clean up former seltests suites
Most of those suites are undocumented and aren't really clear about what
they are testing. Let's add a TODO entry as a future task to get started
into KUnit and DRM.
Maxime Ripard [Wed, 25 Oct 2023 13:24:27 +0000 (15:24 +0200)]
drm/tests: Remove slow tests
Both the drm_buddy and drm_mm tests have been converted from selftest to
kunit recently.
However, a significant portion of them are trying to exert some part of
their API over a huge number of iterations and with random variations of
their parameters. They are thus more a way to discover new bugs than
actual unit tests.
This is fine in itself but leads to very slow runtime (up to 25s for
some drm_test_mm_reserve and drm_test_mm_insert on a Ryzen 7950x while
running the tests in qemu) which make them a poor fit for kunit.
Let's remove those tests from the drm_mm and drm_buddy test suites for
now, and if it's ever needed we can always create proper unit tests for
them later on.
This made the entire DRM tests execution time (as of v6.6-rc1) come from
65s to less than .5s on a Ryzen 7950x system when running under qemu,
and from 9 minutes to about 4s on a RaspberryPi4.
Usages of DRM_IVPU_BO_UNCACHED should be replaced by DRM_IVPU_BO_WC.
There is no functional benefit from DRM_IVPU_BO_UNCACHED if these
buffers are never mapped to host VM.
This allows to cut the buffer handling code in the kernel driver
by half.
Usage of DRM_IVPU_BO_UNCACHED buffers was removed from user-space
driver and will not be part of first UMD release.
accel/ivpu: Allocate vpu_addr in gem->open() callback
Use gem->open() callback to simplify the code and prepare for gem_shmem
conversion. It is called during handle creation for a gem object,
during prime import and in BO_CREATE ioctl. Hence can be used for vpu_addr
allocation. On the way remove unused bo->user_ptr field.
Luben Tuikov [Thu, 2 Nov 2023 22:17:20 +0000 (18:17 -0400)]
drm/sched: Don't disturb the entity when in RR-mode scheduling
Don't call drm_sched_select_entity() in drm_sched_run_job_queue(). In fact,
rename __drm_sched_run_job_queue() to just drm_sched_run_job_queue(), and let
it do just that, schedule the work item for execution.
The problem is that drm_sched_run_job_queue() calls drm_sched_select_entity()
to determine if the scheduler has an entity ready in one of its run-queues,
and in the case of the Round-Robin (RR) scheduling, the function
drm_sched_rq_select_entity_rr() does just that, selects the _next_ entity
which is ready, sets up the run-queue and completion and returns that
entity. The FIFO scheduling algorithm is unaffected.
Now, since drm_sched_run_job_work() also calls drm_sched_select_entity(), then
in the case of RR scheduling, that would result in drm_sched_select_entity()
having been called twice, which may result in skipping a ready entity if more
than one entity is ready. This commit fixes this by eliminating the call to
drm_sched_select_entity() from drm_sched_run_job_queue(), and leaves it only
in drm_sched_run_job_work().
v2: Rebased on top of Tvrtko's renames series of patches. (Luben)
Add fixes-tag. (Tvrtko)
drm/v3d: Expose the total GPU usage stats on sysfs
The previous patch exposed the accumulated amount of active time per
client for each V3D queue. But this doesn't provide a global notion of
the GPU usage.
Therefore, provide the accumulated amount of active time for each V3D
queue (BIN, RENDER, CSD, TFU and CACHE_CLEAN), considering all the jobs
submitted to the queue, independent of the client.
This data is exposed through the sysfs interface, so that if the
interface is queried at two different points of time the usage percentage
of each of the queues can be calculated.
Co-developed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Melissa Wen <mwen@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230905213416.1290219-3-mcanal@igalia.com
drm/v3d: Implement show_fdinfo() callback for GPU usage stats
This patch exposes the accumulated amount of active time per client
through the fdinfo infrastructure. The amount of active time is exposed
for each V3D queue: BIN, RENDER, CSD, TFU and CACHE_CLEAN.
In order to calculate the amount of active time per client, a CPU clock
is used through the function local_clock(). The point where the jobs has
started is marked and is finally compared with the time that the job had
finished.
Moreover, the number of jobs submitted to each queue is also exposed on
fdinfo through the identifier "v3d-jobs-<queue>".
Co-developed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Maíra Canal <mcanal@igalia.com> Acked-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Reviewed-by: Melissa Wen <mwen@igalia.com> Link: https://patchwork.freedesktop.org/patch/msgid/20230905213416.1290219-3-mcanal@igalia.com
drm: Do not round to megabytes for greater than 1MiB sizes in fdinfo stats
It is better not to lose precision and not revert to 1 MiB size
granularity for every size greater than 1 MiB.
Sizes in KiB should not be so troublesome to read (and in fact machine
parsing is I expect the norm here), they align with other api like
/proc/meminfo, and they allow writing tests for the interface without
having to embed drm.ko implementation knowledge into them. (Like knowing
that minimum buffer size one can use for successful verification has to be
1MiB aligned, and on top account for any pre-existing memory utilisation
outside of driver's control.)
But probably even more importantly I think that it is just better to show
the accurate sizes and not arbitrary lose precision for a little bit of a
stretched use case of eyeballing fdinfo text directly.
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Adrián Larumbe <adrian.larumbe@collabora.com> Cc: steven.price@arm.com Reviewed-by: Steven Price <steven.price@arm.com> Link: https://lore.kernel.org/r/20230927133843.247957-2-tvrtko.ursulin@linux.intel.com Signed-off-by: Maxime Ripard <mripard@kernel.org>
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:38 +0000 (10:55 +0000)]
drm/sched: Drop suffix from drm_sched_wakeup_if_can_queue
Because a) helper is exported to other parts of the scheduler and
b) there isn't a plain drm_sched_wakeup to begin with, I think we can
drop the suffix and by doing so separate the intimiate knowledge
between the scheduler components a bit better.
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:37 +0000 (10:55 +0000)]
drm/sched: Rename drm_sched_run_job_queue_if_ready and clarify kerneldoc
"If ready" is not immediately clear what it means - is the scheduler
ready or something else? Drop the suffix, clarify kerneldoc, and employ
the same naming scheme as in drm_sched_run_free_queue:
- drm_sched_run_job_queue - enqueues if there is something to enqueue
*and* scheduler is ready (can queue)
- __drm_sched_run_job_queue - low-level helper to simply queue the job
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:36 +0000 (10:55 +0000)]
drm/sched: Rename drm_sched_free_job_queue to be more descriptive
The current name makes it sound like helper will free a queue, while what
it does is it enqueues the free job worker.
Rename it to drm_sched_run_free_queue to align with existing
drm_sched_run_job_queue.
Despite that creating an illusion there are two queues, while in reality
there is only one, at least it creates a consistent naming for the two
enqueuing helpers.
At the same time simplify the "if done" helper by dropping the suffix and
adding a double underscore prefix to the one which just enqueues.
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:35 +0000 (10:55 +0000)]
drm/sched: Move free worker re-queuing out of the if block
Whether or not there are more jobs to clean up does not depend on the
existance of the current job, given both drm_sched_get_finished_job and
drm_sched_free_job_queue_if_done take and drop the job list lock.
Therefore it is confusing to make it read like there is a dependency.
Tvrtko Ursulin [Thu, 2 Nov 2023 10:55:34 +0000 (10:55 +0000)]
drm/sched: Rename drm_sched_get_cleanup_job to be more descriptive
"Get cleanup job" makes it sound like helper is returning a job which will
execute some cleanup, or something, while the kerneldoc itself accurately
says "fetch the next _finished_ job". So lets rename the helper to be self
documenting.
accel/qaic: Support for 0 resize slice execution in BO
Add support to partially execute a slice which is resized to zero.
Executing a zero size slice in a BO should mean that there is no DMA
transfers involved but you should still configure doorbell and semaphores.
For example consider a BO of size 18K and it is sliced into 3 6K slices
and user calls partial execute ioctl with resize as 10K.
slice 0 - size is 6k and offset is 0, so resize of 10K will not cut short
this slice hence we send the entire slice for execution.
slice 1 - size is 6k and offset is 6k, so resize of 10K will cut short this
slice and only the first 4k should be DMA along with configuring
doorbell and semaphores.
slice 2 - size is 6k and offset is 12k, so resize of 10k will cut short
this slice and no DMA transfer would be involved but we should
would configure doorbell and semaphores.
This change begs to change the behavior of 0 resize. Currently, 0 resize
partial execute ioctl behaves exactly like execute ioctl i.e. no resize.
After this patch all the slice in BO should behave exactly like slice 2 in
above example.
Refactor copy_partial_exec_reqs() to make it more readable and less
complex.
Carl Vanderlip [Fri, 27 Oct 2023 18:08:10 +0000 (12:08 -0600)]
accel/qaic: Quiet array bounds check on DMA abort message
Current wrapper is right-sized to the message being transferred;
however, this is smaller than the structure defining message wrappers
since the trailing element is a union of message/transfer headers of
various sizes (8 and 32 bytes on 32-bit system where issue was
reported). Using the smaller header with a small message
(wire_trans_dma_xfer is 24 bytes including header) ends up being smaller
than a wrapper with the larger header. There are no accesses outside of
the defined size, however they are possible if the larger union member
is referenced.
Abort messages are outside of hot-path and changing the wrapper struct
would require a larger rewrite, so having the memory allocated to the
message be 8 bytes too big is acceptable.
Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202310182253.bcb9JcyJ-lkp@intel.com/ Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231027180810.4873-1-quic_jhugo@quicinc.com
This patch updates a number of register addresses that have
been changed in Raspberry Pi 5 (V3D 7.1) and updates the
code to use the corresponding registers and addresses based
on the actual V3D version.
Thierry Reding [Wed, 1 Nov 2023 17:20:17 +0000 (18:20 +0100)]
fbdev/simplefb: Add support for generic power-domains
The simple-framebuffer device tree bindings document the power-domains
property, so make sure that simplefb supports it. This ensures that the
power domains remain enabled as long as simplefb is active.
v2: - remove unnecessary call to simplefb_detach_genpds() since that's
already done automatically by devres
- fix crash if power-domains property is missing in DT
Thierry Reding [Wed, 1 Nov 2023 17:20:16 +0000 (18:20 +0100)]
fbdev/simplefb: Support memory-region property
The simple-framebuffer bindings specify that the "memory-region"
property can be used as an alternative to the "reg" property to define
the framebuffer memory used by the display hardware. Implement support
for this in the simplefb driver.
Matthew Brost [Tue, 31 Oct 2023 03:24:37 +0000 (20:24 -0700)]
drm/sched: Split free_job into own work item
Rather than call free_job and run_job in same work item have a dedicated
work item for each. This aligns with the design and intended use of work
queues.
v2:
- Test for DMA_FENCE_FLAG_TIMESTAMP_BIT before setting
timestamp in free_job() work item (Danilo)
v3:
- Drop forward dec of drm_sched_select_entity (Boris)
- Return in drm_sched_run_job_work if entity NULL (Boris)
v4:
- Replace dequeue with peek and invert logic (Luben)
- Wrap to 100 lines (Luben)
- Update comments for *_queue / *_queue_if_ready functions (Luben)
v5:
- Drop peek argument, blindly reinit idle (Luben)
- s/drm_sched_free_job_queue_if_ready/drm_sched_free_job_queue_if_done (Luben)
- Update work_run_job & work_free_job kernel doc (Luben)
v6:
- Do not move drm_sched_select_entity in file (Luben)
Matthew Brost [Tue, 31 Oct 2023 03:24:36 +0000 (20:24 -0700)]
drm/sched: Convert drm scheduler to use a work queue rather than kthread
In Xe, the new Intel GPU driver, a choice has made to have a 1 to 1
mapping between a drm_gpu_scheduler and drm_sched_entity. At first this
seems a bit odd but let us explain the reasoning below.
1. In Xe the submission order from multiple drm_sched_entity is not
guaranteed to be the same completion even if targeting the same hardware
engine. This is because in Xe we have a firmware scheduler, the GuC,
which allowed to reorder, timeslice, and preempt submissions. If a using
shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls
apart as the TDR expects submission order == completion order. Using a
dedicated drm_gpu_scheduler per drm_sched_entity solve this problem.
2. In Xe submissions are done via programming a ring buffer (circular
buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the
limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow
control on the ring for free.
A problem with this design is currently a drm_gpu_scheduler uses a
kthread for submission / job cleanup. This doesn't scale if a large
number of drm_gpu_scheduler are used. To work around the scaling issue,
use a worker rather than kthread for submission / job cleanup.
v2:
- (Rob Clark) Fix msm build
- Pass in run work queue
v3:
- (Boris) don't have loop in worker
v4:
- (Tvrtko) break out submit ready, stop, start helpers into own patch
v5:
- (Boris) default to ordered work queue
v6:
- (Luben / checkpatch) fix alignment in msm_ringbuffer.c
- (Luben) s/drm_sched_submit_queue/drm_sched_wqueue_enqueue
- (Luben) Update comment for drm_sched_wqueue_enqueue
- (Luben) Positive check for submit_wq in drm_sched_init
- (Luben) s/alloc_submit_wq/own_submit_wq
v7:
- (Luben) s/drm_sched_wqueue_enqueue/drm_sched_run_job_queue
v8:
- (Luben) Adjust var names / comments
Matthew Brost [Tue, 31 Oct 2023 03:24:35 +0000 (20:24 -0700)]
drm/sched: Add drm_sched_wqueue_* helpers
Add scheduler wqueue ready, stop, and start helpers to hide the
implementation details of the scheduler from the drivers.
v2:
- s/sched_wqueue/sched_wqueue (Luben)
- Remove the extra white line after the return-statement (Luben)
- update drm_sched_wqueue_ready comment (Luben)
Christian König [Fri, 8 Sep 2023 08:27:23 +0000 (10:27 +0200)]
dma-buf: add dma_fence_timestamp helper
When a fence signals there is a very small race window where the timestamp
isn't updated yet. sync_file solves this by busy waiting for the
timestamp to appear, but on other ocassions didn't handled this
correctly.
Provide a dma_fence_timestamp() helper function for this and use it in
all appropriate cases.
Another alternative would be to grab the spinlock when that happens.
v2 by teddy: add a wait parameter to wait for the timestamp to show up, in case
the accurate timestamp is needed and/or the timestamp is not based on
ktime (e.g. hw timestamp)
v3 chk: drop the parameter again for unified handling
VPU was rebranded as NPU (Neural Processing Unit) so user facing
strings have to be updated but the code remains as is and the module
is still called intel_vpu.ko.
Karol Wachowski [Sat, 28 Oct 2023 15:59:34 +0000 (17:59 +0200)]
accel/ivpu: Make DMA allocations for MMU600 write combined
Previously using dma_alloc_wc() API we created cache coherent
(mapped as write-back) mappings.
Because we disable MMU600 snooping it was required to do costly
page walk and cache flushes after each page table modification.
With write-combined buffers it's possible to do a single write memory
barrier to flush write-combined buffer to memory which simplifies the
driver and significantly reduce time of map/unmap operations.
Mapping time of 255 MB is reduced from 2.5 ms to 500 us.
Waking up process, which wait for particular condition, will go to
sleep again on wake_up() if the condition is not met. Add abort flag
to wake up IPC receivers, which will finish with -ECANCELED error.
This is only needed for reset, run time power management prevent to
suspend VPU when there is pending IPC processing or pending job.
Stop job_done thread when going to suspend. Use kthread_park() instead
of kthread_stop() to avoid memory allocation and potential failure
on resume.
Use separate function as thread wake up condition. Use spin lock to assure
rx_msg_list is properly protected against concurrent access. This avoid
race condition when the rx_msg_list list is modified and read in
ivpu_ipc_recive() at the same time.
accel/ivpu/40xx: Allow to change profiling frequency
Profiling freq is a debug firmware feature. It switches default clock
to higher resolution for fine-grained and more accurate firmware task
profiling. We already configure it during boot up of VPU4.
Add debugfs knob and helpers per HW generation that allow to change it.
For vpu37xx the implementation is empty as profiling frequency can only
be changed on VPU4 or newer.
The DMA-fence annotations cause a lockdep warning (see below). As per
https://patchwork.freedesktop.org/patch/462170/ it sounds like the
annotations don't work correctly.
======================================================
WARNING: possible circular locking dependency detected
6.5.0-rc2+ #2 Not tainted
------------------------------------------------------
kmstest/219 is trying to acquire lock: c4705838 (&hdmi->lock){+.+.}-{3:3}, at: hdmi5_bridge_mode_set+0x1c/0x50
but task is already holding lock: c11e1128 (dma_fence_map){++++}-{0:0}, at: omap_atomic_commit_tail+0x14/0xbc
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
3 locks held by kmstest/219:
#0: f1011de4 (crtc_ww_class_acquire){+.+.}-{0:0}, at: drm_mode_atomic_ioctl+0xf0/0xc38
#1: c47059c8 (crtc_ww_class_mutex){+.+.}-{3:3}, at: modeset_lock+0xf8/0x230
#2: c11e1128 (dma_fence_map){++++}-{0:0}, at: omap_atomic_commit_tail+0x14/0xbc
stack backtrace:
CPU: 1 PID: 219 Comm: kmstest Not tainted 6.5.0-rc2+ #2
Hardware name: Generic DRA74X (Flattened Device Tree)
unwind_backtrace from show_stack+0x10/0x14
show_stack from dump_stack_lvl+0x58/0x70
dump_stack_lvl from check_noncircular+0x164/0x198
check_noncircular from __lock_acquire+0x145c/0x29cc
__lock_acquire from lock_acquire.part.0+0xb4/0x258
lock_acquire.part.0 from __mutex_lock+0x90/0x950
__mutex_lock from mutex_lock_nested+0x1c/0x24
mutex_lock_nested from hdmi5_bridge_mode_set+0x1c/0x50
hdmi5_bridge_mode_set from drm_bridge_chain_mode_set+0x48/0x5c
drm_bridge_chain_mode_set from crtc_set_mode+0x188/0x1d0
crtc_set_mode from omap_atomic_commit_tail+0x2c/0xbc
omap_atomic_commit_tail from commit_tail+0x9c/0x188
commit_tail from drm_atomic_helper_commit+0x158/0x18c
drm_atomic_helper_commit from drm_atomic_commit+0xa4/0xe8
drm_atomic_commit from drm_mode_atomic_ioctl+0x9a4/0xc38
drm_mode_atomic_ioctl from drm_ioctl+0x210/0x4a8
drm_ioctl from sys_ioctl+0x138/0xf00
sys_ioctl from ret_fast_syscall+0x0/0x1c
Exception stack(0xf1011fa8 to 0xf1011ff0)
1fa0: 00466d58be9ab51000000003c03864bcbe9ab510be9ab4e0
1fc0: 00466d58be9ab510c03864bc0000003600466ef000466fc00046702000466f20
1fe0: b6bc7ef4be9ab4d0b6bbbb00b6cb2cc0
The DMA-fence annotations cause a lockdep warning (see below). As per
https://patchwork.freedesktop.org/patch/462170/ it sounds like the
annotations don't work correctly.
======================================================
WARNING: possible circular locking dependency detected
6.6.0-rc2+ #1 Not tainted
------------------------------------------------------
kmstest/733 is trying to acquire lock: ffff8000819377f0 (fs_reclaim){+.+.}-{0:0}, at: __kmem_cache_alloc_node+0x58/0x2d4
but task is already holding lock: ffff800081a06aa0 (dma_fence_map){++++}-{0:0}, at: tidss_atomic_commit_tail+0x20/0xc0 [tidss]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
Steven Price [Fri, 20 Oct 2023 10:44:05 +0000 (11:44 +0100)]
drm/panfrost: Remove incorrect IS_ERR() check
sg_page_iter_page() doesn't return an error code, so the IS_ERR() check
is wrong and the error path will never be executed. This also allows
simplifying the code to remove the local variable 'page'.
Maíra Canal [Mon, 23 Oct 2023 10:58:33 +0000 (07:58 -0300)]
drm/v3d: wait for all jobs to finish before unregistering
Currently, we are only warning the user if the BIN or RENDER jobs don't
finish before we unregister V3D. We must wait for all jobs to finish
before unregistering. Therefore, warn the user if TFU or CSD jobs
are not done by the time the driver is unregistered.
accel/ivpu: Add support for delayed D0i3 entry message
Currently the VPU firmware prepares for D0i3 every time the VPU
is entering D0i2 Idle state. This is not optimal as we might not
enter D0i3 every time we enter D0i2 Idle and this preparation
is quite costly.
This optimization moves D0i3 preparation to a dedicated
message sent from the host driver only when the driver is about
to enter D0i3 - this reduces power consumption and latency for
certain workloads, for example audio workloads that submit
inference every 10 ms.
The VPU needs non zero time to enter IDLE state after responding to
D0i3 entry message. If the driver does not wait for the VPU to enter
IDLE state it could cause warm boot failures.
Split ivpu_ipc_send_receive() implementation to have a version
that does not call pm_runtime_resume_and_get(). That implementation
can be invoked when device is up and runtime resume is prohibited
(for example at the end of boot sequence).
The new function will be used for D0i3 entry IPC message addition
in the separate change.
accel/ivpu: Pass D0i3 residency time to the VPU firmware
The firmware needs to know the time spent in D0i3/D3 to
calculate telemetry data. The D0i3/D3 residency time is
calculated by the driver and passed to the firmware
in the boot parameters.
The driver also passes VPU perf counter value captured
right before entering D0i3 - this allows the VPU firmware
to generate monotonic timestamps for the logs.
accel/ivpu: Add support for VPU_JOB_FLAGS_NULL_SUBMISSION_MASK
Add test_mode = 3 that add VPU_JOB_FLAGS_NULL_SUBMISSION_MASK
flag to the job send to the VPU device. Then the VPU will process
the job but won't execute commands (except the command to signal
the fence).
This can be used to estimate job processing overhead in the host
software and VPU firmware.
Unlike the null hardware mode, the null submission mode will
still work even if UMD uses VPU fences to track job completion.
Arnd Bergmann [Fri, 27 Oct 2023 15:26:23 +0000 (17:26 +0200)]
accel/ivpu: avoid build failure with CONFIG_PM=n
The usage count of struct dev_pm_info is an implementation detail that
is only available if CONFIG_PM is enabled, so printing it in a debug message
causes a build failure in configurations without PM:
In file included from include/linux/device.h:15,
from include/linux/pci.h:37,
from drivers/accel/ivpu/ivpu_pm.c:8:
drivers/accel/ivpu/ivpu_pm.c: In function 'ivpu_rpm_get_if_active':
drivers/accel/ivpu/ivpu_pm.c:254:51: error: 'struct dev_pm_info' has no member named 'usage_count'
254 | atomic_read(&vdev->drm.dev->power.usage_count));
| ^
include/linux/dev_printk.h:129:48: note: in definition of macro 'dev_printk'
129 | _dev_printk(level, dev, fmt, ##__VA_ARGS__); \
| ^~~~~~~~~~~
drivers/accel/ivpu/ivpu_drv.h:75:17: note: in expansion of macro 'dev_dbg'
75 | dev_dbg((vdev)->drm.dev, "[%s] " fmt, #type, ##args); \
| ^~~~~~~
drivers/accel/ivpu/ivpu_pm.c:253:9: note: in expansion of macro 'ivpu_dbg'
253 | ivpu_dbg(vdev, RPM, "rpm_get_if_active count %d\n",
| ^~~~~~~~
The print message does not seem essential, so the easiest workaround is
to just remove it.
Fixes: c39dc15191c4 ("accel/ivpu: Read clock rate only if device is up") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Signed-off-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231027152633.528490-1-arnd@kernel.org
Use QAIC_TIMESYNC MHI channel to send UTC time to device in SBL
environment. Remove support for QAIC_TIMESYNC MHI channel in AMSS
environment as it is not used in that environment.
Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231016170114.5446-3-quic_jhugo@quicinc.com
Ajit Pal Singh [Mon, 16 Oct 2023 17:01:13 +0000 (11:01 -0600)]
accel/qaic: Add support for periodic timesync
Device and Host have a time synchronization mechanism that happens once
during boot when device is in SBL mode. After that, in mission-mode there
is no timesync. In an experiment after continuous operation, device time
drifted w.r.t. host by approximately 3 seconds per day. This drift leads
to mismatch in timestamp of device and Host logs. To correct this
implement periodic timesync in driver. This timesync is carried out via
QAIC_TIMESYNC_PERIODIC MHI channel.
Signed-off-by: Ajit Pal Singh <quic_ajitpals@quicinc.com> Signed-off-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231016170114.5446-2-quic_jhugo@quicinc.com
Carl Vanderlip [Mon, 16 Oct 2023 17:00:36 +0000 (11:00 -0600)]
accel/qaic: Enable 1 MSI fallback mode
Several virtualization use-cases either don't support 32 MultiMSIs
(Xen/VMware) or have significant drawbacks to their use (KVM's vIOMMU,
which is required to support 32 MSI, needs to allocate an alternate
system memory space for each device using vIOMMU (e.g. 8GB VM mem and
2 cards => 8 + 2 * 8 = 24GB host memory required)). Support these
cases by enabling a 1 MSI fallback mode.
Whenever all 32 MSIs requested are not available, a second request for
a single MSI is made. Its success is the initiator of single MSI mode.
This mode causes all interrupts generated by the device to be directed
to the 0th MSI (firmware >=v1.10 will do this as a response to the PCIe
MSI capability configuration). Likewise, all interrupt handlers for the
device are registered to the 0th MSI.
Since the DBC interrupt handler checks if the DBC is in use or if
there is any pending changes, the 'spurious' interrupts are
disregarded. If there is work to be done, the standard threaded IRQ
handler is dispatched.
On every interrupt, the MHI handler wakes up its threaded interrupt
handler, and attempts to wake any waiters for MHI state events.
Performance is within +-0.6% for test cases that typify real world
use. Larger differences ([-4,+132]%, avg +47%) exist for very simple
tasks (e.g. addition) compiled for single NSPs. It is assumed that the
small work and many interrupts typically cause contention (e.g. 16 NSPs
vs 4 CPUs), as evidenced by the standard deviation between runs also
decreasing (r=-0.48 between delta(Performace_test) and
delta(StdDev_test/Avg_test))
Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Reviewed-by: Stanislaw Gruszka <stanislaw.gruszka@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231016170036.5409-1-quic_jhugo@quicinc.com
Simon Ser [Mon, 23 Oct 2023 20:36:39 +0000 (20:36 +0000)]
drm/doc: describe PATH format for DP MST
This is already uAPI, xserver parses it. It's useful to document
since user-space might want to lookup the parent connector.
Additionally, people (me included) have misunderstood the PATH
property for being stable across reboots, but since a KMS object
ID is baked in there that's not the case. So PATH shouldn't be
used as-is in config files and such.
Simon Ser [Fri, 20 Oct 2023 10:19:45 +0000 (10:19 +0000)]
drm: introduce CLOSEFB IOCTL
This new IOCTL allows callers to close a framebuffer without
disabling planes or CRTCs. This takes inspiration from Rob Clark's
unref_fb IOCTL [1] and DRM_MODE_FB_PERSIST [2].
User-space patch for wlroots available at [3]. IGT test available
at [4].
v2: add an extra pad field just in case we want to extend this IOCTL
in the future (Pekka, Sima).
Signed-off-by: Simon Ser <contact@emersion.fr> Reviewed-by: Daniel Stone <daniels@collabora.com> Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Dennis Filder <d.filder@web.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20231020101926.145327-2-contact@emersion.fr
Simon Ser [Fri, 20 Oct 2023 10:19:38 +0000 (10:19 +0000)]
drm: extract closefb logic in separate function
drm_mode_rmfb performs two operations: drop the FB from the
file_priv->fbs list, and make sure the FB is no longer used on a
plane.
In the next commit an IOCTL which only does so former will be
introduced, so let's split it into a separate function.
No functional change, only refactoring.
v2: no change
Signed-off-by: Simon Ser <contact@emersion.fr> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Dennis Filder <d.filder@web.de> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Pekka Paalanen <ppaalanen@gmail.com> Cc: Rob Clark <robdclark@gmail.com> Cc: Sean Paul <seanpaul@chromium.org> Reviewed-by: Daniel Stone <daniels@collabora.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20231020101926.145327-1-contact@emersion.fr