When xa_alloc_cyclic() failed in etnaviv_sched_push_job(), the error
path skipped drm_sched_entity_push_job(). This is a violation of the DRM
scheduler contract, as once a job has been armed with drm_sched_job_arm(),
it must be pushed with drm_sched_entity_push_job(). From the DRM
scheduler documentation,
"""
drm_sched_job_arm() is a point of no return since it initializes the
fences and their sequence number etc. Once that function has been called,
you *must* submit it with drm_sched_entity_push_job() and cannot simply
abort it by calling drm_sched_job_cleanup().
"""
Fix this by splitting the fence ID allocation into two phases: first,
alloc an xarray slot before arming the job (which can fail), then fill in
the actual fence with xa_store() after arming. This way, allocation
failures are handled before the job is armed, and once armed, the job is
always pushed to the scheduler.
This also fixes a double call to drm_sched_job_cleanup(), as both
etnaviv_sched_push_job() and its caller would call it on failure.
Fixes: 764be12345c3 ("drm/etnaviv: convert user fence tracking to XArray")
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Link: https://patch.msgid.link/20260402193424.2023318-1-mcanal@igalia.com
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
*/
mutex_lock(&gpu->sched_lock);
+ ret = xa_alloc_cyclic(&gpu->user_fences, &submit->out_fence_id,
+ NULL, xa_limit_32b, &gpu->next_user_fence,
+ GFP_KERNEL);
+ if (ret < 0)
+ goto out_unlock;
+
drm_sched_job_arm(&submit->sched_job);
submit->out_fence = dma_fence_get(&submit->sched_job.s_fence->finished);
- ret = xa_alloc_cyclic(&gpu->user_fences, &submit->out_fence_id,
- submit->out_fence, xa_limit_32b,
- &gpu->next_user_fence, GFP_KERNEL);
- if (ret < 0) {
- drm_sched_job_cleanup(&submit->sched_job);
- goto out_unlock;
- }
+
+ xa_store(&gpu->user_fences, submit->out_fence_id,
+ submit->out_fence, GFP_KERNEL);
/* the scheduler holds on to the job now */
kref_get(&submit->refcount);