From: Philipp Stanner Date: Thu, 8 Jan 2026 08:30:20 +0000 (+0100) Subject: drm/sched: Remove racy hack from drm_sched_fini() X-Git-Url: http://git.ipfire.org/gitweb.cgi?a=commitdiff_plain;h=4827d6d83f079f7d58d6ec0ed49f85dd2b90837f;p=thirdparty%2Flinux.git drm/sched: Remove racy hack from drm_sched_fini() drm_sched_fini() contained a hack to work around a race in amdgpu. According to AMD, the hack should not be necessary anymore. In case there should have been undetected users, commit 975ca62a014c ("drm/sched: Add warning for removing hack in drm_sched_fini()") had added a warning one release cycle ago. Thus, it can be derived that the hack can be savely removed by now. Remove the hack. Acked-by: Danilo Krummrich Signed-off-by: Philipp Stanner Link: https://patch.msgid.link/20260108083019.63532-2-phasta@kernel.org --- diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index e6ee35406165a..13fa55aed3daa 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -1418,48 +1418,12 @@ static void drm_sched_cancel_remaining_jobs(struct drm_gpu_scheduler *sched) */ void drm_sched_fini(struct drm_gpu_scheduler *sched) { - struct drm_sched_entity *s_entity; int i; drm_sched_wqueue_stop(sched); - for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) { - struct drm_sched_rq *rq = sched->sched_rq[i]; - - spin_lock(&rq->lock); - list_for_each_entry(s_entity, &rq->entities, list) { - /* - * Prevents reinsertion and marks job_queue as idle, - * it will be removed from the rq in drm_sched_entity_fini() - * eventually - * - * FIXME: - * This lacks the proper spin_lock(&s_entity->lock) and - * is, therefore, a race condition. Most notably, it - * can race with drm_sched_entity_push_job(). The lock - * cannot be taken here, however, because this would - * lead to lock inversion -> deadlock. - * - * The best solution probably is to enforce the life - * time rule of all entities having to be torn down - * before their scheduler. Then, however, locking could - * be dropped alltogether from this function. - * - * For now, this remains a potential race in all - * drivers that keep entities alive for longer than - * the scheduler. - * - * The READ_ONCE() is there to make the lockless read - * (warning about the lockless write below) slightly - * less broken... - */ - if (!READ_ONCE(s_entity->stopped)) - dev_warn(sched->dev, "Tearing down scheduler with active entities!\n"); - s_entity->stopped = true; - } - spin_unlock(&rq->lock); + for (i = DRM_SCHED_PRIORITY_KERNEL; i < sched->num_rqs; i++) kfree(sched->sched_rq[i]); - } /* Wakeup everyone stuck in drm_sched_entity_flush for this scheduler */ wake_up_all(&sched->job_scheduled);