The DRM scheduler tracks who last uses an entity and when that process
is killed blocks all further submissions to that entity.
The problem is that we didn't track who initially created an entity, so
when a process accidently leaked its file descriptor to a child and
that child got killed, we killed the parent's entities.
Avoid that and instead initialize the entities last user on entity
creation. This also allows to drop the extra NULL check.
Signed-off-by: David Rosca <david.rosca@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4568
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
CC: stable@vger.kernel.org
Acked-by: Philipp Stanner <phasta@kernel.org>
Link: https://lore.kernel.org/r/20251015140128.1470-1-christian.koenig@amd.com
Signed-off-by: Philipp Stanner <phasta@kernel.org>
Link: https://patch.msgid.link/20251015140128.1470-1-christian.koenig@amd.com
entity->guilty = guilty;
entity->num_sched_list = num_sched_list;
entity->priority = priority;
+ entity->last_user = current->group_leader;
/*
* It's perfectly valid to initialize an entity without having a valid
* scheduler attached. It's just not valid to use the scheduler before it
/* For a killed process disallow further enqueueing of jobs. */
last_user = cmpxchg(&entity->last_user, current->group_leader, NULL);
- if ((!last_user || last_user == current->group_leader) &&
+ if (last_user == current->group_leader &&
(current->flags & PF_EXITING) && (current->exit_code == SIGKILL))
drm_sched_entity_kill(entity);