From: Greg Kroah-Hartman Date: Sat, 12 Jul 2025 14:36:48 +0000 (+0200) Subject: 5.15-stable patches X-Git-Tag: v5.15.188~55 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=9bff8e040bee654edb6bf47b166c3efbee048cec;p=thirdparty%2Fkernel%2Fstable-queue.git 5.15-stable patches added patches: drm-gem-fix-race-in-drm_gem_handle_create_tail.patch drm-sched-increment-job-count-before-swapping-tail-spsc-queue.patch --- diff --git a/queue-5.15/drm-gem-fix-race-in-drm_gem_handle_create_tail.patch b/queue-5.15/drm-gem-fix-race-in-drm_gem_handle_create_tail.patch new file mode 100644 index 0000000000..40bb85d217 --- /dev/null +++ b/queue-5.15/drm-gem-fix-race-in-drm_gem_handle_create_tail.patch @@ -0,0 +1,140 @@ +From bd46cece51a36ef088f22ef0416ac13b0a46d5b0 Mon Sep 17 00:00:00 2001 +From: Simona Vetter +Date: Mon, 7 Jul 2025 17:18:13 +0200 +Subject: drm/gem: Fix race in drm_gem_handle_create_tail() + +From: Simona Vetter + +commit bd46cece51a36ef088f22ef0416ac13b0a46d5b0 upstream. + +Object creation is a careful dance where we must guarantee that the +object is fully constructed before it is visible to other threads, and +GEM buffer objects are no difference. + +Final publishing happens by calling drm_gem_handle_create(). After +that the only allowed thing to do is call drm_gem_object_put() because +a concurrent call to the GEM_CLOSE ioctl with a correctly guessed id +(which is trivial since we have a linear allocator) can already tear +down the object again. + +Luckily most drivers get this right, the very few exceptions I've +pinged the relevant maintainers for. Unfortunately we also need +drm_gem_handle_create() when creating additional handles for an +already existing object (e.g. GETFB ioctl or the various bo import +ioctl), and hence we cannot have a drm_gem_handle_create_and_put() as +the only exported function to stop these issues from happening. + +Now unfortunately the implementation of drm_gem_handle_create() isn't +living up to standards: It does correctly finishe object +initialization at the global level, and hence is safe against a +concurrent tear down. But it also sets up the file-private aspects of +the handle, and that part goes wrong: We fully register the object in +the drm_file.object_idr before calling drm_vma_node_allow() or +obj->funcs->open, which opens up races against concurrent removal of +that handle in drm_gem_handle_delete(). + +Fix this with the usual two-stage approach of first reserving the +handle id, and then only registering the object after we've completed +the file-private setup. + +Jacek reported this with a testcase of concurrently calling GEM_CLOSE +on a freshly-created object (which also destroys the object), but it +should be possible to hit this with just additional handles created +through import or GETFB without completed destroying the underlying +object with the concurrent GEM_CLOSE ioctl calls. + +Note that the close-side of this race was fixed in f6cd7daecff5 ("drm: +Release driver references to handle before making it available +again"), which means a cool 9 years have passed until someone noticed +that we need to make this symmetry or there's still gaps left :-/ +Without the 2-stage close approach we'd still have a race, therefore +that's an integral part of this bugfix. + +More importantly, this means we can have NULL pointers behind +allocated id in our drm_file.object_idr. We need to check for that +now: + +- drm_gem_handle_delete() checks for ERR_OR_NULL already + +- drm_gem.c:object_lookup() also chekcs for NULL + +- drm_gem_release() should never be called if there's another thread + still existing that could call into an IOCTL that creates a new + handle, so cannot race. For paranoia I added a NULL check to + drm_gem_object_release_handle() though. + +- most drivers (etnaviv, i915, msm) are find because they use + idr_find(), which maps both ENOENT and NULL to NULL. + +- drivers using idr_for_each_entry() should also be fine, because + idr_get_next does filter out NULL entries and continues the + iteration. + +- The same holds for drm_show_memory_stats(). + +v2: Use drm_WARN_ON (Thomas) + +Reported-by: Jacek Lawrynowicz +Tested-by: Jacek Lawrynowicz +Reviewed-by: Thomas Zimmermann +Cc: stable@vger.kernel.org +Cc: Jacek Lawrynowicz +Cc: Maarten Lankhorst +Cc: Maxime Ripard +Cc: Thomas Zimmermann +Cc: David Airlie +Cc: Simona Vetter +Signed-off-by: Simona Vetter +Signed-off-by: Simona Vetter +Link: https://patchwork.freedesktop.org/patch/msgid/20250707151814.603897-1-simona.vetter@ffwll.ch +Signed-off-by: Greg Kroah-Hartman +--- + drivers/gpu/drm/drm_gem.c | 10 +++++++++- + include/drm/drm_file.h | 3 +++ + 2 files changed, 12 insertions(+), 1 deletion(-) + +--- a/drivers/gpu/drm/drm_gem.c ++++ b/drivers/gpu/drm/drm_gem.c +@@ -234,6 +234,9 @@ drm_gem_object_release_handle(int id, vo + struct drm_file *file_priv = data; + struct drm_gem_object *obj = ptr; + ++ if (drm_WARN_ON(obj->dev, !data)) ++ return 0; ++ + if (obj->funcs->close) + obj->funcs->close(obj, file_priv); + +@@ -361,7 +364,7 @@ drm_gem_handle_create_tail(struct drm_fi + idr_preload(GFP_KERNEL); + spin_lock(&file_priv->table_lock); + +- ret = idr_alloc(&file_priv->object_idr, obj, 1, 0, GFP_NOWAIT); ++ ret = idr_alloc(&file_priv->object_idr, NULL, 1, 0, GFP_NOWAIT); + + spin_unlock(&file_priv->table_lock); + idr_preload_end(); +@@ -382,6 +385,11 @@ drm_gem_handle_create_tail(struct drm_fi + goto err_revoke; + } + ++ /* mirrors drm_gem_handle_delete to avoid races */ ++ spin_lock(&file_priv->table_lock); ++ obj = idr_replace(&file_priv->object_idr, obj, handle); ++ WARN_ON(obj != NULL); ++ spin_unlock(&file_priv->table_lock); + *handlep = handle; + return 0; + +--- a/include/drm/drm_file.h ++++ b/include/drm/drm_file.h +@@ -273,6 +273,9 @@ struct drm_file { + * + * Mapping of mm object handles to object pointers. Used by the GEM + * subsystem. Protected by @table_lock. ++ * ++ * Note that allocated entries might be NULL as a transient state when ++ * creating or deleting a handle. + */ + struct idr object_idr; + diff --git a/queue-5.15/drm-sched-increment-job-count-before-swapping-tail-spsc-queue.patch b/queue-5.15/drm-sched-increment-job-count-before-swapping-tail-spsc-queue.patch new file mode 100644 index 0000000000..36ea90986e --- /dev/null +++ b/queue-5.15/drm-sched-increment-job-count-before-swapping-tail-spsc-queue.patch @@ -0,0 +1,50 @@ +From 8af39ec5cf2be522c8eb43a3d8005ed59e4daaee Mon Sep 17 00:00:00 2001 +From: Matthew Brost +Date: Fri, 13 Jun 2025 14:20:13 -0700 +Subject: drm/sched: Increment job count before swapping tail spsc queue +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +From: Matthew Brost + +commit 8af39ec5cf2be522c8eb43a3d8005ed59e4daaee upstream. + +A small race exists between spsc_queue_push and the run-job worker, in +which spsc_queue_push may return not-first while the run-job worker has +already idled due to the job count being zero. If this race occurs, job +scheduling stops, leading to hangs while waiting on the job’s DMA +fences. + +Seal this race by incrementing the job count before appending to the +SPSC queue. + +This race was observed on a drm-tip 6.16-rc1 build with the Xe driver in +an SVM test case. + +Fixes: 1b1f42d8fde4 ("drm: move amd_gpu_scheduler into common location") +Fixes: 27105db6c63a ("drm/amdgpu: Add SPSC queue to scheduler.") +Cc: stable@vger.kernel.org +Signed-off-by: Matthew Brost +Reviewed-by: Jonathan Cavitt +Link: https://lore.kernel.org/r/20250613212013.719312-1-matthew.brost@intel.com +Signed-off-by: Greg Kroah-Hartman +--- + include/drm/spsc_queue.h | 4 +++- + 1 file changed, 3 insertions(+), 1 deletion(-) + +--- a/include/drm/spsc_queue.h ++++ b/include/drm/spsc_queue.h +@@ -70,9 +70,11 @@ static inline bool spsc_queue_push(struc + + preempt_disable(); + ++ atomic_inc(&queue->job_count); ++ smp_mb__after_atomic(); ++ + tail = (struct spsc_node **)atomic_long_xchg(&queue->tail, (long)&node->next); + WRITE_ONCE(*tail, node); +- atomic_inc(&queue->job_count); + + /* + * In case of first element verify new node will be visible to the consumer diff --git a/queue-5.15/series b/queue-5.15/series index f03b110a89..8fea5b07ca 100644 --- a/queue-5.15/series +++ b/queue-5.15/series @@ -29,3 +29,5 @@ x86-mce-don-t-remove-sysfs-if-thresholding-sysfs-init-fails.patch x86-mce-make-sure-cmci-banks-are-cleared-during-shutdown-on-intel.patch gre-fix-ipv6-multicast-route-creation.patch pinctrl-qcom-msm-mark-certain-pins-as-invalid-for-interrupts.patch +drm-sched-increment-job-count-before-swapping-tail-spsc-queue.patch +drm-gem-fix-race-in-drm_gem_handle_create_tail.patch