From: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Date: Fri, 19 Jun 2026 09:53:59 +0000 (+0200)
Subject: 7.0-stable patches
X-Git-Tag: v5.10.259~1
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=348c97fa4404919fd5098d07ac17852fd6b64afd;p=thirdparty%2Fkernel%2Fstable-queue.git

7.0-stable patches

added patches:
	drm-amdgpu-drop-retry-loop-in-amdgpu_hmm_range_get_pages.patch
---

diff --git a/queue-7.0/drm-amdgpu-drop-retry-loop-in-amdgpu_hmm_range_get_pages.patch b/queue-7.0/drm-amdgpu-drop-retry-loop-in-amdgpu_hmm_range_get_pages.patch
new file mode 100644
index 0000000000..0f2ff44840
--- /dev/null
+++ b/queue-7.0/drm-amdgpu-drop-retry-loop-in-amdgpu_hmm_range_get_pages.patch
@@ -0,0 +1,74 @@
+From 342981fff32802a819d6fc7cf3c9fedf9f3d9d60 Mon Sep 17 00:00:00 2001
+From: Honglei Huang <honghuan@amd.com>
+Date: Fri, 29 May 2026 10:23:17 +0800
+Subject: drm/amdgpu: drop retry loop in amdgpu_hmm_range_get_pages
+MIME-Version: 1.0
+Content-Type: text/plain; charset=UTF-8
+Content-Transfer-Encoding: 8bit
+
+From: Honglei Huang <honghuan@amd.com>
+
+commit 342981fff32802a819d6fc7cf3c9fedf9f3d9d60 upstream.
+
+Since commit c08972f55594 ("drm/amdgpu: fix amdgpu_hmm_range_get_pages")
+moved mmu_interval_read_begin() out of the per-chunk loop, the
+captured notifier_seq is no longer refreshed across retries. As a
+result, the existing -EBUSY retry path can never make progress:
+
+  hmm_range_fault() returns -EBUSY only when
+  mmu_interval_check_retry(notifier, notifier_seq) reports that the
+  sequence is stale. Once the sequence has advanced, the stored seq
+  will never match again, so every subsequent call within the same
+  invocation returns -EBUSY immediately.
+
+The "goto retry" therefore degenerates into a busy spin that simply
+burns CPU for the full HMM_RANGE_DEFAULT_TIMEOUT (~1s) window before
+finally bailing out with -EAGAIN. This is pure latency with no chance
+of recovery, and it actively hurts the KFD userptr stack: the caller
+ends up blocked for a second while holding mmap_lock, only to return
+-EAGAIN to the restore worker (or to userspace) which would have
+re-driven the operation immediately anyway.
+
+Drop the retry/timeout entirely and let -EBUSY propagate straight to
+out_free_pfns, where it is already translated to -EAGAIN. Recovery is
+handled at a higher level: the KFD restore_userptr_worker reschedules
+itself, and the userptr ioctl path returns -EAGAIN to userspace.
+
+No functional regression: the previous behaviour on -EBUSY was already
+to fail with -EAGAIN after a 1s stall; we just skip the stall.
+
+Reviewed-by: Christian KÃ¶nig <christian.koenig@amd.com>
+Signed-off-by: Honglei Huang <honghuan@amd.com>
+Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
+Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
+---
+ drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c |    9 +--------
+ 1 file changed, 1 insertion(+), 8 deletions(-)
+
+--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
++++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c
+@@ -174,7 +174,6 @@ int amdgpu_hmm_range_get_pages(struct mm
+ 	const u64 max_bytes = SZ_2G;
+ 
+ 	struct hmm_range *hmm_range = &range->hmm_range;
+-	unsigned long timeout;
+ 	unsigned long *pfns;
+ 	unsigned long end;
+ 	int r;
+@@ -201,15 +200,9 @@ int amdgpu_hmm_range_get_pages(struct mm
+ 		pr_debug("hmm range: start = 0x%lx, end = 0x%lx",
+ 			hmm_range->start, hmm_range->end);
+ 
+-		timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
+-
+-retry:
+ 		r = hmm_range_fault(hmm_range);
+-		if (unlikely(r)) {
+-			if (r == -EBUSY && !time_after(jiffies, timeout))
+-				goto retry;
++		if (unlikely(r))
+ 			goto out_free_pfns;
+-		}
+ 
+ 		if (hmm_range->end == end)
+ 			break;
diff --git a/queue-7.0/series b/queue-7.0/series
index 0d640aa056..c416088f1e 100644
--- a/queue-7.0/series
+++ b/queue-7.0/series
@@ -378,3 +378,4 @@ arm64-errata-mitigate-tlbi-errata-on-nvidia-olympus-cpu.patch
 arm64-errata-mitigate-tlbi-errata-on-microsoft-azure-cobalt-100-cpu.patch
 vsock-virtio-fix-skb-overhead-overflow-on-32-bit-builds.patch
 netfilter-require-ethernet-mac-header-before-using-e.patch
+drm-amdgpu-drop-retry-loop-in-amdgpu_hmm_range_get_pages.patch