]> git.ipfire.org Git - thirdparty/kernel/linux.git/commit
Revert "svcrdma: Use contiguous pages for RDMA Read sink buffers"
authorChuck Lever <chuck.lever@oracle.com>
Sun, 7 Jun 2026 02:24:56 +0000 (22:24 -0400)
committerChuck Lever <cel@kernel.org>
Tue, 9 Jun 2026 20:32:59 +0000 (16:32 -0400)
commita39f0ce0c9da20986b429e2db3e4e8739035d61b
treebbc502744c6c785bff82d850edaf4caf3640a74e
parent1b8d52328e0d37a51d1a4b11cf0f118a59dc4239
Revert "svcrdma: Use contiguous pages for RDMA Read sink buffers"

Jonathan Flynn reports that commit 18755b8c2f24 ("svcrdma: Use
contiguous pages for RDMA Read sink buffers") regresses NFS/RDMA
WRITE throughput from 73.9 GiB/s to 30.3 GiB/s on a 128-core
single-NUMA-node server driving dual 400Gb/s links with 640 nfsd
threads. Server CPU utilization rises from 8.5% to 76%, with
roughly three quarters of all cycles spent spinning on zone->lock.

The sink buffers are allocated as high-order page blocks, split
into single pages so each sub-page carries an independent refcount,
and later released one page at a time through folio batches. The
per-CPU page caches cannot satisfy an allocation stream whose alloc
order differs from its free order, so every sink buffer page makes
a round trip through the buddy allocator's free lists, serialized
on the zone lock of the single NUMA node. The rq_pages entries that
the split pages displace, bulk-allocated moments earlier by
svc_alloc_arg(), are freed without ever being used, doubling the
allocator traffic.

The regression cannot be addressed trivially. Revert the commit
now; a reworked approach can return in an upcoming merge window.

Reported-by: Jonathan Flynn <jonathan.flynn@hammerspace.com>
Reported-by: Mike Snitzer <snitzer@kernel.org>
Closes: https://lore.kernel.org/linux-nfs/aiHlPmeZq3WgMwoJ@kernel.org/
Closes: https://lore.kernel.org/linux-nfs/3cb119b4b2a8aada30c0c60286778a54@mail.gmail.com/
Fixes: 18755b8c2f24 ("svcrdma: Use contiguous pages for RDMA Read sink buffers")
Cc: stable@vger.kernel.org
Tested-by: Jonathan Flynn <jonathan.flynn@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
net/sunrpc/xprtrdma/svc_rdma_rw.c