Each RDMA Send completion triggers a cascade of work items on the
svcrdma_wq unbound workqueue:
ib_cq_poll_work (on ib_comp_wq, per-CPU)
-> svc_rdma_send_ctxt_put -> queue_work [work item 1]
-> svc_rdma_write_info_free -> queue_work [work item 2]
Every transition through queue_work contends on the unbound
pool's spinlock. Profiling an 8KB NFSv3 read/write workload
over RDMA shows about 4% of total CPU cycles spent on this
lock, with the cascading re-queue of write_info release
contributing roughly 1%.
The initial queue_work in svc_rdma_send_ctxt_put is needed to
move release work off the CQ completion context (which runs on
a per-CPU bound workqueue). However, once executing on
svcrdma_wq, there is no need to re-queue for each write_info
structure. svc_rdma_reply_chunk_release already calls
svc_rdma_cc_release inline from the same svcrdma_wq context,
and svc_rdma_recv_ctxt_put does the same from nfsd thread
context.
Release write chunk resources inline in
svc_rdma_write_info_free, removing the intermediate
svc_rdma_write_info_free_async work item and the wi_work
field from struct svc_rdma_write_info.
Reviewed-by: Mike Snitzer <snitzer@kernel.org>
Tested-by: Jonathan Flynn <jonathan.flynn@hammerspace.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
unsigned int wi_next_off;
struct svc_rdma_chunk_ctxt wi_cc;
- struct work_struct wi_work;
};
struct svc_rdma_send_ctxt {
return info;
}
-static void svc_rdma_write_info_free_async(struct work_struct *work)
+static void svc_rdma_write_info_free(struct svc_rdma_write_info *info)
{
- struct svc_rdma_write_info *info;
-
- info = container_of(work, struct svc_rdma_write_info, wi_work);
svc_rdma_cc_release(info->wi_rdma, &info->wi_cc, DMA_TO_DEVICE);
kfree(info);
}
-static void svc_rdma_write_info_free(struct svc_rdma_write_info *info)
-{
- INIT_WORK(&info->wi_work, svc_rdma_write_info_free_async);
- queue_work(svcrdma_wq, &info->wi_work);
-}
-
/**
* svc_rdma_write_chunk_release - Release Write chunk I/O resources
* @rdma: controlling transport