svc_xprt_received() unconditionally calls
svc_xprt_enqueue() after clearing XPT_BUSY. When no
work flags are pending, the enqueue traverses
svc_xprt_ready() -- executing an smp_rmb(), READ_ONCE(),
and tracepoint -- before returning false.
Trace data from a 256KB NFSv3 workload over RDMA shows
85% of svc_xprt_received() invocations reach
svc_xprt_enqueue() with no pending work flags. In the
WRITE phase, 167,335 of 196,420 calls find no work; in
the READ phase, 97,165 of 98,276. Each unnecessary call
executes a memory barrier, a flags read, and (when
tracing is active) fires the svc_xprt_enqueue
tracepoint.
Add a flags pre-check between clear_bit(XPT_BUSY) and
svc_xprt_enqueue(). Both the clear and the subsequent
READ_ONCE operate on the same xpt_flags word, so
cache-line serialization of the atomic bitops ensures
the read observes any flag set by a concurrent producer
before the line was acquired for the clear. If a
producer's set_bit occurs after the clear_bit, that
producer's own svc_xprt_enqueue() call observes
!XPT_BUSY and dispatches the transport.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
svc_xprt_get(xprt);
smp_mb__before_atomic();
clear_bit(XPT_BUSY, &xprt->xpt_flags);
- svc_xprt_enqueue(xprt);
+
+ /*
+ * Skip the enqueue when no actionable flags are set.
+ * Each producer both sets its flag (XPT_DATA, XPT_CLOSE,
+ * etc.) and calls svc_xprt_enqueue(); if a set_bit races
+ * with this check, the producer's own enqueue observes
+ * !XPT_BUSY and dispatches the transport.
+ */
+ if (READ_ONCE(xprt->xpt_flags) &
+ (BIT(XPT_CONN) | BIT(XPT_CLOSE) | BIT(XPT_HANDSHAKE) |
+ BIT(XPT_DATA) | BIT(XPT_DEFERRED)))
+ svc_xprt_enqueue(xprt);
+
svc_xprt_put(xprt);
}
EXPORT_SYMBOL_GPL(svc_xprt_received);