This patch adds autocorking support for SMC which could improve
throughput for small message by x3+.
The main idea is borrowed from TCP autocorking with some RDMA
specific modification:
1. The first message should never cork to make sure we won't
bring extra latency
2. If we have posted any Tx WRs to the NIC that have not
completed, cork the new messages until:
a) Receive CQE for the last Tx WR
b) We have corked enough message on the connection
3. Try to push the corked data out when we receive CQE of
the last Tx WR to prevent the corked messages hang in
the send queue.
Both SMC autocorking and TCP autocorking check the TX completion
to decide whether we should cork or not. The difference is
when we got a SMC Tx WR completion, the data have been confirmed
by the RNIC while TCP TX completion just tells us the data
have been sent out by the local NIC.
Add an atomic variable tx_pushing in smc_connection to make
sure only one can send to let it cork more and save CDC slot.
SMC autocorking should not bring extra latency since the first
message will always been sent out immediately.
The qperf tcp_bw test shows more than x4 increase under small
message size with Mellanox connectX4-Lx, same result with other
throughput benchmarks like sockperf/netperf.
The qperf tcp_lat test shows SMC autocorking has not increase any
ping-pong latency.