ibmvnic: Only record tx completed bytes once per handler
Byte Queue Limits depends on dql_completed being called once per tx
completion round in order to adjust its algorithm appropriately. The
dql->limit value is an approximation of the amount of bytes that the NIC
can consume per irq interval. If this approximation is too high then the
NIC will become over-saturated. Too low and the NIC will starve.
The dql->limit depends on dql->prev-* stats to calculate an optimal
value. If dql_completed() is called more than once per irq handler then
those prev-* values become unreliable (because they are not an accurate
representation of the previous state of the NIC) resulting in a
sub-optimal limit value.
Therefore, move the call to netdev_tx_completed_queue() to the end of
ibmvnic_complete_tx().
When performing 150 sessions of TCP rr (request-response 1 byte packets)
workloads, one could observe:
PREVIOUSLY: - limit and inflight values hovering around 130
- transaction rate of around 750k pps.
NOW: - limit rises and falls in response to inflight (130-900)
- transaction rate of around 1M pps (33% improvement)
Signed-off-by: Nick Child <nnac123@linux.ibm.com>
Link: https://patch.msgid.link/20240807211809.1259563-7-nnac123@linux.ibm.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>