net: add a fast path in __netif_schedule()
Cpus serving NIC interrupts and specifically TX completions are often
trapped in also restarting a busy qdisc (because qdisc was stopped by BQL
or the driver's own flow control).
When they call netdev_tx_completed_queue() or netif_tx_wake_queue(),
they call __netif_schedule() so that the queue can be run
later from net_tx_action() (involving NET_TX_SOFTIRQ)
Quite often, by the time the cpu reaches net_tx_action(), another cpu
grabbed the qdisc spinlock from __dev_xmit_skb(), and we spend too much
time spinning on this lock.
We can detect in __netif_schedule() if a cpu is already at a specific
point in __dev_xmit_skb() where we have the guarantee the queue will
be run.
This patch gives a 13 % increase of throughput on an IDPF NIC (200Gbit),
32 TX qeues, sending UDP packets of 120 bytes.
This also helps __qdisc_run() to not force a NET_TX_SOFTIRQ
if another thread is waiting in __dev_xmit_skb()
Before:
sar -n DEV 5 5|grep eth1|grep Average
Average: eth1 1496.44
52191462.56 210.00
13369396.90 0.00 0.00 0.00 54.76
After:
sar -n DEV 5 5|grep eth1|grep Average
Average: eth1 1457.88
59363099.96 205.08
15206384.35 0.00 0.00 0.00 62.29
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20251017145334.3016097-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>