From: Jakub Kicinski Date: Sat, 7 Mar 2026 03:15:24 +0000 (-0800) Subject: Merge branch 'net-ntb_netdev-add-multi-queue-support' X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=f67ab9d8106ef8c265310cb4249387c9f7e9ec9b;p=thirdparty%2Flinux.git Merge branch 'net-ntb_netdev-add-multi-queue-support' Koichiro Den says: ==================== net: ntb_netdev: Add Multi-queue support ntb_netdev currently hard-codes a single NTB transport queue pair, which means the datapath effectively runs as a single-queue netdev regardless of available CPUs / parallel flows. The longer-term motivation here is throughput scale-out: allow ntb_netdev to grow beyond the single-QP bottleneck and make it possible to spread TX/RX work across multiple queue pairs as link speeds and core counts keep increasing. Multi-queue also unlocks the standard networking knobs on top of it. In particular, once the device exposes multiple TX queues, qdisc/tc can steer flows/traffic classes into different queues (via skb->queue_mapping), enabling per-flow/per-class scheduling and QoS in a familiar way. Usage ===== 1. Ensure the NTB device you want to use has multiple Memory Windows. 2. modprobe ntb_transport on both sides, if it's not built-in. 3. modprobe ntb_netdev on both sides, if it's not built-in. 4. Use ethtool -L to configure the desired number of queues. The default number of real (combined) queues is 1. e.g. ethtool -L eth0 combined 2 # to increase ethtool -L eth0 combined 1 # to reduce back to 1 Note: * If the NTB device has only a single Memory Window, ethtool -L eth0 combined N (N > 1) fails with: "netlink error: No space left on device". * ethtool -L can be executed while the net_device is up. Compatibility ============= The default remains a single queue, so behavior is unchanged unless the user explicitly increases the number of queues. Kernel base =========== ntb-next (latest as of 2026-03-06): commit 7b3302c687ca ("ntb_hw_amd: Fix incorrect debug message in link disable path") Testing / Results ================= Environment / command line: - 2x R-Car S4 Spider boards "Kernel base" (see above) + this series TCP: [RC] $ sudo iperf3 -s [EP] $ sudo iperf3 -Z -c ${SERVER_IP} -l 65480 -w 512M -P 4 UDP: [RC] $ sudo iperf3 -s [EP] $ sudo iperf3 -ub0 -c ${SERVER_IP} -l 65480 -w 512M -P 4 Without this series: TCP / UDP : 589 Mbps / 580 Mbps With this series (default single queue): TCP / UDP : 583 Mbps / 583 Mbps With this series + `ethtool -L eth0 combined 2`: TCP / UDP : 576 Mbps / 584 Mbps With this series + `ethtool -L eth0 combined 2` + [1], where flows are properly distributed across queues: TCP / UDP : 1.13 Gbps / 1.16 Gbps (re-measured with v3) The 575~590 Mbps variation is run-to-run variance i.e. no measurable regression or improvement is observed with a single queue. The key point is scaling from ~600 Mbps to ~1.20 Gbps once flows are distributed across multiple queues. Note: On R-Car S4 Spider, only BAR2 is usable for ntb_transport MW. For testing, BAR2 was expanded from 1 MiB to 2 MiB and split into two Memory Windows. A follow-up series is planned to add split BAR support for vNTB. On platforms where multiple BARs can be used for the datapath, this series should allow >=2 queues without additional changes. [1] [PATCH v2 00/10] NTB: epf: Enable per-doorbell bit handling while keeping legacy offset https://lore.kernel.org/linux-pci/20260227084955.3184017-1-den@valinux.co.jp/ (subject was accidentally incorrect in the original posting) ==================== Link: https://patch.msgid.link/20260305155639.1885517-1-den@valinux.co.jp Signed-off-by: Jakub Kicinski --- f67ab9d8106ef8c265310cb4249387c9f7e9ec9b