lib/group_cpus: make group CPU cluster aware
As CPU core counts increase, the number of NVMe IRQs may be smaller than
the total number of CPUs. This forces multiple CPUs to share the same
IRQ. If the IRQ affinity and the CPU's cluster do not align, a
performance penalty can be observed on some platforms.
This patch improves IRQ affinity by grouping CPUs by cluster within each
NUMA domain, ensuring better locality between CPUs and their assigned NVMe
IRQs.
Details:
Intel Xeon E platform packs 4 CPU cores as 1 module (cluster) and share
the L2 cache. Let's say, if there are 40 CPUs in 1 NUMA domain and 11
IRQs to dispatch. The existing algorithm will map first 7 IRQs each with
4 CPUs and remained 4 IRQs each with 3 CPUs. The last 4 IRQs may have
cross cluster issue. For example, the 9th IRQ which pinned to CPU32, then
for CPU31, it will have cross L2 memory access.
CPU |28 29 30 31|32 33 34 35|36 ...
-------- -------- --------
IRQ 8 9 10
If this patch applied, then first 2 IRQs each mapped with 2 CPUs and rest
9 IRQs each mapped with 4 CPUs, which avoids the cross cluster memory
access.
CPU |00 01 02 03|04 05 06 07|08 09 10 11| ...
----- ----- ----------- -----------
IRQ 1 2 3 4
As a result, 15%+ performance difference is observed in FIO
libaio/randread/bs=8k.
Changes since V1:
- Add more performance details in commit messages.
- Fix endless loop when topology_cluster_cpumask return invalid mask.
History:
v1: https://lore.kernel.org/all/
20251024023038.872616-1-wangyang.guo@intel.com/
v1 [RESEND]: https://lore.kernel.org/all/
20251111020608.
1501543-1-wangyang.guo@intel.com/
Link: https://lkml.kernel.org/r/20260113022958.3379650-1-wangyang.guo@intel.com
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
Reviewed-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Tested-by: Dan Liang <dan.liang@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@fb.com>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Radu Rendec <rrendec@redhat.com>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>