From: Willy Tarreau <w@1wt.eu>
Date: Fri, 14 Mar 2025 16:58:27 +0000 (+0100)
Subject: MINOR: cpu-topo: add a new "efficiency" cpu-policy
X-Git-Tag: v3.2-dev8~38
X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=ad3650c3542078550dd6950bead0cc986875b2a1;p=thirdparty%2Fhaproxy.git

MINOR: cpu-topo: add a new "efficiency" cpu-policy

This cpu policy tries to evict performant core clusters and only
focuses on efficiency-oriented ones. On an intel i9-14900k, we can
get 525k rps using 8 performance cores, versus 405k when using all
24 efficiency cores. In some cases the power savings might be more
desirable (e.g. scalability tests on a developer's laptop), or the
performance cores might be better suited for another component
(application or security component).
---

diff --git a/doc/configuration.txt b/doc/configuration.txt
index 9c7ed987a..c51101389 100644
--- a/doc/configuration.txt
+++ b/doc/configuration.txt
@@ -1972,6 +1972,18 @@ cpu-policy <policy>
                         systems, per thread-group. The number of thread-groups,
                         if not set, will be set to 1.
 
+   - efficiency         exactly like group-by-cluster below, except that CPU
+                        clusters whose performance is more than twice that of
+                        the next less performant one are evicted. These are
+                        typically "big" or "performance" cores. This means that
+                        if more than one type of CPU cores are detected, only
+                        the efficient one will be used. This can make sense for
+                        use with moderate loads when the most powerful cores
+                        need to be available to the application or a security
+                        component. Some modern CPUs have a large number of such
+                        efficient CPU cores which can collectively deliver a
+                        decent level of performance while using less power.
+
    - first-usable-node  if the CPUs were not previously restricted at boot (for
                         example using the "taskset" utility), and if the
                         "nbthread" directive was not set, then the first NUMA
diff --git a/src/cpu_topo.c b/src/cpu_topo.c
index 154763164..7611e0ccc 100644
--- a/src/cpu_topo.c
+++ b/src/cpu_topo.c
@@ -54,12 +54,14 @@ static int cpu_policy = 1; // "first-usable-node"
 static int cpu_policy_first_usable_node(int policy, int tmin, int tmax, int gmin, int gmax, char **err);
 static int cpu_policy_group_by_cluster(int policy, int tmin, int tmax, int gmin, int gmax, char **err);
 static int cpu_policy_performance(int policy, int tmin, int tmax, int gmin, int gmax, char **err);
+static int cpu_policy_efficiency(int policy, int tmin, int tmax, int gmin, int gmax, char **err);
 
 static struct ha_cpu_policy ha_cpu_policy[] = {
 	{ .name = "none",               .desc = "use all available CPUs",                           .fct = NULL   },
 	{ .name = "first-usable-node",  .desc = "use only first usable node if nbthreads not set",  .fct = cpu_policy_first_usable_node  },
 	{ .name = "group-by-cluster",   .desc = "make one thread group per core cluster",           .fct = cpu_policy_group_by_cluster   },
 	{ .name = "performance",        .desc = "make one thread group per perf. core cluster",     .fct = cpu_policy_performance        },
+	{ .name = "efficiency",         .desc = "make one thread group per eff. core cluster",      .fct = cpu_policy_efficiency         },
 	{ 0 } /* end */
 };
 
@@ -1135,6 +1137,45 @@ static int cpu_policy_performance(int policy, int tmin, int tmax, int gmin, int
 	return cpu_policy_group_by_cluster(policy, tmin, tmax, gmin, gmax, err);
 }
 
+/* the "efficiency" cpu-policy:
+ *  - does nothing if nbthread or thread-groups are set
+ *  - eliminates clusters whose total capacity is above half of others
+ *  - tries to create one thread-group per cluster, with as many
+ *    threads as CPUs in the cluster, and bind all the threads of
+ *    this group to all the CPUs of the cluster.
+ */
+static int cpu_policy_efficiency(int policy, int tmin, int tmax, int gmin, int gmax, char **err)
+{
+	int cpu, cluster;
+	int capa;
+
+	if (global.nbthread || global.nbtgroups)
+		return 0;
+
+	/* sort clusters by reverse capacity */
+	cpu_cluster_reorder_by_capa(ha_cpu_clusters, cpu_topo_maxcpus);
+
+	capa = 0;
+	for (cluster = cpu_topo_maxcpus - 1; cluster >= 0; cluster--) {
+		if (capa && ha_cpu_clusters[cluster].capa > capa * 2) {
+			/* This cluster is more than twice as fast as the
+			 * previous one, we're not interested in using it.
+			 */
+			for (cpu = 0; cpu <= cpu_topo_lastcpu; cpu++) {
+				if (ha_cpu_topo[cpu].cl_gid == ha_cpu_clusters[cluster].idx)
+					ha_cpu_topo[cpu].st |= HA_CPU_F_IGNORED;
+			}
+		}
+		else
+			capa = ha_cpu_clusters[cluster].capa;
+	}
+
+	cpu_cluster_reorder_by_index(ha_cpu_clusters, cpu_topo_maxcpus);
+
+	/* and finish using the group-by-cluster strategy */
+	return cpu_policy_group_by_cluster(policy, tmin, tmax, gmin, gmax, err);
+}
+
 /* apply the chosen CPU policy if no cpu-map was forced. Returns < 0 on failure
  * with a message in *err that must be freed by the caller if non-null.
  */