From: Willy Tarreau <w@1wt.eu>
Date: Tue, 13 May 2025 09:40:44 +0000 (+0200)
Subject: BUG/MINOR: cpu-topo: fix group-by-cluster policy for disordered clusters
X-Git-Tag: v3.2-dev16~29
X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=33d8b006d4bc9d233ed71d5b26fd3c8b499bba82;p=thirdparty%2Fhaproxy.git

BUG/MINOR: cpu-topo: fix group-by-cluster policy for disordered clusters

Some (rare) boards have their clusters in an erratic order. This is
the case for the Radxa Orion O6 where one of the big cores appears as
CPU0 due to booting from it, then followed by the small cores, then the
medium cores, then the remaining big cores. This results in clusters
appearing this order: 0,2,1,0.

The core in cpu_policy_group_by_cluster() expected ordered clusters,
and performs ordered comparisons to decide whether a CPU's cluster has
already been taken care of. On the board above this doesn't work, only
clusters 0 and 2 appear and 1 is skipped.

Let's replace the cluster number comparison with a cpuset to record
which clusters have been taken care of. Now the groups properly appear
like this:

  Tgrp/Thr  Tid        CPU set
  1/1-2     1-2        2: 0,11
  2/1-4     3-6        4: 1-4
  3/1-6     7-12       6: 5-10

No backport is needed, this is purely 3.2.
---

diff --git a/src/cpu_topo.c b/src/cpu_topo.c
index 2e64f21ea..42e44ecb2 100644
--- a/src/cpu_topo.c
+++ b/src/cpu_topo.c
@@ -1074,10 +1074,11 @@ static int cpu_policy_first_usable_node(int policy, int tmin, int tmax, int gmin
  */
 static int cpu_policy_group_by_cluster(int policy, int tmin, int tmax, int gmin, int gmax, char **err)
 {
+	struct hap_cpuset visited_cl_set;
 	struct hap_cpuset node_cpu_set;
 	int cpu, cpu_start;
 	int cpu_count;
-	int cid, lcid;
+	int cid;
 	int thr_per_grp, nb_grp;
 	int thr;
 	int div;
@@ -1088,8 +1089,9 @@ static int cpu_policy_group_by_cluster(int policy, int tmin, int tmax, int gmin,
 	if (global.nbtgroups)
 		return 0;
 
+	ha_cpuset_zero(&visited_cl_set);
+
 	/* iterate over each new cluster */
-	lcid = -1;
 	cpu_start = 0;
 
 	/* used as a divisor of clusters*/
@@ -1104,7 +1106,8 @@ static int cpu_policy_group_by_cluster(int policy, int tmin, int tmax, int gmin,
 			/* skip disabled and already visited CPUs */
 			if (ha_cpu_topo[cpu].st & HA_CPU_F_EXCL_MASK)
 				continue;
-			if ((ha_cpu_topo[cpu].cl_gid / div) <= lcid)
+
+			if (ha_cpuset_isset(&visited_cl_set, ha_cpu_topo[cpu].cl_gid / div))
 				continue;
 
 			if (cid < 0) {
@@ -1118,6 +1121,7 @@ static int cpu_policy_group_by_cluster(int policy, int tmin, int tmax, int gmin,
 			ha_cpuset_set(&node_cpu_set, ha_cpu_topo[cpu].idx);
 			cpu_count++;
 		}
+
 		/* now cid = next cluster_id or -1 if none; cpu_count is the
 		 * number of CPUs in this cluster, and cpu_start is the next
 		 * cpu to restart from to scan for new clusters.
@@ -1125,6 +1129,8 @@ static int cpu_policy_group_by_cluster(int policy, int tmin, int tmax, int gmin,
 		if (cid < 0 || !cpu_count)
 			break;
 
+		ha_cpuset_set(&visited_cl_set, cid);
+
 		/* check that we're still within limits. If there are too many
 		 * CPUs but enough groups left, we'll try to make more smaller
 		 * groups, of the closest size each.
@@ -1163,8 +1169,6 @@ static int cpu_policy_group_by_cluster(int policy, int tmin, int tmax, int gmin,
 			if (global.nbtgroups >= MAX_TGROUPS || global.nbthread >= MAX_THREADS)
 				break;
 		}
-
-		lcid = cid; // last cluster_id
 	}
 
 	if (global.nbthread)