Dave Martin reported inconsistent CMT test failures. In one experiment
the first run of the CMT test failed because of too large (24%) difference
between measured and achievable cache occupancy while the second run passed
with an acceptable 4% difference.
The CMT test is susceptible to interference from the rest of the system.
This can be demonstrated with a utility like stress-ng by running the CMT
test while introducing cache misses using:
stress-ng --matrix-3d 0 --matrix-3d-zyx
Below shows an example of the CMT test failing because of a significant
difference between measured and achievable cache occupancy when run with
interference:
# Starting CMT test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Cache size :
335544320
# Writing benchmark parameters to resctrl FS
# Benchmark PID: 7011
# Checking for pass/fail
# Fail: Check cache miss rate within 15%
# Percent diff=99
# Number of bits: 5
# Average LLC val: 235929
# Cache span (bytes):
83886080
not ok 1 CMT: test
The CMT test creates a new control group that is also capable of monitoring
and assigns the workload to it. The workload allocates a buffer that by
default fills a portion of the L3 and keeps reading from the buffer,
measuring the L3 occupancy at intervals. The test passes if the workload's
L3 occupancy is within 15% of the buffer size.
By not adjusting any capacity bitmasks the workload shares the cache with
the rest of the system. Any other task that may be running could evict
the workload's data from the cache causing it to have low cache occupancy.
Reduce interference from the rest of the system by ensuring that the
workload's control group uses the capacity bitmask found in the user
parameters for L3 and that the rest of the system can only allocate into
the inverse of the workload's L3 cache portion. Other tasks can thus no
longer evict the workload's data from L3.
With the above adjustments the CMT test is more consistent. Repeating the
CMT test while generating interference with stress-ng on a sample
system after applying the fixes show significant improvement in test
accuracy:
# Starting CMT test ...
# Mounting resctrl to "/sys/fs/resctrl"
# Cache size :
335544320
# Writing benchmark parameters to resctrl FS
# Write schema "L3:0=fffe0" to resctrl FS
# Write schema "L3:0=1f" to resctrl FS
# Benchmark PID: 7089
# Checking for pass/fail
# Pass: Check cache miss rate within 15%
# Percent diff=12
# Number of bits: 5
# Average LLC val:
73269248
# Cache span (bytes):
83886080
ok 1 CMT: test
Link: https://lore.kernel.org/r/b160592179f88069cdc679563e152007998a0d76.1775266384.git.reinette.chatre@intel.com
Reported-by: Dave Martin <Dave.Martin@arm.com>
Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
Tested-by: Chen Yu <yu.c.chen@intel.com>
Reviewed-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/lkml/aO+7MeSMV29VdbQs@e133380.arm.com/
Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
#define CON_MON_LCC_OCCUP_PATH \
"%s/%s/mon_data/mon_L3_%02d/llc_occupancy"
-static int cmt_init(const struct resctrl_val_param *param, int domain_id)
+/*
+ * Initialize capacity bitmasks (CBMs) of:
+ * - control group being tested per test parameters,
+ * - default resource group as inverse of control group being tested to prevent
+ * other tasks from interfering with test.
+ */
+static int cmt_init(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ const struct resctrl_val_param *param, int domain_id)
{
+ unsigned long full_mask;
+ char schemata[64];
+ int ret;
+
sprintf(llc_occup_path, CON_MON_LCC_OCCUP_PATH, RESCTRL_PATH,
param->ctrlgrp, domain_id);
- return 0;
+ ret = get_full_cbm(test->resource, &full_mask);
+ if (ret)
+ return ret;
+
+ snprintf(schemata, sizeof(schemata), "%lx", ~param->mask & full_mask);
+ ret = write_schemata("", schemata, uparams->cpu, test->resource);
+ if (ret)
+ return ret;
+
+ snprintf(schemata, sizeof(schemata), "%lx", param->mask);
+ return write_schemata(param->ctrlgrp, schemata, uparams->cpu, test->resource);
}
static int cmt_setup(const struct resctrl_test *test,
#define ALLOCATION_MIN 10
#define ALLOCATION_STEP 10
-static int mba_init(const struct resctrl_val_param *param, int domain_id)
+static int mba_init(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ const struct resctrl_val_param *param, int domain_id)
{
int ret;
return ret;
}
-static int mbm_init(const struct resctrl_val_param *param, int domain_id)
+static int mbm_init(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ const struct resctrl_val_param *param, int domain_id)
{
int ret;
char filename[64];
unsigned long mask;
int num_of_runs;
- int (*init)(const struct resctrl_val_param *param,
+ int (*init)(const struct resctrl_test *test,
+ const struct user_params *uparams,
+ const struct resctrl_val_param *param,
int domain_id);
int (*setup)(const struct resctrl_test *test,
const struct user_params *uparams,
goto reset_affinity;
if (param->init) {
- ret = param->init(param, domain_id);
+ ret = param->init(test, uparams, param, domain_id);
if (ret)
goto reset_affinity;
}