Benchmarks on Hygon processors show that the default non-temporal
threshold is higher than ideal for large copy workloads. As a result,
memcpy and memmove may continue to use the temporal copy path for
longer than is beneficial, increasing cache pollution and reducing
throughput for large copies.
Lower the copy non-temporal threshold to 3/8 of the shared cache size
per thread on Hygon. This allows the non-temporal copy path to be
selected earlier while leaving the memset non-temporal threshold
unchanged.
Signed-off-by: xiejiamei <xiejiamei@hygon.cn>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
if (tunable_size > minimum_non_temporal_threshold
&& tunable_size <= maximum_non_temporal_threshold)
non_temporal_threshold = tunable_size;
+ else if (cpu_features->basic.kind == arch_kind_hygon)
+ {
+ /* Hygon benefits from entering the non-temporal copy path earlier.
+ Use 3/8 of the shared cache size per thread for memcpy and
+ memmove. The memset threshold has already been initialized
+ above and is intentionally left unchanged. */
+ non_temporal_threshold = shared_per_thread * 3 / 8;
+ }
tunable_size = TUNABLE_GET (x86_memset_non_temporal_threshold, long int, NULL);
if (tunable_size > minimum_non_temporal_threshold