Benchmarks on Hygon processors show that the default non-temporal
threshold is higher than ideal for large copy workloads. As a result,
memcpy and memmove may continue to use the temporal copy path for
longer than is beneficial, increasing cache pollution and reducing
throughput for large copies.
Lower the copy non-temporal threshold to 3/8 of the shared cache size
per thread on Hygon. This allows the non-temporal copy path to be
selected earlier while leaving the memset non-temporal threshold
unchanged.
Signed-off-by: xiejiamei <xiejiamei@hygon.cn>
if (!CPU_FEATURES_ARCH_P (cpu_features, Avoid_Non_Temporal_Memset))
memset_non_temporal_threshold = non_temporal_threshold;
+ /* Hygon benefits from entering the non-temporal copy path earlier.
+ Use 3/8 of the shared cache size per thread to reduce cache
+ pollution and improve throughput for large copies. Keep the memset
+ non-temporal threshold unchanged. */
+ if (cpu_features->basic.kind == arch_kind_hygon)
+ non_temporal_threshold = shared_per_thread * 3 / 8;
+
tunable_size = TUNABLE_GET (x86_non_temporal_threshold, long int, NULL);
if (tunable_size > minimum_non_temporal_threshold
&& tunable_size <= maximum_non_temporal_threshold)