- Performance testing revealed significant memcpy performance degradation
when bit_arch_AVX_Fast_Unaligned_Load is enabled on Hygon 3.
- Hygon confirmed AVX performance issues in certain memory functions.
- Glibc benchmarks show SSE outperforms AVX for
memcpy/memmove/memset/strcmp/strcpy/strlen and so on.
- Hardware differences primarily in floating-point operations don't justify
AVX usage for memory operations.
Reviewed-by: gaoxiang <gaoxiang@kylinos.cn>
Signed-off-by: litenglong <litenglong@kylinos.cn>
Reviewed-by: H.J. Lu <hjl.tools@gmail.com>
hardware. */
cpu_features->preferred[index_arch_Avoid_Non_Temporal_Memset]
&= ~bit_arch_Avoid_Non_Temporal_Memset;
+ if (model < 0x4)
+ {
+ /* Unaligned AVX loads are slower. */
+ cpu_features->preferred[index_arch_AVX_Fast_Unaligned_Load]
+ &= ~bit_arch_AVX_Fast_Unaligned_Load;
+ }
}
else
{