]> git.ipfire.org Git - thirdparty/gcc.git/commit
Generate vmovsh instead of vpblendw for specific vec_merge.
authorliuhongt <hongtao.liu@intel.com>
Mon, 4 Sep 2023 05:16:11 +0000 (13:16 +0800)
committerliuhongt <hongtao.liu@intel.com>
Tue, 5 Sep 2023 03:11:14 +0000 (11:11 +0800)
commit33066c903a614f948a2657c7aa3090067f5984a5
treea0b82086028251021e2cb063694947d5e0d289b1
parent6f94ef6c86074a8348ec21d8aade04ce67b4e292
Generate vmovsh instead of vpblendw for specific vec_merge.

On SPR, vmovsh can be execute on 3 ports, vpblendw can only be
executed on 2 ports.
On znver4, vpblendw can be executed on 4 ports, if vmovsh is similar
as vmovss, then it can also be executed on 4 ports.
So there's no difference for znver? but vmovsh is more optimized on
SPR.

gcc/ChangeLog:

* config/i386/sse.md: (V8BFH_128): Renamed to ..
(VHFBF_128): .. this.
(V16BFH_256): Renamed to ..
(VHFBF_256): .. this.
(avx512f_mov<mode>): Extend to V_128.
(vcvtnee<bf16_ph>2ps_<mode>): Changed to VHFBF_128.
(vcvtneo<bf16_ph>2ps_<mode>): Ditto.
(vcvtnee<bf16_ph>2ps_<mode>): Changed to VHFBF_256.
(vcvtneo<bf16_ph>2ps_<mode>): Ditto.
* config/i386/i386-expand.cc (expand_vec_perm_blend):
Canonicalize vec_merge.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vmovsh-1a.c: Remove xfail.
gcc/config/i386/i386-expand.cc
gcc/config/i386/sse.md
gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c