i386: Use 2x-wider modes when emulating QImode vector instructions
Rewrite ix86_expand_vecop_qihi2 to expand fo 2x-wider (e.g. V16QI -> V16HImode)
instructions when available. Currently, the compiler generates following
assembly for V16QImode multiplication (-mavx2):
Patched compiler generates more optimized code involving multiplication
in 2x-wider mode in cases where missing truncate instruction has to be
emulated with a permutation (-mavx2):
The patch also adjusts cost calculation of V*QImode emulations to account
for generation of 2x-wider mode instructions.
gcc/ChangeLog:
* config/i386/i386-expand.cc (ix86_expand_vecop_qihi2):
Rewrite to expand to 2x-wider (e.g. V16QI -> V16HImode)
instructions when available. Emulate truncation via
ix86_expand_vec_perm_const_1 when native truncate insn
is not available.
(ix86_expand_vecop_qihi_partial) <case MULT>: Use pmovzx
when available. Trivially rename some variables.
(ix86_expand_vecop_qihi): Unconditionally call ix86_expand_vecop_qihi2.
* config/i386/i386.cc (ix86_multiplication_cost): Rewrite cost
calculation of V*QImode emulations to account for generation of
2x-wider mode instructions.
(ix86_shift_rotate_cost): Update cost calculation of V*QImode
emulations to account for generation of 2x-wider mode instructions.