If we don't implement this pattern, the vectorizer is happy to
unpack the v4si and use the full mulv2di3. This results in
more element shuffling than is required.
* config/i386/i386.c (bdesc_args): Update. Change
IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI to OPTION_MASK_ISA_SSE2.
(IX86_BUILTIN_VEC_WIDEN_SMUL_EVEN_V4SI): New.
(ix86_builtin_mul_widen_even): Use it.
(ix86_builtin_mul_widen_odd): Relax SMUL_ODD from sse4 to sse2.
(ix86_expand_mul_widen_evenodd): Handle signed for sse2.
* config/i386/sse.md (vec_widen_<s>mult_hi_<V124_AVX2>): Allow
for all SSE2.
(vec_widen_<s>mult_lo_<V124_AVX2>): Likewise.
(vec_widen_<s>mult_odd_<VI4_AVX2>): Likewise. Relax from V124_AVX2.
(vec_widen_smult_even_v4si): New.