Improved V1TI (and V2DI) mode equality/inequality on x86_64.
This patch improves support for vector equality and inequality of
V1TImode vectors, and V2DImode vectors with sse2 but not sse4.
Consider the three functions below:
typedef unsigned int uv4si __attribute__ ((__vector_size__ (16)));
typedef unsigned long long uv2di __attribute__ ((__vector_size__ (16)));
typedef unsigned __int128 uv1ti __attribute__ ((__vector_size__ (16)));
These all perform vector comparisons of 128bit SSE2 registers, generating
the result as a vector, where ~0 (all 1 bits) represents true and a zero
represents false. eq_v4si is trivially implemented by x86_64's pcmpeqd
instruction. This patch improves the other two cases:
where the results of a V4SI comparison are shuffled and bit-wise ANDed
to produce the desired result. There's no change in the code generated
for "-O2 -msse4" where the compiler generates a single "pcmpeqq" insn.
For V1TI mode, the results are equally dramatic, where the current -O2
output looks like:
2022-05-13 Roger Sayle <roger@nextmovesoftware.com>
Uroš Bizjak <ubizjak@gmail.com>
gcc/ChangeLog
* config/i386/sse.md (vec_cmpeqv2div2di): Enable for TARGET_SSE2.
For !TARGET_SSE4_1, expand as a V4SI vector comparison, followed
by a pshufd and pand.
(vec_cmpeqv1tiv1ti): New define_expand implementing V1TImode
vector equality as a V2DImode vector comparison (see above),
followed by a pshufd and pand.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse2-v1ti-veq.c: New test case.
* gcc.target/i386/sse2-v1ti-vne.c: New test case.