SLP load permutation fails if any individual permutation requires more
than two vector inputs. For 128-bit vectors, it's possible to permute
3 contiguous loads of 32-bit and 8-bit elements, but not 16-bit elements
or 64-bit elements. The results are reversed for 256-bit vectors,
and so on for wider vectors.
This patch adds a routine that tests whether a permute will require
three vectors for a given vector count and element size, then adds
vect_perm3_* target selectors for the cases that we currently use.
2017-11-09 Richard Sandiford <richard.sandiford@linaro.org>
Alan Hayward <alan.hayward@arm.com>
David Sherwood <david.sherwood@arm.com>