arm: Auto-vectorization for MVE: add pack/unpack patterns
This patch adds vec_unpack<US>_hi_<mode>, vec_unpack<US>_lo_<mode>,
vec_pack_trunc_<mode> patterns for MVE.
It does so by moving the unpack patterns from neon.md to
vec-common.md, while adding them support for MVE. The pack expander is
derived from the Neon one (which in turn is renamed into
neon_quad_vec_pack_trunc_<mode>).
The patch introduces mve_vec_unpack<US>_lo_<mode> and
mve_vec_unpack<US>_hi_<mode> which are similar to their Neon
counterparts, except for the assembly syntax.
The patch introduces mve_vec_pack_trunc_lo_<mode> to avoid the need for a
zero-initialized temporary, which is needed if the
vec_pack_trunc_<mode> expander calls @mve_vmovn[bt]q_<supf><mode>
instead.
With this patch, we can now vectorize the 16 and 8-bit versions of
vclz and vshl, although the generated code could still be improved.
For test_clz_s16, we now generate
vldrh.16 q3, [r1]
vmovlb.s16 q2, q3
vmovlt.s16 q3, q3
vclz.i32 q2, q2
vclz.i32 q3, q3
vmovnb.i32 q1, q2
vmovnt.i32 q1, q3
vstrh.16 q1, [r0]
which could be improved to
vldrh.16 q3, [r1]
vclz.i16 q1, q3
vstrh.16 q1, [r0]
if we could avoid the need for unpack/pack steps.