Add aarch64 vec_widen_lshift_lo/hi patterns and fix bug it triggers in
mid-end. This pattern takes one vector with N elements of size S, shifts
each element left by the element width and stores the results as N
elements of size 2*s (in 2 result vectors). The aarch64 backend
implements this with the shll,shll2 instruction pair.