Like the logical operations, expand all shifts early rather than only
sometimes. The Neon shift expansions are never emitted (not even with
-fneon-for-64bits), so they are not useful. So all the late expansions
and Neon shift patterns can be removed, and shifts are more optimized
as a result. Since some extend patterns use Neon DImode shifts, remove
the Neon extend variants and related splits.
A simple example now generates the same efficient code after this
patch with -mfpu=neon and -mfpu=vfp (previously just the fact of
having Neon enabled resulted inefficient code for no reason).
unsigned long long f(unsigned long long x, unsigned long long y)
{ return x & (y >> 33); }