Vector registers perform much better for moves compared to pairs of
registers on falkor, so use them instead. This results in a time
reduction of up to 50% (i.e. 2x improvement) for a lot of the smaller
sizes, i.e. up to 1K in memmove-walk. Improvements for larger sizes are
smaller, at about 1%-2%.
* sysdeps/aarch64/multiarch/memmove_falkor.S
(__memcpy_falkor): Use vector registers.