1. Skip aligning memcpy when dist >= len.
Obviously aligning memcpy is redundant when dist >= len which
contains extra very slow load&store instrutions. And I noticed
that dist is way larger than len in most cases by adding printf in
chunkcopy_rvv with apt install (very narrow situation but makes
sense). So I tend to move the comparing before aligning memcpy
since it is only needed by the overlap situation.
2. Make the largest copy while len > dist.
Chunkcopy_rvv only copies as much memory as possible once after
aligning memcpy then uses sizeof(chunk_t) to finish the rest
copying. However, we should do the largest copy as long as
len < dist.