rs6000: Fix regression cases caused 16-byte by pieces move
The previous patch enables 16-byte by pieces move. Originally 16-byte
move is implemented via pattern. expand_block_move does an optimization
on P8 LE to leverage V2DI reversed load/store for memory to memory move.
Now 16-byte move is implemented via by pieces move and finally split to
two DI load/store. This patch creates an insn_and_split pattern to
retake the optimization.