Simplify the alignment steps for SZREG and BLOCK_SIZE multiples. The previous
three-instruction sequences
addi a7, a2, -SZREG
andi a7, a7, -SZREG
addi a7, a7, SZREG
and
addi a7, a2, -BLOCK_SIZE
andi a7, a7, -BLOCK_SIZE
addi a7, a7, BLOCK_SIZE
are equivalent to a single
andi a7, a2, -SZREG
andi a7, a2, -BLOCK_SIZE
because SZREG and BLOCK_SIZE are powers of two in this context, making the
surrounding addi steps cancel out. Folding to one instruction reduces code
size with identical semantics.
No functional change.
sysdeps/riscv/multiarch/memcpy_noalignment.S: Remove redundant addi around
alignment; keep a single andi for SZREG/BLOCK_SIZE rounding.
Signed-off-by: Yao Zihong <zihong.plct@isrc.iscas.ac.cn>
Reviewed-by: Peter Bergner <bergner@tenstorrent.com>
add a5, a0, a4
add a1, a1, a4
bleu a2, a3, L(word_copy_adjust)
- addi a7, a2, -BLOCK_SIZE
- andi a7, a7, -BLOCK_SIZE
- addi a7, a7, BLOCK_SIZE
+ andi a7, a2, -BLOCK_SIZE
add a3, a5, a7
mv a4, a1
L(block_copy):
li a5, SZREG-1
/* if LEN < SZREG jump to tail handling. */
bleu a2, a5, L(tail_adjust)
- addi a7, a2, -SZREG
- andi a7, a7, -SZREG
- addi a7, a7, SZREG
+ andi a7, a2, -SZREG
add a6, a3, a7
mv a5, a1
L(word_copy_loop):