AArch64: Add implementation for pow2 bitmask division.
This adds an implementation for the new optab for unsigned pow2 bitmask for
AArch64.
The implementation rewrites:
x = y / (2 ^ (sizeof (y)/2)-1
into e.g. (for bytes)
(x + ((x + 257) >> 8)) >> 8
where it's required that the additions be done in double the precision of x
such that we don't lose any bits during an overflow.
Essentially the sequence decomposes the division into doing two smaller
divisions, one for the top and bottom parts of the number and adding the results
back together.
To account for the fact that shift by 8 would be division by 256 we add 1 to
both parts of x such that when 255 we still get 1 as the answer.
Because the amount we shift are half the original datatype we can use the
halfing instructions the ISA provides to do the operation instead of using
actual shifts.
For AArch64 this means we generate for:
void draw_bitmap1(uint8_t* restrict pixel, uint8_t level, int n)
{
for (int i = 0; i < (n & -16); i+=1)
pixel[i] = (pixel[i] * level) / 0xff;
}