AArch64: Implement widen_[us]sum using [US]ADDW[TB] for SVE2 [PR122069]
SVE2 adds [US]ADDW[TB] which we can use when we have to do a single step
widening addition. This is useful for instance when the value to be widened
does not come from a load. For example for
int foo2_int(unsigned short *x, unsigned short * restrict y) {
int sum = 0;
for (int i = 0; i < 8000; i++)
{
x[i] = x[i] + y[i];
sum += x[i];
}
return sum;
}