The helper functions for performing the udot/sdot operations against
a scalar were not using an address-swizzling macro when converting
the index of the scalar element into a pointer into the vm array.
This had no effect on little-endian hosts but meant we generated
incorrect results on big-endian hosts.
For these insns, the index is indexing over group of 4 8-bit values,
so 32 bits per indexed entity, and H4() is therefore what we want.
(For Neon the only possible input indexes are 0 and 1.)
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Message-id:
20201028191712.4910-3-peter.maydell@linaro.org
intptr_t index = simd_data(desc);
uint32_t *d = vd;
int8_t *n = vn;
- int8_t *m_indexed = (int8_t *)vm + index * 4;
+ int8_t *m_indexed = (int8_t *)vm + H4(index) * 4;
/* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
* Otherwise opr_sz is a multiple of 16.
intptr_t index = simd_data(desc);
uint32_t *d = vd;
uint8_t *n = vn;
- uint8_t *m_indexed = (uint8_t *)vm + index * 4;
+ uint8_t *m_indexed = (uint8_t *)vm + H4(index) * 4;
/* Notice the special case of opr_sz == 8, from aa64/aa32 advsimd.
* Otherwise opr_sz is a multiple of 16.