Right Shift in rshift64_m128 fallback path (ARM NEON) (#396)
* Added testcase for issue #326, verified to work on Linux
* Fix cppcheck warnings
* Add comment pointing to the github issue
* simd: use unsigned shift intrinsics in ARM NEON rshift fallback paths
The fallback paths (when HAVE__BUILTIN_CONSTANT_P is not defined, i.e.
clang) in rshift_m128 and rshift64_m128 used vshlq_s32/vshlq_s64
(signed shift), which performs arithmetic right shift with sign
extension. This caused incorrect nibble extraction in shufti validation
when input bytes >= 0x80 landed at byte positions 7 or 15 within a
128-bit register.
Change all four shift helpers (lshift_m128, rshift_m128, lshift64_m128,
rshift64_m128) to use unsigned shifts.