This improves codegen for x264 sum of absolute difference routines.
The insn count is same, but we avoid double widening ops and ensuing
whole register moves.
Also for more general applicability, we chose to implement abs diff
vs. the sum of abs diff variant.
Suggested-by: Robin Dapp <rdapp@ventanamicro.com> Co-authored-by: Pan Li <pan2.li@intel.com> Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
PR target/117722