]> git.ipfire.org Git - thirdparty/gcc.git/commit
RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations
authorJuzhe-Zhong <juzhe.zhong@rivai.ai>
Thu, 1 Jun 2023 08:32:12 +0000 (16:32 +0800)
committerPan Li <pan2.li@intel.com>
Sat, 3 Jun 2023 01:42:44 +0000 (09:42 +0800)
commit2e3401bd71b59ca0e03f051c5db286c32299b940
tree57c8c8b278cfe613b46c250f6b504bfc69a82d62
parent829d597548549709fcbfdba03ad6374174d11ec6
RISC-V: Add pseudo vwmul.wv pattern to enhance vwmul.vv instruction optimizations

This patch is to enhance vwmul.vv combine optimizations.
Consider this following code:
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
      int16_t *__restrict dst3, int16_t *__restrict dst4,
      int8_t *__restrict a, int8_t *__restrict b,
      int8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
    {
      dst[i] = (int16_t) a[i] * (int16_t) b[i];
      dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
      dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
      dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
    }
}

In such complicate case, the operand is not single used, used by multiple statements.
GCC combine optimization will iterate the combination of the operands.

Also, we add another pattern of vwmulsu.vv to enhance the vwmulsu.vv optimization.
Currently, we have format:

(mult: (sign_extend) (zero_extend)) in vector.md for intrinsics calling.
Now, we add a new vwmulsu.ww with this format:
(mult: (zero_extend) (sign_extend))

To handle this following cases (sign and unsigned widening multiplication mixing codes):
void
vwadd_int16_t_int8_t (int16_t *__restrict dst, int16_t *__restrict dst2,
      int16_t *__restrict dst3, int16_t *__restrict dst4,
      int8_t *__restrict a, uint8_t *__restrict b,
      uint8_t *__restrict a2, int8_t *__restrict b2, int n)
{
  for (int i = 0; i < n; i++)
    {
      dst[i] = (int16_t) a[i] * (int16_t) b[i];
      dst2[i] = (int16_t) a2[i] * (int16_t) b[i];
      dst3[i] = (int16_t) a2[i] * (int16_t) a[i];
      dst4[i] = (int16_t) a[i] * (int16_t) b2[i];
    }
}

Before this patch:

...
        vsext.vf2       v6,v1
        add     t0,a0,t4
        vzext.vf2       v4,v1
        vmul.vv v2,v4,v6
        add     t0,a1,t4
        vzext.vf2       v2,v1
        vmul.vv v4,v2,v4
        add     t0,a2,t4
        vmul.vv v2,v2,v6
        add     t0,a3,t4
        sub     t6,t6,t1
        vsext.vf2       v2,v1
        vmul.vv v2,v2,v6
...

After this patch:
...
        add     t0,a0,t3
        vwmulsu.vv      v2,v1,v3
        add     t0,a1,t3
        vwmulu.vv       v4,v3,v2
        add     t0,a2,t3
        vwmulsu.vv      v3,v1,v2
        add     t0,a3,t3
        sub     t4,t4,t1
        vwmul.vv        v2,v1,v3
...

gcc/ChangeLog:

* config/riscv/vector.md: Add vector-opt.md.
* config/riscv/autovec-opt.md: New file.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/widen/widen-7.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c: New test.
* gcc.target/riscv/rvv/autovec/widen/widen_run-7.c: New test.
gcc/config/riscv/autovec-opt.md [new file with mode: 0644]
gcc/config/riscv/vector.md
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-7.c [new file with mode: 0644]
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-3.c [new file with mode: 0644]
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen-complicate-4.c [new file with mode: 0644]
gcc/testsuite/gcc.target/riscv/rvv/autovec/widen/widen_run-7.c [new file with mode: 0644]