]> git.ipfire.org Git - thirdparty/gcc.git/commit
PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as...
authorKyrylo Tkachov <ktkachov@nvidia.com>
Thu, 27 Feb 2025 17:00:25 +0000 (09:00 -0800)
committerKyrylo Tkachov <ktkachov@nvidia.com>
Wed, 5 Mar 2025 15:21:36 +0000 (16:21 +0100)
commitdb76482175c4e76db273d7fb3a00ae0f932529a6
treea2da58e5af8ca44309e36a3b2b980044d9a122f9
parent54da358ff51ded726fe7c026fa59c8db0a1b72ed
PR rtl-optimization/119046: Don't mark PARALLEL RTXes with floating-point mode as trapping

In this testcase late-combine was failing to merge:
        dup     v31.4s, v31.s[3]
        fmla    v30.4s, v31.4s, v29.4s
into the lane-wise fmla form.
This is because late-combine checks may_trap_p under the hood on the dup insn.
This ended up returning true for the insn:
(set (reg:V4SF 152 [ _32 ])
        (vec_duplicate:V4SF (vec_select:SF (reg:V4SF 111 [ rhs_panel.8_31 ])
                (parallel:V4SF [
                        (const_int 3 [0x3])]))))

Although mem_trap_p correctly reasoned that vec_duplicate and vec_select of
floating-point modes can't trap, it assumed that the V4SF parallel can trap.
The correct behaviour is to recurse into vector inside the PARALLEL and check
the sub-expression.  This patch adjusts may_trap_p_1 to do just that.
With this check the above insn is not deemed to be trapping and is propagated
into the FMLA giving:
        fmla    vD.4s, vA.4s, vB.s[3]

Bootstrapped and tested on aarch64-none-linux-gnu.
Apparently this also fixes a regression in
gcc.target/aarch64/vmul_element_cost.c that I observed.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

PR rtl-optimization/119046
* rtlanal.cc (may_trap_p_1): Don't mark FP-mode PARALLELs as trapping.

gcc/testsuite/

PR rtl-optimization/119046
* gcc.target/aarch64/pr119046.c: New test.
gcc/rtlanal.cc
gcc/testsuite/gcc.target/aarch64/pr119046.c [new file with mode: 0644]