git.ipfire.org Git - thirdparty/gcc.git/commit

author	Richard Biener <rguenther@suse.de>
	Wed, 12 Jul 2023 13:01:47 +0000 (15:01 +0200)
committer	Richard Biener <rguenther@suse.de>
	Tue, 22 Aug 2023 09:32:50 +0000 (11:32 +0200)
commit	27de9aa152141e7f3ee66372647d0f2cd94c4b90
tree	22a249ae4c755579a1866f56f3549bcf06c39879	tree
parent	d3b5a1bccc219680dc19281b6fd6cc798bb679eb	commit \| diff

tree-optimization/94864 - vector insert of vector extract simplification

The PRs ask for optimizing of

  _1 = BIT_FIELD_REF <b_3(D), 64, 64>;
  result_4 = BIT_INSERT_EXPR <a_2(D), _1, 64>;

to a vector permutation.  The following implements this as
match.pd pattern, improving code generation on x86_64.

On the RTL level we face the issue that backend patterns inconsistently
use vec_merge and vec_select of vec_concat to represent permutes.

I think using a (supported) permute is almost always better
than an extract plus insert, maybe excluding the case we extract
element zero and that's aliased to a register that can be used
directly for insertion (not sure how to query that).

The patch FAILs one case in gcc.target/i386/avx512fp16-vmovsh-1a.c
where we now expand from

__A_28 = VEC_PERM_EXPR <x2.8_9, x1.9_10, { 0, 9, 10, 11, 12, 13, 14, 15 }>;

instead of

_28 = BIT_FIELD_REF <x2.8_9, 16, 0>;
__A_29 = BIT_INSERT_EXPR <x1.9_10, _28, 0>;

producing a vpblendw instruction instead of the expected vmovsh.  That's
either a missed vec_perm_const expansion optimization or even better,
an improvement - Zen4 for example has 4 ports to execute vpblendw
but only 3 for executing vmovsh and both instructions have the same size.

The patch XFAILs the sub-testcase.

PR tree-optimization/94864
PR tree-optimization/94865
PR tree-optimization/93080
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..): New pattern
for vector insertion from vector extraction.

* gcc.target/i386/pr94864.c: New testcase.
* gcc.target/i386/pr94865.c: Likewise.
* gcc.target/i386/avx512fp16-vmovsh-1a.c: XFAIL.
* gcc.dg/tree-ssa/forwprop-40.c: Likewise.
* gcc.dg/tree-ssa/forwprop-41.c: Likewise.

gcc/match.pd		diff \| blob \| blame \| history
gcc/testsuite/gcc.dg/tree-ssa/forwprop-40.c	[new file with mode: 0644]	blob
gcc/testsuite/gcc.dg/tree-ssa/forwprop-41.c	[new file with mode: 0644]	blob
gcc/testsuite/gcc.target/i386/avx512fp16-vmovsh-1a.c		diff \| blob \| blame \| history
gcc/testsuite/gcc.target/i386/pr94864.c	[new file with mode: 0644]	blob
gcc/testsuite/gcc.target/i386/pr94865.c	[new file with mode: 0644]	blob