git.ipfire.org Git - thirdparty/gcc.git/commit

stage 2...

stage 2: implementation of k-arity promotion/reduction in the series "Improving
effectiveness and generality of autovectorization using unified representation".

The permute nodes within primitive reorder tree(PRT) generated from input
program can have any arity depending upon stride of accesses. However, the
target cannot have instructions to support all arities. Hence, we need to
promote or reduce the arity of PRT to enable successful tree tiling.

In classic autovectorization, if vectorization stride > 2, arity reduction is
performed by generating cascaded extract and interleave instructions as
described by "Auto-vectorization of Interleaved Data for SIMD" by D. Nuzman,
I. Rosen and A. Zaks.

Moreover, to enable SLP across loop, "Loop-aware SLP in GCC" by D. Nuzman,
I. Rosen and A. Zaks unrolls loop till stride = vector size.

k-arity reduction/promotion algorithm makes use of modulo arithmetic to generate
PRT of desired arity for both above-mentioned cases.

Single ILV node of arity k can be reduced into cascaded ILV nodes with single
node of arity m with children of arity k/m such that ith child of original ILV
node becomes floor (i/m) th child of (i%m) th child of new parent.

Single EXTR node with k parts and i selector can be reduced into cascaded EXTR
nodes such that parent EXTR node has m parts and i/(k/m) selection on child EXTR
node with k/m parts and i % (k/m) selection.

Similarly, loop unrolling to get desired arity m can be represented as arity
promotion from k to m.

Single ILV node of arity k can be promoted to single ILV node of arity m by
adding extraction with m/k parts and selection i/k of i%k the child of original
tree as ith child of new ILV node.

To enable loop-aware SLP, we first promote arity of input PRT to maximum vector
size permissible on the architecture. This can have impact on vector code size,
though performance will be the same. To eliminate redundant ILV and EXTR
operations, thereby undoing unneccessary unrolling, we can perform unity
reduction optimization:
- EXTR_m,x (ILV_M(S1, S2, ... Sm)) => Sx
- ILV_m (EXTR_0(S), EXTR_1(S),...EXTR_m-1(S)) => S

Later we apply arity promotion reduction algorithm on the output tree to get tree
with desired arity. For now, we are supporting target arity = 2, as most of the
architectures have support for that. However, the code can be extended for
additional arity supports as well.

We have also implemented unity reduction optimization which eliminates redundant
ILV and EXTR nodes thereby undoing unneccessary unrolling - which can bloat up
the code size otherwise.

From-SVN: r246610

author	Sameera Deshpande <sameerad@gcc.gnu.org>
	Fri, 31 Mar 2017 09:09:57 +0000 (14:39 +0530)
committer	Sameera Deshpande <sameerad@gcc.gnu.org>
	Fri, 31 Mar 2017 09:09:57 +0000 (14:39 +0530)
commit	066fa0ff72450cd7b6489af28893ae7eb35a59ce
tree	571a410686dd9d3018b1bff95ed142beccbb7466	tree
parent	08a23481b346fa7ad671a6b5ec8351ac04117fcb	commit \| diff

gcc/Makefile.in		diff \| blob \| blame \| history
gcc/config/mips/mips.h		diff \| blob \| blame \| history
gcc/tree-vect-unified-common.c	[new file with mode: 0644]	blob
gcc/tree-vect-unified-opts.c	[new file with mode: 0644]	blob
gcc/tree-vect-unified.c		diff \| blob \| blame \| history
gcc/tree-vect-unified.h		diff \| blob \| blame \| history
gcc/tree-vectorizer.h		diff \| blob \| blame \| history
gcc/tree.c		diff \| blob \| blame \| history
gcc/tree.h		diff \| blob \| blame \| history