]> git.ipfire.org Git - thirdparty/gcc.git/commit
openacc: Move pass_oacc_device_lower after pass_graphite
authorFrederik Harwath <frederik@codesourcery.com>
Tue, 16 Nov 2021 15:07:34 +0000 (16:07 +0100)
committerKwok Cheung Yeung <kcy@codesourcery.com>
Tue, 21 Jun 2022 13:11:43 +0000 (14:11 +0100)
commit3e8b51d143e8818228cf813681ed1f5074484cbd
tree589db728dea38e45f483bd93150e1375f75ed199
parent39a8c371fda6136cf77c74895a00b136409e0ba3
openacc: Move pass_oacc_device_lower after pass_graphite

The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions.  To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

* omp-general.cc (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-offload.cc (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.cc (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations.  Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.cc (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.cc (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.cc (pass_phiprop::clone): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/loop-processing-1.c: Adjust expected output
to pass name changes due to the pass reordering and cloning.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* c-c++-common/unroll-1.c: Likewise.
* c-c++-common/unroll-4.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-2.c: Likewise.
* gcc.dg/tree-ssa/backprop-3.c: Likewise.
* gcc.dg/tree-ssa/backprop-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-5.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
* gcc.dg/tree-ssa/ldist-17.c: Likewise.
* gcc.dg/tree-ssa/loop-38.c: Likewise.
* gcc.dg/tree-ssa/pr21463.c: Likewise.
* gcc.dg/tree-ssa/pr45427.c: Likewise.
* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gcc.dg/unroll-3.c: Likewise.
* gcc.dg/unroll-4.c: Likewise.
* gcc.dg/unroll-5.c: Likewise.
* gcc.dg/vect/vect-profile-1.c: Likewise.
* c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
* c-c++-common/goacc/device-lowering-no-loops.c: New test.
* c-c++-common/goacc/device-lowering-no-optimization.c: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
52 files changed:
gcc/ChangeLog.omp
gcc/omp-general.cc
gcc/omp-oacc-neuter-broadcast.cc
gcc/omp-offload.cc
gcc/passes.cc
gcc/passes.def
gcc/testsuite/ChangeLog.omp
gcc/testsuite/c-c++-common/goacc/classify-kernels-unparallelized.c
gcc/testsuite/c-c++-common/goacc/classify-kernels.c
gcc/testsuite/c-c++-common/goacc/classify-parallel.c
gcc/testsuite/c-c++-common/goacc/classify-routine.c
gcc/testsuite/c-c++-common/goacc/device-lowering-debug-optimization.c [new file with mode: 0644]
gcc/testsuite/c-c++-common/goacc/device-lowering-no-loops.c [new file with mode: 0644]
gcc/testsuite/c-c++-common/goacc/device-lowering-no-optimization.c [new file with mode: 0644]
gcc/testsuite/c-c++-common/goacc/routine-nohost-1.c
gcc/testsuite/c-c++-common/unroll-1.c
gcc/testsuite/c-c++-common/unroll-4.c
gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
gcc/testsuite/gcc.dg/tree-ssa/backprop-1.c
gcc/testsuite/gcc.dg/tree-ssa/backprop-2.c
gcc/testsuite/gcc.dg/tree-ssa/backprop-3.c
gcc/testsuite/gcc.dg/tree-ssa/backprop-4.c
gcc/testsuite/gcc.dg/tree-ssa/backprop-5.c
gcc/testsuite/gcc.dg/tree-ssa/backprop-6.c
gcc/testsuite/gcc.dg/tree-ssa/cunroll-1.c
gcc/testsuite/gcc.dg/tree-ssa/cunroll-3.c
gcc/testsuite/gcc.dg/tree-ssa/cunroll-9.c
gcc/testsuite/gcc.dg/tree-ssa/ldist-17.c
gcc/testsuite/gcc.dg/tree-ssa/loop-38.c
gcc/testsuite/gcc.dg/tree-ssa/loopclosedphi.c
gcc/testsuite/gcc.dg/tree-ssa/pr21463.c
gcc/testsuite/gcc.dg/tree-ssa/pr45427.c
gcc/testsuite/gcc.dg/tree-ssa/pr61743-1.c
gcc/testsuite/gcc.dg/unroll-2.c
gcc/testsuite/gcc.dg/unroll-3.c
gcc/testsuite/gcc.dg/unroll-4.c
gcc/testsuite/gcc.dg/unroll-5.c
gcc/testsuite/gcc.dg/vect/bb-slp-59.c
gcc/testsuite/gcc.dg/vect/vect-profile-1.c
gcc/tree-pass.h
gcc/tree-ssa-loop-ivcanon.cc
gcc/tree-ssa-loop.cc
gcc/tree-ssa-phiprop.cc
libgomp/ChangeLog.omp
libgomp/testsuite/libgomp.oacc-c-c++-common/pr85486-2.c
libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c
libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c
libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c
libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c
libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c
libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c
libgomp/testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c