git.ipfire.org Git - thirdparty/gcc.git/log

openacc: Use Graphite for dependence analysis in "kernels" regions

This commit changes the handling of OpenACC "kernels" to use Graphite
for dependence analysis. To this end, it first introduces a new
internal representation for "kernels" regions which should be analyzed
by Graphite in pass_omp_oacc_kernels_decompose.  This is now the
default for all "kernels" regions, but the old handling is still
available through the command line parameter
"--param=openacc_kernels=decompose-parloops".  The handling of this
new region type in the omp lowering and omp offloading passes follows
the existing handling for "parallel" regions.  This replaces the
specialized handling for "kernels" regions that was previously used
and which was in limited in many ways.

Graphite is adjusted to be able to analyze the OpenACC functions that
get outlined from the "kernels" regions. It is enabled to handle the
internal function calls that contain information about OpenACC
constructs. In some places where function calls would be rejected by
Graphite, those calls need to be ignored. In other places, information
about the loop step, bounds etc. needs to be extracted from the
calls. The goal is to enable an analysis of the original loop
parameters although the omp lowering and expansion steps have already
modified the loop structure.  Some parallelization-enabling constructs
such as OpenACC "reduction" and "private"/"firstprivate" clauses must
be recognized and the data-dependences must be adjusted to reflect the
semantics of those constructs.  The data-dependence analysis step in
Graphite has so far been tied to the code generation step.  This
commit introduces a separate data-dependence analysis step that avoids
the code generation.  This is necessary because adjusting the code
generation to create a correct OpenACC loop structure would require
very considerable effort and the goal of this commit is to implement
the dependence analysis only. The ability to use Graphite for
dependence analysis without its code generation might be of
independent interest, but it is so far used for OpenACC purposes
only. In general, all changes to Graphite try to avoid affecting other
uses of Graphite as much as possible.

gcc/ChangeLog:

* Makefile.in: Add graphite-oacc.o
* cfgloop.cc (alloc_loop): Set can_be_parallel_valid_p to false.
* cfgloop.h: Add can_be_parallel_valid_p field.
* cfgloopmanip.cc (copy_loop_info): Add assert.
* config/nvptx/nvptx.cc (nvptx_goacc_reduction_setup):
* doc/invoke.texi: Adjust param openacc-kernels description.
* doc/passes.texi: Adjust pass_ipa_oacc_kernels description.
* flag-types.h (enum openacc_kernels):Add
OPENACC_KERNELS_DECOMPOSE_PARLOOPS.
* gimple-pretty-print.cc (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
* gimple.h (enum gf_mask): Add
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and
widen GF_OMP_TARGET_KIND_MASK.
(is_gimple_omp_oacc): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(is_gimple_omp_offloaded): Likewise.
* gimplify.cc (gimplify_omp_for): Enable reduction localization
for "kernels" regions.
(gimplify_omp_workshare): Likewise.
* graphite-dependences.cc (scop_get_reads_and_writes): Handle
"kills" and "reduction" PDRs.
(apply_schedule_on_deps): Add dump output for intermediate
steps of the dependence computation to enable understanding
of unexpected dependences.
(carries_deps): Likewise.
(scop_get_dependences): Handle "kill" operations and add dump
output.
* graphite-isl-ast-to-gimple.cc (visit_schedule_loop_node): New function.
(graphite_oacc_analyze_scop): New function.
* graphite-optimize-isl.cc (optimize_isl): Remove "static" and
add argument to identify OpenACC use; don't fail on unchanged
schedule in this case.
* graphite-poly.cc (new_poly_dr): Handle "kills".
(print_pdr): Likewise.
(new_gimple_poly_bb): Likewise.
(free_gimple_poly_bb): Likewise.
(new_scop): Handle "reduction", "private", and "firstprivate"
hash sets.
(free_scop): Likewise.
(print_isl_space): New function.
(debug_isl_space): New function.
* graphite-scop-detection.cc (scop_detection::can_represent_loop):
Don't fail if niter is 0 in OpenACC functions.
(scop_detection::add_scop): Don't reject regions with only one
loop in OpenACC functions.
(ignored_oacc_internal_call_p): New function.
(scan_tree_for_params): Handle VIEW_CONVERT_EXPR.
(stmt_has_side_effects): Ignore internal OpenACC function calls.
(add_write): Likewise.
(add_read): Likewise.
(add_kill): New function.
(add_kills): New function.
(add_oacc_kills): New function.
(try_generate_gimple_bb): Kill false dependences for OpenACC
"private"/"firstprivate" vars.
(gather_bbs::gather_bbs): Determin OpenACC
"private"/"firstprivate" vars in region.
(gather_bbs::before_dom_children): Add assert.
(determine_openacc_reductions): New function.
(build_scops): Determine OpenACC "reduction" vars in SCoP.
* graphite-sese-to-poly.cc (oacc_ifn_call_extract): New declaration.
(oacc_internal_call_p): New function.
(build_poly_dr): Ignore internal OpenACC function calls,
handle "reduction" refs.
(build_poly_sr): Likewise; handle "kill" operations.
* graphite.cc (graphite_transform_loops): Accept functions with
only a single loop.
(oacc_enable_graphite_p): New function.
(gate_graphite_transforms): Enable pass on OpenACC functions.
* graphite.h (enum poly_dr_type): Add PDR_KILL.
(struct poly_dr): Add "is_reduction" field.
(new_poly_dr): Add argument to declaration.
(pdr_kill_p): New function.
(print_isl_space): New declaration.
(debug_isl_space): New declaration.
(struct scop): Add fields "reductions_vars",
"oacc_firstprivate_vars", and "oacc_private_scalars".
(optimize_isl): New declaration.
(graphite_oacc_analyze_scop): New declaration.
* internal-fn.cc (expand_UNIQUE): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE
* internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE
* omp-expand.cc (struct omp_region): Adjust comment.
(expand_omp_taskloop_for_inner):
(expand_omp_for): Add asserts about expected "kernels" region types.
(mark_loops_in_oacc_kernels_region): Likewise.
(expand_omp_target): Likewise; handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(build_omp_regions_1): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
Likewise.
(omp_make_gimple_edges): Likewise.
* omp-general.cc (oacc_get_kernels_attrib): New function.
(oacc_get_fn_dim_size): Allow argument to be NULL.
* omp-general.h (oacc_get_kernels_attrib): New declaration.
* omp-low.cc (struct omp_context): Add fields
"oacc_firstprivate_vars" and "oacc_private_scalars".
(was_originally_oacc_kernels): New function.
(is_oacc_kernels):
(is_oacc_kernels_decomposed_graphite_part): New function.
(new_omp_context): Allocate "oacc_first_private_vars" and
"oacc_private_scalars" ...
(delete_omp_context): ... and free from here.
(oacc_record_firstprivate_var_clauses): New function.
(oacc_record_private_scalars): New function.
(scan_sharing_clauses): Call functions to record "private"
scalars and "firstprivate" variables.
(check_oacc_kernel_gwv): Add assert.
(ctx_in_oacc_kernels_region): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(scan_omp_for): Likewise.
(check_omp_nesting_restrictions): Likewise.
(lower_oacc_head_mark): Likewise.
(lower_omp_for): Likewise.
(lower_omp_target): Create "private" and "firstprivate" marker
call statements.
(lower_oacc_head_tail): Adjust "private" and "firstprivate"
marker calls.
(lower_oacc_reductions): Emit "private" and "firstprivate"
marker call statements.
(make_oacc_firstprivate_vars_marker): New function.
(make_oacc_private_scalars_marker): New function.
* omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to
region using the new "kernels" handling.
(make_region_seq): Adjust default region type for new
"kernels" handling; no more exceptions, let Graphite handle everything.
(make_region_loop_nest): Likewise; add dump output and assert.
(adjust_nested_loop_clauses): Stop creating "auto" clauses if
loop has "independent", "gang" etc.
(transform_kernels_loop_clauses): Likewise.
* omp-offload.cc (oacc_extract_loop_call): New function.
(oacc_loop_get_cfg_loop): New function.
(can_be_parallel_str): New function.
(oacc_loop_can_be_parallel_p): New function.
(oacc_parallel_kernels_graphite_fun_p): New function.
(oacc_parallel_fun_p): New function.
(oacc_loop_transform_auto_into_independent): New function, ...
(oacc_loop_fixed_partitions): ... called from here to transfer
the result of Graphite's analysis to the loop.
(execute_oacc_loop_designation): Handle "oacc
functions with "parallel_kernels_graphite" attribute.
(execute_oacc_device_lower): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE.
* omp-offload.h (oacc_extract_loop_call): Add declaration.
* params.opt: Add "param=openacc-kernels" value "decompose-parloops".
* sese.cc (scalar_evolution_in_region): "Redirect" SCEV
analysis to outer loop for IFN_GOACC_LOOP calls.
* sese.h: Add field "kill_scalar_refs".
* tree-chrec.cc (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR
like CASE_CONVERT.
* tree-data-ref.cc (dump_data_reference): Include
DR_BASE_ADDRESS and DR_OFFSET in dump output.
(get_references_in_stmt): Don't reject OpenACC internal function
calls.
(graphite_find_data_references_in_stmt): Remove unused variable.
* tree-parloops.cc (pass_parallelize_loops::execute): Disable
pass with the new kernels handling, enable if requested explicitly.
* tree-scalar-evolution.cc (set_scev_analyze_openacc_calls):
Set flag to enable the analysis of internal OpenACC function
calls (use for Graphite only).
(oacc_call_analyzable_p): New function.
(oacc_ifn_call_extract): New function.
(oacc_simplify): New function.
(add_to_evolution): Simplify OpenACC internal function calls
if applicable.
(follow_ssa_edge_binary): Likewise.
(follow_ssa_edge_expr): Likewise.
(follow_copies_to_constant): Likewise.
(analyze_initial_condition): Likewise.
(interpret_loop_phi): Likewise.
(interpret_gimple_call): New function.
(interpret_rhs_expr): Likewise.
(instantiate_scev_name): Likewise.
(analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions.
(expression_expensive_p): Consider internal OpenACC calls to
be cheap.
* tree-scalar-evolution.h (set_scev_analyze_openacc_calls):
New declaration.
(oacc_call_analyzable_p): New declaration.
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Mark
lhs of internal OpenACC function calls necessary.
* tree-ssa-ifcombine.c (recognize_if_then_else):
* tree-ssa-loop-niter.cc (oacc_call_analyzable_p):
(oacc_ifn_call_extract): New declaration.
(interpret_gimple_call): New delcaration.
(expand_simple_operations): Handle internal OpenACC function calls.
* tree-ssa-loop.cc (gate_oacc_kernels): Disable for new
"kernels" handling.
* graphite-oacc.cc: New file.
* graphite-oacc.h: New file.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-independent.f90: Adjust.
* testsuite/libgomp.oacc-fortran/kernels-loop-1.f90: Adjust.
* testsuite/libgomp.oacc-fortran/pr94358-1.f90: Adjust.

gcc/testsuite/ChangeLog:

* c-c++-common/goacc/classify-kernels.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c: Adjust.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Adjust.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Adjust.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Removed.
* c-c++-common/goacc/kernels-reduction.c: Removed.
* gfortran.dg/goacc/loop-auto-transfer-2.f90: New test.
* gfortran.dg/goacc/loop-auto-transfer-3.f90: New test.
* gfortran.dg/goacc/loop-auto-transfer-4.f90: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>

graphite: Add runtime alias checking

Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict".  This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime.  Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.

TODO This needs more testing on non-OpenACC code.

gcc/ChangeLog:

        * common.opt: Add fgraphite-runtime-alias-checks.
        * graphite-isl-ast-to-gimple.cc
        (generate_alias_cond): New function.
        (graphite_regenerate_ast_isl): Use from here.
        * graphite-poly.cc (new_scop): Create unhandled_alias_ddrs vec ...
(free_scop): and release here.
        * graphite-scop-detection.cc (dr_defs_outside_region): New function.
        (dr_well_analyzed_for_runtime_alias_check_p): New function.
        (graphite_runtime_alias_check_p): New function.
        (build_alias_set): Record unhandled alias ddrs for later alias check
        creation if flag_graphite_runtime_alias_checks is true instead
        of failing.
        * graphite.h (struct scop): Add field unhandled_alias_ddrs.
        * sese.h (has_operands_from_region_p): New function.

gcc/testsuite/ChangeLog:

        * gcc.dg/graphite/alias-1.c: New test.

Move compute_alias_check_pairs to tree-data-ref.c

Move this function from tree-loop-distribution.c to tree-data-ref.c
and make it non-static to enable its use from other parts of GCC.

gcc/ChangeLog:
* tree-loop-distribution.cc (data_ref_segment_size): Remove function.
(latch_dominated_by_data_ref): Likewise.
(compute_alias_check_pairs): Likewise.

* tree-data-ref.cc (data_ref_segment_size): New function,
copied from tree-loop-distribution.c
(compute_alias_check_pairs): Likewise.
(latch_dominated_by_data_ref): Likewise.

* tree-data-ref.h (compute_alias_check_pairs): New declaration.

Fix branch prediction dump message

Instead of, for instance, "Loop got predicted 1 to iterate 10 times"
the message should be "Loop 1 got predicted to iterate 10 times".

gcc/ChangeLog:

* predict.cc (pass_profile::execute): Fix dump message.

openacc: Move pass_oacc_device_lower after pass_graphite

The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions. To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.

gcc/ChangeLog:

* omp-general.cc (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-offload.cc (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.cc (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations. Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.cc (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.cc (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.cc (pass_phiprop::clone): Likewise.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/goacc/loop-processing-1.c: Adjust expected output
to pass name changes due to the pass reordering and cloning.
* c-c++-common/goacc/classify-kernels-unparallelized.c: Likewise.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/routine-nohost-1.c: Likewise.
* c-c++-common/unroll-1.c: Likewise.
* c-c++-common/unroll-4.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-1.c: Likewise.
* gcc.dg/tree-ssa/backprop-2.c: Likewise.
* gcc.dg/tree-ssa/backprop-3.c: Likewise.
* gcc.dg/tree-ssa/backprop-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-5.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
* gcc.dg/tree-ssa/cunroll-1.c: Likewise.
* gcc.dg/tree-ssa/cunroll-3.c: Likewise.
* gcc.dg/tree-ssa/cunroll-9.c: Likewise.
* gcc.dg/tree-ssa/ldist-17.c: Likewise.
* gcc.dg/tree-ssa/loop-38.c: Likewise.
* gcc.dg/tree-ssa/pr21463.c: Likewise.
* gcc.dg/tree-ssa/pr45427.c: Likewise.
* gcc.dg/tree-ssa/pr61743-1.c: Likewise.
* gcc.dg/unroll-2.c: Likewise.
* gcc.dg/unroll-3.c: Likewise.
* gcc.dg/unroll-4.c: Likewise.
* gcc.dg/unroll-5.c: Likewise.
* gcc.dg/vect/vect-profile-1.c: Likewise.
* c-c++-common/goacc/device-lowering-debug-optimization.c: New test.
* c-c++-common/goacc/device-lowering-no-loops.c: New test.
* c-c++-common/goacc/device-lowering-no-optimization.c: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>

Fortran: delinearize multi-dimensional array accesses

The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations.  It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends.  This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.

This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.

        gcc/
        * expr.cc (get_inner_reference): Handle NOP_EXPR like
        VIEW_CONVERT_EXPR.

        gcc/fortran/
        * lang.opt (-param=delinearize=): New.
        * trans-array.cc (get_class_array_vptr): New, split from...
        (build_array_ref): ...here.
        (get_array_lbound, get_array_ubound): New, split from...
        (gfc_conv_array_ref): ...here.  Additional code refactoring
        plus support for delinearization of the array access.

        gcc/testsuite/
        * gfortran.dg/assumed_type_2.f90: Adjust patterns.
        * gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
        * gfortran.dg/graphite/block-3.f90: Remove xfails.
        * gfortran.dg/graphite/block-4.f90: Likewise.
        * gfortran.dg/inline_matmul_24.f90: Adjust patterns.
        * gfortran.dg/no_arg_check_2.f90: Likewise.
        * gfortran.dg/pr32921.f: Likewise.
        * gfortran.dg/reassoc_4.f: Disable delinearization for this test.

Co-Authored-By: Tobias Burnus <tobias@codesourcery.com>

Fix gimple_debug_cfg declaration

Silence a warning. The argument type did not match the definition.

gcc/ChangeLog:

* tree-cfg.h (gimple_debug_cfg): Change argument type from int
to dump_flags_t.

testsuite/libgomp.oacc-fortran/: Add -Wopenacc-parallelism

The following testcases expect the -Wopenacc-parallelism warning output
but did fail as not compiled with that warning; solution: add it.

libgomp/ChangeLog:

* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: Compile
with -Wopenacc-parallelism.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: Likewise.

openacc: fix ICE for non-decl expression in non-contiguous array base-pointer

Currently, we do not support cases like struct-members as the base-pointer
for an OpenACC non-contiguous array. Mark such cases as unsupported in the
C/C++ front-ends, instead of ICEing on them.

gcc/c/ChangeLog:

* c-typeck.cc (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.

gcc/cp/ChangeLog:

* semantics.cc (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.

libgomp amdgcn: Fix issues with dynamic OpenMP thread scaling

libgomp/ChangeLog:

* config/gcn/bar.h (gomp_barrier_init): Limit thread count to the
actual physical number.
* config/gcn/team.c (gomp_team_start): Don't attempt to set up
threads that do not exist.

Add -Wopenacc-parallelism to tests only in OG11

2021-04-30 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c: Add
-Wopenacc-parallelism option.

Update expected messages in data-clause-1 tests

The patch 'Merge non-contiguous array support patches' handles one of the
non-contiguous cases such that it is no longer an error.

2021-04-29 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/data-clause-1.c (foo): Remove expected message.
* g++.dg/goacc/data-clause-1.C (foo): Remove expected message.

Update expected messages in kernels-decompose-2 tests

This changes expected messages that differ between mainline and OG11. On
OG10, these messages were added in the patch:

081a01963ca8 Update expected messages, errors and warnings for "kernels" tests

2021-04-29 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c (main): Update expected
messages.
* gfortran.dg/goacc/kernels-decompose-2.f95 (main): Update expected
messages.

Fix is_oacc_parallel_or_serial for kernel regions

2021-04-07 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/
* omp-low.cc (is_oacc_parallel_or_serial): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED and
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE.

Update expected messages in OpenACC tests

This updates the types of messages expected in the test, and the '-fopt-info'
option used to request them. The phrasing of the expected messages has also
changed somewhat and has been adjusted to match.

2021-04-07 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c:
Update additional options and expected messages.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.

DWARF: late code range fixup

Ensure that the parent DWARF subprograms of offload kernel functions have a
code range, and are therefore not discarded by GDB. This is only necessary
when the parent function does not actually exist in the final binary, which is
commonly the case within the offload device's binary.

This patch replaces 808bdf1bb29 and fdcb23540a2. It should be squashed with
those before being posted upstream.

gcc/

* dwarf2out.cc (notional_parents_list): New file variable.
(gen_subprogram_die): Record offload kernel functions in
notional_parents_list.
(fixup_notional_parents): New function.
(dwarf2out_finish): Call fixup_notional_parents.
(dwarf2out_c_finalize): Reset notional_parents_list.

openmp: Scale type precision of collapsed iterator variable

This sets the type precision of the collapsed iterator variable to the
sum of the precision of the collapsed loop variables, up to a maximum of
sizeof(long long) (i.e. 64-bits).

2021-03-01 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/
* omp-expand.cc (expand_oacc_for): Convert .tile variable to
diff_type before multiplying.
* omp-general.cc (omp_extract_for_data): Use accumulated precision
of all collapsed for-loops as precision of iteration variable, up
to the precision of a long long.

libgomp/
* testsuite/libgomp.c-c++-common/collapse-4.c: New.
* testsuite/libgomp.fortran/collapse5.f90: New.

nvptx: remove erroneous stack deletion

The stacks are not supposed to be deleted every time memory is allocated, only
when there is insufficient memory. The unconditional call here seems to be in
error, and is causing a costly reallocation of the stacks before every launch.

libgomp/

* plugin/plugin-nvptx.c (GOMP_OFFLOAD_alloc): Remove early call to
nvptx_stacks_free.

DWARF: fix ICE caused by offload debug fix

This should be squashed with 808bdf1bb29 and fdcb23540a2 to go to mainline.

gcc/

* dwarf2out.cc (gen_subprogram_die): Replace existing low/high PC
attributes, rather than ICE.

OpenMP 5.0: Allow multiple clauses mapping same variable

This is a merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562081.html

This patch now allows multiple clauses on the same construct to map
the same variable, which was not valid in OpenMP 4.5, but now allowed
in 5.0.

This may possibly reverted/updated when a final patch is approved
for mainline.

2021-02-01 Chung-Lin Tang <cltang@codesourcery.com>

gcc/cp/ChangeLog:

* semantics.cc (finish_omp_clauses): Adjust to allow duplicate
mapped variables for OpenMP.

gcc/ChangeLog:

* omp-low.cc (install_var_field): Add new 'tree key_expr = NULL_TREE'
default parameter. Set splay-tree lookup key to key_expr instead of
var if key_expr is non-NULL. Adjust call to install_parm_decl.
Update comments.
(scan_sharing_clauses): Use clause tree expression as splay-tree key
for map/to/from and OpenACC firstprivate cases when installing the
variable field into the send/receive record type.
(maybe_lookup_field_in_outer_ctx): Add code to search through
construct clauses instead of entirely based on splay-tree lookup.
(lower_oacc_reductions): Adjust to find map-clause of reduction
variable, then create receiver-ref.
(lower_omp_target): Adjust to lookup var field using clause expression.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/clauses-2.c: Adjust testcase.

Correct fix offload dwarf info

The previous patch wasn't quite right, apparently. Somehow the behaviour
changed after another clean build? This tweak fixes it.

This patch should be squashed with fdcb23540a2 to go to mainline.

gcc/ChangeLog:

* dwarf2out.cc (gen_subprogram_die): Check offload attributes only.

DWARF address space for variables

Add DWARF address class attributes for variables that exist outside the
generic address space. In particular, this is the case for gang-private
variables in OpenACC offload kernels.

gcc/ChangeLog:

* dwarf2out.cc (add_location_or_const_value_attribute): Set
DW_AT_address_class, if appropriate.

Fix offload dwarf info

Add a notional code range to the notional parent function of offload kernel
functions. This is enough to prevent GDB discarding them.

gcc/ChangeLog:

* dwarf2out.cc (gen_subprogram_die): Add high/low_pc attributes for
parents of offload kernels.

[og10] vect: Add target hook to prefer gather/scatter instructions

For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.

The attached patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.

2021-01-13 Julian Brown <julian@codesourcery.com>

gcc/
* doc/tm.texi.in (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Add
documentation hook.
* doc/tm.texi: Regenerate.
* target.def (prefer_gather_scatter): Add target hook under vectorizer.
* tree-vect-stmts.cc (get_group_load_store_type): Optionally prefer
gather/scatter instructions to scalar/elementwise fallback.
* config/gcn/gcn.cc (TARGET_VECTORIZE_PREFER_GATHER_SCATTER): Define
hook.

[og10] openacc: Adjust loop lowering for AMD GCN

This patch adjusts OpenACC loop lowering in the AMD GCN target compiler
in such a way that the autovectorizer can vectorize the "vector" dimension
of those loops in more cases.

Rather than generating "SIMT" code that executes a scalar instruction
stream for each lane of a vector in lockstep, for GCN we model the GPU
like a typical CPU, with separate instructions to operate on scalar and
vector data. That means that unlike other offload targets, we rely on
the autovectorizer to handle the innermost OpenACC parallelism level,
which is "vector".

Because of this, the OpenACC builtin functions to return the current
vector lane and the vector width return 0 and 1 respectively, despite
the native vector width being 64 elements wide.

This allows generated code to work with our chosen compilation model,
but the way loops are lowered in omp-offload.c:oacc_xform_loop does not
understand the discrepancy between logical (OpenACC) and physical vector
sizes correctly. That means that if a loop is partitioned over e.g. the
worker AND vector dimensions, we actually lower with unit vector size --
meaning that if we then autovectorize, we end up trying to vectorize
over the "worker" dimension rather than the vector one! Then, because
the number of workers is not fixed at compile time, that means the
autovectorizer has a hard time analysing the loop and thus vectorization
often fails entirely.

We can fix this by deducing the true vector width in oacc_xform_loop,
and using that when we are on a "non-SIMT" offload target. We can then
rearrange how loops are lowered in that function so that the loop form
fed to the autovectorizer is more amenable to vectorization -- namely,
the innermost step is set to process each loop iteration sequentially.

For some benchmarks, allowing vectorization to succeed leads to quite
impressive performance improvements -- I've observed between 2.5x and
40x on one machine/GPU combination.

The low-level builtins available to user code (__builtin_goacc_parlevel_id
and __builtin_goacc_parlevel_size) continue to return 0/1 respectively
for the vector dimension for AMD GCN, even if their containing loop is
vectorized -- that's a quirk that we might possibly want to address at
some later date.

Only non-"chunking" loops are handled at present. "Chunking" loops are
still lowered as before.

2021-01-13 Julian Brown <julian@codesourcery.com>

gcc/
* omp-offload.cc (oacc_thread_numbers): Add VF_BY_VECTORIZER parameter.
Add overloaded wrapper for previous arguments & behaviour.
(oacc_xform_loop): Lower vector loops to iterate a multiple of
omp_max_vf times over contiguous steps on non-SIMT targets.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-1.c: Adjust for loop
lowering changes.
* testsuite/libgomp.oacc-c-c++-common/loop-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/loop-red-wv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-gwv-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/routine-wv-1.c: Likewise.

Relax some restrictions on the loop bound in kernels loop annotation.

OpenACC loop semantics require that the loop bound be computable
before entering the loop, rather than the C/C++ semantics where the
end test is evaluated on every iteration.  Formerly the kernels loop
annotater permitted only constants and variables not modified in the
loop body in the loop bound expression.  This patch relaxes those
restrictions somewhat to allow many forms of expressions involving
such constants and variables, including calls to constant functions.

2020-08-30  Sandra Loosemore  <sandra@codesourcery.com>

gcc/c-family/
* c-omp.cc (end_test_ok_for_annotation_r): New.
(end_test_ok_for_annotation): New.
(check_and_annotate_for_loop): Use the new helper function.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-21.c: New.
* c-c++-common/goacc/kernels-loop-annotation-22.c: New.

Clean up loop variable extraction in OpenACC kernels loop annotation.

The code for identifying annotatable loops in OpenACC kernels regions
previously looked for the loop variable as the left-hand side of the
comparison in the loop end test.  However, front end optimizations
sometimes switch the sense of the comparison, making this method
unreliable.  In particular, it's ambiguous when both operands to the
end test comparison are local variables.

This patch reorders the loop processing to identify the loop variable
from the initializer, rather than the end test. The processing of the
end test then just checks that one of the operands to the comparison
matches the variable appearing in the initializer.  Much of the patch
is code refactoring, moving the initializer analysis out of
annotate_for_loop to check_and_annotate_for_loop so it can be
performed earlier.

2020-08-30  Sandra Loosemore  <sandra@codesourcery.com>

gcc/c-family/
* c-omp.cc (annotate_for_loop): Move initializer processing...
(check_and_annotate_for_loop): ... to here.  Allow the loop
variable as either operand to the condition.

Fix patterns in Fortran tests for kernels loop annotation.

Several of the Fortran tests for kernels loop annotation were failing
due to changes in the formatting of "acc loop" constructs in the dump
file. Now the "auto" clause appears first, instead of after "private".

2020-08-23 Sandra Loosemore <sandra@codesourcery.com>

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: Update
expected output.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: Likewise.

Permit calls to builtins and intrinsics in kernels loops.

This tweak to the OpenACC kernels loop annotation relaxes the
restrictions on function calls in the loop body.  Normally calls to
functions not explicitly marked with a parallelism attribute are not
permitted, but C/C++ builtins and Fortran intrinsics have known
semantics so we can generally permit those without restriction.  If
any turn out to be problematical, we can add on here to recognize
them, or in the processing of the "auto" annotations.

2020-08-22  Sandra Loosemore  <sandra@codesourcery.com>

gcc/c-family/
* c-omp.cc (annotate_loops_in_kernels_regions): Test for
calls to builtins.

gcc/fortran/
* openmp.cc (check_expr_for_invalid_calls): Check for intrinsic
functions.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-20.c: New.
* gfortran.dg/goacc/kernels-loop-annotation-20.f95: New.

Update dg-* in gfortran.dg/gomp/pr67500.f90

Contrary to GCC 11, OG10 uses an error instead of a warning,
cf. commit 271c7fef548a86676d304b1eb2be5c0d47280bd6.

gcc/testsuite/
* gfortran.dg/gomp/pr67500.f90: Change dg-warning to
dg-error.

Annotate inner loops in "acc kernels loop" directives (Fortran).

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior in Fortran.

2020-08-19  Sandra Loosemore  <sandra@codesourcery.com>

gcc/fortran/
* openmp.cc (annotate_do_loops_in_kernels): Handle
EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
loops in a combined "acc kernels loop" directive.

gcc/testsuite/
* gfortran.dg/goacc/kernels-loop-annotation-18.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-19.f95: New.
* gfortran.dg/goacc/combined-directives.f90: Adjust expected
patterns.
* gfortran.dg/goacc/private-explicit-kernels-1.f95: Likewise.
* gfortran.dg/goacc/private-predetermined-kernels-1.f95:
Likewise.

Annotate inner loops in "acc kernels loop" directives (C/C++).

Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code.  However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops.  This patch implements this behavior for C and C++.

2020-08-19  Sandra Loosemore  <sandra@codesourcery.com>

gcc/c-family/
* c-omp.cc (annotate_loops_in_kernels_regions): Process inner
loops in combined "acc kernels loop" directives.

gcc/testsuite/
* c-c++-common/goacc/kernels-loop-annotation-18.c: New.
* c-c++-common/goacc/kernels-loop-annotation-19.c: New.
* c-c++-common/goacc/combined-directives.c: Adjust expected
patterns.

Add a "combined" flag for "acc kernels loop" etc directives.

2020-08-19 Sandra Loosemore <sandra@codesourcery.com>

gcc/
* tree.h (OACC_LOOP_COMBINED): New.

gcc/c/
* c-parser.cc (c_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/cp/
* parser.cc (cp_parser_oacc_loop): Set OACC_LOOP_COMBINED.

gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_do): Add combined parameter,
use it to set OACC_LOOP_COMBINED. Update all call sites.

Fix gfortran.dg/goacc/pr70828.f90 testcase

Array mapping was changed by the patch '[OpenMP, Fortran] Add
structure/derived-type element mapping'.

2020-08-19 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/testsuite/
* gfortran.dg/goacc/pr70828.f90: Update expected output in Gimple
dump.

XFAIL tests in gfortran.dg/goacc/loop-2-kernels.f95

The C-equivalent version of the test (c-c++-common/goacc/loop-2-kernels.c)
has these tests XFAILed in the commit 'Make new OpenACC kernels conversion
the default; adjust and add tests' (commit
757f56ddc43fd80bb8740222ec352111b26d66e9), so the Fortran version should
be XFAILed too.

2020-07-24 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/testsuite/
* gfortran.dg/goacc/loop-2-kernels.f95: Add XFAILs.

Fix failures in c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c testcase

This should have been part of 'Update expected messages, errors and warnings
for "kernels" tests' (commit 081a01963ca8db7ddaaf5871d281321454fd3246).

2020-07-24 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c
(main): Remove obsolete expected messages.

libgomp.oacc-fortran/firstprivate-int.f90 fix for nonexisting kind-16 int

libgomp/
* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: Use
highest available integer kind instead of assuming that kind=16 exists.

Fix goacc/noncontig_array-1.c testcase

2020-06-02 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/noncontig_array-1.c: Dump Gimple pass.

OpenACC: fix privatization of by-reference arrays

Replacing of a by-reference variable in a private clause by a local variable
makes sense; however, for arrays, the size is not directly known by the type.
This causes an ICE via create_tmp_var which indirectly invokes
force_constant_size in this case - but the latter only handled Ada.

gcc/ChangeLog:

* gimplify.cc (localize_reductions): Do not create local
variable for privatized arrays.

Fix bug in processing of array dimensions in data clauses.

The g++ front end wraps the array length and low_bound values in
NON_LVALUE_EXPR, causing the subsequent tests for INTEGER_CST to fail.
The test case c-c++-common/goacc/kernels-loop-annotation-1.c was
tickling this bug and giving bogus errors in g++ because it was falling
through to dynamic array code instead of recognizing the constant bounds.

This patch was posted upstream here
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542694.html
but not yet committed. It may be that some other fix for this problem
is implemented on mainline instead; check before merging this patch.

2020-03-31 Sandra Loosemore <sandra@codesourcery.com>

gcc/cp/
* semantics.cc (handle_omp_array_sections_1): Call STRIP_NOPS
on length and low_bound;
(handle_omp_array_sections): Likewise.

Additional Fortran testsuite fixes for kernels loops annotation pass.

These testsuite fixes are specific to the og10 branch, so are being
segregated from the ones that apply to mainline in a separate commit
from the main Fortran kernels loop annotation patch.

2020-03-27 Sandra Loosemore <sandra@codesourcery.com>

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Adjust
line numbering.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-2.f95: Add
-fno-openacc-kernels-annotate-loops.

Kernels loops annotation: Fortran.

This patch implements the Fortran support for adding "#pragma acc loop auto"
annotations to loops in OpenACC kernels regions.  It implements the same
-fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options
that were previously added (and documented) for the C/C++ front ends.

2020-03-27  Sandra Loosemore  <sandra@codesourcery.com>
    Gergö Barany <gergo@codesourcery.com>

gcc/fortran/
* gfortran.h (gfc_oacc_annotate_loops_in_kernels_regions): Declare.
* lang.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.
* openmp.cc: Include options.h.
(enum annotation_state, enum annotation_result): New.
(check_code_for_invalid_calls): New.
(check_expr_for_invalid_calls): New.
(check_for_invalid_calls): New.
(annotate_do_loop): New.
(annotate_do_loops_in_kernels): New.
(compute_goto_targets): New.
(gfc_oacc_annotate_loops_in_kernels_regions): New.
* parse.cc (gfc_parse_file): Handle -fopenacc-kernels-annotate-loops.

gcc/testsuite/
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add
-fno-openacc-kernels-annotate-loops option.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/common-block-3.f90: Likewise.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.
* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
Likewise.
* gfortran.dg/goacc/kernels-loop-annotation-1.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-2.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-3.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-4.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-5.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-6.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-7.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-8.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-9.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-10.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-11.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-12.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-13.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-14.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-15.f95: New.
* gfortran.dg/goacc/kernels-loop-annotation-16.f95: New.

Add -fno-openacc-kernels-annotate-loops option to more testcases.

The testcases being tweaked here are present on the og10 branch but not on
FSF mainline.

2020-03-27 Sandra Loosemore <sandra@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-2.c: Add
-fno-openacc-kernels-annotate-loops.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.

Kernels loops annotation: C and C++.

This patch detects loops in kernels regions that are candidates for
parallelization, and adds "#pragma acc loop auto" annotations to them.
This annotation is controlled by the -fopenacc-kernels-annotate-loops
option, which is enabled by default.  -Wopenacc-kernels-annotate-loops
can be used to produce diagnostics about loops that cannot be annotated.

2020-03-27  Sandra Loosemore  <sandra@codesourcery.com>

Kernels loops annotation: C and C++.

gcc/c-family/
* c-common.h (c_oacc_annotate_loops_in_kernels_regions): Declare.
* c-omp.cc: Include tree-iterator.h
(enum annotation_state): New.
(struct annotation_info): New.
(do_not_annotate_loop): New.
(do_not_annotate_loop_nest): New.
(annotation_error): New.
(c_finish_omp_for_internal): Split from c_finish_omp_for.  Use
annotation_error function.  Code refactoring to avoid destructive
changes that cannot be undone in case of error.
(is_local_var): New.
(lang_specific_unwrap_initializer): New.
(annotate_for_loop): New.
(check_and_annotate_for_loop): New.
(annotate_loops_in_kernels_regions): New.
(c_oacc_annotate_loops_in_kernels_regions): New.
* c.opt (Wopenacc-kernels-annotate-loops): New.
(fopenacc-kernels-annotate-loops): New.

gcc/c/
* c-decl.cc (c_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/cp/
* decl.cc (cp_unwrap_for_init): New.
(finish_function): Call c_oacc_annotate_loops_in_kernels_regions.

gcc/
* doc/invoke.texi (Option Summary): Add entries for
-Wopenacc-kernels-annotate-loops and
-fno-openacc-kernels-annotate-loops.
(Warning Options): Document -Wopenacc-kernels-annotate-loops.
(Optimization Options): Document -fno-openacc-kernels-annotate-loops.

gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Add
-fno-openacc-kernels-annotate-loops option.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/kernels-counter-var-redundant-load.c: Likewise.
* c-c++-common/goacc/kernels-counter-vars-function-scope.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
* c-c++-common/goacc/kernels-loop-data.c: Likewise.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
* c-c++-common/goacc/kernels-loop.c: Likewise.
* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
Likewise.
* c-c++-common/goacc/kernels-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-annotation-1.c: New.
* c-c++-common/goacc/kernels-loop-annotation-2.c: New.
* c-c++-common/goacc/kernels-loop-annotation-3.c: New.
* c-c++-common/goacc/kernels-loop-annotation-4.c: New.
* c-c++-common/goacc/kernels-loop-annotation-5.c: New.
* c-c++-common/goacc/kernels-loop-annotation-6.c: New.
* c-c++-common/goacc/kernels-loop-annotation-7.c: New.
* c-c++-common/goacc/kernels-loop-annotation-8.c: New.
* c-c++-common/goacc/kernels-loop-annotation-9.c: New.
* c-c++-common/goacc/kernels-loop-annotation-10.c: New.
* c-c++-common/goacc/kernels-loop-annotation-11.c: New.
* c-c++-common/goacc/kernels-loop-annotation-12.c: New.
* c-c++-common/goacc/kernels-loop-annotation-13.c: New.
* c-c++-common/goacc/kernels-loop-annotation-14.c: New.
* c-c++-common/goacc/kernels-loop-annotation-15.c: New.
* c-c++-common/goacc/kernels-loop-annotation-16.c: New.
* c-c++-common/goacc/kernels-loop-annotation-17.c: New.

Add XFAIL for libgomp.oacc-c-c++-common/data-firstprivate-1.c

The firstprivate_int optimization changes the semantics of firstprivate
in this test, so XFAIL it until the correct semantics for firstprivate
have been decided (PR92036).

2020-03-24 Kwok Cheung Yeung <kcy@codesourcery.com>

libgomp/
* testsuite/libgomp.oacc-c-c++-common/data-firstprivate-1.c: XFAIL
execution test.

Fix vector handling for firstprivate of <= pointer size

Test case is the existing libgomp.oacc-c++/firstprivate-mappings-1.C.

* omp-low.c (convert_from_firstprivate_int):
Use VIEW_CONVERT also for vectors.

Fix OpenACC "ephemeral" asynchronous host-to-device copies

libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-10.c (main): Fix
async-safety issue. Increase number of iterations.

Fortran "declare create"/allocate support for OpenACC

2018-10-04  Cesar Philippidis  <cesar@codesourcery.com>
            Julian Brown  <julian@codesourcery.com>

gcc/
* omp-low.cc (scan_sharing_clauses): Update handling of OpenACC declare
create, declare copyin and declare deviceptr to have local lifetimes.
(convert_to_firstprivate_int): Handle pointer types.
(convert_from_firstprivate_int): Likewise.  Create local storage for
the values being pointed to.  Add new orig_type argument.
(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
Add orig_type argument to convert_from_firstprivate_int call.
Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize
firstprivate VLAs.
* tree-pretty-print.cc (dump_omp_clause): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.

gcc/fortran/
* gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE,
OMP_MAP_DECLARE_DEALLOCATE.
(gfc_omp_clauses): Add update_allocatable.
* trans-array.cc (gfc_array_allocate): Call
gfc_trans_oacc_declare_allocate for decls that have oacc_declare_create
attribute set.
* trans-decl.cc (find_module_oacc_declare_clauses): Relax
oacc_declare_create to OMP_MAP_ALLOC, and oacc_declare_copyin to
OMP_MAP_TO, in order to match OpenACC 2.5 semantics.
* trans-openmp.cc (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER
(for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for
allocatable scalar decls.  Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}
clauses.
(gfc_trans_oacc_executable_directive): Use GOMP_MAP_ALWAYS_POINTER
for allocatable scalar data clauses inside acc update directives.
(gfc_trans_oacc_declare_allocate): New function.
* trans-stmt.cc (gfc_trans_allocate): Call
gfc_trans_oacc_declare_allocate for decls with oacc_declare_create
attribute set.
(gfc_trans_deallocate): Likewise.
* trans.h (gfc_trans_oacc_declare_allocate): Declare.

gcc/testsuite/
* gfortran.dg/goacc/declare-allocatable-1.f90: New test.

include/
* gomp-constants.h (enum gomp_map_kind): Define
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and GOMP_MAP_FLAG_SPECIAL_4.

libgomp/
* oacc-mem.c (gomp_acc_declare_allocate): New function.
* oacc-parallel.c (GOACC_enter_exit_data): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
* testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90: New test.

2020-02-19  Julian Brown  <julian@codesourcery.com>

gcc/fortran/
* trans-openmp.cc (gfc_omp_check_optional_argument): Handle non-decl
case.

gcc/
* gimplify.cc (gimplify_scan_omp_clauses): Handle
GOMP_MAP_DECLARE_ALLOCATE and GOMP_MAP_DECLARE_DEALLOCATE.

libgomp/
* libgomp.h (gomp_acc_declare_allocate): Remove prototype.
* oacc-mem.c (gomp_acc_declare_allocate): Make static.  Add POINTER
argument. Use acc_delete instead of acc_free.  Handle scalar
mappings.
(find_group_last): Handle GOMP_MAP_DECLARE_ALLOCATE and
GOMP_MAP_DECLARE_DEALLOCATE groupings.
(goacc_enter_data_internal): Fix kind check for
GOMP_MAP_DECLARE_ALLOCATE. Pass new pointer argument to
gomp_acc_declare_allocate.
(goacc_exit_data_internal): Unlock device mutex around
gomp_acc_declare_allocate call. Pass new pointer argument. Handle
group pointer mapping for deallocate.

2021-04-07  Kwok Cheung Yeung  <kcy@codesourcery.com>

libgomp/
* oacc-mem.c (goacc_enter_data_internal): Unlock mutex before calling
gomp_acc_declare_allocate and relock it afterwards.

Handle references in OpenACC "private" clauses

gcc/
* gimplify.cc (localize_reductions): Rewrite references for
OMP_CLAUSE_PRIVATE also.

Silence compiler warnings

gcc/
2019-09-17 Tobias Burnus <tobias@codesourcery.com>

* gimplify.cc (gomp_oacc_needs_data_present): Remove unused variable.

Update expected messages, errors and warnings for "kernels" tests

gcc/testsuite/
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Update
expected message/warning/error output.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/routine-1.c: Likewise.

Use more appropriate var in localize_reductions call

gcc/
* gimplify.cc (gimplify_omp_for): Use for_stmt in call to
localize_reductions.

Fix tree check failure with reduction localization

gcc/
* gimplify.cc (gimplify_omp_workshare): Use OMP_CLAUSES, OMP_BODY
instead of OMP_TARGET_CLAUSES, OMP_TARGET_BODY.

Reference reduction localization

gcc/
* gimplify.cc (privatize_reduction): New struct.
(localize_reductions_r, localize_reductions): New functions.
(gimplify_omp_for): Call localize_reductions.
(gimplify_omp_workshare): Likewise.
* omp-low.cc (lower_oacc_reductions): Handle localized reductions.
Create fewer temp vars.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE_REDUCTION_PRIVATE_DECL
documentation.
* tree.cc (omp_clause_num_ops): Bump number of ops for
OMP_CLAUSE_REDUCTION to 6.
(walk_tree_1): Adjust accordingly.
* tree.h (OMP_CLAUSE_REDUCTION_PRIVATE_DECL): Add macro.

Fix libgomp.oacc-fortran/lib-13.f90 async bug

libgomp/
* testsuite/libgomp.oacc-fortran/lib-13.f90: End data region after
wait API calls.

Add missing exec_params libgomp plugin entry points

libgomp/
* plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_exec_params,
GOMP_OFFLOAD_openacc_async_exec_params): New functions.

NVPTX GOMP_OFFLOAD_openacc_async_construct arg fix and gomp_print_* support

libgomp/
* config/nvptx/gomp_print.c (gomp_print_string, gomp_print_integer,
gomp_print_double): New.

Add changes to profiling interface from OG8 branch

This bundles up the parts of the profiling code from the OG8 branch that were
not included in the upstream patch.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Update.

libgomp/
* oacc-init.c (get_property_any): Add profiling code.

libgomp/
* Makefile.am (libgomp_la_SOURCES): Add
oacc-profiling-acc_register_library.c.
* Makefile.in: Regenerate.
* libgomp.texi: Remove paragraph about acc_register_library.
* oacc-parallel.c (GOACC_parallel_keyed_internal): Set device_api for
profiling.
* oacc-profiling-acc_register_library.c: New file.
* oacc-profiling.c (goacc_profiling_initialize): Call
acc_register_library. Avoid duplicate registration.
(acc_register_library): Remove.
* config/nvptx/oacc-profiling-acc_register_library.c:
New empty file.
* config/nvptx/oacc-profiling.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-dispatch-1.c: Remove
call to acc_register_library.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-init-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-kernels-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-parallel-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-valid_bytes-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/acc_prof-version-1.c: Likewise.

Assumed-size arrays with non-lexical data mappings

gcc/
* gimplify.cc (gimplify_adjust_omp_clauses_1): Raise error for
assumed-size arrays in map clauses for Fortran/OpenMP.
* omp-low.cc (lower_omp_target): Set the size of assumed-size Fortran
arrays to one to allow use of data already mapped on the offload device.

gcc/fortran/
* trans-openmp.cc (gfc_omp_finish_clause): Change clauses mapping
assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type.

Fix expected messages in goacc tests

The expected messages in the OpenACC kernel-related tests should be prefixed
with 'optimized:' rather than 'note:'.

2019-05-31 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-1.c: Change 'note:' to
'optimized:'. Fix typo.
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c:
Change 'note:' to 'optimized:'.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c: Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.

Fix lexically-nested data mappings for no_alloc or optional arguments

gcc/
* gimplify.cc (gimplify_adjust_omp_clauses_1): Support implied no_alloc
and optional arguments based on mappings in enclosing data regions.

Fix references declared in lexically-enclosing OpenACC data region

gcc/fortran/
* trans-openmp.cc (gfc_omp_finish_clause): Guard addition of clauses for
pointers with DECL_P.

gcc/
* gimplify.cc (oacc_array_mapping_info): Add REF field.
(gimplify_scan_omp_clauses): Initialise above field for data blocks
passed by reference.
(gomp_oacc_needs_data_present): Handle references.
(gimplify_adjust_omp_clauses_1): Handle references and optional
arguments for variables declared in lexically-enclosing OpenACC data
region.

Add kernels for-index reuse testcase.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-for-index-reuse-1.c: New
test.

Update OpenACC version to 2.6

libgomp/
* libgomp.texi: Update mentions of OpenACC version to 2.6. Update
section numbers to match version 2.6 of the spec.

Make new OpenACC kernels conversion the default; adjust and add tests

gcc/
* params.opt (openacc_kernels): Default to decompose.
gcc/testsuite/
* c-c++-common/goacc/classify-kernels-unparallelized.c: Update.
* c-c++-common/goacc/classify-kernels.c: Likewise.
* c-c++-common/goacc/classify-parallel.c: Likewise.
* c-c++-common/goacc/classify-routine.c: Likewise.
* c-c++-common/goacc/loop-2-kernels.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-conditional-loop-independent_seq.c:
New file.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto.c:
Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism-1-kernels-straight-line.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-auto.c:
Likewise.
* c-c++-common/goacc/note-parallelism-combined-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-conditional-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-auto.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loop-independent_seq.c:
Likewise.
* c-c++-common/goacc/note-parallelism-kernels-loops.c: Likewise.
* c-c++-common/goacc/note-parallelism.c: Likewise.
* c-c++-common/goacc/routine-1.c: Likewise.
* c-c++-common/goacc/uninit-dim-clause.c: Likewise.

Use "--param=openacc-kernels=parloops" to document "parloops" test cases

gcc/testsuite/
* c-c++-common/goacc/kernels-1.c: Add
"--param=openacc-kernels=parloops".
* c-c++-common/goacc/kernels-acc-loop-reduction.c: Likewise.
* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: Likewise.
* c-c++-common/goacc/kernels-alias-2.c: Likewise.
* c-c++-common/goacc/kernels-alias-3.c: Likewise.
* c-c++-common/goacc/kernels-alias-4.c: Likewise.
* c-c++-common/goacc/kernels-alias-5.c: Likewise.
* c-c++-common/goacc/kernels-alias-6.c: Likewise.
* c-c++-common/goacc/kernels-alias-7.c: Likewise.
* c-c++-common/goacc/kernels-alias-8.c: Likewise.
* c-c++-common/goacc/kernels-alias-ipa-pta-2.c: Likewise.
* c-c++-common/goacc/kernels-alias-ipa-pta-3.c: Likewise.
* c-c++-common/goacc/kernels-alias-ipa-pta-4.c: Likewise.
* c-c++-common/goacc/kernels-alias-ipa-pta.c: Likewise.
* c-c++-common/goacc/kernels-alias.c: Likewise.
* c-c++-common/goacc/kernels-counter-var-redundant-load.c:
Likewise.
* c-c++-common/goacc/kernels-counter-vars-function-scope.c:
Likewise.
* c-c++-common/goacc/kernels-double-reduction-n.c: Likewise.
* c-c++-common/goacc/kernels-double-reduction.c: Likewise.
* c-c++-common/goacc/kernels-loop-2-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-3.c: Likewise.
* c-c++-common/goacc/kernels-loop-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit-2.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-enter-exit.c: Likewise.
* c-c++-common/goacc/kernels-loop-data-update.c: Likewise.
* c-c++-common/goacc/kernels-loop-data.c: Likewise.
* c-c++-common/goacc/kernels-loop-g.c: Likewise.
* c-c++-common/goacc/kernels-loop-mod-not-zero.c: Likewise.
* c-c++-common/goacc/kernels-loop-n-acc-loop.c: Likewise.
* c-c++-common/goacc/kernels-loop-n.c: Likewise.
* c-c++-common/goacc/kernels-loop-nest.c: Likewise.
* c-c++-common/goacc/kernels-loop.c: Likewise.
* c-c++-common/goacc/kernels-one-counter-var.c: Likewise.
* c-c++-common/goacc/kernels-parallel-loop-data-enter-exit.c:
Likewise.
* c-c++-common/goacc/kernels-reduction.c: Likewise.
* gfortran.dg/goacc/kernels-alias-2.f95: Likewise.
* gfortran.dg/goacc/kernels-alias-3.f95: Likewise.
* gfortran.dg/goacc/kernels-alias-4.f95: Likewise.
* gfortran.dg/goacc/kernels-alias.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit-2.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-enter-exit.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data-update.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-data.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-inner.f95: Likewise.
* gfortran.dg/goacc/kernels-loop-n.f95: Likewise.
* gfortran.dg/goacc/kernels-loop.f95: Likewise.
* gfortran.dg/goacc/kernels-loops-adjacent.f95: Likewise.
* gfortran.dg/goacc/kernels-parallel-loop-data-enter-exit.f95:
Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-2.c:
Add "--param=openacc-kernels=parloops".
* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-alias-ipa-pta.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-empty.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-5.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-6.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-collapse.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit-2.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-enter-exit.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data-update.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-data.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-g.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-mod-not-zero.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-n.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop-nest.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-loop.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-parallel-loop-data-enter-exit.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction-1.c:
Likewise.
* testsuite/libgomp.oacc-c-c++-common/kernels-reduction.c:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-2.f95: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-data-2.f95:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit-2.f95:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-data-enter-exit.f95:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-data-update.f95:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop-data.f95: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-loop.f95: Likewise.
* testsuite/libgomp.oacc-fortran/kernels-parallel-loop-data-enter-exit.f95:
Likewise.
* testsuite/libgomp.oacc-fortran/kernels-reduction-1.f90:
Likewise.

Disable libstdc++ dependency for libffi

Disable AC_PROG_CXX and consequently a libstdc++ dependency for libffi,
introduced with upstream libffi commit 7d698125b1f0 ("Use the proper C++
compiler to run C++ tests"). This is only needed for the libffi test
suite, which we don't have to support in the GCC tree, as libffi is
maintained as a separate project. The dependency causes a build failure
with the `powerpc64le-linux-gnu' target due to a circular dependency:

make[1]: Circular configure-target-libffi <- maybe-all-target-libstdc++-v3 dependency dropped.
make[1]: *** [configure-target-libffi] Error 1
make: *** [all] Error 2

due to a libgomp dependency for libstdc++ and then a libffi dependency
for libgomp, introduced with commit 998eb38b265d ("Use functional
parameters for data mappings in OpenACC child functions").

/
* Makefile.def (lang_env_dependencies): Disable `cxx' dependency
for `libffi'.
* Makefile.in: Regenerate.

libffi/
* configure.ac: Disable AC_PROG_CXX.
* configure: Regenerate.
* Makefile.in: Regenerate.
* include/Makefile.in: Regenerate.
* man/Makefile.in: Regenerate.
* testsuite/Makefile.in: Regenerate.

Tweak target selector for libgomp.oacc-c-c++-common/lib-93.c.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/lib-93.c: Adjust target selector.

Enable firstprivate OpenACC reductions

2018-09-05 Cesar Philippidis <cesar@codesourcery.com>
Chung-Lin Tang <cltang@codesourcery.com>

gcc/
* gimplify.cc (omp_add_variable): Enable firstprivate reduction
variables.

gcc/testsuite/
* c-c++-common/goacc/reduction-10.c: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/privatize-reduction-1.c: New
test.
* testsuite/libgomp.oacc-c-c++-common/privatize-reduction-2.c: New
test.

Don't mark OpenACC auto loops as independent inside acc parallel regions

2018-09-20 Cesar Philippidis <cesar@codesourcery.com>

gcc/
* omp-low.cc (lower_oacc_head_mark): Don't mark OpenACC auto
loops as independent inside acc parallel regions.

gcc/testsuite/
* c-c++-common/goacc/loop-auto-1.c: Adjust test case to conform to
the new behavior of the auto clause in OpenACC 2.5.
* c-c++-common/goacc/loop-auto-2.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* c-c++-common/goacc/loop-auto-3.c: New test.
* gfortran.dg/goacc/loop-auto-1.f90: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust test case
to conform to the new behavior of the auto clause in OpenACC 2.5.

Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

2018-12-22  Cesar Philippidis  <cesar@codesourcery.com>
            Julian Brown  <julian@codesourcery.com>
            Tobias Burnus  <tobias@codesourcery.com>

gcc/
* omp-low.cc (maybe_lookup_field_in_outer_ctx): New function.
(convert_to_firstprivate_int): New function.
(convert_from_firstprivate_int): New function.
(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.
Remove unused variable.

libgomp/
* oacc-parallel.c (GOACC_parallel_keyed): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* plugin/plugin-nvptx.c (nvptx_exec): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New
test.
* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.

Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-08-28 Julian Brown <julian@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>

gcc/
* gimplify.cc (oacc_array_mapping_info): New struct.
(gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Zero-initialise above.
(delete_omp_context): Delete above if allocated.
(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
and record in above map.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
slices) declared in lexically-enclosing data constructs.
* omp-low.cc (lower_omp_target): Allow decl for bias not to be present
in OpenACC context.

gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: New test.
* gfortran.dg/goacc/pr70828.f90: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
* testsuite/libgomp.oacc-fortran/implicit_copy.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: New test.

Default compute dimensions (compile time)

Typo fix relative to last posted version.

2018-10-05  Nathan Sidwell  <nathan@acm.org>
    Tom de Vries  <tdevries@suse.de>
    Thomas Schwinge  <thomas@codesourcery.com>
    Julian Brown  <julian@codesourcery.com>

gcc/
* doc/invoke.texi (fopenacc-dim): Update.
* omp-offload.cc (oacc_parse_default_dims): Update.

gcc/testsuite/
* c-c++-common/goacc/acc-icf.c: Update.
* c-c++-common/goacc/parallel-dims-1.c: Likewise.
* gfortran.dg/goacc/routine-4.f90: Likewise.
* gfortran.dg/goacc/routine-multiple-directives-1.f90: Likewise.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-default-compile.c: New.
* testsuite/libgomp.oacc-c-c++-common/loop-warn-1.c: New.
* testsuite/libgomp.oacc-c-c++-common/parallel-dims.c: Update.

Generate sequential loop for OpenACC loop directive inside kernels

2019-09-20 Chung-Lin Tang <cltang@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>

gcc/
* omp-expand.cc (struct omp_region): Add inside_kernels_p field.
(expand_omp_for_generic): Adjust to generate a 'sequential' loop
when GOMP builtin arguments are BUILT_IN_NONE.
(expand_omp_for): Use expand_omp_for_generic to generate a
non-parallelized loop for OMP_FORs inside OpenACC kernels regions.
(expand_omp): Mark inside_kernels_p field true for regions
nested inside OpenACC kernels constructs.
gcc/testsuite/
* c-c++-common/goacc/kernels-loop-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-2-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-n-acc-loop.c: New test.
* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: New test.

Reinstate kernels-restrict behaviour

This patch contains a small fix for upstream churn relative to the last version
posted.

2018-09-20 Cesar Philippidis <cesar@codesourcery.com>
Julian Brown <julian@codesourcery.com>

* omp-low.c (install_var_field): New base_pointer_restrict
argument.
(scan_sharing_clauses): Update call to install_var_field.
(omp_target_base_pointers_restrict_p): New function.
(scan_omp_target): Update call to install_var_field.

Adjustments and additions to testcases

Some additions of redundant "present" clauses dropped.

2018-10-22  Cesar Philippidis  <cesar@codesourcery.com>

gcc/testsuite/
* g++.dg/goacc/loop-1.c: New test.
* g++.dg/goacc/loop-2.c: New test.
* g++.dg/goacc/loop-3.c: New test.

2018-10-22  James Norris  <jnorris@codesourcery.com>
    Cesar Philippidis  <cesar@codesourcery.com>
    Tom de Vries  <tom@codesourcery.com>

libgomp/
* testsuite/libgomp.oacc-fortran/data-3.f90: Update parallel
regions to denote variables copyied in via acc enter data as
present.
* testsuite/libgomp.oacc-c-c++-common/subr.h: Reimplement.
* testsuite/libgomp.oacc-c-c++-common/subr.ptx: Regenerated PTX.
* testsuite/libgomp.oacc-c-c++-common/timer.h: Removed.
* testsuite/libgomp.oacc-c-c++-common/lib-69.c: Change async checks.
* testsuite/libgomp.oacc-c-c++-common/lib-70.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-72.c: Rework kernel i/f and
change async checks.
* testsuite/libgomp.oacc-c-c++-common/lib-73.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-74.c: Rework kernel i/f and
timing checks.
* testsuite/libgomp.oacc-c-c++-common/lib-75.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-76.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-78.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-81.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-82.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-93.c: New test.

Various OpenACC reduction enhancements - test cases

2018-12-13  Cesar Philippidis  <cesar@codesourcery.com>
            Nathan Sidwell  <nathan@acm.org>
            Julian Brown  <julian@codesourcery.com>

        gcc/testsuite/
        * c-c++-common/goacc/orphan-reductions-1.c: New test.
        * c-c++-common/goacc/reduction-9.c: New test.
        * c-c++-common/goacc/routine-4.c: Update.
        * g++.dg/goacc/reductions-1.C: New test.
        * gcc.dg/goacc/loop-processing-1.c: Update.
        * gfortran.dg/goacc/orphan-reductions-1.f90: New test.

        libgomp/
        * testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
        * testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
        * testsuite/libgomp.oacc-fortran/reduction-9.f90: New test.

Various OpenACC reduction enhancements - ME and nvptx changes

Parts of the first posting got lost in the second posting, above.
This version hopefully contains everything.

2018-10-30 Cesar Philippidis <cesar@codesourcery.com>

gcc/
* config/nvptx/nvptx.cc (nvptx_propagate_unified): New.
(nvptx_split_blocks): Call it for cond_uni insn.
(nvptx_expand_cond_uni): New.
(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
(nvptx_init_builtins): Initialize it.
(nvptx_expand_builtin):
(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
tree BITS operand.
(nvptx_vector_reduction): New.
(nvptx_adjust_reduction_type): New.
(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
Use it to adjust the type of ref_to_res.
(nvptx_goacc_reduction_teardown):
* config/nvptx/nvptx.md (cond_uni): New pattern.

Various OpenACC reduction enhancements - FE changes

This version differs somewhat from the last version posted upstream
(and addresses some of Jakub's review comments).

2018-12-13  Cesar Philippidis  <cesar@codesourcery.com>
    Nathan Sidwell  <nathan@acm.org>
    Julian Brown  <julian@codesourcery.com>

gcc/c/
* c-parser.cc (c_parser_omp_variable_list): New c_omp_region_type
argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
OpenACC.
(c_parser_oacc_data_clause): Add region-type argument.
(c_parser_oacc_data_clause_deviceptr): Likewise.
(c_parser_omp_clause_reduction): Change is_omp boolean parameter to
c_omp_region_type.  Update call to c_parser_omp_variable_list.
(c_parser_oacc_all_clauses): Update calls to
c_parser_omp_clause_reduction.
(c_parser_omp_all_clauses): Likewise.
(c_parser_oacc_cache): Update call to c_parser_omp_var_list_parens.
* c-typeck.cc (c_finish_omp_clauses): Emit an error on orphan OpenACC
gang reductions.  Suppress user-defined reduction error for OpenACC.

gcc/cp/
* parser.cc (cp_parser_omp_var_list_no_open):  New c_omp_region_type
argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
OpenACC.
(cp_parser_omp_var_list): Add c_omp_region_type argument. Update call
to cp_parser_omp_var_list_parens.
(cp_parser_oacc_data_clause): Update call to cp_parser_omp_var_list.
(cp_parser_omp_clause_reduction): Change is_omp boolean parameter to
c_omp_region_type.  Update call to cp_parser_omp_var_list_no_open.
(cp_parser_oacc_all_clauses): Update call to
cp_parser_omp_clause_reduction.
(cp_parser_omp_all_clauses): Likewise.
* semantics.cc (finish_omp_reduction_clause): Add c_omp_region_type
argument.  Suppress user-defined reduction error for OpenACC.
(finish_omp_clauses): Emit an error on orphan OpenACC gang reductions.

gcc/fortran/
* openmp.cc (oacc_is_parallel): New.
(resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
gang reductions.
* trans-openmp.cc (gfc_omp_clause_copy_ctor): Permit reductions.

2022-02-03  Kwok Cheung Yeung  <kcy@codesourcery.com>

gcc/c/
* c-parser.cc (c_parser_omp_clause_map): Update call to
c_parser_omp_variable_list.
(c_parser_omp_clause_to): Update call to c_parser_omp_var_list_parens.
(c_parser_omp_clause_from): Likewise.

gcc/cp/
* parser.cc (cp_parser_omp_clause_map): Update call to
cp_parser_omp_var_list_no_open.
(cp_parser_omp_all_clauses): Update calls to cp_parser_omp_var_list.

Add OpenACC Fortran support for deviceptr and variable in common blocks

2018-06-29  Cesar Philippidis  <cesar@codesourcery.com>
    James Norris  <jnorris@codesourcery.com>

gcc/fortran/
* openmp.cc (gfc_match_omp_map_clause): Re-write handling of the
deviceptr clause.  Add new common_blocks argument.  Propagate it to
gfc_match_omp_variable_list.
(gfc_match_omp_clauses): Update calls to gfc_match_omp_map_clauses.
(resolve_positive_int_expr): Promote the warning to an error.
(check_array_not_assumed): Remove pointer check.
(resolve_oacc_nested_loops): Error on do concurrent loops.
* trans-openmp.cc (gfc_omp_finish_clause): Don't create pointer data
mappings for deviceptr clauses.
(gfc_trans_omp_clauses): Likewise.

gcc/
* gimplify.cc (enum gimplify_omp_var_data): Add GOVD_DEVICETPR.
(oacc_default_clause): Privatize fortran common blocks.
(omp_notice_variable): Add GOVD_DEVICEPTR attribute when appropriate.
Defer the expansion of DECL_VALUE_EXPR for common block decls.
(gimplify_scan_omp_clauses): Add GOVD_DEVICEPTR attribute when
appropriate.
(gimplify_adjust_omp_clauses_1): Set GOMP_MAP_FORCE_DEVICEPTR for
implicit deviceptr mappings.

gcc/testsuite/
* c-c++-common/goacc/deviceptr-4.c: Update.
* gfortran.dg/goacc/loop-2-kernels-tile.f95: Update.
* gfortran.dg/goacc/loop-2-parallel-tile.f95: Update.
* gfortran.dg/goacc/sie.f95: Update.
* gfortran.dg/goacc/tile-1.f90: Update.
* gfortran.dg/gomp/pr77516.f90: Update.

libgomp/
* oacc-parallel.c (GOACC_parallel_keyed): Handle Fortran deviceptr
clause.
(GOACC_data_start): Likewise.
* testsuite/libgomp.oacc-fortran/deviceptr-1.f90: New test.

Tweak error return value for acc_set_cuda_stream.

The return value of acc_set_cuda_stream is unspecified in OpenACC 2.6.
The testsuite changes might be unnecessary with the current async code.

libgomp/
* oacc-cuda.c (acc_set_cuda_stream): Return 0 on error/invalid
arguments.
* testsuite/libgomp.oacc-c-c++-common/lib-84.c: Handle unnumbered
async stream being an alias for a numbered async stream.
* testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.

Merge non-contiguous array support patches.

This version is based from v4, posted upstream here:
https://gcc.gnu.org/pipermail/gcc-patches/2020-April/543437.html

2020-04-19 Chung-Lin Tang <cltang@codesourcery.com>

PR other/76739

gcc/c/
* c-typeck.cc (handle_omp_array_sections_1): Add 'bool &non_contiguous'
parameter, adjust recursive call site, add cases for allowing
pointer based multi-dimensional arrays for OpenACC.
(handle_omp_array_sections): Adjust handle_omp_array_sections_1 call,
handle non-contiguous case to create dynamic array map.

gcc/cp/
* semantics.cc (handle_omp_array_sections_1): Add 'bool &non_contiguous'
parameter, adjust recursive call site, add cases for allowing
pointer based multi-dimensional arrays for OpenACC.
(handle_omp_array_sections): Adjust handle_omp_array_sections_1 call,
handle non-contiguous case to create dynamic array map.

gcc/fortran/
* f95-lang.cc (DEF_FUNCTION_TYPE_VAR_5): New symbol.
* types.def (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_VAR): New type.

gcc/
* builtin-types.def (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_VAR): New type.
* omp-builtins.def (BUILT_IN_GOACC_DATA_START): Adjust function type
to new BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_VAR.
* gimplify.cc (gimplify_scan_omp_clauses): Skip gimplification of
OMP_CLAUSE_SIZE of non-contiguous array maps (which is a TREE_LIST).
* omp-expand.cc (expand_omp_target): Add non-contiguous array descriptor
pointers to variadic arguments.
* omp-low.cc (append_field_to_record_type): New function.
(create_noncontig_array_descr_type): Likewise.
(create_noncontig_array_descr_init_code): Likewise.
(scan_sharing_clauses): For non-contiguous array map kinds, check for
supported dimension structure, and install non-contiguous array
variable into current omp_context.
(reorder_noncontig_array_clauses): New function.
(scan_omp_target): Call reorder_noncontig_array_clauses to place
non-contiguous array map clauses at beginning of clause sequence.
(lower_omp_target): Add handling for non-contiguous array map kinds,
add all created non-contiguous array descriptors to
gimple_omp_target_data_arg.

gcc/testsuite/
* c-c++-common/goacc/noncontig_array-1.c: New test.

libgomp/
* libgomp_g.h (GOACC_data_start): Add variadic '...' to declaration.
* libgomp.h (gomp_map_vars_openacc): New function declaration.
* oacc-int.h (struct goacc_ncarray_dim): New struct declaration.
(struct goacc_ncarray_descr_type): Likewise.
(struct goacc_ncarray): Likewise.
(struct goacc_ncarray_info): Likewise.
(goacc_noncontig_array_create_ptrblock): New function declaration.
* oacc-parallel.c (goacc_noncontig_array_count_rows): New function.
(goacc_noncontig_array_compute_sizes): Likewise.
(goacc_noncontig_array_fill_rows_1): Likewise.
(goacc_noncontig_array_fill_rows): Likewise.
(goacc_process_noncontiguous_arrays): Likewise.
(goacc_noncontig_array_create_ptrblock): Likewise.
(GOACC_parallel_keyed): Use goacc_process_noncontiguous_arrays to
handle non-contiguous array descriptors at end of varargs, adjust
to use gomp_map_vars_openacc.
(GOACC_data_start): Likewise. Adjust function type to accept varargs.
* target.c (gomp_map_vars_internal): Add struct goacc_ncarray_info *
nca_info parameter, add handling code for non-contiguous arrays.
(gomp_map_vars_openacc): Add new function for specialization of
gomp_map_vars_internal for OpenACC structured region usage.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-4.c: New test.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-utils.h: Support
header for new tests.

include/
* gomp-constants.h (GOMP_MAP_FLAG_SPECIAL_3): Define.
(enum gomp_map_kind): Add GOMP_MAP_NONCONTIG_ARRAY,
GOMP_MAP_NONCONTIG_ARRAY_TO, GOMP_MAP_NONCONTIG_ARRAY_FROM,
GOMP_MAP_NONCONTIG_ARRAY_TOFROM, GOMP_MAP_NONCONTIG_ARRAY_FORCE_TO,
GOMP_MAP_NONCONTIG_ARRAY_FORCE_FROM, GOMP_MAP_NONCONTIG_ARRAY_FORCE_TOFROM,
GOMP_MAP_NONCONTIG_ARRAY_ALLOC, GOMP_MAP_NONCONTIG_ARRAY_FORCE_ALLOC,
GOMP_MAP_NONCONTIG_ARRAY_FORCE_PRESENT.
(GOMP_MAP_NONCONTIG_ARRAY_P): Define.

Identify OMP development branch in output of 'gcc --version'

2022-06-20 Kwok Cheung Yeung <kcy@codesourcery.com>

gcc/
* Makefile.in (REVISION_s): Change default message.

ipa: Fix double reference-count decrements for the same edge (PR 107769, PR 109318)

It turns out that since addition of the code that can identify globals
which are only read from, the code that keeps track of the references
can decrement their count for the same calls, once during IPA-CP and
then again during inlining.  Fixed by adding a special flag to the
pass-through variant and simply wiping out the reference to the
refdesc structure from the constant ones.

Moreover, during debugging of the issue I have discovered that the
code removing references could remove a reference associated with the
same statement but of a wrong type.  In all cases it wanted to remove
an IPA_REF_ADDR reference so removing a lesser one instead should do
no harm in practice, but we should try to be consistent and so this
patch extends symtab_node::find_reference so that it searches for a
reference of a given type only.

gcc/ChangeLog:

2023-04-14  Martin Jambor  <mjambor@suse.cz>

PR ipa/107769
PR ipa/109318
* cgraph.h (symtab_node::find_reference): Add parameter use_type.
* ipa-prop.h (ipa_pass_through_data): New flag refdesc_decremented.
(ipa_zap_jf_refdesc): New function.
(ipa_get_jf_pass_through_refdesc_decremented): Likewise.
(ipa_set_jf_pass_through_refdesc_decremented): Likewise.
* ipa-cp.cc (ipcp_discover_new_direct_edges): Provide a value for
the new parameter of find_reference.
(adjust_references_in_caller): Likewise. Make sure the constant jump
function is not used to decrement a refdec counter again.  Only
decrement refdesc counters when the pass_through jump function allows
it.  Added a detailed dump when decrementing refdesc counters.
* ipa-prop.cc (ipa_print_node_jump_functions_for_edge): Dump new flag.
(ipa_set_jf_simple_pass_through): Initialize the new flag.
(ipa_set_jf_unary_pass_through): Likewise.
(ipa_set_jf_arith_pass_through): Likewise.
(remove_described_reference): Provide a value for the new parameter of
find_reference.
(update_jump_functions_after_inlining): Zap refdesc of new jfunc if
the previous pass_through had a flag mandating that we do so.
(propagate_controlled_uses): Likewise.  Only decrement refdesc
counters when the pass_through jump function allows it.
(ipa_edge_args_sum_t::duplicate): Provide a value for the new
parameter of find_reference.
(ipa_write_jump_function): Assert the new flag does not have to be
streamed.
* symtab.cc (symtab_node::find_reference): Add parameter use_type, use
it in searching.

gcc/testsuite/ChangeLog:

2023-04-06  Martin Jambor  <mjambor@suse.cz>

PR ipa/107769
PR ipa/109318
* gcc.dg/ipa/pr109318.c: New test.
* gcc.dg/lto/pr107769_0.c: Likewise.

aarch64: disable LDP via tuning structure for -mcpu=ampere1

AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).

Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.

This commit:
* adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
* allows -moverride=tune=... to override this

These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
   503.bwaves_r.      -0.88%
   507.cactuBSSN_r     0.35%
   508.namd_r          3.09%
   510.parest_r       -2.99%
   511.povray_r        5.54%
   519.lbm_r          15.83%
   521.wrf_r           0.56%
   526.blender_r       2.47%
   527.cam4_r          0.70%
   538.imagick_r       0.00%
   544.nab_r          -0.33%
   549.fotonik3d_r.   -0.42%
   554.roms_r          0.00%
   -------------------------
   = total             1.79%

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Co-Authored-By: Di Zhao <di.zhao@amperecomputing.com>
gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.

testsuite: Fix up vect-simd-clone-1[678]f.c tests some more

With
make check-gcc check-g++ -j32 -k RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mavx,-m32/-mavx512f,-m32/-march=cascadelake,-m64,-m64/-mavx,-m64/-mavx512f,-m64/-march=cascadelake\}
+vect.exp=vect-simd-clone*'
the vect-simd-clone-1[678]f.c tests fail with -m32/-mavx512f and -m32/-march=cascadelake,
in that case there are zero matches rather than the 4 expected for ia32.
-m64/-mavx512f and -m64/-march=cascadelake works fine though (2 expected
matches).

So, the following patch just adds -mno-avx512f for x86 non-lp64.

2023-04-17 Jakub Jelinek <jakub@redhat.com>

* gcc.dg/vect/vect-simd-clone-16f.c: Add -mno-avx512f for non-lp64 x86.
* gcc.dg/vect/vect-simd-clone-17f.c: Likewise.
* gcc.dg/vect/vect-simd-clone-18f.c: Likewise.

tree-optimization/109524 - ICE with VRP edge removal

VRP queues edges to process late for updating global ranges for
__builtin_unreachable. But this interferes with edge removal
from substitute_and_fold. The following deals with this by
looking up the edge with source/dest block indices which do not
become stale.

PR tree-optimization/109524
* tree-vrp.cc (remove_unreachable::m_list): Change to a
vector of pairs of block indices.
(remove_unreachable::maybe_register_block): Adjust.
(remove_unreachable::remove_and_update_globals): Likewise.
Deal with removed blocks.

* g++.dg/pr109524.C: New testcase.

testsuite: update builtins-5-p9-runnable.c for BE

Hi,

As PR108809 mentioned, vec_xl_len_r and vec_xst_len_r are tested
in gcc.target/powerpc/builtins-5-p9-runnable.c.
The vector operand of these two bifs are different from the view
of v16_int8 between BE and LE, even it is same from the view of
128bits(uint128/V1TI).

The test case gcc.target/powerpc/builtins-5-p9-runnable.c was
written for LE environment, this patch updates it for BE.

Tested on ppc64 BE and LE.
Is this ok for trunk?

BR,
Jeff (Jiufu)

gcc/testsuite/ChangeLog:

PR testsuite/108809
* gcc.target/powerpc/builtins-5-p9-runnable.c: Update for BE.

RISC-V: Fix testsuite fail on RV32

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/scalar_move-2.c: Adjust include way
for riscv_vector.h
* gcc.target/riscv/rvv/base/spill-sp-adjust.c: Add missing
-mabi.

RISC-V: Add test cases for the RVV mask insn shortcut.

There are sorts of shortcut codegen for the RVV mask insn. For
example.

vmxor vd, va, va => vmclr vd.

We would like to add more optimization like this but first of all
we must add the tests for the existing shortcut optimization, to
ensure we don't break existing optimization from underlying shortcut
optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/mask_insn_shortcut.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

Daily bump.

[committed] [PR target/109508] Adjust conditional move expansion for SFB

Recently the conditional move expander's predicates were loosened for the
benefit of the THEAD processors. In particular one operand that was
previously "register_operand" is now "reg_or_0_operand". That's fine for
THEAD, but breaks for SFB which requires a register for that operand.

This results in an ICE when compiling the testcase an SFB target such as
the sifive s76.

This change adjusts the expansion code slightly to copy the value into
a register for SFB.

Bootstrapped and regression tested (c,c++,fortran only) with a toolchain
configured to enable SFB by default.

PR target/109508
gcc/

* config/riscv/riscv.cc (riscv_expand_conditional_move): For
TARGET_SFB_ALU, force the true arm into a register.

gcc/testsuite
* gcc.target/riscv/pr109508.c: New test.

[Committed] New test case gcc.target/avr/pr54816.c

PR target/54816 is now fixed on mainline.  This adds a test case to
check that it doesn't regress in future.  Tested with a cross compiler
to avr-elf.  Committed as obvious.

2023-04-16  Roger Sayle  <roger@nextmovesoftware.com>

gcc/testsuite/ChangeLog
PR target/54816
* gcc.target/avr/pr54816.c: New test case.

Daily bump.

Fix fallout of previous change on x86/Linux

gcc/ada/
PR bootstrap/109510
* gcc-interface/decl.cc (gnat_to_gnu_entity) <types>: Do not reset
align to zero in any case. Set TYPE_USER_ALIGN on the type only if
it is an aggregate type, or else a type whose default alignment is
specifically capped on selected platforms.

Fix handling of large arguments passed by value.

2023-04-15  John David Anglin  <danglin@gcc.gnu.org>

gcc/ChangeLog:

PR target/104989
* config/pa/pa-protos.h (pa_function_arg_size): Update prototype.
* config/pa/pa.cc (pa_function_arg): Return NULL_RTX if argument
size is zero.
(pa_arg_partial_bytes): Don't call pa_function_arg_size twice.
(pa_function_arg_size): Change return type to int.  Return zero
for arguments larger than 1 GB.  Update comments.

c++: constexpr aggregate destruction [PR109357]

We were assuming that the result of evaluation of TARGET_EXPR_INITIAL would
always be the new value of the temporary, but that's not necessarily true
when the initializer is complex (i.e. target_expr_needs_replace). In that
case evaluating the initializer initializes the temporary as a side-effect.

PR c++/109357

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_constant_expression) [TARGET_EXPR]:
Check for complex initializer.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-dtor15.C: New test.

if-conv: Small improvement for expansion of complex PHIs [PR109154]

The following patch is just a dumb improvement, gets rid of 2 unnecessary
instructions on both the PR's original testcase and on the two reduced ones,
both on -mcpu=neoverse-v1 and -mavx512f.

The thing is, if we have args_len (args_len >= 2) unique PHI arguments,
we need only args_len - 1 COND_EXPRs to expand the PHI, because first
COND_EXPR can merge 2 unique arguments and all the following ones merge
another unique argument with the previously merged arguments,
while the code for mysterious reasons was always emitting args_len
COND_EXPRs, where the first COND_EXPR merged the first and second unique
arguments, the second COND_EXPR merged the second unique argument with
result of merging the first and second unique arguments and the rest was
already expectable, nth COND_EXPR for n > 2 merged the nth unique argument
with result of merging the previous unique arguments.
Now, in my understanding, the bb_predicate for bb's predecessor need to
form a disjunct set which together creates the successor's bb_predicate,
so I don't see why we'd need to check all the bb_predicates, if we check
all but one then when all those other ones are false the last bb_predicate
is necessarily true.  Given that the code attempts to sort argument with
most occurrences (so likely most complex combined predicate) last, I chose
not to test that last argument's predicate.
So e.g. on the testcase from comment 47 in the PR:
void
foo (int *f, int d, int e)
{
  for (int i = 0; i < 1024; i++)
    {
      int a = f[i];
      int t;
      if (a < 0)
        t = 1;
      else if (a < e)
        t = 1 - a * d;
      else
        t = 0;
      f[i] = t;
    }
}
we used to emit:
  _7 = a_10 < 0;
  _21 = a_10 >= 0;
  _22 = a_10 < e_11(D);
  _23 = _21 & _22;
  _26 = a_10 >= e_11(D);
  _27 = _21 & _26;
  _ifc__42 = _7 ? 1 : t_13;
  _ifc__43 = _23 ? t_13 : _ifc__42;
  t_6 = _27 ? 0 : _ifc__43;
while the following patch changes it to:
  _7 = a_10 < 0;
  _21 = a_10 >= 0;
  _22 = a_10 < e_11(D);
  _23 = _21 & _22;
  _ifc__42 = _23 ? t_13 : 0;
  t_6 = _7 ? 1 : _ifc__42;
which I believe should be sufficient for a PHI <1, t_13, 0>.

I've gathered some statistics and on x86_64-linux and i686-linux
bootstraps/regtests, this code triggers:
     92 4 4
    112 2 4
    141 3 4
   4046 3 3
(where 2nd number is args_len and 3rd argument EDGE_COUNT (bb->preds)
and first argument count of those from sort | uniq -c | sort -n).
In all these cases the patch should squeze one extra COND_EXPR and
its associated predicate (the latter only if it wasn't used elsewhere).

Incrementally, I think we should try to perform some analysis on which
predicates depend on inverses of other predicates and if possible try
to sort the arguments better and omit testing unnecessary predicates.
So essentially for the above testcase deconstruct it back to:
  _7 = a_10 < 0;
  _22 = a_10 < e_11(D);
  _ifc__42 = _22 ? t_13 : 0;
  t_6 = _7 ? 1 : _ifc__42;
which is like what this patch produces, but with the & a_10 >= 0 part
removed, because the last predicate is a_10 < 0 and so testing a_10 >= 0
on what appears on the false branch doesn't make sense.
But I'm afraid that will take more work than is doable in stage4 right now.

2023-04-15  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109154
* tree-if-conv.cc (predicate_scalar_phi): For complex PHIs, emit just
args_len - 1 COND_EXPRs rather than args_len.  Formatting fix.

rs6000: don't expect __ibm128 with 64-bit long double [PR99708]

When long double is 64-bit wide, as on vxworks, the rs6000 backend
defines neither the __ibm128 type nor the __SIZEOF_IBM128__ macro, but
pr99708.c expected both to be always defined. Adjust the test to
match the implementation.

Co-Authored-By: Kewen Lin <linkw@linux.ibm.com>
for gcc/testsuite/ChangeLog

PR target/99708
* gcc.target/powerpc/pr99708.c: Accept lack of
__SIZEOF_IBM128__ when long double is 64-bit wide.

-Wdangling-pointer: fix MEM_REF handling [PR109514]

Here we hit the MEM_REF case, with its arg an ADDR_EXPR, but had no handling
for that and wrongly assumed it would be a reference to a local variable.
This patch overhauls the logic for deciding whether the target is something
to warn about so that we only warn if we specifically recognize the target
as non-local. None of the existing tests regress as a result.

PR c++/109514

gcc/ChangeLog:

* gimple-ssa-warn-access.cc (pass_waccess::check_dangling_stores):
Overhaul lhs_ref.ref analysis.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wdangling-pointer-6.C: New test.