Julian Brown [Tue, 26 Feb 2019 23:48:00 +0000 (15:48 -0800)]
Fortran "declare create"/allocate support for OpenACC
2018-10-04 Cesar Philippidis <cesar@codesourcery.com>
Julian Brown <julian@codesourcery.com>
gcc/
* omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
create, declare copyin and declare deviceptr to have local lifetimes.
(convert_to_firstprivate_int): Handle pointer types.
(convert_from_firstprivate_int): Likewise. Create local storage for
the values being pointed to. Add new orig_type argument.
(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
Add orig_type argument to convert_from_firstprivate_int call.
Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT. Don't privatize
firstprivate VLAs.
* tree-pretty-print.c (dump_omp_clause): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
gcc/fortran/
* gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE,
OMP_MAP_DECLARE_DEALLOCATE.
(gfc_omp_clauses): Add update_allocatable.
* trans-array.c (gfc_array_allocate): Call
gfc_trans_oacc_declare_allocate for decls that have oacc_declare_create
attribute set.
* trans-decl.c (find_module_oacc_declare_clauses): Relax
oacc_declare_create to OMP_MAP_ALLOC, and oacc_declare_copyin to
OMP_MAP_TO, in order to match OpenACC 2.5 semantics.
* trans-openmp.c (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER
(for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for
allocatable scalar decls. Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}
clauses.
(gfc_trans_oacc_executable_directive): Use GOMP_MAP_ALWAYS_POINTER
for allocatable scalar data clauses inside acc update directives.
(gfc_trans_oacc_declare_allocate): New function.
* trans-stmt.c (gfc_trans_allocate): Call
gfc_trans_oacc_declare_allocate for decls with oacc_declare_create
attribute set.
(gfc_trans_deallocate): Likewise.
* trans.h (gfc_trans_oacc_declare_allocate): Declare.
gcc/testsuite/
* gfortran.dg/goacc/declare-allocatable-1.f90: New test.
include/
* gomp-constants.h (enum gomp_map_kind): Define
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and GOMP_MAP_FLAG_SPECIAL_4.
libgomp/
* oacc-mem.c (gomp_acc_declare_allocate): New function.
* oacc-parallel.c (GOACC_enter_exit_data): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
* testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90: New test.
Julian Brown [Mon, 14 Oct 2019 20:12:39 +0000 (13:12 -0700)]
Re-do OpenACC private variable resolution
gcc/
* config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename
to...
(gcn_goacc_adjust_private_decl): ...this.
* config/gcn/gcn-tree.c (diagnostic-core.h): Include.
(gcn_goacc_adjust_gangprivate_decl): Rename to...
(gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
* config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
* config/nvptx/nvptx.c (tree-pretty-print.h): Include.
(nvptx_goacc_adjust_private_decl): New function.
(TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook using above function.
* doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
* doc/tm.texi: Regenerated.
* internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
* internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
* omp-low.c (omp_context): Remove oacc_partitioning_levels field.
(lower_oacc_reductions): Add PRIVATE_MARKER parameter. Insert before
fork.
(lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify its
gimple call arguments as appropriate. Don't set
oacc_partitioning_levels in omp_context. Pass private_marker to
lower_oacc_reductions.
(oacc_record_private_var_clauses): Don't check for NULL ctx.
(mark_oacc_gangprivate): Remove unused function.
(make_oacc_private_marker): New function.
(lower_omp_for): Only call oacc_record_vars_in_bind for
OpenACC contexts. Create private marker and pass to
lower_oacc_head_tail.
(lower_omp_target): Remove unnecessary call to
oacc_record_private_var_clauses. Remove call to mark_oacc_gangprivate.
Create private marker and pass to lower_oacc_reductions.
(process_oacc_gangprivate_1): Remove.
(lower_omp_1): Only call oacc_record_vars_in_bind for OpenACC. Don't
iterate over contexts calling process_oacc_gangprivate_1.
(omp-offload.c (oacc_loop_xform_head_tail): Treat
private-variable markers like fork/join when transforming head/tail
sequences.
(execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE instead of
"oacc gangprivate" attributes to determine partitioning level of
variables. Remove unused variables.
* omp-sese.c (find_gangprivate_vars): New function.
(find_local_vars_to_propagate): Use GANGPRIVATE_VARS parameter instead
of "oacc gangprivate" attribute to determine which variables are
gang-private.
(oacc_do_neutering): Use find_gangprivate_vars.
* target.def (adjust_gangprivate_decl): Rename to...
(adjust_private_decl): ...this. Update documentation (briefly).
libgomp/
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Use
oaccdevlow dump and update scanned output.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: Likewise.
Add missing atomic to force worker partitioning for test variable.
Julian Brown [Fri, 20 Sep 2019 20:53:10 +0000 (13:53 -0700)]
Handle references in OpenACC "private" clauses
gcc/
* gimplify.c (localize_reductions): Rewrite references for
OMP_CLAUSE_PRIVATE also.
libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.
* gfortran.dg/goacc/classify-kernels-unparallelized.f95: Add
one dg-message for additional -fopt-info-optimized-omp output.
* gfortran.dg/goacc/classify-kernels.f95: Likewise.
* gfortran.dg/goacc/kernels-decompose-1.f95: Change 'note' to
'optimized' in dg-message.
Julian Brown [Fri, 9 Aug 2019 20:01:33 +0000 (13:01 -0700)]
Wait at end of OpenACC asynchronous kernels regions
gcc/
* omp-oacc-kernels.c (add_wait): New function, split out of...
(add_async_clauses_and_wait): ...here. Call new outlined function.
(decompose_kernels_region_body): Add wait at the end of
explicitly-asynchronous kernels regions.
Julian Brown [Fri, 5 Jul 2019 01:14:41 +0000 (18:14 -0700)]
Assumed-size arrays with non-lexical data mappings
gcc/
* gimplify.c (gimplify_adjust_omp_clauses_1): Raise error for
assumed-size arrays in map clauses for Fortran/OpenMP.
* omp-low.c (lower_omp_target): Set the size of assumed-size Fortran
arrays to one to allow use of data already mapped on the offload device.
gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Change clauses mapping
assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type.
Julian Brown [Tue, 28 May 2019 15:42:10 +0000 (08:42 -0700)]
Apply gangprivate attribute to innermost decl
...and fix parallelism-level calculation when applying the attribute.
gcc/
* omp-low.c (mark_oacc_gangprivate): Add CTX parameter. Use to look up
correct decl to add attribute to.
(lower_omp_for): Move "oacc gangprivate" processing from here...
(process_oacc_gangprivate_1): ...to here. New function.
(lower_omp_target): Update call to mark_oacc_gangprivate.
(execute_lower_omp): Call process_oacc_gangprivate_1 for each OMP
context.
libgomp/
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.
Julian Brown [Sun, 19 May 2019 17:42:20 +0000 (10:42 -0700)]
Fix references declared in lexically-enclosing OpenACC data region
gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Guard addition of clauses for
pointers with DECL_P.
gcc/
* gimplify.c (oacc_array_mapping_info): Add REF field.
(gimplify_scan_omp_clauses): Initialise above field for data blocks
passed by reference.
(gomp_oacc_needs_data_present): Handle references.
(gimplify_adjust_omp_clauses_1): Handle references and optional
arguments for variables declared in lexically-enclosing OpenACC data
region.
Julian Brown [Thu, 16 May 2019 12:45:35 +0000 (05:45 -0700)]
Avoid introducing 'create' mapping clauses for loop index variables in kernels regions
gcc/
* omp-oacc-kernels.c (find_omp_for_index_vars_1,
find_omp_for_index_vars): New functions.
(maybe_build_inner_data_region): Add IDX_VARS argument. Don't add
CREATE mapping clauses for loop index variables. Set TREE_ADDRESSABLE
flag on newly-mapped declarations as a side effect.
(decompose_kernels_region_body): Call find_omp_for_index_vars. Don't
create PRESENT clause for loop index variables. Pass index variable
set to maybe_build_inner_data_region.
Julian Brown [Wed, 9 Jan 2019 11:41:04 +0000 (03:41 -0800)]
Update OpenACC version to 2.6
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Update _OPENACC define to 201711.
gcc/doc/
* invoke.texi: Update mention of OpenACC version to 2.6.
gcc/fortran/
* cpp.c (cpp_define_builtins): Update _OPENACC define to 201711.
* gfortran.texi: Update mentions of OpenACC version to 2.6.
* intrinsic.texi: Likewise.
gcc/testsuite/
* c-c++-common/cpp/openacc-define-3.c: Update expected value for
_OPENACC define.
* gfortran.dg/openacc-define-3.f90: Likewise.
libgomp/
* libgomp.texi: Update mentions of OpenACC version to 2.6. Update
section numbers to match version 2.6 of the spec.
* openacc.f90 (openacc_version): Update to 201711.
* openacc_lib.h (openacc_version): Update to 201711.
* testsuite/libgomp.oacc-fortran/openacc_version-1.f: Update expected
openacc_version to 201711.
* testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Likewise.
Thomas Schwinge [Fri, 1 Feb 2019 17:12:05 +0000 (18:12 +0100)]
Adjust parallelism of loops in gang-single parts of OpenACC kernels regions: "struct adjust_nested_loop_clauses_wi_info"
The current code apparently is too freaky at least for for GCC 4.6:
[...]/gcc/omp-oacc-kernels.c: In function 'tree_node* transform_kernels_loop_clauses(gimple*, tree, tree, tree, tree)':
[...]/gcc/omp-oacc-kernels.c:584:10: error: expected identifier before numeric constant
[...]/gcc/omp-oacc-kernels.c: In lambda function:
[...]/gcc/omp-oacc-kernels.c:584:25: error: expected '{' before '=' token
[...]/gcc/omp-oacc-kernels.c: In function 'tree_node* transform_kernels_loop_clauses(gimple*, tree, tree, tree, tree)':
[...]/gcc/omp-oacc-kernels.c:584:25: warning: lambda expressions only available with -std=c++0x or -std=gnu++0x [enabled by default]
[...]/gcc/omp-oacc-kernels.c:584:28: error: no match for 'operator=' in '{} = & loop_gang_clause'
[...]
gcc/
* omp-oacc-kernels.c (struct adjust_nested_loop_clauses_wi_info): New.
(adjust_nested_loop_clauses, transform_kernels_loop_clauses): Use it.
Thomas Schwinge [Thu, 24 Jan 2019 16:40:03 +0000 (08:40 -0800)]
New OpenACC kernels region decompose algorithm
Previously, OpenACC kernels region bodies were decomposed into a sequence of
alternating gang-single and gang-parallel "parallel" regions. The new
algorithm in this patch introduces a third possibility: Loops that look like
they might benefit from the parloops pass are converted into old "kernels"
regions, exposing them to the parloops pass later on. This has the benefit
that loops that cannot be parallelized are not offloaded to the GPU.
gcc/
* omp-oacc-kernels.c (adjust_region_code_walk_stmt_fn)
(adjust_region_code): New functions.
(make_loops_gang_single): Update.
(make_gang_single_region): Rename to...
(make_region_seq): ... this, and update.
(make_gang_parallel_loop_region): Rename to...
(make_region_loop_nest): ... this, and update.
(is_unconditional_oacc_for_loop): Remove stmt parameter and check.
(decompose_kernels_region_body): Update.
gcc/testsuite/
* c-c++-common/goacc/kernels-conversion.c: Adjust test.
* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
* c-c++-common/goacc/kernels-decompose-1.c: New file.
* gfortran.dg/goacc/kernels-decompose-1.f95: Likewise.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/kernels-decompose-1.c: New
file.
Gergö Barany [Mon, 21 Jan 2019 20:50:14 +0000 (12:50 -0800)]
Launch kernels asynchronously in OpenACC kernels regions
Kernels regions are decomposed into one or more smaller regions that are to
be executed in sequence. With this patch, all of these regions are launched
asynchronously, and a wait directive is added after them. This means that
the host only waits once for the kernels to complete, not once per kernel.
If the original kernels region was marked async, that asynchronous behavior
is preserved, and no wait is added.
gcc/
* omp-oacc-kernels.c (add_async_clauses_and_wait): New function...
(decompose_kernels_region_body): ... called from here.
Gergö Barany [Thu, 24 Jan 2019 06:11:11 +0000 (22:11 -0800)]
Adjust parallelism of loops in gang-single parts of OpenACC kernels regions
Loops in gang-single parts of kernels regions cannot be executed in
gang-redundant mode. If the user specified gang clauses on such loops, emit
an error and remove these clauses. Adjust automatic partitioning to exclude
gang partitioning in gang-single regions.
gcc/
* omp-oacc-kernels.c (add_parent_or_loop_num_clause): New function.
(adjust_nested_loop_clauses): Likewise.
(transform_kernels_loop_clauses, make_gang_parallel_loop_region):
Add worker and vector clause parameters, emit error on illegal
nesting.
(visit_loops_in_gang_single_region): Emit warning on conditionally
executed code with a gang clause.
(make_loops_gang_single): New function.
(decompose_kernels_region_body): Separate out gang/worker/vector clauses
for separate handling; add call to make_loops_gang_single.
* omp-offload.c (oacc_loop_auto_partitions): Add and propagate
is_oacc_gang_single parameter.
(oacc_loop_partition): Likewise.
(execute_oacc_device_lower): Adjust call to oacc_loop_partition.
Gergö Barany [Wed, 23 Jan 2019 22:32:57 +0000 (14:32 -0800)]
Handle conditional execution of loops in OpenACC kernels regions
Any OpenACC loop controlled by an if statement or a non-OpenACC loop must be
executed in a gang-single region. Detecting such loops is not trivial as
OpenACC kernels expansion is done on GIMPLE but before computation of the
control flow graph. This patch adds an auxiliary analysis for determining
whether a statement is inside a conditionally executed region (relative to
the kernels region's entry).
gcc/
* omp-oacc-kernels.c (control_flow_regions): New class.
(control_flow_regions::control_flow_regions): New constructor.
(control_flow_regions::is_unconditional_oacc_for_loop): New method.
(control_flow_regions::find_rep): Likewise.
(control_flow_regions::union_reps): Likewise.
(control_flow_regions::compute_regions): Likewise.
(decompose_kernels_region_body): Use test for conditional execution.
gcc/testsuite/
* c-c++-common/goacc/kernels-conversion.c: Add test for conditionally
executed code.
* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
Gergö Barany [Mon, 21 Jan 2019 15:16:06 +0000 (07:16 -0800)]
Turn OpenACC kernels regions into a sequence of parallel regions
This patch decomposes each OpenACC kernels region into a sequence of
parallel regions. Each OpenACC loop nest turns into its own region; any code
between such loop nests is gathered up into a region as well. The loop
regions can be distributed across gangs if the original kernels region had a
num_gangs clause, while the other regions are executed in "gang-single"
mode. The implied default "auto" clause on kernels loops is made explicit
unless there is a conflicting clause.
gcc/
* omp-oacc-kernels.c (top_level_omp_for_in_stmt): New function.
(make_gang_single_region): Likewise.
(transform_kernels_loop_clauses, make_gang_parallel_loop_region):
Likewise.
(flatten_binds): Likewise.
(make_data_region_try_statement): Likewise.
(maybe_build_inner_data_region): Likewise.
(decompose_kernels_region_body): Likewise.
(transform_kernels_region): Delegate to decompose_kernels_region_body
and make_data_region_try_statement.
gcc/testsuite/
* c-c++-common/goacc/kernels-conversion.c: Test for a gang-single
region.
* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
Gergö Barany [Mon, 21 Jan 2019 13:28:20 +0000 (05:28 -0800)]
Separate OpenACC kernels regions in data and parallel parts
This is the first in a series of patches that completely rework the handling
of the OpenACC "kernels" directive. In the future, kernels regions will be
transformed into data regions containing a sequence of serial and parallel
offloaded regions. This first patch sets up a new pass that is responsible
for this transformation, and in a first step constructs the new data region
containing a parallel region with the original kernels region's body.
gcc/
* Makefile.in: Add...
* omp-oacc-kernels.c: ... this new file for the kernels conversion
pass.
* flag-types.h (enum openacc_kernels): Add "split" style. Adjust
all users.
* doc/invoke.texi (-fopenacc-kernels): Update.
* passes.def: Add pass_convert_oacc_kernels to pipeline.
* tree-pass.h (make_pass_convert_oacc_kernels): Add declaration.
gcc/testsuite/
* c-c++-common/goacc/kernels-conversion.c: New test.
* gfortran.dg/goacc/kernels-conversion.f95: Likewise.
* c-c++-common/goacc/if-clause-2.c: Update.
* gfortran.dg/goacc/kernels-tree.f95: Likewise.
Thomas Schwinge [Wed, 23 Jan 2019 14:56:52 +0000 (06:56 -0800)]
Add OpenACC target kinds for decomposed kernels regions
This patch is in preparation for changes that will cut up OpenACC kernels
regions into individual parts. For the new sub-regions that will be
generated, this adds the following new kinds of OpenACC regions for internal
use:
- GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED for parts of kernels
regions to be executed in gang-redundant mode
- GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE for parts of kernels
regions to be executed in gang-single mode
- GF_OMP_TARGET_KIND_OACC_DATA_KERNELS for data regions generated around the
body of a kernels region
gcc/
* gimple.h (enum gf_mask): Add new target kinds
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_PARALLELIZED,
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GANG_SINGLE, and
GF_OMP_TARGET_KIND_OACC_DATA_KERNELS.
(is_gimple_omp_oacc): Handle new target kinds.
(is_gimple_omp_offloaded): Likewise.
* gimple-pretty-print.c (dump_gimple_omp_target): Likewise.
* omp-expand.c (expand_omp_target): Likewise.
(build_omp_regions_1): Likewise.
(omp_make_gimple_edges): Likewise.
* omp-low.c (is_oacc_parallel_or_serial): Likewise.
(was_originally_oacc_kernels): New function.
(scan_omp_for): Update check for illegal nesting.
(check_omp_nesting_restrictions): Handle new target kinds.
(lower_oacc_reductions): Likewise.
(lower_omp_target): Likewise.
* omp-offload.c (execute_oacc_device_lower): Likewise.
Disable AC_PROG_CXX and consequently a libstdc++ dependency for libffi,
introduced with upstream libffi commit 7d698125b1f0 ("Use the proper C++
compiler to run C++ tests"). This is only needed for the libffi test
suite, which we don't have to support in the GCC tree, as libffi is
maintained as a separate project. The dependency causes a build failure
with the `powerpc64le-linux-gnu' target due to a circular dependency:
due to a libgomp dependency for libstdc++ and then a libffi dependency
for libgomp, introduced with commit 998eb38b265d ("Use functional
parameters for data mappings in OpenACC child functions").
Julian Brown [Thu, 21 Mar 2019 22:09:24 +0000 (15:09 -0700)]
Add support for gang local storage allocation in shared memory
2018-12-11 Julian Brown <julian@codesourcery.com>
Chung-Lin Tang <cltang@codesourcery.com>
gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): New function.
(TARGET_SET_CURRENT_FUNCTION): Define hook.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap decls marked with the
"oacc gangprivate" attribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and
oacc_addressable_var_decls fields.
(new_omp_context): Initialize oacc_addressable_var_decls in new
omp_context.
(delete_omp_context): Delete oacc_addressable_var_decls in old
omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
(mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses. Call mark_oacc_gangprivate for gang-partitioned loops.
(lower_omp_target): Call oacc_record_private_var_clauses with "target"
clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
* target.def (expand_accel_var): New hook.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.kk
Julian Brown [Wed, 27 Feb 2019 00:00:54 +0000 (16:00 -0800)]
Fix hang when running oacc exec with CUDA 9.0 nvprof
2018-09-20 Tom de Vries <tdevries@suse.de>
Cesar Philippidis <cesar@codesourcery.com>
libgomp/
* oacc-init.c (acc_init_state_lock, acc_init_state, acc_init_thread):
New variable.
(acc_init_1): Set acc_init_thread to pthread_self (). Set
acc_init_state to initializing at the start, and to initialized at the
end.
(self_initializing_p): New function.
(acc_get_device_type): Return acc_device_none if called by thread that
is currently executing acc_init_1.
gcc/testsuite/
* c-c++-common/goacc/reduction-8.c: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/privatize-reduction-1.c: New
test.
* testsuite/libgomp.oacc-c-c++-common/privatize-reduction-2.c: New
test.
Julian Brown [Tue, 26 Feb 2019 23:55:23 +0000 (15:55 -0800)]
Don't mark OpenACC auto loops as independent inside acc parallel regions
2018-09-20 Cesar Philippidis <cesar@codesourcery.com>
gcc/
* omp-low.c (lower_oacc_head_mark): Don't mark OpenACC auto
loops as independent inside acc parallel regions.
gcc/testsuite/
* c-c++-common/goacc/loop-auto-1.c: Adjust test case to conform to
the new behavior of the auto clause in OpenACC 2.5.
* c-c++-common/goacc/loop-auto-2.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* c-c++-common/goacc/loop-auto-3.c: New test.
* gfortran.dg/goacc/loop-auto-1.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust test case
to conform to the new behavior of the auto clause in OpenACC 2.5.
Julian Brown [Tue, 26 Feb 2019 23:10:21 +0000 (15:10 -0800)]
Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC
2018-12-22 Cesar Philippidis <cesar@codesourcery.com>
Julian Brown <julian@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
gcc/
* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
(convert_to_firstprivate_int): New function.
(convert_from_firstprivate_int): New function.
(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.
Remove unused variable.
libgomp/
* oacc-parallel.c (GOACC_parallel_keyed): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* plugin/plugin-nvptx.c (nvptx_exec): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New
test.
* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.
Julian Brown [Tue, 26 Feb 2019 22:22:41 +0000 (14:22 -0800)]
Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)
2018-08-28 Julian Brown <julian@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>
gcc/
* gimplify.c (oacc_array_mapping_info): New struct.
(gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Zero-initialise above.
(delete_omp_context): Delete above if allocated.
(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
and record in above map.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
slices) declared in lexically-enclosing data constructs.
* omp-low.c (lower_omp_target): Allow decl for bias not to be present
in OpenACC context.
gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: New test.
* gfortran.dg/goacc/pr70828.f90: New test.
* gfortran.dg/goacc/pr70828-2.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
* testsuite/libgomp.oacc-fortran/implicit_copy.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: New test.
Julian Brown [Tue, 26 Feb 2019 22:12:06 +0000 (14:12 -0800)]
Default compute dimensions (compile time)
Typo fix relative to last posted version.
2018-10-05 Nathan Sidwell <nathan@acm.org>
Tom de Vries <tdevries@suse.de>
Thomas Schwinge <thomas@codesourcery.com>
Julian Brown <julian@codesourcery.com>
Julian Brown [Tue, 26 Feb 2019 21:39:03 +0000 (13:39 -0800)]
Generate sequential loop for OpenACC loop directive inside kernels
2019-09-20 Chung-Lin Tang <cltang@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>
gcc/
* omp-expand.c (struct omp_region): Add inside_kernels_p field.
(expand_omp_for_generic): Adjust to generate a 'sequential' loop
when GOMP builtin arguments are BUILT_IN_NONE.
(expand_omp_for): Use expand_omp_for_generic to generate a
non-parallelized loop for OMP_FORs inside OpenACC kernels regions.
(expand_omp): Mark inside_kernels_p field true for regions
nested inside OpenACC kernels constructs.
gcc/testsuite/
* c-c++-common/goacc/kernels-loop-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-2-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-n-acc-loop.c: New test.
* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: New test.
Julian Brown [Wed, 6 Mar 2019 22:44:56 +0000 (14:44 -0800)]
Reinstate kernels-restrict behaviour
This patch contains a small fix for upstream churn relative to the last version
posted.
2018-09-20 Cesar Philippidis <cesar@codesourcery.com>
Julian Brown <julian@codesourcery.com>
* omp-low.c (install_var_field): New base_pointer_restrict
argument.
(scan_sharing_clauses): Update call to install_var_field.
(omp_target_base_pointers_restrict_p): New function.
(scan_omp_target): Update call to install_var_field.
Julian Brown [Tue, 12 Feb 2019 23:14:22 +0000 (15:14 -0800)]
Various OpenACC reduction enhancements - test cases
2018-12-13 Cesar Philippidis <cesar@codesourcery.com>
Nathan Sidwell <nathan@acm.org>
Julian Brown <julian@codesourcery.com>
gcc/testsuite/
* c-c++-common/goacc/orphan-reductions-1.c: New test.
* c-c++-common/goacc/reduction-7.c: New test.
* c-c++-common/goacc/routine-4.c: Update.
* g++.dg/goacc/reductions-1.C: New test.
* gcc.dg/goacc/loop-processing-1.c: Update.
* gfortran.dg/goacc/orphan-reductions-1.f90: New test.
libgomp/
* libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
* libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
* libgomp.oacc-fortran/reduction-9.f90: New test.
Julian Brown [Tue, 12 Feb 2019 23:06:55 +0000 (15:06 -0800)]
Various OpenACC reduction enhancements - ME and nvptx changes
Parts of the first posting got lost in the second posting, above.
This version hopefully contains everything.
2018-10-30 Cesar Philippidis <cesar@codesourcery.com>
gcc/
* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
(nvptx_split_blocks): Call it for cond_uni insn.
(nvptx_expand_cond_uni): New.
(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
(nvptx_init_builtins): Initialize it.
(nvptx_expand_builtin):
(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
tree BITS operand.
(nvptx_vector_reduction): New.
(nvptx_adjust_reduction_type): New.
(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
Use it to adjust the type of ref_to_res.
(nvptx_goacc_reduction_teardown):
* config/nvptx/nvptx.md (cond_uni): New pattern.
Julian Brown [Tue, 12 Feb 2019 22:56:12 +0000 (14:56 -0800)]
Various OpenACC reduction enhancements - FE changes
This version differs somewhat from the last version posted upstream
(and addresses some of Jakub's review comments).
2018-12-13 Cesar Philippidis <cesar@codesourcery.com>
Nathan Sidwell <nathan@acm.org>
Julian Brown <julian@codesourcery.com>
gcc/c/
* c-parser.c (c_parser_omp_variable_list): New c_omp_region_type
argument. Use it to specialize handling of OMP_CLAUSE_REDUCTION for
OpenACC.
(c_parser_oacc_data_clause): Add region-type argument.
(c_parser_oacc_data_clause_deviceptr): Likewise.
(c_parser_omp_clause_reduction): Change is_omp boolean parameter to
c_omp_region_type. Update call to c_parser_omp_variable_list.
(c_parser_oacc_all_clauses): Update calls to
c_parser_omp_clause_reduction.
(c_parser_omp_all_clauses): Likewise.
(c_parser_oacc_cache): Update call to c_parser_omp_var_list_parens.
* c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC
gang reductions. Suppress user-defined reduction error for OpenACC.
gcc/cp/
* parser.c (cp_parser_omp_var_list_no_open): New c_omp_region_type
argument. Use it to specialize handling of OMP_CLAUSE_REDUCTION for
OpenACC.
(cp_parser_omp_var_list): Add c_omp_region_type argument. Update call
to cp_parser_omp_var_list_parens.
(cp_parser_oacc_data_clause): Update call to cp_parser_omp_var_list.
(cp_parser_omp_clause_reduction): Change is_omp boolean parameter to
c_omp_region_type. Update call to cp_parser_omp_var_list_no_open.
(cp_parser_oacc_all_clauses): Update call to
cp_parser_omp_clause_reduction.
(cp_parser_omp_all_clauses): Likewise.
* semantics.c (finish_omp_reduction_clause): Add c_omp_region_type
argument. Suppress user-defined reduction error for OpenACC.
(finish_omp_clauses): Emit an error on orphan OpenACC gang reductions.
gcc/fortran/
* openmp.c (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
gang reductions.
* trans-openmp.c (gfc_omp_clause_copy_ctor): Permit reductions.
Julian Brown [Tue, 12 Feb 2019 22:32:34 +0000 (14:32 -0800)]
Add OpenACC Fortran support for deviceptr and variable in common blocks
2018-06-29 Cesar Philippidis <cesar@codesourcery.com>
James Norris <jnorris@codesourcery.com>
gcc/fortran/
* openmp.c (gfc_match_omp_map_clause): Re-write handling of the
deviceptr clause. Add new common_blocks argument. Propagate it to
gfc_match_omp_variable_list.
(gfc_match_omp_clauses): Update calls to gfc_match_omp_map_clauses.
(resolve_positive_int_expr): Promote the warning to an error.
(check_array_not_assumed): Remove pointer check.
(resolve_oacc_nested_loops): Error on do concurrent loops.
* trans-openmp.c (gfc_omp_finish_clause): Don't create pointer data
mappings for deviceptr clauses.
(gfc_trans_omp_clauses): Likewise.
gcc/
* gimplify.c (enum gimplify_omp_var_data): Add GOVD_DEVICETPR.
(oacc_default_clause): Privatize fortran common blocks.
(omp_notice_variable): Add GOVD_DEVICEPTR attribute when appropriate.
Defer the expansion of DECL_VALUE_EXPR for common block decls.
(gimplify_scan_omp_clauses): Add GOVD_DEVICEPTR attribute when
appropriate.
(gimplify_adjust_omp_clauses_1): Set GOMP_MAP_FORCE_DEVICEPTR for
implicit deviceptr mappings.
Julian Brown [Tue, 12 Feb 2019 14:36:03 +0000 (06:36 -0800)]
Tweak error return value for acc_set_cuda_stream.
The return value of acc_set_cuda_stream is unspecified in OpenACC 2.6.
The testsuite changes might be unnecessary with the current async code.
libgomp/
* oacc-cuda.c (acc_set_cuda_stream): Return 0 on error/invalid
arguments.
* testsuite/libgomp.oacc-c-c++-common/lib-84.c: Handle unnumbered
async stream being an alias for a numbered async stream.
* testsuite/libgomp.oacc-c-c++-common/lib-85.c: Likewise.
gcc/c/
* c-typeck.c (handle_omp_array_sections_1): Add 'bool &non_contiguous'
parameter, adjust recursive call site, add cases for allowing
pointer based multi-dimensional arrays for OpenACC.
(handle_omp_array_sections): Adjust handle_omp_array_sections_1 call,
handle non-contiguous case to create dynamic array map.
gcc/cp/
* semantics.c (handle_omp_array_sections_1): Add 'bool &non_contiguous'
parameter, adjust recursive call site, add cases for allowing
pointer based multi-dimensional arrays for OpenACC.
(handle_omp_array_sections): Adjust handle_omp_array_sections_1 call,
handle non-contiguous case to create dynamic array map.
gcc/fortran/
* f95-lang.c (DEF_FUNCTION_TYPE_VAR_5): New symbol.
* types.def (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_VAR): New type.
gcc/
* builtin-types.def (BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_VAR): New type.
* omp-builtins.def (BUILT_IN_GOACC_DATA_START): Adjust function type
to new BT_FN_VOID_INT_SIZE_PTR_PTR_PTR_VAR.
* gimplify.c (gimplify_scan_omp_clauses): Skip gimplification of
OMP_CLAUSE_SIZE of non-contiguous array maps (which is a TREE_LIST).
* omp-expand.c (expand_omp_target): Add non-contiguous array descriptor
pointers to variadic arguments.
* omp-low.c (append_field_to_record_type): New function.
(create_noncontig_array_descr_type): Likewise.
(create_noncontig_array_descr_init_code): Likewise.
(scan_sharing_clauses): For non-contiguous array map kinds, check for
supported dimension structure, and install non-contiguous array
variable into current omp_context.
(reorder_noncontig_array_clauses): New function.
(scan_omp_target): Call reorder_noncontig_array_clauses to place
non-contiguous array map clauses at beginning of clause sequence.
(lower_omp_target): Add handling for non-contiguous array map kinds,
add all created non-contiguous array descriptors to
gimple_omp_target_data_arg.
gcc/testsuite/
* c-c++-common/goacc/noncontig_array-1.c: New test.
libgomp/
* libgomp_g.h (GOACC_data_start): Add variadic '...' to declaration.
* libgomp.h (gomp_map_vars_openacc): New function declaration.
* oacc-int.h (struct goacc_ncarray_dim): New struct declaration.
(struct goacc_ncarray_descr_type): Likewise.
(struct goacc_ncarray): Likewise.
(struct goacc_ncarray_info): Likewise.
(goacc_noncontig_array_create_ptrblock): New function declaration.
* oacc-parallel.c (goacc_noncontig_array_count_rows): New function.
(goacc_noncontig_array_compute_sizes): Likewise.
(goacc_noncontig_array_fill_rows_1): Likewise.
(goacc_noncontig_array_fill_rows): Likewise.
(goacc_process_noncontiguous_arrays): Likewise.
(goacc_noncontig_array_create_ptrblock): Likewise.
(GOACC_parallel_keyed): Use goacc_process_noncontiguous_arrays to
handle non-contiguous array descriptors at end of varargs, adjust
to use gomp_map_vars_openacc.
(GOACC_data_start): Likewise. Adjust function type to accept varargs.
* target.c (gomp_map_vars_internal): Add struct goacc_ncarray_info *
nca_info parameter, add handling code for non-contiguous arrays.
(gomp_map_vars_openacc): Add new function for specialization of
gomp_map_vars_internal for OpenACC structured region usage.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-4.c: New test.
* testsuite/libgomp.oacc-c-c++-common/noncontig_array-utils.h: Support
header for new tests.
c++ ICE with nested requirement as default tpl parm[PR94827]
Template headers are not incrementally updated as we parse its parameters.
We maintain a dummy level until the closing > when we replace the dummy with
a real parameter set. requires processing was expecting a properly populated
arg_vec in current_template_parms, and then creates a self-mapping of parameters
from that. But we don't need to do that, just teach map_arguments to look at
TREE_VALUE when args is NULL.
* constraint.cc (map_arguments): If ARGS is null, it's a
self-mapping of parms.
(finish_nested_requirement): Do not pass argified
current_template_parms to normalization.
(tsubst_nested_requirement): Don't assert no template parms.
Jonathan Wakely [Thu, 30 Apr 2020 14:47:52 +0000 (15:47 +0100)]
libstdc++: Avoid errors in allocator's noexcept-specifier (PR 89510)
This fixes a regression due to the conditional noexcept-specifier on
std::allocator::construct and std::allocator::destroy, as well as the
corresponding members of new_allocator, malloc_allocator, and
allocator_traits. Those noexcept-specifiers were using expressions which
might be ill-formed, which caused errors outside the immediate context
when checking for the presence of construct and destroy in SFINAE
contexts.
The fix is to use the is_nothrow_constructible and
is_nothrow_destructible type traits instead, because those traits are
safe to use even when the construction/destruction itself is not valid.
The is_nothrow_constructible trait will be false for a type that is not
also nothrow-destructible, even if the new-expression used in the
construct function body is actually noexcept. That's not the correct
answer, but isn't a problem because providing a noexcept-specifier on
these functions is not required by the standard anyway. If the answer is
false when it should be true, that's suboptimal but OK (unlike giving
errors for valid code, or giving a true answer when it should be false).
PR libstdc++/89510
* include/bits/alloc_traits.h (allocator_traits::_S_construct)
(allocator_traits::_S_destroy)
(allocator_traits<allocator<T>>::construct): Use traits in
noexcept-specifiers.
* include/bits/allocator.h (allocator<void>::construct)
(allocator<void>::destroy): Likewise.
* include/ext/malloc_allocator.h (malloc_allocator::construct)
(malloc_allocator::destroy): Likewise.
* include/ext/new_allocator.h (new_allocator::construct)
(new_allocator::destroy): Likewise.
* testsuite/20_util/allocator/89510.cc: New test.
* testsuite/ext/malloc_allocator/89510.cc: New test.
* testsuite/ext/new_allocator/89510.cc: New test.
coroutines: Fix handling of artificial vars [PR94886]
The testcase ICEs because the range-based for generates three
artificial variables that need to be allocated to the coroutine
frame but, when walking the BIND_EXR that contains these, the
DECL_INITIAL for one of them refers to an entry appearing later,
which means that the frame entry hasn't been allocated when that
INITIAL is walked.
The solution is to defer walking the DECL_INITIAL/SIZE etc. until
all the BIND_EXPR vars have been processed.
gcc/cp/ChangeLog:
2020-04-30 Iain Sandoe <iain@sandoe.co.uk>
PR c++/94886
* coroutines.cc (transform_local_var_uses): Defer walking
the DECL_INITIALs of BIND_EXPR vars until all the frame
allocations have been made.
gcc/testsuite/ChangeLog:
2020-04-30 Iain Sandoe <iain@sandoe.co.uk>
PR c++/94886
* g++.dg/coroutines/pr94886-folly-3.C: New test.
coroutines: Fix handling of target cleanup exprs [PR94883]
The problem here is that target cleanup expressions have been
added to the initialisers for the awaitable (and returns of
non-trivial values from await_suspend() calls. This is because
the expansion of the co_await into its control flow is not
apparent to the machinery adding the target cleanup expressions.
The solution being tested is simply to recreate target expressions
as the co_awaits are lowered. Teaching the machinery to handle
walking co_await expressions in different ways at different points
(outside the coroutine transformation) seems overly complex.
gcc/cp/ChangeLog:
2020-04-30 Iain Sandoe <iain@sandoe.co.uk>
PR c++/94883
* coroutines.cc (register_awaits): Update target
expressions for awaitable and suspend handle
initializers.
gcc/testsuite/ChangeLog:
2020-04-30 Iain Sandoe <iain@sandoe.co.uk>
PR c++/94883
* g++.dg/coroutines/pr94883-folly-2.C: New test.
coroutines: Fix cases where proxy variables are used [PR94879]
There are several places where the handling of a variable
declaration depends on whether it corresponds to a compiler
temporary, or to some other entity. We were testing that var
decls were artificial in determining this. However, proxy vars
are also artificial so that this is not sufficient. The solution
is to exclude variables with a DECL_VALUE_EXPR as well, since
the value variable will not be a temporary.
gcc/cp/ChangeLog:
2020-04-30 Iain Sandoe <iain@sandoe.co.uk>
PR c++/94879
* coroutines.cc (build_co_await): Account for variables
with DECL_VALUE_EXPRs.
(captures_temporary): Likewise.
(register_awaits): Likewise.
gcc/testsuite/ChangeLog:
2020-04-30 Iain Sandoe <iain@sandoe.co.uk>
PR c++/94879
* g++.dg/coroutines/pr94879-folly-1.C: New test.
Marek Polacek [Wed, 29 Apr 2020 19:36:35 +0000 (15:36 -0400)]
tree: Don't reuse types if TYPE_USER_ALIGN differ [PR94775]
Here we trip on the TYPE_USER_ALIGN (t) assert in strip_typedefs: it
gets "const d[0]" with TYPE_USER_ALIGN=0 but the result built by
build_cplus_array_type is "const char[0]" with TYPE_USER_ALIGN=1.
When we strip_typedefs the element of the array "const d", we see it's
a typedef_variant_p, so we look at its DECL_ORIGINAL_TYPE, which is
char, but we need to add the const qualifier, so we call
cp_build_qualified_type -> build_qualified_type
where get_qualified_type checks to see if we already have such a type
by walking the variants list, which in this case is:
char -> c -> const char -> const char -> d -> const d
Because check_base_type only checks TYPE_ALIGN and not TYPE_USER_ALIGN,
we choose the first const char, which has TYPE_USER_ALIGN set. If the
element type of an array has TYPE_USER_ALIGN, the array type gets it too.
So we can make check_base_type stricter. I was afraid that it might make
us reuse types less often, but measuring showed that we build the same
amount of types with and without the patch, while bootstrapping.
PR c++/94775
* tree.c (check_base_type): Return true only if TYPE_USER_ALIGN match.
(check_aligned_type): Check if TYPE_USER_ALIGN match.
* config/aarch64/aarch64.h (TARGET_OUTLINE_ATOMICS): Define.
* config/aarch64/aarch64.opt (moutline-atomics): Change to Int variable.
* doc/invoke.texi (moutline-atomics): Document as on by default.
Szabolcs Nagy [Fri, 24 Apr 2020 16:36:02 +0000 (17:36 +0100)]
aarch64: don't emit bti j after NOTE_INSN_DELETED_LABEL [PR94748]
It was previously discussed that indirect branches cannot go to
NOTE_INSN_DELETED_LABEL so inserting a landing pad is unnecessary.
See https://gcc.gnu.org/pipermail/gcc-patches/2019-May/522625.html
Before the patch a bti j was inserted after the label in
__attribute__((target("branch-protection=bti")))
int foo (void)
{
label:
return 0;
}
This is not necessary and weakens the security protection.
gcc/ChangeLog:
PR target/94748
* config/aarch64/aarch64-bti-insert.c (rest_of_insert_bti): Remove
the check for NOTE_INSN_DELETED_LABEL.
gcc/testsuite/ChangeLog:
PR target/94748
* gcc.target/aarch64/pr94748.c: New test.
Corrects a previous change made to the SPARC stdc bindings, and
backports PPC-related fixes. The library and language testsuite now
passes fully on powerpc64le-linux-gnu.
PR d/94825
* libdruntime/Makefile.am (DRUNTIME_SOURCES_CONFIGURED): Remove
config/powerpc/switchcontext.S
* libdruntime/Makefile.in: Regenerate.
* libdruntime/config/powerpc/callwithstack.S: Remove.
* libdruntime/config/powerpc/switchcontext.S: Fix symbol name of
fiber_switchContext.
* libdruntime/core/thread.d: Disable fiber migration tests on PPC.
* testsuite/libphobos.thread/fiber_guard_page.d: Set guardPageSize
same as stackSize.
Jakub Jelinek [Thu, 30 Apr 2020 09:49:40 +0000 (11:49 +0200)]
--with-{documentation,changes}-root-url tweaks
> , CHANGES_URL ("gcc-10/changes.html#empty_base");
>
> where the macro would just use preprocessor string concatenation?
Ok, the following patch implements it (doesn't introduce a separate
macro and just uses CHANGES_ROOT_URL "gcc-10/changes.html#empty_base"),
in addition adds the documentation Joseph requested.
2020-04-30 Jakub Jelinek <jakub@redhat.com>
* configure.ac (--with-documentation-root-url,
--with-changes-root-url): Diagnose URL not ending with /,
use AC_DEFINE_UNQUOTED instead of AC_SUBST.
* opts.h (get_changes_url): Remove.
* opts.c (get_changes_url): Remove.
* Makefile.in (CFLAGS-opts.o): Don't add -DDOCUMENTATION_ROOT_URL
or -DCHANGES_ROOT_URL.
* doc/install.texi (--with-documentation-root-url,
--with-changes-root-url): Document.
* config/arm/arm.c (aapcs_vfp_is_call_or_return_candidate): Don't call
get_changes_url and free, change url variable type to const char * and
set it to CHANGES_ROOT_URL "gcc-10/changes.html#empty_base".
* config/s390/s390.c (s390_function_arg_vector,
s390_function_arg_float): Likewise.
* config/aarch64/aarch64.c (aarch64_vfp_is_call_or_return_candidate):
Likewise.
* config/rs6000/rs6000-call.c (rs6000_discover_homogeneous_aggregate):
Likewise.
* config.in: Regenerate.
* configure: Regenerate.
Andreas Krebbel [Thu, 30 Apr 2020 06:29:26 +0000 (08:29 +0200)]
IBM Z: vec_store_len_r/vec_load_len_r fix
This fixes a problem with the vec_store_len_r intrinsic. The macros
mapping the intrinsic to a GCC builtin had the wrong signature.
With the patch an immediate length operand of vlrl/vstrl is handled
the same way as if it was passed in a register to vlrlr/vstrlr.
Values bigger than 15 always load the full vector. If it can be
recognized that it is in effect a full vector register load or store
it is now implemented with vl/vst instead.
gcc/ChangeLog:
2020-04-30 Andreas Krebbel <krebbel@linux.ibm.com>
* config/s390/constraints.md ("j>f", "jb4"): New constraints.
* config/s390/vecintrin.h (vec_load_len_r, vec_store_len_r): Fix
macro definitions.
* config/s390/vx-builtins.md ("vlrlrv16qi", "vstrlrv16qi"): Add a
separate expander.
("*vlrlrv16qi", "*vstrlrv16qi"): Add alternative for vl/vst.
Change constraint for vlrl/vstrl to jb4.
gcc/testsuite/ChangeLog:
2020-04-30 Andreas Krebbel <krebbel@linux.ibm.com>
* gcc.target/s390/zvector/vec_load_len_r.c: New test.
* gcc.target/s390/zvector/vec_store_len_r.c: New test.
var-tracking.c: Fix possible use of uninitialized variable pre
While bootstrapping GCC on S/390 the following warning/error is raised:
gcc/var-tracking.c:10239:34: error: 'pre' may be used uninitialized in this function [-Werror=maybe-uninitialized]
10239 | VTI (bb)->out.stack_adjust += pre;
| ^
The lines of interest are:
HOST_WIDE_INT pre, post = 0;
// ...
if (!frame_pointer_needed)
{
insn_stack_adjust_offset_pre_post (insn, &pre, &post);
// ...
}
// ...
adjust_insn (bb, insn);
if (!frame_pointer_needed && pre)
VTI (bb)->out.stack_adjust += pre;
Both if statements depend on global variable frame_pointer_needed. In function
insn_stack_adjust_offset_pre_post local variable pre is initialized. The
problematic part is the function call between both if statements. Since
adjust_insn also calls functions which are defined in a different compilation
unit, we are not able to prove that global variable frame_pointer_needed is not
altered by adjust_insn and its siblings. Thus we must assume that
frame_pointer_needed may be true before the call and false afterwards which
renders the warning true (admitted the location hint is not totally perfect).
By initialising pre we silence the warning.
gcc/ChangeLog:
2020-04-30 Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
* var-tracking.c (vt_initialize): Move variables pre and post
into inner block and initialize both in order to fix warning
about uninitialized use. Remove unnecessary checks for
frame_pointer_needed.
Jakub Jelinek [Wed, 29 Apr 2020 20:41:47 +0000 (22:41 +0200)]
diagnostics: Add %{...%} pretty-format support for URLs and use it in -Wpsabi diagnostics
The following patch attempts to use the diagnostics URL support if available
to provide more information about the C++17 empty base and C++20
[[no_unique_address]] empty class ABI changes in -Wpsabi diagnostics.
in GCC 10.1 at the end of the diagnostics is then in some terminals
underlined with a dotted line and points to a (to be written) anchor in
gcc-10/changes.html which we need to write anyway.
2020-04-29 Jakub Jelinek <jakub@redhat.com>
* configure.ac (-with-changes-root-url): New configure option,
defaulting to https://gcc.gnu.org/.
* Makefile.in (CFLAGS-opts.o): Define CHANGES_ROOT_URL for
opts.c.
* pretty-print.c (get_end_url_string): New function.
(pp_format): Handle %{ and %} for URLs.
(pp_begin_url): Use pp_string instead of pp_printf.
(pp_end_url): Use get_end_url_string.
* opts.h (get_changes_url): Declare.
* opts.c (get_changes_url): New function.
* config/rs6000/rs6000-call.c: Include opts.h.
(rs6000_discover_homogeneous_aggregate): Use %{in GCC 10.1%} instead
of just in GCC 10.1 in diagnostics and add URL.
* config/arm/arm.c (aapcs_vfp_is_call_or_return_candidate): Likewise.
* config/aarch64/aarch64.c (aarch64_vfp_is_call_or_return_candidate):
Likewise.
* config/s390/s390.c (s390_function_arg_vector,
s390_function_arg_float): Likewise.
* configure: Regenerated.
* c-format.c (PP_FORMAT_CHAR_TABLE): Add %{ and %}.
Jakub Jelinek [Wed, 29 Apr 2020 20:38:01 +0000 (22:38 +0200)]
s390: Fix up -Wpsabi diagnostics + [[no_unique_address]] empty member fix [PR94704]
So, based on the yesterday's discussions, similarly to powerpc64le-linux
I've done some testing for s390x-linux too.
First of all, I found a bug in my patch from yesterday, it was printing
the wrong type like 'double' etc. rather than the class that contained such
the element. Fix below.
For s390x-linux, I was using
struct X { };
struct Y { int : 0; };
struct Z { int : 0; Y y; };
struct U : public X { X q; };
struct A { double a; };
struct B : public X { double a; };
struct C : public Y { double a; };
struct D : public Z { double a; };
struct E : public U { double a; };
struct F { [[no_unique_address]] X x; double a; };
struct G { [[no_unique_address]] Y y; double a; };
struct H { [[no_unique_address]] Z z; double a; };
struct I { [[no_unique_address]] U u; double a; };
struct J { double a; [[no_unique_address]] X x; };
struct K { double a; [[no_unique_address]] Y y; };
struct L { double a; [[no_unique_address]] Z z; };
struct M { double a; [[no_unique_address]] U u; };
#define T(S, s) extern S s; extern void foo##s (S); int bar##s () { foo##s (s); return 0; }
T (A, a)
T (B, b)
T (C, c)
T (D, d)
T (E, e)
T (F, f)
T (G, g)
T (H, h)
T (I, i)
T (J, j)
T (K, k)
T (L, l)
T (M, m)
as testcase and looking for "\tld\t%f0,".
While g++ 9 with -std=c++17 used to pass in fpr just
A, g++ 9 -std=c++14, as well as current trunk -std=c++14 & 17
and clang++ from today -std=c++14 & 17 all pass A, B, C
in fpr and nothing else. The intent stated by Jason seems to be
that A, B, C, F, G, J, K should all be passed in fpr.
Attached are two (updated) versions of the patch on top of the
powerpc+middle-end patch just posted.
The first one emits two separate -Wpsabi warnings like powerpc, one for
the -std=c++14 vs. -std=c++17 ABI difference and one for GCC 9 vs. 10
[[no_unique_address]] passing changes, the other one is silent about the
second case.
2020-04-29 Jakub Jelinek <jakub@redhat.com>
PR target/94704
* config/s390/s390.c (s390_function_arg_vector,
s390_function_arg_float): Use DECL_FIELD_ABI_IGNORED instead of
cxx17_empty_base_field_p. In -Wpsabi diagnostics use the type
passed to the function rather than the type of the single element.
Rename cxx17_empty_base_seen variable to empty_base_seen, change
type to int, and adjust diagnostics depending on if the field
has [[no_unique_attribute]] or not.
* g++.target/s390/s390.exp: New file.
* g++.target/s390/pr94704-1.C: New test.
* g++.target/s390/pr94704-2.C: New test.
* g++.target/s390/pr94704-3.C: New test.
* g++.target/s390/pr94704-4.C: New test.
Jakub Jelinek [Wed, 29 Apr 2020 15:31:26 +0000 (17:31 +0200)]
x86: Fix -O0 remaining intrinsic macros [PR94832]
A few other macros seem to suffer from the same issue. What I've done was:
cat gcc/config/i386/*intrin.h | sed -e ':x /\\$/ { N; s/\\\n//g ; bx }' \
| grep '^[[:blank:]]*#[[:blank:]]*define[[:blank:]].*(' | sed 's/[ ]\+/ /g' \
> /tmp/macros
and then looking for regexps:
)[a-zA-Z]
) [a-zA-Z]
[a-zA-Z][-+*/%]
[a-zA-Z] [-+*/%]
[-+*/%][a-zA-Z]
[-+*/%] [a-zA-Z]
in the resulting file.
As reported in the PR, while most intrinsic -O0 macro argument uses
are properly wrapped in ()s or used in context where having a complex
expression passed as the argument doesn't pose a problem (e.g. when
macro argument use is in between commas, or between ( and comma, or
between comma and ) etc.), especially the gather/scatter macros don't do
this and if one passes to some macro e.g. x + y as argument, the
corresponding inline function would do cast on the argument, but
the macro does (int) ARG, then it is (int) x + y rather than (int) (x + y).
The following patch fixes those issues in *gather/*scatter*; additionally,
the AVX2 macros were passing incorrect mask of e.g.
(__v2df)_mm_set1_pd((double)(long long int) -1)
which is IMHO equivalent to
(__v2df){-1.0, -1.0}
when it really wants to pass __v2df vector with all bits set.
I've used what the inline functions use for those cases.
fortran/io.c: Fix use of uninitialized variable num [PR94769]
While bootstrapping GCC on S/390 the following warning occurs:
gcc/fortran/io.c: In function 'bool gfc_resolve_dt(gfc_code*, gfc_dt*, locus*)':
gcc/fortran/io.c:3857:7: error: 'num' may be used uninitialized in this function [-Werror=maybe-uninitialized]
3857 | if (num == 0)
| ^~
gcc/fortran/io.c:3843:11: note: 'num' was declared here
3843 | int num;
Since gfc_resolve_dt is a non-static function we cannot assume anything about
argument DT. Argument DT gets passed to function check_io_constraints which
passes values depending on DT, namely dt->asynchronous->value.character.string
to function compare_to_allowed_values as well as argument warn which is true as
soon as DT->dterr is true. Thus both arguments depend on DT.
If function compare_to_allowed_values is called with
dt->asynchronous->value.character.string not being an allowed value, and
ALLOWED_F2003 as well as ALLOWED_GNU being NULL (which is the case at the
particular call side), and WARN equals true, then the function returns with a
non-zero value and leaves num uninitialized which renders the warning true.
Initialized num to -1 and added an assert statement.
gcc/fortran/ChangeLog:
2020-04-29 Stefan Schulze Frielinghaus <stefansf@linux.ibm.com>
PR fortran/94769
* io.c (check_io_constraints): Initialize local variable num to
-1 and assert that it receives a meaningful value by function
compare_to_allowed_values.
This is the rs6000 version of the earlier committed x86, aarch64 and arm
fixes, as create_tmp_var_raw is used because the C FE can call this outside
of function context, we need to make sure the first references to those
VAR_DECLs are through a TARGET_EXPR, so that it gets gimple_add_tmp_var
marked in whatever function it gets expanded in. Without that DECL_CONTEXT
is NULL and the vars aren't added as local decls of the containing function.
2020-04-29 Jakub Jelinek <jakub@redhat.com>
PR target/94826
* config/rs6000/rs6000.c (rs6000_atomic_assign_expand_fenv): Use
TARGET_EXPR instead of MODIFY_EXPR for first assignment to
fenv_var, fenv_clear and old_fenv variables. For fenv_addr
take address of TARGET_EXPR of fenv_var with void_node initializer.
Formatting fixes.
This patch makes the order in which template parameters appear in the
TREE_LIST returned by find_template_parameters deterministic between
runs.
The current nondeterminism is semantically harmless, but it has the
undesirable effect of causing some concepts diagnostics which print a
constraint's parameter mapping via pp_cxx_parameter_mapping to also be
nondeterministic, as in the testcases below.
gcc/cp/ChangeLog:
PR c++/94830
* pt.c (find_template_parameter_info::parm_list): New field.
(keep_template_parm): Use the new field to build up the
parameter list here instead of ...
(find_template_parameters): ... here. Return ftpi.parm_list.
gcc/testsuite/ChangeLog:
PR c++/94830
* g++.dg/concepts/diagnostics12.C: Clarify the dg-message now
that the corresponding diagnostic is deterministic.
* g++.dg/concepts/diagnostics13.C: New test.
2020-04-29 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* calls.h (cxx17_empty_base_field_p): Turn into a function declaration.
* calls.c (cxx17_empty_base_field_p): New function. Check
DECL_ARTIFICIAL and RECORD_OR_UNION_TYPE_P in addition to the
previous checks.
H.J. Lu [Wed, 29 Apr 2020 11:52:46 +0000 (04:52 -0700)]
x86: Allow -fcf-protection with external thunk
Allow -fcf-protection with external thunk since the external thunk can be
made compatible with -fcf-protection.
gcc/
PR target/93654
* config/i386/i386-options.c (ix86_set_indirect_branch_type):
Allow -fcf-protection with -mindirect-branch=thunk-extern and
-mfunction-return=thunk-extern.
* doc/invoke.texi: Update notes for -fcf-protection=branch with
-mindirect-branch=thunk-extern and -mindirect-return=thunk-extern.
gcc/testsuite/
PR target/93654
* gcc.target/i386/pr93654.c: New test.