Sandra Loosemore [Sun, 30 Aug 2020 19:15:23 +0000 (12:15 -0700)]
Clean up loop variable extraction in OpenACC kernels loop annotation.
The code for identifying annotatable loops in OpenACC kernels regions
previously looked for the loop variable as the left-hand side of the
comparison in the loop end test. However, front end optimizations
sometimes switch the sense of the comparison, making this method
unreliable. In particular, it's ambiguous when both operands to the
end test comparison are local variables.
This patch reorders the loop processing to identify the loop variable
from the initializer, rather than the end test. The processing of the
end test then just checks that one of the operands to the comparison
matches the variable appearing in the initializer. Much of the patch
is code refactoring, moving the initializer analysis out of
annotate_for_loop to check_and_annotate_for_loop so it can be
performed earlier.
gcc/c-family/
* c-omp.c (annotate_for_loop): Move initializer processing...
(check_and_annotate_for_loop): ... to here. Allow the loop
variable as either operand to the condition.
Sandra Loosemore [Sun, 23 Aug 2020 05:43:57 +0000 (22:43 -0700)]
Fix patterns in Fortran tests for kernels loop annotation.
Several of the Fortran tests for kernels loop annotation were failing
due to changes in the formatting of "acc loop" constructs in the dump
file. Now the "auto" clause appears first, instead of after "private".
Sandra Loosemore [Sun, 23 Aug 2020 01:23:26 +0000 (18:23 -0700)]
Permit calls to builtins and intrinsics in kernels loops.
This tweak to the OpenACC kernels loop annotation relaxes the
restrictions on function calls in the loop body. Normally calls to
functions not explicitly marked with a parallelism attribute are not
permitted, but C/C++ builtins and Fortran intrinsics have known
semantics so we can generally permit those without restriction. If
any turn out to be problematical, we can add on here to recognize
them, or in the processing of the "auto" annotations.
Sandra Loosemore [Thu, 20 Aug 2020 02:24:43 +0000 (19:24 -0700)]
Annotate inner loops in "acc kernels loop" directives (Fortran).
Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code. However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops. This patch implements this behavior in Fortran.
gcc/fortran/
* openmp.c (annotate_do_loops_in_kernels): Handle
EXEC_OACC_KERNELS_LOOP separately to permit annotation of inner
loops in a combined "acc kernels loop" directive.
Sandra Loosemore [Thu, 20 Aug 2020 02:18:57 +0000 (19:18 -0700)]
Annotate inner loops in "acc kernels loop" directives (C/C++).
Normally explicit loop directives in a kernels region inhibit
automatic annotation of other loops in the same nest, on the theory
that users have indicated they want manual control over that section
of code. However there seems to be an expectation in user code that
the combined "kernels loop" directive should still allow annotation of
inner loops. This patch implements this behavior for C and C++.
Julian Brown [Tue, 28 Jul 2020 13:02:50 +0000 (06:02 -0700)]
[og10] openacc: Unshare reduction temporaries for GCN
The GCN backend uses tree nodes like MEM((__lds TYPE *) <constant>)
for reduction temporaries. Unlike e.g. var decls and SSA names, these
nodes cannot be shared during gimplification, but are so in some
circumstances. This is detected when appropriate --enable-checking
options are used. This patch unshares such nodes when they are reused
more than once.
2020-07-30 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Do not
cache/share decls for reduction temporaries between invocations.
(gcn_goacc_reduction_teardown): Unshare VAR on second use.
* config/gcn/gcn.c (gcn_init_machine_status): Do not initialise
reduc_decls.
* config/gcn/gcn.h (machine_function): Remove reduc_decls cache.
Fix c-c++-common/goacc/routine-4.c and c-c++-common/goacc/routine-4-extern.c testcases
'Various OpenACC reduction enhancements - FE changes' (commit 6b3e1f7f05cd360bbd356b3f78511aa2ec3f40c3) introduced checks for gang
reductions on orphan loops. The checks triggered in the routine-4.c
and routine-4-extern.c testcases, requiring changes that effectively
rendered them useless as test cases.
This patch restores the original intent of the test cases, by restoring
the original tests and removing the orphan loop reductions that were
triggering the new check.
XFAIL tests in gfortran.dg/goacc/loop-2-kernels.f95
The C-equivalent version of the test (c-c++-common/goacc/loop-2-kernels.c)
has these tests XFAILed in the commit 'Make new OpenACC kernels conversion
the default; adjust and add tests' (commit 757f56ddc43fd80bb8740222ec352111b26d66e9), so the Fortran version should
be XFAILed too.
Frederik Harwath [Mon, 20 Jul 2020 09:24:21 +0000 (11:24 +0200)]
libgomp, Fortran: Fix OpenACC "gang reduction on an orphan loop" error message
According to the OpenACC standard version 2.5 and later, reductions on
orphaned gang loops are explicitly disallowed (cf. section "Changes
from Version 2.0 to 2.5"). A loop is "orphaned" if it is not
lexically contained in a compute construct (cf. section "Loop
construct" of the OpenACC standard), i.e. in either a "parallel", a
"serial", or a "kernels" construct.
This commit fixes the check for reductions on orphaned gang loops in
the Fortran frontend which (in contrast to the C, C++ frontends)
erroneously rejects reductions on gang loops that are contained in
"kernels" constructs.
2020-07-20 Frederik Harwath <frederik@codesourcery.com>
gcc/fortran/
* openmp.c (oacc_is_parallel_or_serial): Removed function.
(oacc_is_kernels): New function.
(oacc_is_compute_construct): New function.
(resolve_oacc_loop_blocks): Use "oacc_is_compute_construct"
instead of "oacc_is_parallel_or_serial" for checking that a
loop is not orphaned.
gcc/testsuite/
* gfortran.dg/goacc/orphan-reductions-2.f90: New test
verifying that the "gang reduction on an orphan loop" error message
is not emitted for non-orphaned loops.
* c-c++-common/goacc/orphan-reductions-2.c: Likewise for C and C++.
libgomp/
* plugin/plugin-gcn.c (parse_target_attributes): Automatically set
the number of teams and threads if necessary.
(gcn_exec): Automatically set the number of gangs and workers if
necessary.
Co-Authored-By: Andrew Stubbs <ams@codesourcery.com>
This is an optimisation for middle-end worker-partitioning support (used
to support multiple workers on AMD GCN). At present, barriers may be
emitted in cases where they aren't needed and cannot be optimised away.
This patch stops the extraneous barriers from being emitted in the
first place.
One exception to the above (where the barrier is still needed) is for
predicated blocks of code that perform a write to gang-private shared
memory from one worker. We must execute a barrier before other workers
read that shared memory location.
2020-07-15 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/gcn.c (gimple.h): Include.
(gcn_fork_join): Emit barrier for worker-level joins.
* omp-sese.c (find_local_vars_to_propagate): Add writes_gangprivate
bitmap parameter. Set bit for blocks containing gang-private variable
writes.
(worker_single_simple): Don't emit barrier after predicated block.
(worker_single_copy): Don't emit barrier if we're not broadcasting
anything and the block contains no gang-private writes.
(neuter_worker_single): Don't predicate blocks that only contain NOPs
or internal marker functions. Pass has_gangprivate_write argument to
worker_single_copy.
(oacc_do_neutering): Add writes_gangprivate bitmap handling.
Julian Brown [Mon, 10 Feb 2020 20:26:57 +0000 (12:26 -0800)]
amdgcn: Add waitcnt after LDS write instructions
Data-share write (ds_write) instructions do not necessarily complete
the write to LDS immediately. When a write completes, LGKM_CNT is
decremented. For now, we wait until LGKM_CNT reaches zero after each
ds_write instruction.
This fixes a race condition in the case where LDS is read immediately
after being written. This can happen with broadcast operations.
Julian Brown [Mon, 27 Jan 2020 14:14:05 +0000 (06:14 -0800)]
openacc: Shared memory layout optimisation
This patch implements an algorithm to lay out local data-share (LDS)
space. It currently works for AMD GCN. At the moment, LDS is used for
three things:
After the patch is applied, (2) and (3) are placed at preallocated
locations in LDS, and (1) continues to be handled by the backend (as it
is at present prior to this patch being applied). LDS now looks like this:
+--------------+ (gang local size + 1024, = 1536)
| free space |
| ... |
| - - - - - - -|
| worker bcast |
+--------------+
| reductions |
+--------------+ <<< -mgang-local-size=<number> (def. 512)
| gang private |
| vars |
+--------------+ (32)
| low LDS vars |
+--------------+ LDS base
So, gang-private space is fixed at a constant amount at compile time
(which can be increased with a command-line switch if necessary
for some given code). The layout algorithm takes out a slice of the
remainder of usable space for reduction vars, and uses the rest for
worker partitioning.
The partitioning algorithm works as follows.
1. An "adjacency" set is built up for each basic block that might
do a broadcast. This is calculated by starting at each such block,
and doing a recursive DFS walk over successors to find the next
block (or blocks) that *also* does a broadcast
(dfs_broadcast_reachable_1).
2. The adjacency set is inverted to get adjacent predecessor blocks also.
3. Blocks that will perform a broadcast are sorted by size of that
broadcast: the biggest blocks are handled first.
4. A splay tree structure is used to calculate the spans of LDS memory
that are already allocated by the blocks adjacent to this one
(merge_ranges{,_1}.
5. The current block's broadcast space is allocated from the first free
span not allocated in the splay tree structure calculated above
(first_fit_range). This seems to work quite nicely and efficiently
with the splay tree structure.
6. Continue with the next-biggest broadcast block until we're done.
In this way, "adjacent" broadcasts will not use the same piece of
LDS memory.
2020-07-15 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/gcn-protos.h (gcn_goacc_adjust_private_decl): Update
prototype.
* config/gcn/gcn-tree.c (gcn_goacc_get_worker_red_decl): Use
preallocated block of LDS memory.
(gcn_goacc_create_propagation_record): Add OFFSET parameter, and return
temporary LDS space at that offset. Return pointer in "sender" case.
(gcn_goacc_adjust_private_decl): Return var.
* config/gcn/gcn.c (acc_lds_size, gangprivate_hwm, lds_allocs): New
global vars.
(ACC_LDS_SIZE): Define as acc_lds_size.
(gcn_init_machine_status): Don't initialise lds_allocated and
lds_allocs fields of machine function struct.
(gcn_option_override): Handle default size for gang-private variables
and -mgang-local-size option.
(gcn_expand_prologue): Use LDS_SIZE instead of LDS_SIZE-1 when
initialising M0_REG.
(gcn_shared_mem_layout): New function.
(gcn_print_lds_decl): Update comment. Use global lds_allocs map and
gangprivate_hwm variable.
(TARGET_GOACC_SHARED_MEM_LAYOUT): Define target hook.
* config/gcn/gcn.h (machine_function): Remove lds_allocated,
lds_allocs. Add reduction_base, reduction_limit.
* config/gcn/gcn.opt (gang_local_size_opt): New global.
(mgang-local-size=): New option.
* config/nvptx/nvptx.c (nvptx_goacc_adjust_private_decl): Return var.
* doc/tm.texi.in (TARGET_GOACC_SHARED_MEM_LAYOUT): Place documentation
hook.
* doc/tm.texi: Regenerate.
* omp-offload.c (addr_expr_rewrite_info): Change adjusted_vars to a
hash_map.
(rewrite_addr_expr): Rewrite VAR_DECLs also.
(default_goacc_create_propagation_record): Add OFFSET parameter.
(execute_oacc_gimple_workers): Calculate per-function reduction
temporary and private-variable size. Call OpenACC shared_mem_layout
hook. Move num_workers==1 handling here.
(execute_oacc_device_lower): Fix for adjusted_vars being a hash_map
rather than a hash_set.
(pass_oacc_gimple_workers::gate): Remove num_workers==1 handling from
here. Enable pass for all OpenACC routines in order to call shared
memory-layout hook.
* omp-sese.c (targhooks.h, diagnostic-core.h): Add includes.
(build_sender_ref): Handle sender_decl being pointer.
(worker_single_copy): Add PLACEMENT and ISOLATE_BROADCASTS parameters.
Pass placement argument to create_propagation_record hook invocations.
Handle sender_decl being pointer and isolate_broadcasts inserting extra
barriers.
(blk_offset_map_t): Add typedef.
(neuter_worker_single): Add BLK_OFFSET_MAP parameter. Pass
preallocated range to worker_single_copy call.
(dfs_broadcast_reachable_1): New function.
(idx_decl_pair_t, used_range_vec_t): New typedefs.
(sort_size_descending): New function.
(addr_range): New class.
(splay_tree_compare_addr_range, splay_tree_free_key, first_fit_range,
merge_ranges_1, merge_ranges): New functions.
(oacc_do_neutering): Add BOUNDS_LO, BOUNDS_HI parameters. Arrange
layout of shared memory for broadcast operations.
* omp-sese.h (oacc_do_neutering): Update prototype.
* target.def (adjust_private_decl): Change return type to tree.
(create_propagation_record): Add OFFSET parameter.
(shared_mem_layout): New hook.
* targhooks.h (default_goacc_create_propagation_record): Update
prototype.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/broadcast-many.c: New test.
Julian Brown [Mon, 20 Jan 2020 19:42:28 +0000 (11:42 -0800)]
openacc: Turn off worker partitioning if num_workers==1
This patch turns off the middle-end worker-partitioning support if the
number of workers for an outlined offload function is one. In that case,
we do not need to perform the broadcasting/neutering code transformation.
2020-07-15 Julian Brown <julian@codesourcery.com>
gcc/
* omp-offload.c (pass_oacc_gimple_workers::gate): Disable worker
partitioning if num_workers is 1.
Julian Brown [Thu, 25 Jun 2020 14:40:53 +0000 (07:40 -0700)]
openacc: Fix race condition in Fortran loop collapse tests
The gangs participating in a gang-partitioned loop are not all guaranteed
to complete before some given gang continues to execute beyond that loop.
This means that two existing test cases contain a race condition,
because a loop that may be gang-partitioned is followed immediately by
another loop. The fix is to place the loops in separate parallel regions.
Tobias Burnus [Wed, 3 Jun 2020 13:35:12 +0000 (15:35 +0200)]
OpenACC: fix privatization of by-reference arrays
Replacing of a by-reference variable in a private clause by a local variable
makes sense; however, for arrays, the size is not directly known by the type.
This causes an ICE via create_tmp_var which indirectly invokes
force_constant_size in this case - but the latter only handled Ada.
gcc/ChangeLog:
* gimplify.c (localize_reductions): Do not create local
variable for privatized arrays.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/privatized-ref-2.f90: New test.
libgomp/
* testsuite/libgomp.oacc-fortran/parallel-dims-aux.c (aux_gang):
Handle case where AMD GCN is used.
(acc_worker): Likewise.
(acc_vector): Likewise.
Sandra Loosemore [Tue, 31 Mar 2020 21:29:09 +0000 (14:29 -0700)]
Fix bug in processing of array dimensions in data clauses.
The g++ front end wraps the array length and low_bound values in
NON_LVALUE_EXPR, causing the subsequent tests for INTEGER_CST to fail.
The test case c-c++-common/goacc/kernels-loop-annotation-1.c was
tickling this bug and giving bogus errors in g++ because it was falling
through to dynamic array code instead of recognizing the constant bounds.
This patch was posted upstream here
https://gcc.gnu.org/pipermail/gcc-patches/2020-March/542694.html
but not yet committed. It may be that some other fix for this problem
is implemented on mainline instead; check before merging this patch.
Sandra Loosemore [Thu, 19 Mar 2020 15:32:24 +0000 (08:32 -0700)]
Additional Fortran testsuite fixes for kernels loops annotation pass.
These testsuite fixes are specific to the og10 branch, so are being
segregated from the ones that apply to mainline in a separate commit
from the main Fortran kernels loop annotation patch.
Sandra Loosemore [Tue, 17 Mar 2020 01:08:01 +0000 (18:08 -0700)]
Kernels loops annotation: Fortran.
This patch implements the Fortran support for adding "#pragma acc loop auto"
annotations to loops in OpenACC kernels regions. It implements the same
-fopenacc-kernels-annotate-loops and -Wopenacc-kernels-annotate-loops options
that were previously added (and documented) for the C/C++ front ends.
Sandra Loosemore [Sun, 15 Mar 2020 22:13:46 +0000 (15:13 -0700)]
Kernels loops annotation: C and C++.
This patch detects loops in kernels regions that are candidates for
parallelization, and adds "#pragma acc loop auto" annotations to them.
This annotation is controlled by the -fopenacc-kernels-annotate-loops
option, which is enabled by default. -Wopenacc-kernels-annotate-loops
can be used to produce diagnostics about loops that cannot be annotated.
Add XFAIL for libgomp.oacc-c-c++-common/data-firstprivate-1.c
The firstprivate_int optimization changes the semantics of firstprivate
in this test, so XFAIL it until the correct semantics for firstprivate
have been decided (PR92036).
Fix test failure in routine-level-of-parallelism-2.c testcase
c-c++-common/goacc/routine-level-of-parallelism-2.c is supposed to be
equivalent to gfortran.dg/goacc/routine-level-of-parallelism-1.f90,
but is missing some test directives present in the latter.
Tobias Burnus [Mon, 16 Mar 2020 15:22:57 +0000 (16:22 +0100)]
Fix for is_gimple_reg vars to 'data kernels'
Nearly all variable mapping is moved from 'kernels' to a surrounding
'data kernels' and then 'force_present' mapped for the 'kernels'. However, as
libgomp.oacc-c-c++-common/declare-vla.c shows, moving 'int i, N' will fail as
there is a special case for is_gimple_reg in mapping and that fails badly if
outside a target region (e.g. offloading = false). As those are transferred by
value and not as a pointer, it makes more sense to only map them at
'kernels' and ignore them for 'data kernels'.
Additionally, as e.g. libgomp.oacc-c-c++-common/kernels-decompose-1.c shows,
one still additionally to handle 'kernels'-declared variables which now are
declared in 'kernels data' and and can be handled as is_gimple_reg.
gcc/
* omp-oacc-kernels-decompose.cc (maybe_build_inner_data_region):
is_gimple_reg vars are not yet mapped, fall through to map is as
before the transformation.
(omp_oacc_kernels_decompose_1): Don't map is_gimple_reg vars.
(decompose_kernels_region_body): Use tofrom for is_gimple_reg vars.
(omp_oacc_kernels_decompose_1): Handle is_gimple_reg vars as without
data kernels.
Julian Brown [Tue, 22 Oct 2019 00:22:31 +0000 (17:22 -0700)]
Run all kernels regions with GOMP_MAP_FORCE_TOFROM mappings synchronously
gcc/
* omp-oacc-kernels-decompose.cc (decompose_kernels_region_body): Add
inhibit_async parameter. Force asynchronous kernel launches to
run synchronously if they have problematic variable mappings.
Don't add explicit wait for decomposed kernels regions we forced
synchronous.
(omp_oacc_kernels_decompose_1): Detect problematic variable mappings,
and inhibit asynchronous execution if we find any.
Julian Brown [Tue, 26 Feb 2019 23:48:00 +0000 (15:48 -0800)]
Fortran "declare create"/allocate support for OpenACC
2018-10-04 Cesar Philippidis <cesar@codesourcery.com>
Julian Brown <julian@codesourcery.com>
gcc/
* omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
create, declare copyin and declare deviceptr to have local lifetimes.
(convert_to_firstprivate_int): Handle pointer types.
(convert_from_firstprivate_int): Likewise. Create local storage for
the values being pointed to. Add new orig_type argument.
(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
Add orig_type argument to convert_from_firstprivate_int call.
Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT. Don't privatize
firstprivate VLAs.
* tree-pretty-print.c (dump_omp_clause): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
gcc/fortran/
* gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE,
OMP_MAP_DECLARE_DEALLOCATE.
(gfc_omp_clauses): Add update_allocatable.
* trans-array.c (gfc_array_allocate): Call
gfc_trans_oacc_declare_allocate for decls that have oacc_declare_create
attribute set.
* trans-decl.c (find_module_oacc_declare_clauses): Relax
oacc_declare_create to OMP_MAP_ALLOC, and oacc_declare_copyin to
OMP_MAP_TO, in order to match OpenACC 2.5 semantics.
* trans-openmp.c (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER
(for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for
allocatable scalar decls. Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}
clauses.
(gfc_trans_oacc_executable_directive): Use GOMP_MAP_ALWAYS_POINTER
for allocatable scalar data clauses inside acc update directives.
(gfc_trans_oacc_declare_allocate): New function.
* trans-stmt.c (gfc_trans_allocate): Call
gfc_trans_oacc_declare_allocate for decls with oacc_declare_create
attribute set.
(gfc_trans_deallocate): Likewise.
* trans.h (gfc_trans_oacc_declare_allocate): Declare.
gcc/testsuite/
* gfortran.dg/goacc/declare-allocatable-1.f90: New test.
include/
* gomp-constants.h (enum gomp_map_kind): Define
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and GOMP_MAP_FLAG_SPECIAL_4.
libgomp/
* oacc-mem.c (gomp_acc_declare_allocate): New function.
* oacc-parallel.c (GOACC_enter_exit_data): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
* testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90: New test.
Julian Brown [Mon, 14 Oct 2019 20:12:39 +0000 (13:12 -0700)]
Re-do OpenACC private variable resolution
gcc/
* config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl): Rename
to...
(gcn_goacc_adjust_private_decl): ...this.
* config/gcn/gcn-tree.c (diagnostic-core.h): Include.
(gcn_goacc_adjust_gangprivate_decl): Rename to...
(gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
* config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
* config/nvptx/nvptx.c (tree-pretty-print.h): Include.
(nvptx_goacc_adjust_private_decl): New function.
(TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook using above function.
* doc/tm.texi.in (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename to...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...this.
* doc/tm.texi: Regenerated.
* internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
* internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
* omp-low.c (omp_context): Remove oacc_partitioning_levels field.
(lower_oacc_reductions): Add PRIVATE_MARKER parameter. Insert before
fork.
(lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify its
gimple call arguments as appropriate. Don't set
oacc_partitioning_levels in omp_context. Pass private_marker to
lower_oacc_reductions.
(oacc_record_private_var_clauses): Don't check for NULL ctx.
(mark_oacc_gangprivate): Remove unused function.
(make_oacc_private_marker): New function.
(lower_omp_for): Only call oacc_record_vars_in_bind for
OpenACC contexts. Create private marker and pass to
lower_oacc_head_tail.
(lower_omp_target): Remove unnecessary call to
oacc_record_private_var_clauses. Remove call to mark_oacc_gangprivate.
Create private marker and pass to lower_oacc_reductions.
(process_oacc_gangprivate_1): Remove.
(lower_omp_1): Only call oacc_record_vars_in_bind for OpenACC. Don't
iterate over contexts calling process_oacc_gangprivate_1.
(omp-offload.c (oacc_loop_xform_head_tail): Treat
private-variable markers like fork/join when transforming head/tail
sequences.
(execute_oacc_device_lower): Use IFN_UNIQUE_OACC_PRIVATE instead of
"oacc gangprivate" attributes to determine partitioning level of
variables. Remove unused variables.
* omp-sese.c (find_gangprivate_vars): New function.
(find_local_vars_to_propagate): Use GANGPRIVATE_VARS parameter instead
of "oacc gangprivate" attribute to determine which variables are
gang-private.
(oacc_do_neutering): Use find_gangprivate_vars.
* target.def (adjust_gangprivate_decl): Rename to...
(adjust_private_decl): ...this. Update documentation (briefly).
libgomp/
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: Use
oaccdevlow dump and update scanned output.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: Likewise.
Add missing atomic to force worker partitioning for test variable.
Julian Brown [Fri, 20 Sep 2019 20:53:10 +0000 (13:53 -0700)]
Handle references in OpenACC "private" clauses
gcc/
* gimplify.c (localize_reductions): Rewrite references for
OMP_CLAUSE_PRIVATE also.
libgomp/
* testsuite/libgomp.oacc-fortran/privatized-ref-1.f95: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-2.C: New test.
* testsuite/libgomp.oacc-c++/privatized-ref-3.C: New test.
Julian Brown [Fri, 9 Aug 2019 20:01:33 +0000 (13:01 -0700)]
Wait at end of OpenACC asynchronous kernels regions
gcc/
* omp-oacc-kernels-decompose.cc (add_wait): New function, split out
of...
(add_async_clauses_and_wait): ...here. Call new outlined function.
(decompose_kernels_region_body): Add wait at the end of
explicitly-asynchronous kernels regions.
Julian Brown [Fri, 5 Jul 2019 01:14:41 +0000 (18:14 -0700)]
Assumed-size arrays with non-lexical data mappings
gcc/
* gimplify.c (gimplify_adjust_omp_clauses_1): Raise error for
assumed-size arrays in map clauses for Fortran/OpenMP.
* omp-low.c (lower_omp_target): Set the size of assumed-size Fortran
arrays to one to allow use of data already mapped on the offload device.
gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Change clauses mapping
assumed-size arrays to use the GOMP_MAP_FORCE_PRESENT map type.
Julian Brown [Tue, 28 May 2019 15:42:10 +0000 (08:42 -0700)]
Apply gangprivate attribute to innermost decl
...and fix parallelism-level calculation when applying the attribute.
gcc/
* omp-low.c (mark_oacc_gangprivate): Add CTX parameter. Use to look up
correct decl to add attribute to.
(lower_omp_for): Move "oacc gangprivate" processing from here...
(process_oacc_gangprivate_1): ...to here. New function.
(lower_omp_target): Update call to mark_oacc_gangprivate.
(execute_lower_omp): Call process_oacc_gangprivate_1 for each OMP
context.
libgomp/
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-2.f90: New test.
Julian Brown [Sun, 19 May 2019 17:42:20 +0000 (10:42 -0700)]
Fix references declared in lexically-enclosing OpenACC data region
gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Guard addition of clauses for
pointers with DECL_P.
gcc/
* gimplify.c (oacc_array_mapping_info): Add REF field.
(gimplify_scan_omp_clauses): Initialise above field for data blocks
passed by reference.
(gomp_oacc_needs_data_present): Handle references.
(gimplify_adjust_omp_clauses_1): Handle references and optional
arguments for variables declared in lexically-enclosing OpenACC data
region.
Julian Brown [Thu, 16 May 2019 12:45:35 +0000 (05:45 -0700)]
Avoid introducing 'create' mapping clauses for loop index variables in kernels regions
gcc/
* omp-oacc-kernels-decompose.cc (find_omp_for_index_vars_1,
find_omp_for_index_vars): New functions.
(maybe_build_inner_data_region): Add IDX_VARS argument. Don't add
CREATE mapping clauses for loop index variables. Set TREE_ADDRESSABLE
flag on newly-mapped declarations as a side effect.
(decompose_kernels_region_body): Call find_omp_for_index_vars. Don't
create PRESENT clause for loop index variables. Pass index variable
set to maybe_build_inner_data_region.
Julian Brown [Wed, 9 Jan 2019 11:41:04 +0000 (03:41 -0800)]
Update OpenACC version to 2.6
gcc/c-family/
* c-cppbuiltin.c (c_cpp_builtins): Update _OPENACC define to 201711.
gcc/doc/
* invoke.texi: Update mention of OpenACC version to 2.6.
gcc/fortran/
* cpp.c (cpp_define_builtins): Update _OPENACC define to 201711.
* gfortran.texi: Update mentions of OpenACC version to 2.6.
* intrinsic.texi: Likewise.
gcc/testsuite/
* c-c++-common/cpp/openacc-define-3.c: Update expected value for
_OPENACC define.
* gfortran.dg/openacc-define-3.f90: Likewise.
libgomp/
* libgomp.texi: Update mentions of OpenACC version to 2.6. Update
section numbers to match version 2.6 of the spec.
* openacc.f90 (openacc_version): Update to 201711.
* openacc_lib.h (openacc_version): Update to 201711.
* testsuite/libgomp.oacc-fortran/openacc_version-1.f: Update expected
openacc_version to 201711.
* testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Likewise.
[nvptx] Expand OpenACC child function arguments to use CUDA params space
This patch replaces the patches "Use functional parameters for data mappings
in OpenACC child functions" and "Make OpenACC function-parameter explosion
optional".
* config/nvptx/nvptx.c (nvptx_expand_to_rtl_hook): New function
implementing CUDA .params space transformation.
(TARGET_EXPAND_TO_RTL_HOOK): implement hook with
nvptx_expand_to_rtl_hook.
Disable AC_PROG_CXX and consequently a libstdc++ dependency for libffi,
introduced with upstream libffi commit 7d698125b1f0 ("Use the proper C++
compiler to run C++ tests"). This is only needed for the libffi test
suite, which we don't have to support in the GCC tree, as libffi is
maintained as a separate project. The dependency causes a build failure
with the `powerpc64le-linux-gnu' target due to a circular dependency:
due to a libgomp dependency for libstdc++ and then a libffi dependency
for libgomp, introduced with commit 998eb38b265d ("Use functional
parameters for data mappings in OpenACC child functions").
Julian Brown [Thu, 21 Mar 2019 22:09:24 +0000 (15:09 -0700)]
Add support for gang local storage allocation in shared memory
2018-12-11 Julian Brown <julian@codesourcery.com>
Chung-Lin Tang <cltang@codesourcery.com>
gcc/
* config/nvptx/nvptx.c (tree-hash-traits.h): Include.
(gangprivate_shared_size): New global variable.
(gangprivate_shared_align): Likewise.
(gangprivate_shared_sym): Likewise.
(gangprivate_shared_hmap): Likewise.
(nvptx_option_override): Initialize gangprivate_shared_sym,
gangprivate_shared_align.
(nvptx_file_end): Output gangprivate_shared_sym.
(nvptx_goacc_expand_accel_var): New function.
(nvptx_set_current_function): New function.
(TARGET_SET_CURRENT_FUNCTION): Define hook.
(TARGET_GOACC_EXPAND_ACCEL): Likewise.
* doc/tm.texi (TARGET_GOACC_EXPAND_ACCEL_VAR): Document new hook.
* doc/tm.texi.in (TARGET_GOACC_EXPAND_ACCEL_VAR): Likewise.
* expr.c (expand_expr_real_1): Remap decls marked with the
"oacc gangprivate" attribute.
* omp-low.c (omp_context): Add oacc_partitioning_level and
oacc_addressable_var_decls fields.
(new_omp_context): Initialize oacc_addressable_var_decls in new
omp_context.
(delete_omp_context): Delete oacc_addressable_var_decls in old
omp_context.
(lower_oacc_head_tail): Record partitioning-level count in omp context.
(oacc_record_private_var_clauses, oacc_record_vars_in_bind)
(mark_oacc_gangprivate): New functions.
(lower_omp_for): Call oacc_record_private_var_clauses with "for"
clauses. Call mark_oacc_gangprivate for gang-partitioned loops.
(lower_omp_target): Call oacc_record_private_var_clauses with "target"
clauses.
Call mark_oacc_gangprivate for offloaded target regions.
(lower_omp_1): Call vars_in_bind for GIMPLE_BIND within OMP regions.
* target.def (expand_accel_var): New hook.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/gang-private-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/loop-gwv-2.c: New test.
* testsuite/libgomp.oacc-c/pr85465.c: New test.
* testsuite/libgomp.oacc-fortran/gangprivate-attrib-1.f90: New test.kk
gcc/testsuite/
* c-c++-common/goacc/reduction-8.c: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/privatize-reduction-1.c: New
test.
* testsuite/libgomp.oacc-c-c++-common/privatize-reduction-2.c: New
test.
Julian Brown [Tue, 26 Feb 2019 23:55:23 +0000 (15:55 -0800)]
Don't mark OpenACC auto loops as independent inside acc parallel regions
2018-09-20 Cesar Philippidis <cesar@codesourcery.com>
gcc/
* omp-low.c (lower_oacc_head_mark): Don't mark OpenACC auto
loops as independent inside acc parallel regions.
gcc/testsuite/
* c-c++-common/goacc/loop-auto-1.c: Adjust test case to conform to
the new behavior of the auto clause in OpenACC 2.5.
* c-c++-common/goacc/loop-auto-2.c: Likewise.
* gcc.dg/goacc/loop-processing-1.c: Likewise.
* c-c++-common/goacc/loop-auto-3.c: New test.
* gfortran.dg/goacc/loop-auto-1.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust test case
to conform to the new behavior of the auto clause in OpenACC 2.5.
Julian Brown [Tue, 26 Feb 2019 23:10:21 +0000 (15:10 -0800)]
Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC
2018-12-22 Cesar Philippidis <cesar@codesourcery.com>
Julian Brown <julian@codesourcery.com>
Tobias Burnus <tobias@codesourcery.com>
gcc/
* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
(convert_to_firstprivate_int): New function.
(convert_from_firstprivate_int): New function.
(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.
Remove unused variable.
libgomp/
* oacc-parallel.c (GOACC_parallel_keyed): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* plugin/plugin-nvptx.c (nvptx_exec): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New
test.
* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.
Julian Brown [Tue, 26 Feb 2019 22:22:41 +0000 (14:22 -0800)]
Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)
2018-08-28 Julian Brown <julian@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>
gcc/
* gimplify.c (oacc_array_mapping_info): New struct.
(gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Zero-initialise above.
(delete_omp_context): Delete above if allocated.
(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
and record in above map.
(gomp_oacc_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
slices) declared in lexically-enclosing data constructs.
* omp-low.c (lower_omp_target): Allow decl for bias not to be present
in OpenACC context.
gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: New test.
* gfortran.dg/goacc/pr70828.f90: New test.
* gfortran.dg/goacc/pr70828-2.f90: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
* testsuite/libgomp.oacc-fortran/implicit_copy.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-6.f90: New test.
Julian Brown [Tue, 26 Feb 2019 22:12:06 +0000 (14:12 -0800)]
Default compute dimensions (compile time)
Typo fix relative to last posted version.
2018-10-05 Nathan Sidwell <nathan@acm.org>
Tom de Vries <tdevries@suse.de>
Thomas Schwinge <thomas@codesourcery.com>
Julian Brown <julian@codesourcery.com>
Julian Brown [Tue, 26 Feb 2019 21:39:03 +0000 (13:39 -0800)]
Generate sequential loop for OpenACC loop directive inside kernels
2019-09-20 Chung-Lin Tang <cltang@codesourcery.com>
Cesar Philippidis <cesar@codesourcery.com>
gcc/
* omp-expand.c (struct omp_region): Add inside_kernels_p field.
(expand_omp_for_generic): Adjust to generate a 'sequential' loop
when GOMP builtin arguments are BUILT_IN_NONE.
(expand_omp_for): Use expand_omp_for_generic to generate a
non-parallelized loop for OMP_FORs inside OpenACC kernels regions.
(expand_omp): Mark inside_kernels_p field true for regions
nested inside OpenACC kernels constructs.
gcc/testsuite/
* c-c++-common/goacc/kernels-loop-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-2-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-3-acc-loop.c: New test.
* c-c++-common/goacc/kernels-loop-n-acc-loop.c: New test.
* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: New test.
Julian Brown [Wed, 6 Mar 2019 22:44:56 +0000 (14:44 -0800)]
Reinstate kernels-restrict behaviour
This patch contains a small fix for upstream churn relative to the last version
posted.
2018-09-20 Cesar Philippidis <cesar@codesourcery.com>
Julian Brown <julian@codesourcery.com>
* omp-low.c (install_var_field): New base_pointer_restrict
argument.
(scan_sharing_clauses): Update call to install_var_field.
(omp_target_base_pointers_restrict_p): New function.
(scan_omp_target): Update call to install_var_field.