The ICE occurs during Gimple verification after the ompexp stage because
one of the arguments to the generated builtin call is of a Gimple reg type,
but isn't a Gimple value (because it is marked addressable).
This appears to be fallout from the commit "OpenACC 'kernels' decomposition:
Mark variables used in synthesized data clauses as addressable [PR100280]".
The launch dimensions have been added to the arguments of a builtin call
by oacc_set_fn_attrib, but one of the dimensions has been marked addressable.
Fixed by forcing the added arguments to be re-gimplified.
The final tail mark has no LHS, causing code that assumes its presence to
segfault. The LHS and the assignment appear to have been removed as dead
code by the cddce1 stage.
Fixed by checking for the presence of the LHS before using it.
A change that was present in the OG11 version of
'openmp: in_reduction clause support on target construct' but
not in the mainline version resulted in non-contiguous
arrays being accepted in cache clauses, only to ICE later.
Thomas Schwinge [Thu, 5 May 2022 21:01:36 +0000 (23:01 +0200)]
Refactor '-ldl' handling for libgomp proper and plugins
Instead of implicit global 'LIBS="-ldl $LIBS"' via 'AC_CHECK_LIB', make
'-ldl' explicit for libgomp proper, and clean up 'PLUGIN_GCN_LIBS',
'PLUGIN_NVPTX_LIBS' accordingly.
Jakub Jelinek [Thu, 12 May 2022 06:31:20 +0000 (08:31 +0200)]
openmp: Add omp_all_memory support (C/C++ only so far)
The ugly part is that OpenMP 5.1 made omp_all_memory a reserved identifier
which isn't allowed to be used anywhere but in the depend clause, this is
against how everything else has been handled in OpenMP so far (where
some identifiers could have special meaning in some OpenMP clauses or
pragmas but not elsewhere).
The patch handles it by making it a conditional keyword (for -fopenmp
only) and emitting a better diagnostics when it is used in a primary
expression. Having a nicer diagnostics when e.g. trying to do
int omp_all_memory;
or
int *omp_all_memory[10];
etc. would mean changing too many spots and hooking into name lookups
to reject declaring any such symbols would be too ugly and I'm afraid
there are way too many spots where one can introduce a name
(variables, functions, namespaces, struct, enum, enumerators, template
arguments, ...).
Otherwise, the handling is quite simple, normal depend clauses lower
into addresses of variables being handed over to the library, for
omp_all_memory I'm using NULL pointers. omp_all_memory can only be
used with inout or out depend kinds and means that a task is dependent
on all previously created sibling tasks that have any dependency (of
any depend kind) and that any later created sibling tasks will be
dependent on it if they have any dependency.
2022-05-12 Jakub Jelinek <jakub@redhat.com>
gcc/
* gimplify.cc (gimplify_omp_depend): Don't build_fold_addr_expr
if null_pointer_node.
(gimplify_scan_omp_clauses): Likewise.
* tree-pretty-print.cc (dump_omp_clause): Print null_pointer_node
as omp_all_memory.
gcc/c-family/
* c-common.h (enum rid): Add RID_OMP_ALL_MEMORY.
* c-omp.cc (c_finish_omp_depobj): Don't build_fold_addr_expr
if null_pointer_node.
gcc/c/
* c-parser.cc (c_parse_init): Register omp_all_memory as keyword
if flag_openmp.
(c_parser_postfix_expression): Diagnose uses of omp_all_memory
in postfix expressions.
(c_parser_omp_variable_list): Handle omp_all_memory in depend
clause.
* c-typeck.cc (c_finish_omp_clauses): Handle omp_all_memory
keyword in depend clause as null_pointer_node, diagnose invalid
uses.
gcc/cp/
* lex.cc (init_reswords): Register omp_all_memory as keyword
if flag_openmp.
* parser.cc (cp_parser_primary_expression): Diagnose uses of
omp_all_memory in postfix expressions.
(cp_parser_omp_var_list_no_open): Handle omp_all_memory in depend
clause.
* semantics.cc (finish_omp_clauses): Handle omp_all_memory
keyword in depend clause as null_pointer_node, diagnose invalid
uses.
* pt.cc (tsubst_omp_clause_decl): Pass through omp_all_memory.
gcc/testsuite/
* c-c++-common/gomp/all-memory-1.c: New test.
* c-c++-common/gomp/all-memory-2.c: New test.
* c-c++-common/gomp/all-memory-3.c: New test.
* g++.dg/gomp/all-memory-1.C: New test.
* g++.dg/gomp/all-memory-2.C: New test.
libgomp/
* libgomp.h (struct gomp_task): Add depend_all_memory member.
* task.c (gomp_init_task): Initialize depend_all_memory.
(gomp_task_handle_depend): Handle omp_all_memory.
(gomp_task_run_post_handle_depend_hash): Clear
parent->depend_all_memory if equal to current task.
(gomp_task_maybe_wait_for_dependencies): Handle omp_all_memory.
* testsuite/libgomp.c-c++-common/depend-1.c: New test.
* testsuite/libgomp.c-c++-common/depend-2.c: New test.
* testsuite/libgomp.c-c++-common/depend-3.c: New test.
With recent commit 2e309a4eff80e55b53d32d26926a2a94eabfea21 "libgomp testsuite:
Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library",
and commit d6adba307508c75f1ccb2121eb1a43c9ab1d4056 "libgomp GCN plugin:
Clean up unused references to system-provided HSA Runtime library", the last
uses of '--with-hsa-runtime' etc. are gone.
Thomas Schwinge [Wed, 6 Apr 2022 10:15:28 +0000 (12:15 +0200)]
libgomp GCN plugin: Clean up always-empty 'PLUGIN_GCN_CPPFLAGS', 'PLUGIN_GCN_LDFLAGS'
After recent commit d6adba307508c75f1ccb2121eb1a43c9ab1d4056
"libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime
library", these aren't set anymore.
Thomas Schwinge [Wed, 6 Apr 2022 09:31:45 +0000 (11:31 +0200)]
libgomp GCN plugin: Clean up unused references to system-provided HSA Runtime library
This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-include=[...]', '--with-hsa-runtime-lib=[...]' -- which
nobody really is doing, as far as I can tell.
Originally changed for the libgomp HSA plugin in
commit b8d89b03db5f212919e4571671ebb4f5f8b1e19d (r242749)
"Remove build dependence on HSA run-time", and later propagated into the GCN
plugin, these are no longer built against system-provided HSA Runtime library.
Instead, unconditionally built against the GCC-shipped 'include/hsa*.h' header
files, and at run time does 'dlopen("libhsa-runtime64.so.1")'. It thus doesn't
make sense to consider references to system-provided HSA Runtime library during
libgomp GCN plugin build.
Thomas Schwinge [Wed, 6 Apr 2022 08:39:56 +0000 (10:39 +0200)]
libgomp testsuite: Don't amend 'LD_LIBRARY_PATH' for system-provided HSA Runtime library
This is only active if GCC is 'configure'd with '--with-hsa-runtime=[...]' or
'--with-hsa-runtime-lib=[...]' -- which nobody really is doing, as far as I can
tell.
# For build-tree testing, also consider the library paths used for builing.
# For installed testing, we assume all that to be provided in the sysroot.
if { $blddir != "" } {
[...]
global hsa_runtime_lib
if { $hsa_runtime_lib != "" } {
append always_ld_library_path ":$hsa_runtime_lib"
}
}
However, the libgomp GCN plugin is unconditionally built against the
GCC-shipped 'include/hsa*.h' header files, and at run time does
'dlopen("libhsa-runtime64.so.1")', so there is no system-provided HSA Runtime
library "used for builing". It thus doesn't make sense to amend
'LD_LIBRARY_PATH' for system-provided HSA Runtime library.
Fortran: Add support for OMP non-rectangular loops.
This patch adds support for OMP 5.1 "canonical loop nest form" to the
Fortran front end, marks non-rectangular loops for processing
by the middle end, and implements missing checks in the gimplifier
for additional prohibitions on non-rectangular loops.
Note that the OMP spec also prohibits non-rectangular loops with the TILE
construct; that construct hasn't been implemented yet, so that error will
need to be filled in later.
gcc/fortran/
* gfortran.h (struct gfc_omp_clauses): Add non_rectangular bit.
* openmp.cc (is_outer_iteration_variable): New function.
(expr_is_invariant): New function.
(bound_expr_is_canonical): New function.
(resolve_omp_do): Replace existing non-rectangularity error with
check for canonical form and setting non_rectangular bit.
* trans-openmp.cc (gfc_trans_omp_do): Transfer non_rectangular
flag to generated tree structure.
gcc/
* gimplify.cc (gimplify_omp_for): Update messages for SCHEDULED
and ORDERED clause conflict errors. Add check for GRAINSIZE and
NUM_TASKS on TASKLOOP.
gcc/testsuite/
* c-c++-common/gomp/loop-6.c (f3): New function to test TASKLOOP
diagnostics.
* gfortran.dg/gomp/collapse1.f90: Update expected messages.
* gfortran.dg/gomp/pr85313.f90: Remove dg-error on non-rectangular
loops that are now accepted.
* gfortran.dg/gomp/non-rectangular-loop.f90: New file.
* gfortran.dg/gomp/canonical-loop-1.f90: New file.
* gfortran.dg/gomp/canonical-loop-2.f90: New file.
Chung-Lin Tang [Fri, 17 Jun 2022 14:22:25 +0000 (22:22 +0800)]
openmp: Implement uses_allocators clause
This is a merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596412.html
For user defined allocator handles, this allows target regions to assign
memory space and traits to allocators, and automatically calls
omp_init/destroy_allocator() in the beginning/end of the target region.
For pre-defined allocators (i.e. omp_..._mem_alloc names), this is a no-op,
such clauses are not created.
Asides from the front-end portions, the target region transforms are
done in gimplify_omp_workshare.
This patch also includes added changes to enforce the "allocate allocator
must be also in a uses_allocator clause". This is done during
gimplify_scan_omp_clauses.
* c-parser.cc (c_parser_omp_clause_name): Add case for uses_allocators
clause.
(c_parser_omp_clause_uses_allocators): New function.
(c_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case.
(OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask.
* c-typeck.cc (c_finish_omp_clauses): Add case handling for
OMP_CLAUSE_USES_ALLOCATORS.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_clause_name): Add case for uses_allocators
clause.
(cp_parser_omp_clause_uses_allocators): New function.
(cp_parser_omp_all_clauses): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS case.
(OMP_TARGET_CLAUSE_MASK): Add PRAGMA_OMP_CLAUSE_USES_ALLOCATORS to mask.
* semantics.cc (finish_omp_clauses): Add case handling for
OMP_CLAUSE_USES_ALLOCATORS.
fortran/ChangeLog:
* gfortran.h (struct gfc_omp_namelist): Add memspace_sym, traits_sym
fields.
(OMP_LIST_USES_ALLOCATORS): New list enum.
* openmp.cc (enum omp_mask2): Add OMP_CLAUSE_USES_ALLOCATORS.
(gfc_match_omp_clause_uses_allocators): New function.
(gfc_match_omp_clauses): Add case to handle OMP_CLAUSE_USES_ALLOCATORS.
(OMP_TARGET_CLAUSES): Add OMP_CLAUSE_USES_ALLOCATORS.
(resolve_omp_clauses): Add "USES_ALLOCATORS" to clause_names[].
* dump-parse-tree.cc (show_omp_namelist): Handle OMP_LIST_USES_ALLOCATORS.
(show_omp_clauses): Likewise.
* trans-array.cc (gfc_conv_array_initializer): Adjust array index
to always be a created tree expression instead of NULL_TREE when zero.
* trans-openmp.cc (gfc_trans_omp_clauses): For ALLOCATE clause, handle
using gfc_trans_omp_variable for EXPR_VARIABLE exprs.
Add handling of OMP_LIST_USES_ALLOCATORS case.
* types.def (BT_FN_VOID_PTRMODE): Define.
(BT_FN_PTRMODE_PTRMODE_INT_PTR): Define.
* gimplify.cc (gimplify_scan_omp_clauses): Add checking of OpenMP target
region allocate clauses, to require a uses_allocators clause to exist
for allocators.
(gimplify_omp_workshare): Add handling of OMP_CLAUSE_USES_ALLOCATORS
for OpenMP target regions; create calls of omp_init/destroy_allocator
around target region body.
* omp-low.cc (lower_private_allocate): Adjust receiving of allocator.
(lower_rec_input_clauses): Likewise.
(create_task_copyfn): Add dereference for allocator if needed.
* system.h (startswith): New function.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/uses_allocators-1.c: New test.
* c-c++-common/gomp/uses_allocators-2.c: New test.
* c-c++-common/gomp/uses_allocators-3.c: New test.
* gfortran.dg/gomp/allocate-1.f90: Adjust testcase.
* gfortran.dg/gomp/uses_allocators-1.f90: New test.
* gfortran.dg/gomp/uses_allocators-2.f90: New test.
* gfortran.dg/gomp/uses_allocators-3.f90: New test.
Andrew Stubbs [Thu, 24 Feb 2022 17:16:13 +0000 (17:16 +0000)]
amdgcn: Add gfx90a support
This adds architecture options and multilibs for the AMD GFX90a GPUs.
It also tidies up some of the ISA selection code, and corrects a few small
mistake in the gfx908 naming.
gcc/ChangeLog:
* config.gcc (amdgcn): Accept --with-arch=gfx908 and gfx90a.
* config/gcn/gcn-opts.h (enum gcn_isa): New.
(TARGET_GCN3): Use enum gcn_isa.
(TARGET_GCN3_PLUS): Likewise.
(TARGET_GCN5): Likewise.
(TARGET_GCN5_PLUS): Likewise.
(TARGET_CDNA1): New.
(TARGET_CDNA1_PLUS): New.
(TARGET_CDNA2): New.
(TARGET_CDNA2_PLUS): New.
(TARGET_M0_LDS_LIMIT): New.
(TARGET_PACKED_WORK_ITEMS): New.
* config/gcn/gcn.cc (gcn_isa): Change to enum gcn_isa.
(gcn_option_override): Recognise CDNA ISA variants.
(gcn_omp_device_kind_arch_isa): Support gfx90a.
(gcn_expand_prologue): Make m0 init optional.
Add support for packed work items.
(output_file_start): Support gfx90a.
(gcn_hsa_declare_function_name): Support gfx90a metadata.
* config/gcn/gcn.h (TARGET_CPU_CPP_BUILTINS):Add __CDNA1__ and
__CDNA2__.
* config/gcn/gcn.md (<su>mulsi3_highpart): Use TARGET_GCN5_PLUS.
(<su>mulsi3_highpart_imm): Likewise.
(<su>mulsidi3): Likewise.
(<su>mulsidi3_imm): Likewise.
* config/gcn/gcn.opt (gpu_type): Add gfx90a.
* config/gcn/mkoffload.cc (EF_AMDGPU_MACH_AMDGCN_GFX90a): New.
(main): Support gfx90a.
* config/gcn/t-gcn-hsa: Add gfx90a multilib.
* config/gcn/t-omp-device: Add gfx90a isa.
Tobias Burnus [Mon, 23 May 2022 08:54:32 +0000 (10:54 +0200)]
OpenMP: Handle descriptors in target's firstprivate [PR104949]
For allocatable/pointer arrays, a firstprivate to a device
not only needs to privatize the descriptor but also the actual
data. This is implemented as:
firstprivate(x) firstprivate(x.data) attach(x [bias: &x.data-&x)
where the address of x in device memory is saved in hostaddrs[i]
by libgomp and the middle end actually passes hostaddrs[i]' to
attach.
As side effect, has_device_addr(array_desc) had to be changed:
before, it was converted to firstprivate in the front end; now
it is handled in omp-low.cc as has_device_addr requires a shallow
firstprivate (not touching the data pointer) while the normal
firstprivate requires (now) a deep firstprivate.
gcc/fortran/ChangeLog:
PR fortran/104949
* f95-lang.cc (LANG_HOOKS_OMP_ARRAY_SIZE): Redefine.
* trans-openmp.cc (gfc_omp_array_size): New.
(gfc_trans_omp_variable_list): Never turn has_device_addr
to firstprivate.
* trans.h (gfc_omp_array_size): New.
PR fortran/104949
* target.c (gomp_map_vars_internal, copy_firstprivate_data):
Support attach for GOMP_MAP_FIRSTPRIVATE.
* testsuite/libgomp.fortran/target-firstprivate-1.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-2.f90: New test.
* testsuite/libgomp.fortran/target-firstprivate-3.f90: New test.
Tobias Burnus [Thu, 12 May 2022 08:39:58 +0000 (10:39 +0200)]
Fortran: Fix proc pointer as elemental arg handling
The vtab's _callback function calls the elemental 'cb'
cb (var(:)%comp, comp_types_vtable._callback);
which gets called in a scalarization loop as 'var' might be a
nonscalar. Without the patch, that got translated as:
D.1234 = &comp_types_vtable._callback
...
cb (&(*D.4060)[S.3 + D.4071], &D.1234);
where 'D.1234' is function_type. With the patch, it remains a pointer;
i.e. D.1234 = comp... and 'cb (..., D.1234)', avoiding ME ICE.
Note: Fortran (F2018, C15100) requires that dummy arguments are
dummy data objects, which rules out dummy procs/proc-pointer dummies,
which is enforced in resolve_fl_procedure.
Thus, this change only affects the internally generated code.
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_scalar_elemental_arg_saved_as_reference):
Return true for attr.proc_pointer expressions.
gcc/testsuite/ChangeLog:
* gfortran.dg/finalize_38.f90: Compile with -Ofast.
* gfortran.dg/abstract_type_6.f03: Remove dg-error as
now hidden by other errors; copy to ...
* gfortran.dg/abstract_type_6a.f03: ... here; remove
some error to diagnose the error.
* gfortran.dg/finalize_39.f90: New test.
Fortran: Fix finalization resolution with deep copy
Follow-up patch to
"Fortran/OpenMP: Support mapping of DT with allocatable components"
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591144.html
gcc/fortran/ChangeLog:
* resolve.cc (gfc_resolve_finalizers): Also resolve allocatable comps.
Tobias Burnus [Wed, 4 May 2022 16:18:44 +0000 (18:18 +0200)]
OpenMP: Fix use_device_{addr,ptr} with in-data-sharing arg
For array-descriptor vars, the descriptor is assigned to a temporary. However,
this failed when the clause's argument was in turn in a data-sharing clause
as the outer context's VALUE_EXPR wasn't used.
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Fix use_device_{addr,ptr} with list
item that is in an outer data-sharing clause.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/use_device_addr-5.f90: New test.
Andrew Stubbs [Wed, 13 Apr 2022 15:55:47 +0000 (16:55 +0100)]
openmp: unified_address support
This makes "requires unified_address" work by making it eqivalent to
"requires unified_shared_memory". This is more than is strictly necessary,
but should be standard compliant.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_match_omp_requires): Check requires unified_address
for conflict with -foffload-memory=shared.
gcc/ChangeLog:
* omp-low.cc: Do USM transformations for "unified_address".
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/usm-4.c: New test.
* gfortran.dg/gomp/usm-4.f90: New test.
Fix a crash due to mismatch of free and GOMP_alloc.
With allocate directive, we replace the malloc calls to GOMP_alloc if
it is associated with the allocate statement. The memory was supposed
to be free-d by the implicitely generated free calls which also get
replaced. But if user explicitely deallocated the memory using the
deallocate statement, it can cause a mismatch. This commit handles
that case and also replaces the free call generated for deallocate
clause.
Also added deallocate in the testcase and tidied it up a bit.
gcc/ChangeLog:
* omp-low.cc (lower_omp_allocate): Move allocate declaration
inside loop. Set it to false at the end of condition.
Andrew Stubbs [Fri, 11 Mar 2022 12:58:38 +0000 (12:58 +0000)]
openmp: -foffload-memory=pinned
Implement the -foffload-memory=pinned option such that libgomp is
instructed to enable fully-pinned memory at start-up. The option is
intended to provide a performance boost to certain offload programs without
modifying the code.
This feature only works on Linux, at present, and simply calls mlockall to
enable always-on memory pinning. It requires that the ulimit feature is
set high enough to accommodate all the program's memory usage.
In this mode the ompx_pinned_memory_alloc feature is disabled as it is not
needed and may conflict.
Backport of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591354.html
gcc/ChangeLog:
* omp-low.cc (omp_enable_pinned_mode): New function.
(execute_lower_omp): Call omp_enable_pinned_mode.
libgomp/ChangeLog:
* config/linux/allocator.c (always_pinned_mode): New variable.
(GOMP_enable_pinned_mode): New function.
(linux_memspace_alloc): Disable pinning when always_pinned_mode set.
(linux_memspace_calloc): Likewise.
(linux_memspace_free): Likewise.
(linux_memspace_realloc): Likewise.
* libgomp.map (GOMP_5.1.1): New version space with
GOMP_enable_pinned_mode.
* testsuite/libgomp.c/alloc-pinned-7.c: New test.
openmp: Use libgomp memory allocation functions with unified shared memory.
This patches changes calls to malloc/free/calloc/realloc/aligned_alloc and
operator new to memory allocation functions in libgomp with
allocator=ompx_unified_shared_mem_alloc. This helps existing code to benefit
from the unified shared memory. The libgomp does the correct thing with all
the mapping constructs and there is no memory copies if the pointer is pointing
to unified shared memory.
We only replace replacable new operator and not the class member or placement new.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591353.html
gcc/ChangeLog:
* omp-low.cc (usm_transform): New function.
(make_pass_usm_transform): Likewise.
(class pass_usm_transform): New.
* passes.def: Add pass_usm_transform.
* tree-pass.h (make_pass_usm_transform): New declaration.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/usm-2.c: New test.
* c-c++-common/gomp/usm-3.c: New test.
* g++.dg/gomp/usm-1.C: New test.
* g++.dg/gomp/usm-2.C: New test.
* g++.dg/gomp/usm-3.C: New test.
* gfortran.dg/gomp/usm-2.f90: New test.
* gfortran.dg/gomp/usm-3.f90: New test.
libgomp/ChangeLog:
* testsuite/libgomp.c/usm-6.c: New test.
* testsuite/libgomp.c++/usm-1.C: Likewise.
Andrew Stubbs [Fri, 11 Mar 2022 12:37:58 +0000 (12:37 +0000)]
openmp, nvptx: ompx_unified_shared_mem_alloc
This adds support for using Cuda Managed Memory with omp_alloc. It will be
used as the underpinnings for "requires unified_shared_memory" in a later
patch.
There are two new predefined allocators, ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc, plus corresponding memory spaces, which can be used to
allocate memory in the "managed" space and explicitly on the host (it is
intended that "malloc" will be intercepted by the compiler).
The nvptx plugin is modified to make the necessary Cuda calls, and libgomp
is modified to switch to shared-memory mode for USM allocated mappings.
Backport of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591352.html
libgomp/ChangeLog:
* allocator.c (omp_max_predefined_alloc): Update.
(omp_aligned_alloc): Don't fallback ompx_host_mem_alloc.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/linux/allocator.c (linux_memspace_alloc): Handle USM.
(linux_memspace_calloc): Handle USM.
(linux_memspace_free): Handle USM.
(linux_memspace_realloc): Handle USM.
* config/nvptx/allocator.c (nvptx_memspace_alloc): Reject
ompx_host_mem_alloc.
(nvptx_memspace_calloc): Likewise.
(nvptx_memspace_realloc): Likewise.
* libgomp-plugin.h (GOMP_OFFLOAD_usm_alloc): New prototype.
(GOMP_OFFLOAD_usm_free): New prototype.
(GOMP_OFFLOAD_is_usm_ptr): New prototype.
* libgomp.h (gomp_usm_alloc): New prototype.
(gomp_usm_free): New prototype.
(gomp_is_usm_ptr): New prototype.
(struct gomp_device_descr): Add USM functions.
* omp.h.in (omp_memspace_handle_t): Add ompx_unified_shared_mem_space
and ompx_host_mem_space.
(omp_allocator_handle_t): Add ompx_unified_shared_mem_alloc and
ompx_host_mem_alloc.
* omp_lib.f90.in: Likewise.
* plugin/plugin-nvptx.c (nvptx_alloc): Add "usm" parameter.
Call cuMemAllocManaged as appropriate.
(GOMP_OFFLOAD_alloc): Move internals to ...
(GOMP_OFFLOAD_alloc_1): ... this, and add usm parameter.
(GOMP_OFFLOAD_usm_alloc): New function.
(GOMP_OFFLOAD_usm_free): New function.
(GOMP_OFFLOAD_is_usm_ptr): New function.
* target.c (gomp_map_vars_internal): Add USM support.
(gomp_usm_alloc): New function.
(gomp_usm_free): New function.
(gomp_load_plugin_for_device): New function.
* testsuite/libgomp.c/usm-1.c: New test.
* testsuite/libgomp.c/usm-2.c: New test.
* testsuite/libgomp.c/usm-3.c: New test.
* testsuite/libgomp.c/usm-4.c: New test.
* testsuite/libgomp.c/usm-5.c: New test.
Andrew Stubbs [Fri, 11 Mar 2022 14:33:11 +0000 (14:33 +0000)]
openmp, nvptx: low-lat memory access traits
The NVPTX low latency memory is not accessible outside the team that allocates
it, and therefore should be unavailable for allocators with the access trait
"all". This change means that the omp_low_lat_mem_alloc predefined
allocator now implicitly implies the "pteam" trait.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/589355.html
libgomp/ChangeLog:
* allocator.c (MEMSPACE_VALIDATE): New macro.
(omp_aligned_alloc): Use MEMSPACE_VALIDATE.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* config/nvptx/allocator.c (nvptx_memspace_validate): New function.
(MEMSPACE_VALIDATE): New macro.
* testsuite/libgomp.c/allocators-4.c (main): Add access trait.
* testsuite/libgomp.c/allocators-6.c (main): Add access trait.
* testsuite/libgomp.c/allocators-7.c: New test.
Andrew Stubbs [Fri, 11 Mar 2022 12:33:06 +0000 (12:33 +0000)]
libgomp, openmp: Add ompx_pinned_mem_alloc
This creates a new predefined allocator as a shortcut for using pinned
memory with OpenMP. The name uses the OpenMP extension space and is
intended to be consistent with other OpenMP implementations currently in
development.
The allocator is equivalent to using a custom allocator with the pinned
trait and the null fallback trait.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588951.html
* allocator.c (omp_max_predefined_alloc): Update.
(omp_aligned_alloc): Support ompx_pinned_mem_alloc.
(omp_free): Likewise.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
* omp.h.in (omp_allocator_handle_t): Add ompx_pinned_mem_alloc.
* omp_lib.f90.in: Add ompx_pinned_mem_alloc.
* testsuite/libgomp.c/alloc-pinned-5.c: New test.
* testsuite/libgomp.c/alloc-pinned-6.c: New test.
* testsuite/libgomp.fortran/alloc-pinned-1.f90: New test.
Andrew Stubbs [Fri, 11 Mar 2022 12:12:39 +0000 (12:12 +0000)]
libgomp: pinned memory
Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588360.html
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(xmlock): New function.
(omp_init_allocator): Don't disallow the pinned trait.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588372.html
This patch looks for malloc/free calls that were generated by allocate statement
that is associated with allocate directive and replaces them with GOMP_alloc
and GOMP_free.
gcc/ChangeLog:
* omp-low.cc (scan_sharing_clauses): Handle OMP_CLAUSE_ALLOCATOR.
(scan_omp_allocate): New.
(scan_omp_1_stmt): Call it.
(lower_omp_allocate): New function.
(lower_omp_1): Call it.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/allocate-6.f90: Add tests.
* gfortran.dg/gomp/allocate-7.f90: New test.
* gfortran.dg/gomp/allocate-8.f90: New test.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocate-2.f90: New test.
Handle cleanup of omp allocated variables (OpenMP 5.0).
Currently we are only handling omp allocate directive that is associated
with an allocate statement. This statement results in malloc and free calls.
The malloc calls are easy to get to as they are in the same block as allocate
directive. But the free calls come in a separate cleanup block. To help any
later passes finding them, an allocate directive is generated in the
cleanup block with kind=free. The normal allocate directive is given
kind=allocate.
Backport of a patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588370.html
gcc/fortran/ChangeLog:
* gfortran.h (struct access_ref): Declare new members
omp_allocated and omp_allocated_end.
* openmp.cc (gfc_match_omp_allocate): Set new_st.resolved_sym to
NULL.
(prepare_omp_allocated_var_list_for_cleanup): New function.
(gfc_resolve_omp_allocate): Call it.
* trans-decl.cc (gfc_trans_deferred_vars): Process omp_allocated.
* trans-openmp.cc (gfc_trans_omp_allocate): Set kind for the stmt
generated for allocate directive.
This is backport of a patch posted in
https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590597.html
An allocate clause in target region must specify an allocator
unless the compilation unit has requires construct with
dynamic_allocators clause. Current implementation of the allocate
clause did not check for this restriction. This patch fills that
gap.
gcc/ChangeLog:
* omp-low.cc (omp_maybe_offloaded_ctx): New prototype.
(scan_sharing_clauses): Check a restriction on allocate clause.
Tobias Burnus [Tue, 1 Mar 2022 15:35:08 +0000 (16:35 +0100)]
Fortran/OpenMP: Support mapping of DT with allocatable components
gcc/fortran/ChangeLog:
* class.cc (finalization_scalarizer): Mark syms as artificial.
(generate_callback_wrapper): New.
(gfc_find_derived_vtab): Call it, add _callback comp.
* f95-lang.cc (LANG_HOOKS_OMP_DEEP_MAPPING,
LANG_HOOKS_OMP_DEEP_MAPPING_P,
LANG_HOOKS_OMP_DEEP_MAPPING_CNT): Redeinfe
* gfortran.h (gfc_import_iso_c_binding_module,
GFC_CLASS_CALLBACK_DEFAULT_FLAG, GFC_CLASS_CALLBACK_VTABLE_FLAG,
GFC_CLASS_CB_ALLOCATABLE, GFC_CLASS_CB_POINTER,
GFC_CLASS_CB_PROC_POINTER, GFC_CLASS_CB_VTABLE,
GFC_CLASS_CB_VPTR): New.
* match.cc (select_type_set_tmp): Propagate allocatable property.
* module.cc (MOD_VERSION): Bump due to vtab change.
(import_iso_c_binding_module): New import_all arg.
(gfc_import_iso_c_binding_module): New.
(gfc_use_module): Update call.
* openmp.cc (resolve_omp_clauses): Accept DT with alloc comps.
* resolve.cc (gfc_resolve_formal_arglist, gfc_resolve_intrinsic,
resolve_fl_procedure, resolve_types): Permit some violations
for internal code.
* trans-array.cc (gfc_conv_descriptor_stride_get,
gfc_tree_array_size, gfc_full_array_size): Update
for GFC_TYPE_ARRAY_AKIND change.
(gfc_conv_expr_descriptor): Likewise; permit calling with tree code.
* trans-expr.cc (VTABLE_CALLBACK_FIELD): Add.
(VTAB_GET_FIELD_GEN): Use it.
(VTABLE_DEALLOCATE_FIELD): Undef at the end.
(gfc_conv_expr_reference): Fixes; avoid unneccessary temp var.
* trans-intrinsic.cc (gfc_conv_intrinsic_sizeof,
gfc_conv_associated): Fix class and comp-ref handling.
(conv_isocbinding_function): Remove buggy code.
* trans-openmp.cc (gfc_has_alloc_comps): Add ptr_ok arg.
(gfc_omp_private_outer_ref, gfc_walk_alloc_comps,
gfc_omp_clause_default_ctor, gfc_omp_clause_copy_ctor,
gfc_omp_clause_assign_op, gfc_omp_clause_dtor,
(gfc_omp_finish_clause): Update call.
(GFC_MAP_TOKEN_DATA, GFC_MAP_TOKEN_SIZES, GFC_MAP_TOKEN_KINDS,
GFC_MAP_TOKEN_DATA_OFFSET, GFC_MAP_TOKEN_OFFSET,
GFC_MAP_TOKEN_FLAGS, GFC_MAP_TOKEN_DETACH): Define.
(gfc_omp_get_token_data, gfc_omp_get_token_sizes,
gfc_omp_get_token_kinds, gfc_omp_get_token_offset_data,
gfc_omp_get_token_offset, gfc_omp_get_token_flags,
gfc_omp_get_token_detach, gfc_omp_get_map_token_type,
gfc_omp_get_cb_type, gfc_omp_gen_deep_map_fn,
gfc_omp_deep_mapping_map, gfc_omp_deep_mapping_item,
gfc_omp_deep_mapping_comps, gfc_omp_gen_simple_loop,
gfc_omp_get_array_size, gfc_omp_elmental_loop,
gfc_omp_deep_map_kind_p, gfc_omp_deep_mapping_int_p,
gfc_omp_deep_mapping_p, gfc_omp_deep_mapping_do),
gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New.
(gfc_trans_omp_array_section): Save clause decl to survive gimplifying.
(gfc_trans_omp_clauses): Likewise; fixes.
* trans-types.cc (gfc_build_array_type, gfc_get_derived_type,
gfc_get_array_descr_info): Update array kind to distinguish
different assumed-rank arrays.
* trans.h (gfc_class_vtab_callback_get, gfc_omp_deep_mapping_p,
gfc_omp_deep_mapping_cnt, gfc_omp_deep_mapping): New prototypes.
(enum gfc_array_kind): Additional GFC_ARRAY_ASSUMED_RANK_* entries.
gcc/ChangeLog:
* langhooks-def.h (lhd_omp_deep_mapping_p,
lhd_omp_deep_mapping_cnt, lhd_omp_deep_mapping): New.
(LANG_HOOKS_OMP_DEEP_MAPPING_P, LANG_HOOKS_OMP_DEEP_MAPPING_CNT,
LANG_HOOKS_OMP_DEEP_MAPPING): Define.
(LANG_HOOKS_DECLS): Use it.
* langhooks.cc (lhd_omp_deep_mapping_p, lhd_omp_deep_mapping_cnt,
lhd_omp_deep_mapping): New stubs.
* langhooks.h (struct lang_hooks_for_decls): Add new hooks
* omp-expand.cc (expand_omp_target): Handle dynamic-size
addr/sizes/kinds arrays.
* omp-low.cc (build_sender_ref, fixup_child_record_type,
scan_sharing_clauses, lower_omp_target): Update to handle
new hooks and dynamic-size addr/sizes/kinds arrays.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/allocatable-comp.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-3.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-4.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-5.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-6.f90: New test.
* testsuite/libgomp.fortran/map-alloc-comp-7.f90: New test.
Chung-Lin Tang [Thu, 24 Feb 2022 09:07:48 +0000 (01:07 -0800)]
openmp: Handle C/C++ array reference base-pointers in array sections
In cases where a program constructs its own deep-copying for arrays-of-pointers,
e.g:
#pragma omp target enter data map(to:level->vectors[:N])
for (i = 0; i < N; i++)
#pragma omp target enter data map(to:level->vectors[i][:N])
We need to treat the part of the array reference before the array section
as a base-pointer (here 'level->vectors[i]'), providing pointer-attachment
behavior.
This patch adds this inside handle_omp_array_sections(), tracing the whole
sequence of array dimensions, creating a whole base-pointer reference
iteratively using build_array_ref(). The conditions are that each of the
"absorbed" dimensions must be length==1, and the final reference must be
of pointer-type (so that pointer attachment makes sense).
openmp: Improve handling of nested OpenMP metadirectives in C and C++
This patch fixes a misparsing issue when encountering code like:
#pragma omp metadirective when {<selector_set>={...}: A)
#pragma omp metadirective when (<selector_set>={...}: B)
When called for the first metadirective, analyze_metadirective_body would
stop just before the colon in the second metadirective because it naively
assumes that the '}' marks the end of a code block.
The assertion for clauses to end parsing at the same point is now disabled
if a parse error has occurred during the parsing of the clause, since some
tokens may not be consumed if a parse error cuts parsing short.
gcc/c/
* c-parser.cc (c_parser_omp_construct): Move handling of
PRAGMA_OMP_METADIRECTIVE from here...
(c_parser_pragma): ...to here.
(analyze_metadirective_body): Check that the bracket nesting level
is also zero before stopping the adding of tokens on encountering a
close brace.
(c_parser_omp_metadirective): Modify function signature and update.
Do not assert on remaining tokens if there has been a parse error.
gcc/cp/
* parser.cc (cp_parser_omp_construct): Move handling of
PRAGMA_OMP_METADIRECTIVE from here...
(cp_parser_pragma): ...to here.
(analyze_metadirective_body): Check that the bracket
nesting level is also zero before stopping the adding of tokens on
encountering a close brace.
(cp_parser_omp_metadirective): Modify function signature and update.
Do not assert on remaining tokens if there has been a parse error.
gcc/testsuite/
* c-c++-common/gomp/metadirective-1.c (f): Add test for
improperly nested metadirectives.
openmp: More Fortran front-end fixes for metadirectives
This adds a check for declarative OpenMP directives in metadirective
variants (already present in the C/C++ front-ends), and fixes an
ICE when an empty metadirective (i.e. just '!$omp metadirective')
is presented.
gcc/fortran/
* gfortran.h (is_omp_declarative_stmt): New.
* openmp.cc (match_omp_metadirective): Reject declarative OpenMP
directives with 'sorry'.
* parse.cc (parse_omp_metadirective_body): Check that state stack head
is non-null before dereferencing.
(is_omp_declarative_stmt): New.
gcc/testsuite/
* gfortran.dg/gomp/metadirective-2.f90 (main): Test empty
metadirective.
openmp: Eliminate non-matching metadirective variants early in Fortran front-end
This patch checks during parsing if a metadirective selector is both
resolvable and non-matching - if so, it is removed from further
consideration. This is both more efficient, and avoids spurious
syntax errors caused by considering combinations of selectors that
lead to invalid combinations of OpenMP directives, when that
combination would never arise in the first place.
This exposes another bug - when metadirectives that are not of the
begin-end variety are nested, we might have to drill up through
multiple layers of the state stack to reach the state for the
next statement. This is now fixed.
gcc/
* omp-general.cc (DELAY_METADIRECTIVES_AFTER_LTO): Check that cfun is
non-null before derefencing.
gcc/fortran/
* decl.cc (gfc_match_end): Search for first previous state that is not
COMP_OMP_METADIRECTIVE.
* gfortran.h (gfc_skip_omp_metadirective_clause): Add prototype.
* openmp.cc (match_omp_metadirective): Skip clause if
result of gfc_skip_omp_metadirective_clause is true.
* trans-openmp.cc (gfc_trans_omp_set_selector): Add argument and
disable expression conversion if false.
(gfc_skip_omp_metadirective_clause): New.
With the OpenMP 5.0/5.1 specifications, we would expect A to be selected in
the metadirective when the target call is made, but B when f is called
directly outside of a target context. However, since GCC does not have
separate copies of f for local and target calls, and the construct selector
is static, it must be resolved one way or the other at compile-time (currently
in the favour of selecting A), which may be unexpected behaviour.
This patch attempts to detect the above situation, and will emit a warning
if found.
gcc/
* gimplify.cc (gimplify_omp_metadirective): Mark offloadable functions
containing metadirectives with 'construct={target}' in the selector.
* omp-general.cc (omp_has_target_constructor_p): New.
* omp-general.h (omp_has_target_constructor_p): New prototype.
* omp-low.cc (lower_omp_1): Emit warning if marked functions called
outside of a target context.
openmp: Add support for streaming metadirectives and resolving them after LTO
This patch adds support for streaming metadirective Gimple statements during
LTO, and adds a metadirective expansion pass that runs after LTO. This is
required for metadirectives with selectors that can only be resolved from
within the accel compiler.
openmp: Add support for resolving metadirectives during parsing and Gimplification
This adds support for resolving metadirectives according to the OpenMP 5.1
specification. The variants are sorted by score, then gathered into a list
of dynamic replacement candidates. The metadirective is then expanded into
a sequence of 'if..else' statements to test the dynamic selector and execute
the variant if the selector is satisfied.
If any of the selectors in the list are unresolvable, GCC will give up on
resolving the metadirective and try again later.
This adds a new Gimple statement type GIMPLE_OMP_METADIRECTIVE, which
represents the metadirective in Gimple. In high Gimple, the statement
contains the body of the directive variants, whereas in low Gimple, it
only contains labels to the bodies.
This patch adds support for converting metadirectives from tree to Gimple
form, and handling of the Gimple form (Gimple lowering, OpenMP lowering
and expansion, inlining, SSA handling etc).
Metadirectives should be resolved before they reach the back-end, otherwise
the compiler will crash as GCC does not know how to convert metadirective
Gimple statements to RTX.
This patch implements parsing for the OpenMP metadirective introduced in
OpenMP 5.0. Metadirectives are parsed into an OMP_METADIRECTIVE node,
with the variant clauses forming a chain accessible via
OMP_METADIRECTIVE_CLAUSES. Each clause contains the context selector
and tree for the variant.
User conditions in the selector are now permitted to be non-constant when
used in metadirectives as specified in OpenMP 5.1.
gcc/
* omp-general.cc (omp_context_selector_matches): Add extra argument.
(omp_resolve_metadirective): New stub function.
* omp-general.h (struct omp_metadirective_variant): New.
(omp_context_selector_matches): Add extra argument.
(omp_resolve_metadirective): New prototype.
* tree.def (OMP_METADIRECTIVE): New.
* tree.h (OMP_METADIRECTIVE_CLAUSES): New macro.
gcc/c/
* c-parser.cc (c_parser_skip_to_end_of_block_or_statement): Handle
parentheses in statement.
(c_parser_omp_metadirective): New prototype.
(c_parser_omp_context_selector): Add extra argument. Allow
non-constant expressions.
(c_parser_omp_context_selector_specification): Add extra argument and
propagate it to c_parser_omp_context_selector.
(analyze_metadirective_body): New.
(c_parser_omp_metadirective): New.
(c_parser_omp_construct): Handle PRAGMA_OMP_METADIRECTIVE.
gcc/c-family/
* c-common.h (enum c_omp_directive_kind): Add C_OMP_DIR_META.
(c_omp_expand_metadirective): New prototype.
* c-gimplify.cc (genericize_omp_metadirective_stmt): New.
(c_genericize_control_stmt): Handle OMP_METADIRECTIVE tree nodes.
* c-omp.cc (omp_directives): Classify metadirectives as C_OMP_DIR_META.
(c_omp_expand_metadirective): New stub function.
* c-pragma.cc (omp_pragmas): Add entry for metadirective.
* c-pragma.h (enum pragma_kind): Add PRAGMA_OMP_METADIRECTIVE.
libgomp/
* config/nvptx/team.c (gomp_nvptx_main): Initialize shared_pool_size
to zero. Do not use dynamic_smem_size register if PTX version lower
than 4.1.
Andrew Stubbs [Fri, 3 Dec 2021 17:46:41 +0000 (17:46 +0000)]
libgomp, nvptx: low-latency memory allocator
This patch adds support for allocating low-latency ".shared" memory on
NVPTX GPU device, via the omp_low_lat_mem_space and omp_alloc. The memory
can be allocated, reallocated, and freed using a basic but fast algorithm,
is thread safe and the size of the low-latency heap can be configured using
the GOMP_NVPTX_LOWLAT_POOL environment variable.
The use of the PTX dynamic_smem_size feature means that the minimum version
requirement is now bumped to 4.1 (still old at this point).
libgomp/ChangeLog:
* allocator.c (MEMSPACE_ALLOC): New macro.
(MEMSPACE_CALLOC): New macro.
(MEMSPACE_REALLOC): New macro.
(MEMSPACE_FREE): New macro.
(dynamic_smem_size): New constants.
(omp_alloc): Use MEMSPACE_ALLOC.
Implement fall-backs for predefined allocators.
(omp_free): Use MEMSPACE_FREE.
(omp_calloc): Use MEMSPACE_CALLOC.
Implement fall-backs for predefined allocators.
(omp_realloc): Use MEMSPACE_REALLOC.
Implement fall-backs for predefined allocators.
* config/nvptx/team.c (__nvptx_lowlat_heap_root): New variable.
(__nvptx_lowlat_pool): New asm varaible.
(gomp_nvptx_main): Initialize the low-latency heap.
* plugin/plugin-nvptx.c (lowlat_pool_size): New variable.
(GOMP_OFFLOAD_init_device): Read the GOMP_NVPTX_LOWLAT_POOL envvar.
(GOMP_OFFLOAD_run): Apply lowlat_pool_size.
* config/nvptx/allocator.c: New file.
* testsuite/libgomp.c/allocators-1.c: New test.
* testsuite/libgomp.c/allocators-2.c: New test.
* testsuite/libgomp.c/allocators-3.c: New test.
* testsuite/libgomp.c/allocators-4.c: New test.
* testsuite/libgomp.c/allocators-5.c: New test.
* testsuite/libgomp.c/allocators-6.c: New test.
This updates the bundled cuda.h header file to include some new API calls and
constants that are now used in the code.
This patch should be included when the "libgomp, nvptx: low-latency memory
allocator" or "openmp: Add support for 'target_device' context selector set"
patches are upstreamed.
include/
* cuda/cuda.h (CUdevice_attribute): Add definitions for
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MAJOR and
CU_DEVICE_ATTRIBUTE_COMPUTE_CAPABILITY_MINOR.
(CUmemAttach_flags): New.
(CUpointer_attribute): New.
(cuMemAllocManaged): New prototype.
(cuPointerGetAttribute): New prototype.
libgomp/
* plugin/cuda-lib.def (cuMemAllocManaged): Add new call.
(cuPointerGetAttribute): Likewise.
Frederik Harwath [Tue, 16 Nov 2021 15:22:29 +0000 (16:22 +0100)]
graphite: Accept loops without data references
It seems that the check that rejects loops without data references is
only included to avoid handling non-profitable loops. Including those
loops in Graphite's analysis enables more consistent diagnostic
messages in OpenACC "kernels" code and does not introduce any
testsuite regressions. If executing Graphite on loops without
data references leads to noticeable compile time slow-downs for
non-OpenACC users of Graphite, the check can be re-introduced but
restricted to non-OpenACC functions.
gcc/ChangeLog:
* graphite-scop-detection.cc (scop_detection::harmful_loop_in_region):
Remove check for loops without data references.
Frederik Harwath [Tue, 16 Nov 2021 15:21:57 +0000 (16:21 +0100)]
graphite: Adjust scop loop-nest choice
The find_common_loop function is used in Graphite to obtain a common
super-loop of all loops inside a SCoP. The function is applied to the
loop of the destination block of the edge that leads into the SESE
region and the loop of the source block of the edge that exits the
region. The exit block is usually introduced by the canonicalization
of the loop structure that Graphite does to support its code
generation. If it is empty, it may happen that it belongs to the outer
fake loop. This way, build_alias_set may end up analysing
data-references with respect to this loop although there may exist a
proper super-loop of the SCoP loops. This does not seem to be correct
in general and it leads to problems with runtime alias check creation
which fails if executed on a loop without niter information.
gcc/ChangeLog:
* graphite-scop-detection.cc (scop_context_loop): New function.
(build_alias_set): Use scop_context_loop instead of find_common_loop.
* graphite-isl-ast-to-gimple.cc (graphite_regenerate_ast_isl): Likewise.
* graphite.h (scop_context_loop): New declaration.
Frederik Harwath [Tue, 16 Nov 2021 15:21:42 +0000 (16:21 +0100)]
graphite: Tune parameters for OpenACC use
The default values of some parameters that restrict Graphite's
resource usage are too low for many OpenACC codes. Furthermore,
exceeding the limits does not alwas lead to user-visible diagnostic
messages.
This commit increases the parameter values on OpenACC functions. The
values were chosen to allow for the analysis of all "kernels" regions
in the SPEC ACCEL v1.3 benchmark suite. Warnings about exceeded
Graphite-related limits are added to the -fopt-info-missed
output. Those warnings are phrased in a uniform way that intentionally
refers to the "data-dependence analysis" of "OpenACC loops" instead of
"a failure in Graphite" to make them easier to understand for users.
gcc/ChangeLog:
* graphite-optimize-isl.cc (optimize_isl): Adjust
param_max_isl_operations value for OpenACC functions and add
special warnings if value gets exceeded.
* graphite-scop-detection.cc (build_scops): Likewise for
param_graphite_max_arrays_per_scop.
gcc/testsuite/ChangeLog:
* gcc.dg/goacc/graphite-parameter-1.c: New test.
* gcc.dg/goacc/graphite-parameter-2.c: New test.
Frederik Harwath [Tue, 16 Nov 2021 15:20:56 +0000 (16:20 +0100)]
openacc: Disable pass_pre on outlined functions analyzed by Graphite
The additional dependences introduced by partial redundancy
elimination proper and by the code hoisting step of the pass very
often cause Graphite to fail on OpenACC functions. On the other hand,
the pass can also enable the analysis of OpenACC loops (cf. e.g. the
loop-auto-transfer-4.f90 testcase), for instance, because full
redundancy elimination removes definitions that would otherwise
prevent the creation of runtime alias checks outside of the SCoP.
This commit disables the actual partial redundancy elimination step as
well as the code hoisting step of pass_pre on OpenACC functions that
might be handled by Graphite.
gcc/ChangeLog:
* tree-ssa-pre.cc (insert): Skip any insertions in OpenACC
functions that might be processed by Graphite.
Frederik Harwath [Tue, 16 Nov 2021 15:20:41 +0000 (16:20 +0100)]
openacc: Handle internal function calls in pass_lim
The loop invariant motion pass correctly refuses to move statements
out of a loop if any other statement in the loop is unanalyzable. The
pass does not know how to handle the OpenACC internal function calls
which was not necessary until recently when the OpenACC device
lowering pass was moved to a later position in the pass pipeline.
This commit changes pass_lim to ignore the OpenACC internal function
calls which do not contain any memory references. The hoisting enabled
by this change can be useful for the data-dependence analysis in
Graphite; for instance, in the outlined functions for OpenACC regions,
all invariant accesses to the ".omp_data_i" struct should be hoisted
out of the OpenACC loop. This is particularly important for variables
that were scalars in the original loop and which have been turned into
accesses to the struct by the outlining process. Not hoisting those
can prevent scalar evolution analysis which is crucial for Graphite.
Since any hoisting that introduces intermediate names - and hence,
"fake" dependences - inside the analyzed nest can be harmful to
data-dependence analysis, a flag to restrict the hoisting in OpenACC
functions is added to the pass. The pass instance that executes before
Graphite now runs with this flag set to true and the pass instance
after Graphite runs unrestricted.
A more precise way of selecting the statements for which hoisting
should be enabled is left for a future improvement.
gcc/ChangeLog:
* passes.def: Set restrict_oacc_hoisting to true for the early
pass_lim instance.
* tree-ssa-loop-im.cc (movement_possibility): Add
restrict_oacc_hoisting flag to function; restrict movement if set.
(compute_invariantness): Add restrict_oacc_hoisting flag and pass it on.
(gather_mem_refs_stmt): Skip IFN_GOACC_LOOP and IFN_UNIQUE
calls.
(loop_invariant_motion_in_fun): Add restrict_oacc_hoisting flag and
pass it on.
(pass_lim::execute): Pass on new flags.
* tree-ssa-loop-manip.h (loop_invariant_motion_in_fun): Adjust declaration.
* gimple-loop-interchange.cc (pass_linterchange::execute): Adjust call to
loop_invariant_motion_in_fun.
Frederik Harwath [Tue, 16 Nov 2021 15:20:15 +0000 (16:20 +0100)]
openacc: Warn about "independent" "kernels" loops with data-dependences
This commit concerns loops in OpenACC "kernels" region that have been marked
up with an explicit "independent" clause by the user, but for which Graphite
found data dependences. A discussion on the private internal OpenACC mailing
list suggested that warning the user about the dependences woud be a more
acceptable solution than reverting the user's decision. This behavior is
implemented by the present commit.
gcc/ChangeLog:
* common.opt: Add flag Wopenacc-false-independent.
* omp-offload.cc (oacc_loop_warn_if_false_independent): New function.
(oacc_loop_fixed_partitions): Call from here.
Andrew Stubbs [Tue, 16 Nov 2021 15:19:53 +0000 (16:19 +0100)]
openacc: Add runtime alias checking for OpenACC kernels
This commit adds the code generation for the runtime alias checks for
OpenACC loops that have been analyzed by Graphite. The runtime alias
check condition gets generated in Graphite. It is evaluated by the
code generated for the IFN_GOACC_LOOP internal function calls. If
aliasing is detected at runtime, the execution dimensions get adjusted
to execute the affected loops sequentially.
gcc/ChangeLog:
* graphite-isl-ast-to-gimple.cc: Include internal-fn.h.
(graphite_oacc_analyze_scop): Implement runtime alias checks.
* omp-expand.cc (expand_oacc_for): Add an additional "noalias" parameter
to GOACC_LOOP internal calls, and initialise it to integer_one_node.
* omp-offload.cc (oacc_xform_loop): Integrate the runtime alias check
into the GOACC_LOOP expansion.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/runtime-alias-check-2.c: New test.
Andrew Stubbs [Tue, 16 Nov 2021 15:19:23 +0000 (16:19 +0100)]
openacc: Add data optimization pass
Address PR90591 "Avoid unnecessary data transfer out of OMP
construct", for simple (but common) cases.
This commit adds a pass that optimizes data mapping clauses.
Currently, it can optimize copy/map(tofrom) clauses involving scalars
to copyin/map(to) and further to "private". The pass is restricted
"kernels" regions but could be extended to other types of regions.
gcc/ChangeLog:
* Makefile.in: Add pass.
* doc/gimple.texi: TODO.
* gimple-walk.cc (walk_gimple_seq_mod): Adjust for backward walking.
* gimple-walk.h (struct walk_stmt_info): Add field.
* passes.def: Add new pass.
* tree-pass.h (make_pass_omp_data_optimize): New declaration.
* omp-data-optimize.cc: New file.
Frederik Harwath [Tue, 16 Nov 2021 15:18:02 +0000 (16:18 +0100)]
Add function for printing a single OMP_CLAUSE
Commit 89f4f339130c ("For 'OMP_CLAUSE' in 'dump_generic_node', dump
the whole OMP clause chain") changed the dumping behavior for
OMP_CLAUSEs. The old behavior is required for a follow-up
commit ("openacc: Add data optimization pass") that optimizes single
OMP_CLAUSEs.
Frederik Harwath [Tue, 16 Nov 2021 15:17:48 +0000 (16:17 +0100)]
openacc: Remove unused partitioning in "kernels" regions
With the old "kernels" handling, unparallelized regions would
get executed with 1x1x1 partitioning even if the user provided
explicit num_gangs, num_workers clauses etc.
This commit restores this behavior by removing unused partitioning
after assigning the parallelism dimensions to loops.
gcc/ChangeLog:
* omp-offload.cc (oacc_remove_unused_partitioning): New function
for removing partitioning that is not used by any loop.
(oacc_validate_dims): Call oacc_remove_unused_partitioning and
enable warnings about unused partitioning.
Frederik Harwath [Tue, 16 Nov 2021 15:17:15 +0000 (16:17 +0100)]
openacc: Add further kernels tests
Add some copies of tests to continue covering the old "parloops"-based
"kernels" implementation - until it gets removed from GCC - and
add further tests for the new Graphite-based implementation.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-fortran/parallel-loop-auto-reduction-2.f90:
New test.
gcc/testsuite/ChangeLog:
* c-c++-common/goacc/classify-kernels-unparallelized-graphite.c:
New test.
* c-c++-common/goacc/classify-kernels-unparallelized-parloops.c:
New test.
* c-c++-common/goacc/kernels-decompose-1-parloops.c: New test.
* c-c++-common/goacc/kernels-reduction-parloops.c: New test.
* c-c++-common/goacc/loop-auto-reductions.c: New test.
* c-c++-common/goacc/note-parallelism-1-kernels-loop-auto-parloops.c:
New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-1.c: New test.
* c-c++-common/goacc/note-parallelism-kernels-loops-parloops.c:
New test.
* gfortran.dg/goacc/classify-kernels-unparallelized-parloops.f95:
New test.
* gfortran.dg/goacc/kernels-conversion.f95: New test.
* gfortran.dg/goacc/kernels-decompose-1-parloops.f95: New test.
* gfortran.dg/goacc/kernels-decompose-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-data-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops-2.f95: New test.
* gfortran.dg/goacc/kernels-loop-parloops.f95: New test.
* gfortran.dg/goacc/kernels-reductions.f90: New test.
Frederik Harwath [Tue, 16 Nov 2021 15:16:47 +0000 (16:16 +0100)]
openacc: Add "can_be_parallel" flag info to "graph" dumps
gcc/ChangeLog:
* graph.cc (oacc_get_fn_attrib): New declaration.
(find_loop_location): New declaration.
(draw_cfg_nodes_for_loop): Print value of the
can_be_parallel flag at the top of loops in OpenACC
functions.
Frederik Harwath [Tue, 16 Nov 2021 15:16:22 +0000 (16:16 +0100)]
openacc: Use Graphite for dependence analysis in "kernels" regions
This commit changes the handling of OpenACC "kernels" to use Graphite
for dependence analysis. To this end, it first introduces a new
internal representation for "kernels" regions which should be analyzed
by Graphite in pass_omp_oacc_kernels_decompose. This is now the
default for all "kernels" regions, but the old handling is still
available through the command line parameter
"--param=openacc_kernels=decompose-parloops". The handling of this
new region type in the omp lowering and omp offloading passes follows
the existing handling for "parallel" regions. This replaces the
specialized handling for "kernels" regions that was previously used
and which was in limited in many ways.
Graphite is adjusted to be able to analyze the OpenACC functions that
get outlined from the "kernels" regions. It is enabled to handle the
internal function calls that contain information about OpenACC
constructs. In some places where function calls would be rejected by
Graphite, those calls need to be ignored. In other places, information
about the loop step, bounds etc. needs to be extracted from the
calls. The goal is to enable an analysis of the original loop
parameters although the omp lowering and expansion steps have already
modified the loop structure. Some parallelization-enabling constructs
such as OpenACC "reduction" and "private"/"firstprivate" clauses must
be recognized and the data-dependences must be adjusted to reflect the
semantics of those constructs. The data-dependence analysis step in
Graphite has so far been tied to the code generation step. This
commit introduces a separate data-dependence analysis step that avoids
the code generation. This is necessary because adjusting the code
generation to create a correct OpenACC loop structure would require
very considerable effort and the goal of this commit is to implement
the dependence analysis only. The ability to use Graphite for
dependence analysis without its code generation might be of
independent interest, but it is so far used for OpenACC purposes
only. In general, all changes to Graphite try to avoid affecting other
uses of Graphite as much as possible.
gcc/ChangeLog:
* Makefile.in: Add graphite-oacc.o
* cfgloop.cc (alloc_loop): Set can_be_parallel_valid_p to false.
* cfgloop.h: Add can_be_parallel_valid_p field.
* cfgloopmanip.cc (copy_loop_info): Add assert.
* config/nvptx/nvptx.cc (nvptx_goacc_reduction_setup):
* doc/invoke.texi: Adjust param openacc-kernels description.
* doc/passes.texi: Adjust pass_ipa_oacc_kernels description.
* flag-types.h (enum openacc_kernels):Add
OPENACC_KERNELS_DECOMPOSE_PARLOOPS.
* gimple-pretty-print.cc (dump_gimple_omp_target): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
* gimple.h (enum gf_mask): Add
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE and
widen GF_OMP_TARGET_KIND_MASK.
(is_gimple_omp_oacc): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(is_gimple_omp_offloaded): Likewise.
* gimplify.cc (gimplify_omp_for): Enable reduction localization
for "kernels" regions.
(gimplify_omp_workshare): Likewise.
* graphite-dependences.cc (scop_get_reads_and_writes): Handle
"kills" and "reduction" PDRs.
(apply_schedule_on_deps): Add dump output for intermediate
steps of the dependence computation to enable understanding
of unexpected dependences.
(carries_deps): Likewise.
(scop_get_dependences): Handle "kill" operations and add dump
output.
* graphite-isl-ast-to-gimple.cc (visit_schedule_loop_node): New function.
(graphite_oacc_analyze_scop): New function.
* graphite-optimize-isl.cc (optimize_isl): Remove "static" and
add argument to identify OpenACC use; don't fail on unchanged
schedule in this case.
* graphite-poly.cc (new_poly_dr): Handle "kills".
(print_pdr): Likewise.
(new_gimple_poly_bb): Likewise.
(free_gimple_poly_bb): Likewise.
(new_scop): Handle "reduction", "private", and "firstprivate"
hash sets.
(free_scop): Likewise.
(print_isl_space): New function.
(debug_isl_space): New function.
* graphite-scop-detection.cc (scop_detection::can_represent_loop):
Don't fail if niter is 0 in OpenACC functions.
(scop_detection::add_scop): Don't reject regions with only one
loop in OpenACC functions.
(ignored_oacc_internal_call_p): New function.
(scan_tree_for_params): Handle VIEW_CONVERT_EXPR.
(stmt_has_side_effects): Ignore internal OpenACC function calls.
(add_write): Likewise.
(add_read): Likewise.
(add_kill): New function.
(add_kills): New function.
(add_oacc_kills): New function.
(try_generate_gimple_bb): Kill false dependences for OpenACC
"private"/"firstprivate" vars.
(gather_bbs::gather_bbs): Determin OpenACC
"private"/"firstprivate" vars in region.
(gather_bbs::before_dom_children): Add assert.
(determine_openacc_reductions): New function.
(build_scops): Determine OpenACC "reduction" vars in SCoP.
* graphite-sese-to-poly.cc (oacc_ifn_call_extract): New declaration.
(oacc_internal_call_p): New function.
(build_poly_dr): Ignore internal OpenACC function calls,
handle "reduction" refs.
(build_poly_sr): Likewise; handle "kill" operations.
* graphite.cc (graphite_transform_loops): Accept functions with
only a single loop.
(oacc_enable_graphite_p): New function.
(gate_graphite_transforms): Enable pass on OpenACC functions.
* graphite.h (enum poly_dr_type): Add PDR_KILL.
(struct poly_dr): Add "is_reduction" field.
(new_poly_dr): Add argument to declaration.
(pdr_kill_p): New function.
(print_isl_space): New declaration.
(debug_isl_space): New declaration.
(struct scop): Add fields "reductions_vars",
"oacc_firstprivate_vars", and "oacc_private_scalars".
(optimize_isl): New declaration.
(graphite_oacc_analyze_scop): New declaration.
* internal-fn.cc (expand_UNIQUE): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE
* internal-fn.h: Add OACC_PRIVATE_SCALAR and OACC_FIRSTPRIVATE
* omp-expand.cc (struct omp_region): Adjust comment.
(expand_omp_taskloop_for_inner):
(expand_omp_for): Add asserts about expected "kernels" region types.
(mark_loops_in_oacc_kernels_region): Likewise.
(expand_omp_target): Likewise; handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(build_omp_regions_1): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
Likewise.
(omp_make_gimple_edges): Likewise.
* omp-general.cc (oacc_get_kernels_attrib): New function.
(oacc_get_fn_dim_size): Allow argument to be NULL.
* omp-general.h (oacc_get_kernels_attrib): New declaration.
* omp-low.cc (struct omp_context): Add fields
"oacc_firstprivate_vars" and "oacc_private_scalars".
(was_originally_oacc_kernels): New function.
(is_oacc_kernels):
(is_oacc_kernels_decomposed_graphite_part): New function.
(new_omp_context): Allocate "oacc_first_private_vars" and
"oacc_private_scalars" ...
(delete_omp_context): ... and free from here.
(oacc_record_firstprivate_var_clauses): New function.
(oacc_record_private_scalars): New function.
(scan_sharing_clauses): Call functions to record "private"
scalars and "firstprivate" variables.
(check_oacc_kernel_gwv): Add assert.
(ctx_in_oacc_kernels_region): Handle
GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE.
(scan_omp_for): Likewise.
(check_omp_nesting_restrictions): Likewise.
(lower_oacc_head_mark): Likewise.
(lower_omp_for): Likewise.
(lower_omp_target): Create "private" and "firstprivate" marker
call statements.
(lower_oacc_head_tail): Adjust "private" and "firstprivate"
marker calls.
(lower_oacc_reductions): Emit "private" and "firstprivate"
marker call statements.
(make_oacc_firstprivate_vars_marker): New function.
(make_oacc_private_scalars_marker): New function.
* omp-oacc-kernels-decompose.cc (adjust_region_code_walk_stmt_fn):
Assign GF_OMP_TARGET_KIND_OACC_PARALLEL_KERNELS_GRAPHITE to
region using the new "kernels" handling.
(make_region_seq): Adjust default region type for new
"kernels" handling; no more exceptions, let Graphite handle everything.
(make_region_loop_nest): Likewise; add dump output and assert.
(adjust_nested_loop_clauses): Stop creating "auto" clauses if
loop has "independent", "gang" etc.
(transform_kernels_loop_clauses): Likewise.
* omp-offload.cc (oacc_extract_loop_call): New function.
(oacc_loop_get_cfg_loop): New function.
(can_be_parallel_str): New function.
(oacc_loop_can_be_parallel_p): New function.
(oacc_parallel_kernels_graphite_fun_p): New function.
(oacc_parallel_fun_p): New function.
(oacc_loop_transform_auto_into_independent): New function, ...
(oacc_loop_fixed_partitions): ... called from here to transfer
the result of Graphite's analysis to the loop.
(execute_oacc_loop_designation): Handle "oacc
functions with "parallel_kernels_graphite" attribute.
(execute_oacc_device_lower): Handle
IFN_UNIQUE_OACC_PRIVATE_SCALAR and IFN_UNIQUE_OACC_FIRSTPRIVATE.
* omp-offload.h (oacc_extract_loop_call): Add declaration.
* params.opt: Add "param=openacc-kernels" value "decompose-parloops".
* sese.cc (scalar_evolution_in_region): "Redirect" SCEV
analysis to outer loop for IFN_GOACC_LOOP calls.
* sese.h: Add field "kill_scalar_refs".
* tree-chrec.cc (chrec_fold_plus_1): Handle VIEW_CONVERT_EXPR
like CASE_CONVERT.
* tree-data-ref.cc (dump_data_reference): Include
DR_BASE_ADDRESS and DR_OFFSET in dump output.
(get_references_in_stmt): Don't reject OpenACC internal function
calls.
(graphite_find_data_references_in_stmt): Remove unused variable.
* tree-parloops.cc (pass_parallelize_loops::execute): Disable
pass with the new kernels handling, enable if requested explicitly.
* tree-scalar-evolution.cc (set_scev_analyze_openacc_calls):
Set flag to enable the analysis of internal OpenACC function
calls (use for Graphite only).
(oacc_call_analyzable_p): New function.
(oacc_ifn_call_extract): New function.
(oacc_simplify): New function.
(add_to_evolution): Simplify OpenACC internal function calls
if applicable.
(follow_ssa_edge_binary): Likewise.
(follow_ssa_edge_expr): Likewise.
(follow_copies_to_constant): Likewise.
(analyze_initial_condition): Likewise.
(interpret_loop_phi): Likewise.
(interpret_gimple_call): New function.
(interpret_rhs_expr): Likewise.
(instantiate_scev_name): Likewise.
(analyze_scalar_evolution_1): Handle GIMPLE_CALL, handle default definitions.
(expression_expensive_p): Consider internal OpenACC calls to
be cheap.
* tree-scalar-evolution.h (set_scev_analyze_openacc_calls):
New declaration.
(oacc_call_analyzable_p): New declaration.
* tree-ssa-dce.cc (mark_stmt_if_obviously_necessary): Mark
lhs of internal OpenACC function calls necessary.
* tree-ssa-ifcombine.c (recognize_if_then_else):
* tree-ssa-loop-niter.cc (oacc_call_analyzable_p):
(oacc_ifn_call_extract): New declaration.
(interpret_gimple_call): New delcaration.
(expand_simple_operations): Handle internal OpenACC function calls.
* tree-ssa-loop.cc (gate_oacc_kernels): Disable for new
"kernels" handling.
* graphite-oacc.cc: New file.
* graphite-oacc.h: New file.
Frederik Harwath [Tue, 16 Nov 2021 15:15:08 +0000 (16:15 +0100)]
graphite: Add runtime alias checking
Graphite rejects a SCoP if it contains a pair of data references for
which it cannot determine statically if they may alias. This happens
very often, for instance in C code which does not use explicit
"restrict". This commit adds the possibility to analyze a SCoP
nevertheless and perform an alias check at runtime. Then, if aliasing
is detected, the execution will fall back to the unoptimized SCoP.
TODO This needs more testing on non-OpenACC code.
gcc/ChangeLog:
* common.opt: Add fgraphite-runtime-alias-checks.
* graphite-isl-ast-to-gimple.cc
(generate_alias_cond): New function.
(graphite_regenerate_ast_isl): Use from here.
* graphite-poly.cc (new_scop): Create unhandled_alias_ddrs vec ...
(free_scop): and release here.
* graphite-scop-detection.cc (dr_defs_outside_region): New function.
(dr_well_analyzed_for_runtime_alias_check_p): New function.
(graphite_runtime_alias_check_p): New function.
(build_alias_set): Record unhandled alias ddrs for later alias check
creation if flag_graphite_runtime_alias_checks is true instead
of failing.
* graphite.h (struct scop): Add field unhandled_alias_ddrs.
* sese.h (has_operands_from_region_p): New function.
Frederik Harwath [Tue, 16 Nov 2021 15:13:03 +0000 (16:13 +0100)]
graphite: Fix minor mistakes in comments
gcc/ChangeLog:
* graphite-sese-to-poly.cc (build_poly_sr_1): Fix a typo and
a reference to a variable which does not exist.
* graphite-isl-ast-to-gimple.cc (gsi_insert_earliest): Fix typo
in comment.
Frederik Harwath [Tue, 16 Nov 2021 15:12:23 +0000 (16:12 +0100)]
graphite: Rename isl_id_for_ssa_name
The SSA names for which this function gets used are always SCoP
parameters and hence "isl_id_for_parameter" is a better name. It also
explains the prefix "P_" for those names in the ISL representation.
gcc/ChangeLog:
* graphite-sese-to-poly.cc (isl_id_for_ssa_name): Rename to ...
(isl_id_for_parameter): ... this new function name.
(build_scop_context): Adjust function use.
Frederik Harwath [Tue, 16 Nov 2021 15:11:21 +0000 (16:11 +0100)]
graphite: Extend SCoP detection dump output
Extend dump output to make understanding why Graphite rejects to
include a loop in a SCoP easier (for GCC developers).
gcc/ChangeLog:
* graphite-scop-detection.cc (scop_detection::can_represent_loop):
Output reason for failure to dump file.
(scop_detection::harmful_loop_in_region): Likewise.
(scop_detection::graphite_can_represent_expr): Likewise.
(scop_detection::stmt_has_simple_data_refs_p): Likewise.
(scop_detection::stmt_simple_for_scop_p): Likewise.
(print_sese_loop_numbers): New function.
(scop_detection::add_scop): Use from here to print loops in
rejected SCoP.
Frederik Harwath [Tue, 16 Nov 2021 15:07:34 +0000 (16:07 +0100)]
openacc: Move pass_oacc_device_lower after pass_graphite
The OpenACC device lowering pass must run after the Graphite pass to
allow for the use of Graphite for automatic parallelization of kernels
regions in the future. Experimentation has shown that it is best,
performancewise, to run pass_oacc_device_lower together with the
related passes pass_oacc_loop_designation and pass_oacc_gimple_workers
early after pass_graphite in pass_tree_loop, at least if the other
tree loop passes are not adjusted. In particular, to enable
vectorization which is crucial for GCN offloading, device lowering
should happen before pass_vectorize. To bring the loops contained in
the offloading functions into the shape expected by the loop
vectorizer, we have to make sure that some passes that previously were
executed only once before pass_tree_loop are also executed on the
offloading functions. To ensure the execution of
pass_oacc_device_lower if pass_tree_loop does not execute (no loops,
no optimizations), we introduce two further copies of the pass to the
pipeline that run if there are no loops or if no optimization is
performed.
gcc/ChangeLog:
* omp-general.cc (oacc_get_fn_dim_size): Return 0 on
missing "dims".
* omp-offload.cc (pass_oacc_loop_designation::clone): New
member function.
(pass_oacc_gimple_workers::clone): Likewise.
(pass_oacc_gimple_device_lower::clone): Likewise.
* passes.cc (pass_data_no_loop_optimizations): New pass_data.
(class pass_no_loop_optimizations): New pass.
(make_pass_no_loop_optimizations): New function.
* passes.def: Move pass_oacc_{loop_designation,
gimple_workers, device_lower} into tree_loop, and add
copies to pass_tree_no_loop and to new
pass_no_loop_optimizations. Add copies of passes pass_ccp,
pass_ipa_warn, pass_complete_unrolli, pass_backprop,
pass_phiprop, pass_fix_loops after the OpenACC passes
in pass_tree_loop.
* tree-ssa-loop-ivcanon.cc (pass_complete_unroll::clone):
New member function.
(pass_complete_unrolli::clone): Likewise.
* tree-ssa-loop.cc (pass_fix_loops::clone): Likewise.
(pass_tree_loop_init::clone): Likewise.
(pass_tree_loop_done::clone): Likewise.
* tree-ssa-phiprop.cc (pass_phiprop::clone): Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.oacc-c-c++-common/pr85486-2.c: Adjust
expected output to pass name changes due to the pass
reordering and cloning.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-1.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-3.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-4.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-5.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-6.c: Likewise
* testsuite/libgomp.oacc-c-c++-common/vector-length-128-7.c: Likewise.
The Fortran front end presently linearizes accesses to
multi-dimensional arrays by combining the indices for the various
dimensions into a series of explicit multiplies and adds with
refactoring to allow CSE of invariant parts of the computation.
Unfortunately this representation interferes with Graphite-based loop
optimizations. It is difficult to recover the original
multi-dimensional form of the access by the time loop optimizations
run because parts of it have already been optimized away or into a
form that is not easily recognizable, so it seems better to have the
Fortran front end produce delinearized accesses to begin with, a set
of nested ARRAY_REFs similar to the existing behavior of the C and C++
front ends. This is a long-standing problem that has previously been
discussed e.g. in PR 14741 and PR61000.
This patch is an initial implementation for explicit array accesses
only; it doesn't handle the accesses generated during scalarization of
whole-array or array-section operations, which follow a different code
path.
gcc/
* expr.cc (get_inner_reference): Handle NOP_EXPR like
VIEW_CONVERT_EXPR.
gcc/fortran/
* lang.opt (-param=delinearize=): New.
* trans-array.cc (get_class_array_vptr): New, split from...
(build_array_ref): ...here.
(get_array_lbound, get_array_ubound): New, split from...
(gfc_conv_array_ref): ...here. Additional code refactoring
plus support for delinearization of the array access.
Chung-Lin Tang [Thu, 19 Aug 2021 08:17:02 +0000 (16:17 +0800)]
openacc: fix ICE for non-decl expression in non-contiguous array base-pointer
Currently, we do not support cases like struct-members as the base-pointer
for an OpenACC non-contiguous array. Mark such cases as unsupported in the
C/C++ front-ends, instead of ICEing on them.
gcc/c/ChangeLog:
* c-typeck.cc (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.
gcc/cp/ChangeLog:
* semantics.cc (handle_omp_array_sections_1): Robustify non-contiguous
array check and reject non-DECL base-pointer cases as unsupported.
Andrew Stubbs [Tue, 3 Aug 2021 12:45:35 +0000 (13:45 +0100)]
libgomp amdgcn: Fix issues with dynamic OpenMP thread scaling
libgomp/ChangeLog:
* config/gcn/bar.h (gomp_barrier_init): Limit thread count to the
actual physical number.
* config/gcn/team.c (gomp_team_start): Don't attempt to set up
threads that do not exist.
and moves the stripping of ARRAY_REFS/INDIRECT_REFS out of
extract_base_bit_offset and back into the (two) call sites of the
function. The difference between the two ways of looking through these
nodes comes down to (I think) what processing has been done on the
clause in question already: in the case where BASE_REF is non-NULL,
we are processing an OMP_CLAUSE_DECL for the first time. Conversely,
when BASE_REF is NULL, we are processing a node from the sorted list
that is being constructed after a GOMP_MAP_STRUCT node.
2021-06-07 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (extract_base_bit_offset): Don't look through ARRAY_REFs or
INDIRECT_REFs here.
(build_struct_group): Reinstate previous behaviour for handling
ARRAY_REFs/INDIRECT_REFs.
Julian Brown [Tue, 18 May 2021 17:22:56 +0000 (10:22 -0700)]
[og11] Rework indirect struct handling for OpenACC in gimplify.c
This patch reworks indirect struct handling in gimplify.c (i.e. for
struct components mapped with "mystruct->a[0:n]", "mystruct->b", etc.),
for OpenACC. The key observation leading to these changes was that
component mappings of references-to-structures is already implemented
and working, and indirect struct component handling via a pointer can
work quite similarly. That lets us remove some earlier, special-case
handling for mapping indirect struct component accesses for OpenACC,
which required the pointed-to struct to be manually mapped before the
indirect component mapping.
With this patch, you can map struct components directly (e.g. an array
slice "mystruct->a[0:n]") just like you can map a non-indirect struct
component slice ("mystruct.a[0:n]"). Both references-to-pointers (with
the former syntax) and references to structs (with the latter syntax)
work now.
For Fortran class pointers, we no longer re-use GOMP_MAP_TO_PSET for the
class metadata (the structure that points to the class data and vptr)
-- it is instead treated as any other struct.
For C++, the struct handling also works for class members ("this->foo"),
without having to explicitly map "this[:1]" first.
For OpenACC, we permit chained indirect component references
("mystruct->a->b[0:n]"), though only the last part of such mappings will
trigger an attach/detach operation. To properly use such a construct
on the target, you must still manually map "mystruct->a[:1]" first --
but there's no need to map "mystruct[:1]" explicitly before that.
This version of the patch avoids altering code paths for OpenMP,
where possible.
2021-06-02 Julian Brown <julian@codesourcery.com>
gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_clauses): Don't create GOMP_MAP_TO_PSET
mappings for class metadata, nor GOMP_MAP_POINTER mappings for
POINTER_TYPE_P decls.
gcc/
* gimplify.cc (extract_base_bit_offset): Add BASE_IND and OPENMP
parameters. Handle pointer-typed indirect references for OpenACC
alongside reference-typed ones.
(strip_components_and_deref, aggregate_base_p): New functions.
(build_struct_group): Add pointer type indirect ref handling,
including chained references, for OpenACC. Also handle references to
structs for OpenACC. Conditionalise bits for OpenMP only where
appropriate.
(gimplify_scan_omp_clauses): Rework pointer-type indirect structure
access handling to work more like the reference-typed handling for
OpenACC only.
* omp-low.cc (scan_sharing_clauses): Handle pointer-type indirect struct
references, and references to pointers to structs also.
gcc/testsuite/
* g++.dg/goacc/member-array-acc.C: New test.
* g++.dg/gomp/member-array-omp.C: New test.
libgomp/
* testsuite/libgomp.oacc-c-c++-common/deep-copy-15.c: New test.
* testsuite/libgomp.oacc-c-c++-common/deep-copy-16.c: New test.
* testsuite/libgomp.oacc-c++/deep-copy-17.C: New test.
Julian Brown [Tue, 18 May 2021 17:08:22 +0000 (10:08 -0700)]
[og11] Refactor struct lowering for OpenACC/OpenMP in gimplify.c
This patch is a second attempt at refactoring struct component mapping
handling for OpenACC/OpenMP during gimplification, after the patch I
posted here:
This patch goes further, in that the struct-handling code is outlined
into its own function (to create the "GOMP_MAP_STRUCT" node and the
sorted list of nodes immediately following it, from a set of mappings
of components of a given struct or derived type). I've also gone through
the list-handling code and attempted to add comments documenting how it
works to the best of my understanding, and broken out a couple of helper
functions in order to (hopefully) have the code self-document better also.
2021-06-02 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (insert_struct_comp_map): Refactor function into...
(build_struct_comp_nodes): This new function. Remove list handling
and improve self-documentation.
(insert_node_after, move_node_after, move_nodes_after,
move_concat_nodes_after): New helper functions.
(build_struct_group): New function to build up GOMP_MAP_STRUCT node
groups to map struct components. Outlined from...
(gimplify_scan_omp_clauses): Here. Call above function.
Julian Brown [Mon, 19 Apr 2021 13:24:41 +0000 (06:24 -0700)]
[og11] Unify ARRAY_REF/INDIRECT_REF stripping code in extract_base_bit_offset
For historical reasons, it seems that extract_base_bit_offset
unnecessarily used two different ways to strip ARRAY_REF/INDIRECT_REF
nodes from component accesses. I verified that the two ways of performing
the operation gave the same results across the whole testsuite (and
several additional benchmarks).
The code was like this since an earlier "mechanical" refactoring by me,
first posted here:
It was never clear to me if there was an important semantic
difference between the two ways of stripping the base before calling
get_inner_reference, but it appears that there is not, so one can go away.
2021-06-02 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (extract_base_bit_offset): Unify ARRAY_REF/INDIRECT_REF
stripping code in first call/subsequent call cases.
It never makes sense for a GOMP_MAP_ATTACH_DETACH mapping to survive
beyond gimplify.c, so this patch rewrites such mappings to GOMP_MAP_ATTACH
or GOMP_MAP_DETACH unconditionally (rather than checking for a list
of types of OpenACC or OpenMP constructs), in cases where it hasn't
otherwise been done already in the preceding code.
2021-06-02 Julian Brown <julian@codesourcery.com>
gcc/
* gimplify.cc (gimplify_scan_omp_clauses): Simplify condition
for changing GOMP_MAP_ATTACH_DETACH to GOMP_MAP_ATTACH or
GOMP_MAP_DETACH.