Julian Brown [Thu, 21 Jan 2021 14:54:54 +0000 (06:54 -0800)]
[og10] openacc: Fix lowering for derived-type mappings through array elements
This patch fixes lowering of derived-type mappings which select elements
of arrays of derived types, and similar. These would previously lead
to ICEs.
With this change, OpenACC directives can pass through constructs that
are no longer recognized by the gimplifier, hence alterations are needed
there also.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Handle element selection
for arrays of derived types.
gcc/
* gimplify.c (gimplify_scan_omp_clauses): Handle ATTACH_DETACH
for non-decls.
Tobias Burnus [Tue, 16 Feb 2021 16:39:49 +0000 (17:39 +0100)]
[og10] Fortran: %re/%im fixes for OpenMP/OpenACC + gfc_is_simplify_contiguous
gcc/fortran/ChangeLog:
* expr.c (gfc_is_simplify_contiguous): Handle REF_INQUIRY, i.e.
%im and %re which are EXPR_VARIABLE.
* openmp.c (resolve_omp_clauses): Diagnose %re/%im explicitly.
gcc/testsuite/ChangeLog:
* gfortran.dg/goacc/ref_inquiry.f90: New test.
* gfortran.dg/gomp/ref_inquiry.f90: New test.
Julian Brown [Thu, 4 Feb 2021 18:13:22 +0000 (10:13 -0800)]
[og10] openacc: Character types and mixed arrays/derived type tests
This patch adds some tests for character types that are components
of derived types used in OpenACC data-movement clauses (some of which
currently fail and are thus XFAILed), and tests (also XFAILed) mixing
arrays and derived types. The XFAILs are addressed by follow-on patches.
Originally a combination of several mainline patches.
Julian Brown [Sat, 30 Jan 2021 01:18:07 +0000 (17:18 -0800)]
[og10] openacc: Use class_pointer instead of pointer attribute for class types
Elsewhere in the Fortran front-end, the class_pointer attribute is
used for BT_CLASS entities instead of the pointer attribute. This patch
follows suit for OpenACC. I couldn't actually come up with a test case
where this makes a difference (i.e., where "class_pointer" and "pointer"
have different values at this point in the code), but this may nonetheless
fix a latent bug.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Use class_pointer attribute
for BT_CLASS.
Julian Brown [Fri, 29 Jan 2021 23:37:27 +0000 (15:37 -0800)]
[og10] openacc: Dereference BT_CLASS data pointers but not BT_DERIVED pointers
The stanza in gfc_trans_omp_clauses that handles derived type members
that are themselves derived type pointers or class pointers now adds
an explicit dereference only for the latter. The former is already
dereferenced transparently in gfc_conv_component_ref.
gcc/fortran/
* trans-openmp.c (gfc_trans_omp_clauses): Fix dereferencing for
BT_DERIVED members.
gcc/testsuite/
* gfortran.dg/goacc/derived-classtypes-1.f95: New test.
Tom de Vries [Fri, 9 Oct 2020 09:36:10 +0000 (11:36 +0200)]
[nvptx] Set -misa=sm_35 by default
The nvptx-as assembler verifies the ptx code using ptxas, if there's any
in the PATH.
The default in the nvptx port for -misa=sm_xx is sm_30, but the ptxas of the
latest cuda release (11.1) no longer supports sm_30.
Consequently we cannot build gcc against that release (although we should
still be able to build without any cuda release).
Fix this by setting -misa=sm_35 by default.
Tested check-gcc on nvptx.
Tested libgomp on x86_64-linux with nvpx accelerator.
Both build again cuda 9.1.
gcc/ChangeLog:
2020-10-09 Tom de Vries <tdevries@suse.de>
PR target/97348
* config/nvptx/nvptx.h (ASM_SPEC): Also pass -m to nvptx-as if
default is used.
* config/nvptx/nvptx.opt (misa): Init with PTX_ISA_SM35.
Tobias Burnus [Fri, 12 Feb 2021 10:21:08 +0000 (11:21 +0100)]
Fortran: Fix some select rank issues [PR97694 and 97723].
Backport from mainline; also fixes PR fortran/99045
2020-12-27 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/97694
PR fortran/97723
* check.c (allocatable_check): Select rank temporaries are
permitted even though they are treated as associate variables.
* resolve.c (gfc_resolve_code): Break on select rank as well as
select type so that the block os resolved.
* trans-stmt.c (trans_associate_var): Class associate variables
that are optional dummies must use the backend_decl.
gcc/testsuite/
PR fortran/97694
PR fortran/97723
* gfortran.dg/select_rank_5.f90: New test.
Jakub Jelinek [Wed, 18 Nov 2020 08:40:45 +0000 (09:40 +0100)]
openmp: Fix ICE on non-rectangular loop with known 0 iterations
The loops in the testcase are non-rectangular and have 0 iterations
(the outer loop iterates, but the inner one never). In this case we
just have the overall number of iterations computed (0), and don't have
factor and other values computed. We never need to map logical iterations
to the individual iterations in that case, and we were crashing during
expansion of that code.
2020-11-18 Jakub Jelinek <jakub@redhat.com>
PR middle-end/97862
* omp-expand.c (expand_omp_for_init_vars): Don't use the sqrt path
if number of iterations is constant 0.
Jakub Jelinek [Tue, 13 Oct 2020 07:30:47 +0000 (09:30 +0200)]
openmp: Improve composite triangular loop lowering and expansion
This propagates needed values from the point where number of iterations
is calculated on composite loops to the places where that information
is needed to use the more efficient square root discovery to compute
the starting iterator values from the logical iteration number.
2020-10-13 Jakub Jelinek <jakub@redhat.com>
* omp-low.c (add_taskreg_looptemp_clauses): For triangular loops
with non-constant number of iterations add another 4 _looptemp_
clauses before the (optional) one for lastprivate.
(lower_omp_for_lastprivate): Skip those clauses when looking for
the lastprivate clause.
(lower_omp_for): For triangular loops with non-constant number of
iterations add another 4 _looptemp_ clauses.
* omp-expand.c (expand_omp_for_init_counts): For triangular loops
with non-constant number of iterations set counts[0],
fd->first_inner_iterations, fd->factor and fd->adjn1 from the newly
added _looptemp_ clauses.
(expand_omp_for_init_vars): Initialize the newly added _looptemp_
clauses.
(find_lastprivate_looptemp): New function.
(expand_omp_for_static_nochunk, expand_omp_for_static_chunk,
expand_omp_taskloop_for_outer): Use it instead of manually skipping
_looptemp_ clauses.
Jakub Jelinek [Thu, 13 Aug 2020 07:06:05 +0000 (09:06 +0200)]
openmp: Add support for non-rectangular loops in taskloop construct
2020-08-13 Jakub Jelinek <jakub@redhat.com>
* gimplify.c (gimplify_omp_taskloop_expr): New function.
(gimplify_omp_for): Use it. For OMP_FOR_NON_RECTANGULAR
loops adjust in outer taskloop the var-outer decls.
* omp-expand.c (expand_omp_taskloop_for_inner): Handle non-rectangular
loops.
(expand_omp_for): Don't reject non-rectangular taskloop.
* omp-general.c (omp_extract_for_data): Don't assert that
non-rectangular loops have static schedule, instead treat loop->m1
or loop->m2 as if loop->n1 or loop->n2 is non-constant.
* testsuite/libgomp.c/loop-22.c (main): Add some further tests.
* testsuite/libgomp.c/loop-23.c (main): Likewise.
* testsuite/libgomp.c/loop-24.c: New test.
Jakub Jelinek [Wed, 5 Aug 2020 08:45:16 +0000 (10:45 +0200)]
openmp: Handle even some combined non-rectangular loops
The number of loops computation and logical iteration -> actual iterator values
computations can now be done separately even on composite constructs (though
for triangular loops it would still be more efficient to propagate a few values
through, will handle that incrementally).
simd and taskloop are still unhandled.
Jakub Jelinek [Wed, 5 Aug 2020 08:37:25 +0000 (10:37 +0200)]
openmp: Use more efficient logical -> actual computation even if # iterations is computed at runtime
For triangular loops use more efficient logical iteration number
to actual iterator values computation even for non-rectangular loops
where number of loop iterations could not be computed at compile time.
2020-08-05 Jakub Jelinek <jakub@redhat.com>
* omp-expand.c (expand_omp_for_init_counts): Remember
first_inner_iterations, factor and n1o from the number of iterations
computation in *fd.
(expand_omp_for_init_vars): Use more efficient logical iteration number
to actual iterator values computation even for non-rectangular loops
where number of loop iterations could not be computed at compile time.
Jakub Jelinek [Tue, 4 Aug 2020 08:53:07 +0000 (10:53 +0200)]
openmp: Compute number of collapsed loop iterations more efficiently for some non-rectangular loops
2020-08-04 Jakub Jelinek <jakub@redhat.com>
* omp-expand.c (expand_omp_for_init_counts): For triangular loops
compute number of iterations at runtime more efficiently.
(expand_omp_for_init_vars): Adjust immediate dominators.
(extract_omp_for_update_vars): Likewise.
Jakub Jelinek [Wed, 15 Jul 2020 14:34:54 +0000 (16:34 +0200)]
openmp: Fix up loop-21.c
I've missed
+FAIL: libgomp.c/loop-21.c execution test
during testing of the recent patch. The problem is that while
for the number of iterations computation it doesn't matter if we compute
min_inner_iterations as (m2 * first + n2 + (adjusted step) + m1 * first + n1) / step
or (m2 * last + n2 + (adjusted step) + m1 * last + n1) / step provided that
in the second case we use as factor (m1 - m2) * ostep / step rather than
(m2 - m1) * ostep / step, for the logical to actual iterator values computation
it does matter and in my hand written C implementations of all the cases (outer
vs. inner loop with increasing vs. decreasing iterator) I'm using the same computation
and it worked well for all the pseudo-random iterators testing it was doing.
It also means min_inner_iterations is misnamed, because it is not really
minimum number of inner iterations, whether the first or last outer iteration
results in the smaller or larger value of this can be (sometimes) only
determined at runtime.
So this patch also renames it to first_inner_iterations.
2020-07-15 Jakub Jelinek <jakub@redhat.com>
PR libgomp/96198
* omp-general.h (struct omp_for_data): Rename min_inner_iterations
member to first_inner_iterations, adjust comment.
* omp-general.c (omp_extract_for_data): Adjust for the above change.
Always use n1first and n2first to compute it, rather than depending
on single_nonrect_cond_code. Similarly, always compute factor
as (m2 - m1) * outer_step / inner_step rather than sometimes m1 - m2
depending on single_nonrect_cond_code.
* omp-expand.c (expand_omp_for_init_vars): Rename min_inner_iterations
to first_inner_iterations and min_inner_iterationsd to
first_inner_iterationsd.
Jakub Jelinek [Tue, 14 Jul 2020 08:31:59 +0000 (10:31 +0200)]
openmp: Adjust outer bounds of non-rect loops
In loops like:
#pragma omp parallel for collapse(2)
for (i = -4; i < 8; i++)
for (j = 3 * i; j > 2 * i; j--)
for some outer loop iterations there are no inner loop iterations at all,
the condition is false. In order to use Summæ Potestate to count number
of iterations or to transform the logical iteration number to actual
iterator values using quadratic non-equation root discovery the outer
iterator range needs to be adjusted, such that the inner loop has at least
one iteration for each of the outer loop iterator value in the reduced
range. Sometimes this adjustment is done at the start of the range,
at other times at the end.
This patch implements it during the compile time number of loop computation
(if all expressions are compile time constants).
2020-07-14 Jakub Jelinek <jakub@redhat.com>
* omp-general.h (struct omp_for_data): Add adjn1 member.
* omp-general.c (omp_extract_for_data): For non-rect loop, punt on
count computing if n1, n2 or step are not INTEGER_CST earlier.
Narrow the outer iterator range if needed so that non-rect loop
has at least one iteration for each outer range iteration. Compute
adjn1.
* omp-expand.c (expand_omp_for_init_vars): Use adjn1 if non-NULL
instead of the outer loop's n1.
Jakub Jelinek [Thu, 9 Jul 2020 10:07:17 +0000 (12:07 +0200)]
openmp: Optimize triangular loop logical iterator to actual iterators computation using search for quadratic equation root(s)
This patch implements the optimized logical to actual iterators
computation for triangular loops.
I have a rough implementation using integers, but this one uses floating
point. There is a small problem that -fopenmp programs aren't linked with
-lm, so it does it only if the hw has sqrt optab (and uses ifn rather than
__builtin_sqrt because it obviously doesn't need errno handling etc.).
Do you think it is ok this way, or should I use the integral computation
using inlined isqrt (we have inequation of the form
start >= x * t10 + t11 * (((x - 1) * x) / 2)
where t10 and t11 are signed long long values and start unsigned long long,
and the division by 2 actually is a problem for accuracy in some cases, so
if we do it in integral, we need to do actually
long long t12 = 2 * t10 - t11;
unsigned long long t13 = t12 * t12 + start * 8 * t11;
unsigned long long isqrt_ = isqrtull (t13);
long long x = (((long long) isqrt_ - t12) / t11) >> 1;
with careful overflow checking on all the computations before isqrtull
(and on overflows use the fallback implementation).
2020-07-09 Jakub Jelinek <jakub@redhat.com>
* omp-general.h (struct omp_for_data): Add min_inner_iterations
and factor members.
* omp-general.c (omp_extract_for_data): Initialize them and remember
them in OMP_CLAUSE_COLLAPSE_COUNT if needed and restore from there.
* omp-expand.c (expand_omp_for_init_counts): Fix up computation of
counts[fd->last_nonrect] if fd->loop.n2 is INTEGER_CST.
(expand_omp_for_init_vars): For
fd->first_nonrect + 1 == fd->last_nonrect loops with for now
INTEGER_CST fd->loop.n2 find quadratic equation roots instead of
using fallback method when possible.
* testsuite/libgomp.c/loop-19.c: New test.
* testsuite/libgomp.c/loop-20.c: New test.
Jakub Jelinek [Thu, 2 Jul 2020 09:03:33 +0000 (11:03 +0200)]
openmp: Diagnose non-rectangular loops with invalid steps
THe OpenMP 5 standard requires that if some loop in OpenMP loop nest refers
to some outer loop's iterator variable, then the subtraction of the multiplication
factors for the outer iterator multiplied by the outer increment modulo the
inner increment is 0. For loops with non-constants in any of these we can't
diagnose it, it would be a task for something like -fsanitize=openmp,
but if all these are constant, we can diagnose it.
2020-07-02 Jakub Jelinek <jakub@redhat.com>
* omp-expand.c (expand_omp_for): Diagnose non-rectangular loops with
invalid steps - ((m2 - m1) * incr_outer) % incr must be 0 in valid
OpenMP non-rectangular loops. Use XALLOCAVEC.
Jakub Jelinek [Sat, 27 Jun 2020 10:43:36 +0000 (12:43 +0200)]
openmp: Non-rectangular loop support for non-composite worksharing loops and distribute
This implements the fallback mentioned in
https://gcc.gnu.org/pipermail/gcc/2020-June/232874.html
Special cases for triangular loops etc. to follow later, also composite
constructs not supported yet (need to check the passing of temporaries around)
and lastprivate might not give the same answers as serial loop if the last
innermost body iteration isn't the last one for some of the outer loops
(that will need to be solved separately together with rectangular loops that have no
innermost body iterations, but some of the outer loops actually iterate).
Also, simd needs work.
Jakub Jelinek [Mon, 22 Jun 2020 09:06:08 +0000 (11:06 +0200)]
openmp: Compute triangular loop number of iterations at compile time
2020-06-22 Jakub Jelinek <jakub@redhat.com>
* omp-general.c (omp_extract_for_data): For triangular loops with
all loop invariant expressions constant where the innermost loop is
executed at least once compute number of iterations at compile time.
Jakub Jelinek [Tue, 16 Jun 2020 14:31:13 +0000 (16:31 +0200)]
openmp: Initial part of OpenMP 5.0 non-rectangular loop support
OpenMP 5.0 adds support for non-rectangular loop collapses, e.g.
triangular and more complex.
This patch deals just with the diagnostics so that they aren't rejected
immediately as before. As the spec generally requires as before that the
iteration variable initializer and bound in the comparison as invariant
vs. the outermost loop, and just add some exceptional forms that can violate
that, we need to avoid folding the expressions until we can detect them and
in order to avoid folding it later on, I chose to use a TREE_VEC in those
expressions to hold the var_outer * expr1 + expr2 triplet, the patch adds
pretty-printing of that, gimplification etc. and just sorry_at during
omp expansion for now.
The next step will be to implement the different cases of that one by one.
2020-06-16 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree.h (OMP_FOR_NON_RECTANGULAR): Define.
* gimplify.c (gimplify_omp_for): Diagnose schedule, ordered
or dist_schedule clause on non-rectangular loops. Handle
gimplification of non-rectangular lb/b expressions. When changing
iteration variable, adjust also non-rectangular lb/b expressions
referencing that.
* omp-general.h (struct omp_for_data_loop): Add m1, m2 and outer
members.
(struct omp_for_data): Add non_rect member.
* omp-general.c (omp_extract_for_data): Handle non-rectangular
loops. Fill in non_rect, m1, m2 and outer.
* omp-low.c (lower_omp_for): Handle non-rectangular lb/b expressions.
* omp-expand.c (expand_omp_for): Emit sorry_at for unsupported
non-rectangular loop cases and assert for cases that can't be
non-rectangular.
* tree-pretty-print.c (dump_mem_ref): Formatting fix.
(dump_omp_loop_non_rect_expr): New function.
(dump_generic_node): Handle non-rectangular OpenMP loops.
* tree-pretty-print.h (dump_omp_loop_non_rect_expr): Declare.
* gimple-pretty-print.c (dump_gimple_omp_for): Handle non-rectangular
OpenMP loops.
gcc/c-family/
* c-common.h (c_omp_check_loop_iv_exprs): Add an int argument.
* c-omp.c (struct c_omp_check_loop_iv_data): Add maybe_nonrect and
idx members.
(c_omp_is_loop_iterator): New function.
(c_omp_check_loop_iv_r): Use it. Add support for silent scanning
if outer loop iterator is present. Perform duplicate checking through
hash_set in the function rather than expecting caller to do that.
Pass NULL instead of d->ppset to walk_tree_1.
(c_omp_check_nonrect_loop_iv): New function.
(c_omp_check_loop_iv): Use it. Fill in new members, allow
non-rectangular loop forms, diagnose multiple associated loops with
the same iterator. Pass NULL instead of &pset to walk_tree_1.
(c_omp_check_loop_iv_exprs): Likewise.
gcc/c/
* c-parser.c (c_parser_expr_no_commas): Save, clear and restore
c_in_omp_for.
(c_parser_omp_for_loop): Set c_in_omp_for around some calls to avoid
premature c_fully_fold. Defer explicit c_fully_fold calls to after
c_finish_omp_for.
* c-tree.h (c_in_omp_for): Declare.
* c-typeck.c (c_in_omp_for): Define.
(build_modify_expr): Avoid c_fully_fold if c_in_omp_for.
(digest_init): Likewise.
(build_binary_op): Likewise.
gcc/cp/
* semantics.c (handle_omp_for_class_iterator): Adjust
c_omp_check_loop_iv_exprs caller.
(finish_omp_for): Likewise. Don't call fold_build_cleanup_point_expr
before calling c_finish_omp_for and c_omp_check_loop_iv, move it
after those calls.
* pt.c (tsubst_omp_for_iterator): Handle non-rectangular loops.
gcc/testsuite/
* c-c++-common/gomp/loop-6.c: New test.
* gcc.dg/gomp/loop-1.c: Don't expect diagnostics on valid
non-rectangular loops.
* gcc.dg/gomp/loop-2.c: New test.
* g++.dg/gomp/loop-1.C: Don't expect diagnostics on valid
non-rectangular loops.
* g++.dg/gomp/loop-2.C: Likewise.
* g++.dg/gomp/loop-5.C: New test.
* g++.dg/gomp/loop-6.C: New test.
Chung-Lin Tang [Mon, 8 Feb 2021 15:53:55 +0000 (07:53 -0800)]
Enable gimplify GOMP_MAP_STRUCT handling of (COMPONENT_REF (INDIRECT_REF ...)) map clauses.
This patch tries to allow map(A->ptr) to be properly handled the same way as
map(B.ptr) expressions. map(struct:*A) clauses are now produced during
gimplify.
This patch, as of time of commit, is only pushed to devel/omp/gcc-10, not yet
submitted as mainline patch to upstream.
* gimplify.c ("tree-hash-traits.h"): Add include.
(gimplify_scan_omp_clauses): Change struct_map_to_clause to type
hash_map<tree_operand, tree> *. Adjust struct map handling to handle
cases of *A and A->B expressions.
(gimplify_adjust_omp_clauses): Move GOMP_MAP_STRUCT removal code for
exit data directives code to earlier position.
Chung-Lin Tang [Tue, 2 Feb 2021 12:34:01 +0000 (20:34 +0800)]
OpenMP 5.0: requires directive
This is a merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2021-January/563393.html
This patch completes more of the reverse_offload, unified_address, and
unified_shared_memory clauses for the OpenMP 5.0 requires directive,
including runtime verification of the offload target.
(currently no offload devices actually support above features, only
warning messages are emitted)
This may possibly reverted/updated when a final patch is approved
for mainline.
* c-parser.c (c_parser_declaration_or_fndef): Set
OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
"omp declare target" attribute.
(c_parser_omp_target_data): Set OMP_REQUIRES_TARGET_USED in
omp_requires_mask.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_requires): Adjust to only mention "not implemented yet"
for OMP_REQUIRES_DYNAMIC_ALLOCATORS.
gcc/cp/ChangeLog:
* parser.c (cp_parser_simple_declaration): Set
OMP_REQUIRES_TARGET_USED in omp_requires_mask if function has
"omp declare target" attribute.
(cp_parser_omp_target_data): Set OMP_REQUIRES_TARGET_USED in
omp_requires_mask.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
(cp_parser_omp_requires): Adjust to only mention "not implemented yet"
for OMP_REQUIRES_DYNAMIC_ALLOCATORS.
gcc/fortran/ChangeLog:
* openmp.c (gfc_check_omp_requires): Fix REVERSE_OFFLOAD typo.
(gfc_match_omp_requires): Adjust to only mention "not implemented yet"
for OMP_REQUIRES_DYNAMIC_ALLOCATORS.
* parse.c ("tree.h"): Add include.
("omp-general.h"): Likewise.
(gfc_parse_file): Add code to merge omp_requires to omp_requires_mask.
gcc/ChangeLog:
* omp-offload.c (omp_finish_file): Add code to reate OpenMP requires
mask variable in .gnu.gomp_requires section if needed.
* gomp-constants.h (GOMP_REQUIRES_UNIFIED_ADDRESS): New symbol.
(GOMP_REQUIRES_UNIFIED_SHARED_MEMORY): Likewise.
(GOMP_REQUIRES_REVERSE_OFFLOAD): Likewise.
libgcc/ChangeLog:
* offloadstuff.c (__requires_mask_table): New symbol to mark start of
.gnu.gomp_requires section.
(__requires_mask_table_end): New symbol to mark end of
.gnu.gomp_requires section.
libgomp/ChangeLog:
* libgomp-plugin.h (GOMP_OFFLOAD_supported_features): New declaration.
* libgomp.h (struct gomp_device_descr): New 'supported_features_func'
plugin hook field.
* oacc-host.c (host_supported_features): New host hook function.
(host_dispatch): Initialize 'supported_features_func' host hook.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_supported_features): New function.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_supported_features): Likewise.
* target.c (<stdio.h>): Add include of standard header.
(gomp_requires_mask): New static variable.
(__requires_mask_table): New declaration.
(__requires_mask_table_end): Likewise.
(gomp_load_plugin_for_device): Add loading of 'supported_features' hook.
(gomp_target_init): Add code to summarize .gnu._gomp_requires section
mask values, emit error if inconsistency found.
* testsuite/libgomp.c-c++-common/requires-1.c: New test.
* testsuite/libgomp.c-c++-common/requires-1-aux.c: New file linked with
above test.
* testsuite/libgomp.c-c++-common/requires-2.c: New test.
* testsuite/libgomp.c-c++-common/requires-2-aux.c: New file linked with
above test.
liboffloadmic/ChangeLog:
* plugin/libgomp-plugin-intelmic.cpp (GOMP_OFFLOAD_supported_features):
New function.
* cp-tree.h (finish_omp_target): New declaration.
(set_omp_target_this_expr): Likewise.
* lambda.c (lambda_expr_this_capture): Add call to
set_omp_target_this_expr.
* parser.c (cp_parser_omp_target): Factor out code, change to call
finish_omp_target, add re-initing call to set_omp_target_this_expr.
* semantics.c (omp_target_this_expr): New static variable.
(omp_target_ptr_members_accessed): New static hash_map for tracking
accessed non-static pointer-type members.
(finish_non_static_data_member): Add call to set_omp_target_this_expr.
Add recording of non-static pointer-type members access.
(finish_this_expr): Add call to set_omp_target_this_expr.
(set_omp_target_this_expr): New function to set omp_target_this_expr.
(finish_omp_target): New function with code merged from
cp_parser_omp_target, plus code to implement this[:1] and __closure map
clauses for OpenMP.
(handle_omp_array_sections_1): Move code to peel of '*' for
reference-based COMPONENT_REFs before FIELD_DECL transforming.
(finish_omp_clauses): Handle 'A->member' case in map clauses.
* g++.dg/gomp/target-this-1.C: New testcase.
* g++.dg/gomp/target-this-2.C: New testcase.
* g++.dg/gomp/target-this-3.C: New testcase.
* g++.dg/gomp/target-this-4.C: New testcase.
include/ChangeLog:
* gomp-constants.h (enum gomp_map_kind):
Add GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION map kinds.
(GOMP_MAP_POINTER_P):
Include GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION.
libgomp/ChangeLog:
* libgomp.h (gomp_attach_pointer): Add bool parameter.
* oacc-mem.c (acc_attach_async): Update call to gomp_attach_pointer.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_existing): Update assert condition to
include GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION.
(gomp_map_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for mapping a pointer with NULL target.
(gomp_attach_pointer): Add 'bool allow_zero_length_array_sections'
parameter, add support for attaching a pointer with NULL target.
(gomp_map_vars_internal): Update calls to gomp_map_pointer and
gomp_attach_pointer, add handling for
GOMP_MAP_ATTACH_ZERO_LENGTH_ARRAY_SECTION, and
GOMP_MAP_POINTER_TO_ZERO_LENGTH_ARRAY_SECTION cases.
* testsuite/libgomp.c++/target-this-1.C: New testcase.
* testsuite/libgomp.c++/target-this-2.C: New testcase.
* testsuite/libgomp.c++/target-this-3.C: New testcase.
* testsuite/libgomp.c++/target-this-4.C: New testcase.
* c-typeck.c (c_finish_omp_clauses): Adjust to allow duplicate
mapped variables for OpenMP.
gcc/cp/ChangeLog:
* semantics.c (finish_omp_clauses): Adjust to allow duplicate
mapped variables for OpenMP.
gcc/ChangeLog:
* omp-low.c (install_parm_decl): Add new 'tree key_expr' parameter.
Use key_expr as splay-tree key instead of var itself.
(install_var_field): Add new 'tree key_expr = NULL_TREE'
default parameter. Set splay-tree lookup key to key_expr instead of
var if key_expr is non-NULL. Adjust call to install_parm_decl.
Update comments.
(scan_sharing_clauses): Use clause tree expression as splay-tree key
for map/to/from and OpenACC firstprivate cases when installing the
variable field into the send/receive record type.
(maybe_lookup_field_in_outer_ctx): Add code to search through
construct clauses instead of entirely based on splay-tree lookup.
(lower_oacc_reductions): Adjust to find map-clause of reduction
variable, then create receiver-ref.
(lower_omp_target): Adjust to lookup var field using clause expression.
Chung-Lin Tang [Wed, 27 Jan 2021 13:35:43 +0000 (21:35 +0800)]
OpenMP 5.0 Structure element mapping
This is a merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/561139.html
This patch implements the changes to the behavior of target mapping structure
elements, as specified in OpenMP 5.0.
This may possibly reverted/updated when a final patch is approved
for mainline.
libgomp/ChangeLog:
* hashtab.h (htab_clear): New function with initialization code
factored out from...
(htab_create): ...here, adjust to use htab_clear function.
* libgomp.h (REFCOUNT_SPECIAL): New symbol to denote range of
special refcount values, add comments.
(REFCOUNT_INFINITY): Adjust definition to use REFCOUNT_SPECIAL.
(REFCOUNT_LINK): Likewise.
(REFCOUNT_STRUCTELEM): New special refcount range for structure
element siblings.
(REFCOUNT_STRUCTELEM_P): Macro for testing for structure element
sibling maps.
(REFCOUNT_STRUCTELEM_FLAG_FIRST): Flag to indicate first sibling.
(REFCOUNT_STRUCTELEM_FLAG_LAST): Flag to indicate last sibling.
(REFCOUNT_STRUCTELEM_FIRST_P): Macro to test _FIRST flag.
(REFCOUNT_STRUCTELEM_LAST_P): Macro to test _LAST flag.
(struct splay_tree_key_s): Add structelem_refcount and
structelem_refcount_ptr fields into a union with dynamic_refcount.
Add comments.
(gomp_map_vars): Delete declaration.
(gomp_map_vars_async): Likewise.
(gomp_unmap_vars): Likewise.
(gomp_unmap_vars_async): Likewise.
(goacc_map_vars): New declaration.
(goacc_unmap_vars): Likewise.
* oacc-mem.c (acc_map_data): Adjust to use goacc_map_vars.
(goacc_enter_datum): Likewise.
(goacc_enter_data_internal): Likewise.
* oacc-parallel.c (GOACC_parallel_keyed): Adjust to use goacc_map_vars
and goacc_unmap_vars.
(GOACC_data_start): Adjust to use goacc_map_vars.
(GOACC_data_end): Adjust to use goacc_unmap_vars.
* target.c (hash_entry_type): New typedef.
(htab_alloc): New function hook for hashtab.h.
(htab_free): Likewise.
(htab_hash): Likewise.
(htab_eq): Likewise.
(hashtab.h): Add file include.
(gomp_increment_refcount): New function.
(gomp_decrement_refcount): Likewise.
(gomp_map_vars_existing): Add refcount_set parameter, adjust to use
gomp_increment_refcount.
(gomp_map_fields_existing): Add refcount_set parameter, adjust calls
to gomp_map_vars_existing.
(gomp_map_vars_internal): Add refcount_set parameter, add local openmp_p
variable to guard OpenMP specific paths, adjust calls to
gomp_map_vars_existing, add structure element sibling splay_tree_key
sequence creation code, adjust Fortran map case to avoid increment
under OpenMP.
(gomp_map_vars): Adjust to static, add refcount_set parameter, manage
local refcount_set if caller passed in NULL, adjust call to
gomp_map_vars_internal.
(gomp_map_vars_async): Adjust and rename into...
(goacc_map_vars): ...this new function, adjust call to
gomp_map_vars_internal.
(gomp_remove_splay_tree_key): New function with code factored out from
gomp_remove_var_internal.
(gomp_remove_var_internal): Add code to handle removing multiple
splay_tree_key sequence for structure elements, adjust code to use
gomp_remove_splay_tree_key for splay-tree key removal.
(gomp_unmap_vars_internal): Add refcount_set parameter, adjust to use
gomp_decrement_refcount.
(gomp_unmap_vars): Adjust to static, add refcount_set parameter, manage
local refcount_set if caller passed in NULL, adjust call to
gomp_unmap_vars_internal.
(gomp_unmap_vars_async): Adjust and rename into...
(goacc_unmap_vars): ...this new function, adjust call to
gomp_unmap_vars_internal.
(GOMP_target): Manage refcount_set and adjust calls to gomp_map_vars and
gomp_unmap_vars.
(GOMP_target_ext): Likewise.
(gomp_target_data_fallback): Adjust call to gomp_map_vars.
(GOMP_target_data): Likewise.
(GOMP_target_data_ext): Likewise.
(GOMP_target_end_data): Adjust call to gomp_unmap_vars.
(gomp_exit_data): Add refcount_set parameter, adjust to use
gomp_decrement_refcount, adjust to queue splay-tree keys for removal
after main loop.
(GOMP_target_enter_exit_data): Manage refcount_set and adjust calls to
gomp_map_vars and gomp_exit_data.
(gomp_target_task_fn): Likewise.
* testsuite/libgomp.c-c++-common/refcount-1.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-1.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-2.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-3.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-4.c: New testcase.
* testsuite/libgomp.c-c++-common/struct-elem-5.c: New testcase.
openmp: Fix intermittent hanging of task-detach-6 libgomp tests [PR98738]
This adds support for the task detach clause to taskwait, and fixes a
number of problems related to semaphores that may lead to a hang in
some circumstances.
PR libgomp/98738
* libgomp.h (enum gomp_task_kind): Add GOMP_TASK_DETACHED.
* task.c (task_fulfilled_p): Check detach field as well.
(GOMP_task): Add thread to debug messages. Use address of task as
the event handle.
(gomp_barrier_handle_tasks): Fix indentation. Use address of task
as event handle. Set kind of suspended detach task to
GOMP_TASK_DETACHED and decrement task_running_count. Move
finish_cancelled block out of else branch. Skip decrement of
task_running_count if task kind is GOMP_TASK_DETACHED.
(GOMP_taskwait): Finish fulfilled detach tasks. Update comment.
Queue detach tasks that have not been fulfilled.
(omp_fulfill_event): Use address of task as event handle. Post
to taskwait_sem and taskgroup_sem if necessary. Check
task_running_count before calling gomp_team_barrier_wake.
* testsuite/libgomp.c-c++-common/task-detach-5.c (main): Change
data-sharing of detach events on enclosing parallel to private.
* testsuite/libgomp.c-c++-common/task-detach-6.c (main): Likewise.
* testsuite/libgomp.fortran/task-detach-5.f90 (task_detach_5):
Likewise.
* testsuite/libgomp.fortran/task-detach-6.f90 (task_detach_6):
Likewise.
Jakub Jelinek [Wed, 20 Jan 2021 21:09:22 +0000 (22:09 +0100)]
libgomp: Fix up GOMP_task on s390x
On Wed, Jan 20, 2021 at 05:04:39PM +0100, Florian Weimer wrote:
> Sorry, this appears to cause OpenMP task state corruption in RPM. We
> have only seen this on s390x.
Haven't actually verified it, but my suspection is that this is a caller
stack corruption.
We play with fire with the GOMP_task API/ABI extensions, the GOMP_task
function used to be:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
long arg_size, long arg_align, bool if_clause, unsigned flags);
and later:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
long arg_size, long arg_align, bool if_clause, unsigned flags,
void **depend);
and later:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
long arg_size, long arg_align, bool if_clause, unsigned flags,
void **depend, int priority);
and now:
void
GOMP_task (void (*fn) (void *), void *data, void (*cpyfn) (void *, void *),
long arg_size, long arg_align, bool if_clause, unsigned flags,
void **depend, int priority, void *detach)
and which of those depend, priority and detach argument is present depends
on the bits in flags.
I'm afraid the compiler just decided to spill the detach = NULL store in
if ((flags & GOMP_TASK_FLAG_DETACH) == 0)
detach = NULL;
on s390x into the argument stack slot. Not a problem if the caller passes
all those 10 arguments, but if not, can clobber random stack location.
This hack should fix it up. Priority doesn't need changing, but I've
changed it anyway just to be safe. With the patch none of the 3 arguments
are ever modified, so I'd hope gcc doesn't decide to spill something
unrelated there.
2021-01-20 Jakub Jelinek <jakub@redhat.com>
* task.c (GOMP_task): Rename priority argument to priority_arg,
add priority automatic variable and modify that variable. Instead of
clearing detach argument when GOMP_TASK_FLAG_DETACH bit is not set,
check flags for that bit.
Jakub Jelinek [Wed, 20 Jan 2021 07:35:20 +0000 (08:35 +0100)]
openmp: Don't ICE on detach clause with erroneous decl [PR98742]
Similarly to how we handle erroneous operands to e.g. allocate clause,
this change just removes those clauses instead of accessing TYPE_MAIN_VARIANT
of its type, which doesn't work on error_mark_node. Also, just for good
measure, bails out if TYPE_NAME is NULL.
2021-01-20 Jakub Jelinek <jakub@redhat.com>
PR c++/98742
* semantics.c (finish_omp_clauses) <case OMP_CLAUSE_DETACH>: If
error_operand_p, remove clause without further checking. Check
for non-NULL TYPE_NAME.
Jakub Jelinek [Fri, 18 Dec 2020 20:43:20 +0000 (21:43 +0100)]
openmp: Don't optimize shared to firstprivate on task with depend clause
The attached testcase is miscompiled, because we optimize shared clauses
to firstprivate when task body can't modify the variable even when the
task has depend clause. That is wrong, because firstprivate means the
variable will be copied immediately when the task is created, while with
depend clause some other task might change it later before the dependencies
are satisfied and the task should observe the value only after the change.
2020-12-18 Jakub Jelinek <jakub@redhat.com>
* gimplify.c (struct gimplify_omp_ctx): Add has_depend member.
(gimplify_scan_omp_clauses): Set it to true if OMP_CLAUSE_DEPEND
appears on OMP_TASK.
(gimplify_adjust_omp_clauses_1, gimplify_adjust_omp_clauses): Force
GOVD_WRITTEN on shared variables if task construct has depend clause.
Jakub Jelinek [Mon, 18 Jan 2021 06:18:46 +0000 (07:18 +0100)]
libgomp: Don't access gomp_sem_t as int using atomics unconditionally
This patch introduces gomp_sem_getcount wrapper, which uses sem_getvalue
for POSIX and atomic loads for linux futex and accel. rtems for now
remains broken.
2021-01-18 Jakub Jelinek <jakub@redhat.com>
* config/linux/sem.h (gomp_sem_getcount): New function.
* config/posix/sem.h (gomp_sem_getcount): New function.
* config/posix/sem.c (gomp_sem_getcount): New function.
* config/accel/sem.h (gomp_sem_getcount): New function.
* task.c (task_fulfilled_p): Use gomp_sem_getcount.
(omp_fulfill_event): Likewise.
gcc/c/
* c-parser.c (c_parser_omp_clause_detach): New.
(c_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH clause.
(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
* c-typeck.c (c_finish_omp_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH
clause. Prevent use of detach with mergeable and overriding the
data sharing mode of the event handle.
gcc/cp/
* parser.c (cp_parser_omp_clause_detach): New.
(cp_parser_omp_all_clauses): Handle PRAGMA_OMP_CLAUSE_DETACH.
(OMP_TASK_CLAUSE_MASK): Add mask for PRAGMA_OMP_CLAUSE_DETACH.
* pt.c (tsubst_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
* semantics.c (finish_omp_clauses): Handle OMP_CLAUSE_DETACH clause.
Prevent use of detach with mergeable and overriding the data sharing
mode of the event handle.
gcc/fortran/
* dump-parse-tree.c (show_omp_clauses): Handle detach clause.
* frontend-passes.c (gfc_code_walker): Walk detach expression.
* gfortran.h (struct gfc_omp_clauses): Add detach field.
(gfc_c_intptr_kind): New.
* openmp.c (gfc_free_omp_clauses): Free detach clause.
(gfc_match_omp_detach): New.
(enum omp_mask1): Add OMP_CLAUSE_DETACH.
(enum omp_mask2): Remove OMP_CLAUSE_DETACH.
(gfc_match_omp_clauses): Handle OMP_CLAUSE_DETACH for OpenMP.
(OMP_TASK_CLAUSES): Add OMP_CLAUSE_DETACH.
(resolve_omp_clauses): Prevent use of detach with mergeable and
overriding the data sharing mode of the event handle.
* trans-openmp.c (gfc_trans_omp_clauses): Handle detach clause.
* trans-types.c (gfc_c_intptr_kind): New.
(gfc_init_kinds): Initialize gfc_c_intptr_kind.
* types.def
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT): Rename
to...
(BT_FN_VOID_OMPFN_PTR_OMPCPYFN_LONG_LONG_BOOL_UINT_PTR_INT_PTR):
...this. Add extra argument.
Chung-Lin Tang [Thu, 21 Jan 2021 15:04:26 +0000 (23:04 +0800)]
Target mapping C++ members inside member functions
This is a merge of:
https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562467.html
This patch fixes some problems with target mapping when inside
C++ member functions:
1. Allow deref '->' in map clauses.
2. Allow this[X] in map clauses.
3. Create map(this->member) from map(member), when encountering
member's FIELD_DECL.
This may possibly reverted/updated when a final patch is approved
for mainline.
gcc/cp/ChangeLog:
* parser.c (cp_parser_omp_clause_map): Adjust call to
cp_parser_omp_var_list_no_open to set 'allow_deref' argument to true.
* semantics.c (handle_omp_array_sections_1): Add handling to create
'this->member' from 'member' FIELD_DECL.
(finish_omp_clauses): Likewise. Adjust to allow 'this[]' in OpenMP
map clauses.
gcc/testsuite/ChangeLog:
* g++.dg/gomp/target-3.C: New test.
* g++.dg/gomp/this-2.C: Adjust testcase.
Andrew Stubbs [Fri, 15 Jan 2021 11:26:46 +0000 (11:26 +0000)]
DWARF address space for variables
Add DWARF address class attributes for variables that exist outside the
generic address space. In particular, this is the case for gang-private
variables in OpenACC offload kernels.
gcc/ChangeLog:
* dwarf2out.c (add_location_or_const_value_attribute): Set
DW_AT_address_class, if appropriate.
Andrew Stubbs [Fri, 8 Jan 2021 13:56:52 +0000 (13:56 +0000)]
amdgcn: Fix DWARF variables with alloca
Require a frame pointer for entry functions that use alloca because it isn't
possible to encode the DWARF frame otherwise. Adjust the CFA definition
expressions accordingly.
gcc/ChangeLog:
* config/gcn/gcn.c (gcn_expand_prologue): Use the frame pointer for
the DWARF CFA, if it exists.
(gcn_frame_pointer_rqd): Require a frame pointer for entry functions
that use alloca.
Julian Brown [Wed, 25 Nov 2020 17:08:01 +0000 (09:08 -0800)]
[og10] vect: Add target hook to prefer gather/scatter instructions
For AMD GCN, the instructions available for loading/storing vectors are
always scatter/gather operations (i.e. there are separate addresses for
each vector lane), so the current heuristic to avoid gather/scatter
operations with too many elements in get_group_load_store_type is
counterproductive. Avoiding such operations in that function can
subsequently lead to a missed vectorization opportunity whereby later
analyses in the vectorizer try to use a very wide array type which is
not available on this target, and thus it bails out.
The attached patch adds a target hook to override the "single_element_p"
heuristic in the function as a target hook, and activates it for GCN. This
allows much better code to be generated for affected loops.
Julian Brown [Fri, 6 Nov 2020 23:17:29 +0000 (15:17 -0800)]
[og10] openacc: Adjust loop lowering for AMD GCN
This patch adjusts OpenACC loop lowering in the AMD GCN target compiler
in such a way that the autovectorizer can vectorize the "vector" dimension
of those loops in more cases.
Rather than generating "SIMT" code that executes a scalar instruction
stream for each lane of a vector in lockstep, for GCN we model the GPU
like a typical CPU, with separate instructions to operate on scalar and
vector data. That means that unlike other offload targets, we rely on
the autovectorizer to handle the innermost OpenACC parallelism level,
which is "vector".
Because of this, the OpenACC builtin functions to return the current
vector lane and the vector width return 0 and 1 respectively, despite
the native vector width being 64 elements wide.
This allows generated code to work with our chosen compilation model,
but the way loops are lowered in omp-offload.c:oacc_xform_loop does not
understand the discrepancy between logical (OpenACC) and physical vector
sizes correctly. That means that if a loop is partitioned over e.g. the
worker AND vector dimensions, we actually lower with unit vector size --
meaning that if we then autovectorize, we end up trying to vectorize
over the "worker" dimension rather than the vector one! Then, because
the number of workers is not fixed at compile time, that means the
autovectorizer has a hard time analysing the loop and thus vectorization
often fails entirely.
We can fix this by deducing the true vector width in oacc_xform_loop,
and using that when we are on a "non-SIMT" offload target. We can then
rearrange how loops are lowered in that function so that the loop form
fed to the autovectorizer is more amenable to vectorization -- namely,
the innermost step is set to process each loop iteration sequentially.
For some benchmarks, allowing vectorization to succeed leads to quite
impressive performance improvements -- I've observed between 2.5x and
40x on one machine/GPU combination.
The low-level builtins available to user code (__builtin_goacc_parlevel_id
and __builtin_goacc_parlevel_size) continue to return 0/1 respectively
for the vector dimension for AMD GCN, even if their containing loop is
vectorized -- that's a quirk that we might possibly want to address at
some later date.
Only non-"chunking" loops are handled at present. "Chunking" loops are
still lowered as before.
2021-01-13 Julian Brown <julian@codesourcery.com>
gcc/
* omp-offload.c (oacc_thread_numbers): Add VF_BY_VECTORIZER parameter.
Add overloaded wrapper for previous arguments & behaviour.
(oacc_xform_loop): Lower vector loops to iterate a multiple of
omp_max_vf times over contiguous steps on non-SIMT targets.
Julian Brown [Fri, 6 Nov 2020 22:53:29 +0000 (14:53 -0800)]
amdgcn: Fix exec register live-on-entry to BB in md-reorg
This patch fixes a corner case in the AMD GCN md-reorg pass when the
EXEC register is live on entry to a BB, and could be clobbered by code
inserted by the pass before a use in (e.g.) a different BB.
Backport from mainline:
2021-01-13 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/gcn.c (gcn_md_reorg): Fix case where EXEC reg is live
on entry to a BB.
Julian Brown [Mon, 30 Nov 2020 19:10:04 +0000 (11:10 -0800)]
amdgcn: Improve FP division accuracy
GCN has a reciprocal-approximation instruction but no
hardware divide. This patch adjusts the open-coded reciprocal
approximation/Newton-Raphson refinement steps to use fused multiply-add
instructions as is necessary to obtain a properly-rounded result, and
adds further refinement steps to correctly round the full division result.
The patterns in question are still guarded by a flag_reciprocal_math
condition, and do not yet support denormals.
Backport from mainline:
2021-01-13 Julian Brown <julian@codesourcery.com>
gcc/
* config/gcn/gcn-valu.md (recip<mode>2<exec>, recip<mode>2): Use unspec
for reciprocal-approximation instructions.
(div<mode>3): Use fused multiply-accumulate operations for reciprocal
refinement and division result.
* config/gcn/gcn.md (UNSPEC_RCP): New unspec constant.
gcc/testsuite/
* gcc.target/gcn/fpdiv.c: New test.
Julian Brown [Mon, 30 Nov 2020 20:01:37 +0000 (12:01 -0800)]
amdgcn: Fix subdf3 pattern
This patch fixes a typo in the subdf3 pattern that meant it had a
non-standard name and thus the compiler would emit a libcall rather than
the proper hardware instruction for DFmode subtraction.
Chung-Lin Tang [Tue, 10 Nov 2020 11:36:58 +0000 (03:36 -0800)]
openmp: Implement OpenMP 5.0 base-pointer attachement and clause ordering
This patch implements some parts of the target variable mapping changes
specified in OpenMP 5.0, including base-pointer attachment/detachment
behavior for array section list-items in map clauses, and ordering of
map clauses according to map kind.
* c-common.h (c_omp_adjust_map_clauses): New declaration.
* c-omp.c (struct map_clause): Helper type for c_omp_adjust_map_clauses.
(c_omp_adjust_map_clauses): New function.
gcc/c/ChangeLog:
* c-parser.c (c_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(c_parser_omp_target_enter_data): Likewise.
(c_parser_omp_target_exit_data): Likewise.
(c_parser_omp_target): Likewise.
* c-typeck.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type.
(c_finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.
gcc/cp/ChangeLog:
* parser.c (cp_parser_omp_target_data): Add use of
new c_omp_adjust_map_clauses function. Add GOMP_MAP_ATTACH_DETACH as
handled map clause kind.
(cp_parser_omp_target_enter_data): Likewise.
(cp_parser_omp_target_exit_data): Likewise.
(cp_parser_omp_target): Likewise.
* semantics.c (handle_omp_array_sections): Adjust COMPONENT_REF case to
use GOMP_MAP_ATTACH_DETACH map kind for C_ORT_OMP region type. Fix
interaction between reference case and attach/detach.
(finish_omp_clauses): Adjust bitmap checks to allow struct decl and
same struct field access to co-exist on OpenMP construct.
gcc/ChangeLog:
* gimplify.c (is_or_contains_p): New static helper function.
(omp_target_reorder_clauses): New function.
(gimplify_scan_omp_clauses): Add use of omp_target_reorder_clauses to
reorder clause list according to OpenMP 5.0 rules. Add handling of
GOMP_MAP_ATTACH_DETACH for OpenMP cases.
* omp-low.c (is_omp_target): New static helper function.
(scan_sharing_clauses): Add scan phase handling of GOMP_MAP_ATTACH/DETACH
for OpenMP cases.
(lower_omp_target): Add lowering handling of GOMP_MAP_ATTACH/DETACH for
OpenMP cases.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/clauses-2.c: Remove dg-error cases now valid.
* gfortran.dg/gomp/map-2.f90: Likewise.
* c-c++-common/gomp/map-5.c: New testcase.
libgomp/ChangeLog:
* libgomp.h (enum gomp_map_vars_kind): Adjust enum values to be bit-flag
usable.
* oacc-mem.c (acc_map_data): Adjust gomp_map_vars argument flags to
'GOMP_MAP_VARS_OPENACC | GOMP_MAP_VARS_ENTER_DATA'.
(goacc_enter_datum): Likewise for call to gomp_map_vars_async.
(goacc_enter_data_internal): Likewise.
* target.c (gomp_map_vars_internal):
Change checks of GOMP_MAP_VARS_ENTER_DATA to use bit-and (&). Adjust use
of gomp_attach_pointer for OpenMP cases.
(gomp_exit_data): Add handling of GOMP_MAP_DETACH.
(GOMP_target_enter_exit_data): Add handling of GOMP_MAP_ATTACH.
* testsuite/libgomp.c-c++-common/ptr-attach-1.c: New testcase.
This removes the nest-var ICV, expressing nesting in terms of the
max-active-levels-var ICV instead. The max-active-levels-var ICV
is now per data environment rather than per device.
Jakub Jelinek [Thu, 22 Oct 2020 07:31:01 +0000 (09:31 +0200)]
openmp: Change omp_get_initial_device () to match OpenMP 5.1 requirements
> Therefore, I think until omp_get_initial_device () value is changed, we
The following so far untested patch implements that change.
OpenMP 4.5 said for omp_get_initial_device:
The value of the device number is implementation defined. If it is between 0 and one less than
omp_get_num_devices() then it is valid for use with all device constructs and routines; if it is
outside that range, then it is only valid for use with the device memory routines and not in the
device clause.
and OpenMP 5.0 similarly, but OpenMP 5.1 says:
The value of the device number is the value returned by the omp_get_num_devices routine.
As the new value is compatible with what has been required earlier, I think
we can change it already now.
* icv.c (omp_get_initial_device): Remove including corresponding
ialias.
* icv-device.c (omp_get_initial_device): New function. Return
gomp_get_num_devices (). Add ialias.
* target.c (resolve_device): Don't fail with
OMP_TARGET_OFFLOAD=mandatory if device_id is equal to
gomp_get_num_devices ().
(omp_target_alloc, omp_target_free, omp_target_is_present,
omp_target_memcpy, omp_target_memcpy_rect, omp_target_associate_ptr,
omp_target_disassociate_ptr, omp_pause_resource): Use
gomp_get_num_devices () instead of GOMP_DEVICE_HOST_FALLBACK on the
first use in the functions, in uses dominated by the
gomp_get_num_devices call use num_devices_openmp instead.
* libgomp.texi (omp_get_initial_device): Document.
* config/gcn/icv-device.c (omp_get_initial_device): New function.
Add ialias.
* config/nvptx/icv-device.c (omp_get_initial_device): Likewise.
* testsuite/libgomp.c/target-40.c: New test.
openmp: Implement support for OMP_TARGET_OFFLOAD environment variable
This implements support for the OMP_TARGET_OFFLOAD environment variable
introduced in the OpenMP 5.0 standard, which controls how offloading
is handled. It may be set to MANDATORY (abort if offloading cannot be
performed), DISABLED (no offloading to devices) or DEFAULT (offload to
device if possible, fall back to host if not).
2020-10-20 Kwok Cheung Yeung <kcy@codesourcery.com>
Jakub Jelinek <jakub@redhat.com>
libgomp/
* env.c (gomp_target_offload_var): New.
(parse_target_offload): New.
(handle_omp_display_env): Print value of OMP_TARGET_OFFLOAD.
(initialize_env): Parse OMP_TARGET_OFFLOAD.
* libgomp.h (gomp_target_offload_t): New.
(gomp_target_offload_var): New.
* libgomp.texi (OMP_TARGET_OFFLOAD): New section.
* target.c (resolve_device): Generate error if device not found and
offloading is mandatory.
(gomp_target_fallback): Generate error if offloading is mandatory.
(GOMP_target): Add argument in call to gomp_target_fallback.
(GOMP_target_ext): Likewise.
(gomp_target_data_fallback): Generate error if offloading is mandatory.
(GOMP_target_data): Add argument in call to gomp_target_data_fallback.
(GOMP_target_data_ext): Likewise.
(gomp_target_task_fn): Add argument in call to gomp_target_fallback.
(gomp_target_init): Return early if offloading is disabled.
Inside of the function, use automatic
variables corresponding to num_devices, num_devices_openmp and devices
global variables and update the globals only at the end of the
function.
openmp: Add support for the omp_get_supported_active_levels runtime library routine
This patch implements the omp_get_supported_active_levels runtime routine
from the OpenMP 5.0 specification, which returns the maximum number of
active nested parallel regions supported by this implementation. The
current maximum (set using the omp_set_max_active_levels routine or the
OMP_MAX_ACTIVE_LEVELS environment variable) cannot exceed this number.
vtrn_half.c:76:17: error: redeclaration of 'vector_float64x2' with no linkage
vtrn_half.c:77:17: error: redeclaration of 'vector2_float64x2' with no linkage
vtrn_half.c:80:17: error: redeclaration of 'vector_res_float64x2' with no linkage
This is because r11-3402 now always declares float64x2 variables for
aarch64, leading to a duplicate declaration in these testcases.
The fix is simply to remove these now useless declarations.
These tests are skipped on arm*, so there is no impact on that target.
This patch implements the missing reinterprets to and from poly128_t and
float64x2_t.
I've plugged in the appropriate testing in the advsimd-intrinsics.exp
too.
Bootstrapped and tested on aarch64-none-linux-gnu.
Tested advsimd-intrinsics.exp on arm-none-eabi too to make sure arm
testing isn't affected.
This patch implements the missing vrndns_f32 intrinsic. This operates on a scalar float32_t value.
It can be mapped down to a __builtin_aarch64_frintnsf builtin.
This patch does that.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
PR target/71233
* config/aarch64/aarch64-simd-builtins.def (frintn): Use BUILTIN_VHSDF_HSDF
for modes. Remove explicit hf instantiation.
* config/aarch64/arm_neon.h (vrndns_f32): Define.
gcc/testsuite/
PR target/71233
* gcc.target/aarch64/simd/vrndns_f32_1.c: New test.
AArch64: Implement missing _p64 intrinsics for vector permutes
This patch implements some missing vector permute intrinsics operating on poly64x2_t types.
They are implemented identically to their uint64x2_t brethren.
Bootstrapped and tested on aarch64-none-linux-gnu.
This patch implements some missing vceq* intrinsics on poly types.
The behaviour is to produce the appropriate CMEQ instruction as for the unsigned types.
Bootstrapped and tested on aarch64-none-linux-gnu.
Eric Botcazou [Mon, 28 Sep 2020 07:00:46 +0000 (09:00 +0200)]
Fix bogus alignment warning on address clause
The compiler gives a bogus alignment warning on an address clause and
a discriminated record type with variable size.
gcc/ada/ChangeLog:
* gcc-interface/decl.c (maybe_saturate_size): Add ALIGN parameter
and round down the result to ALIGN.
(gnat_to_gnu_entity): Adjust calls to maybe_saturate_size.
gcc/testsuite/ChangeLog:
* gnat.dg/addr16.adb: New test.
* gnat.dg/addr16_pkg.ads: New helper.
Jakub Jelinek [Sun, 27 Sep 2020 21:18:26 +0000 (23:18 +0200)]
optabs: Don't reuse target for multi-word expansions if it overlaps operand(s) [PR97073]
The following testcase is miscompiled on i686-linux, because
we try to expand a double-word bitwise logic operation with op0
being a (mem:DI u) and target (mem:DI u+4), i.e. partial overlap, and
thus end up with:
movl 4(%esp), %eax
andl u, %eax
movl %eax, u+4
! movl u+4, %eax optimized out
andl 8(%esp), %eax
movl %eax, u+8
rather than with the desired:
movl 4(%esp), %edx
movl 8(%esp), %eax
andl u, %edx
andl u+4, %eax
movl %eax, u+8
movl %edx, u+4
because the store of the first word to target overwrites the second word of
the operand.
expand_binop for this (and several similar places) already check for target
== op0 or target == op1, this patch just adds reg_overlap_mentioned_p calls
next to it.
Pedantically, at least for some of these it might be sufficient to force
a different target if there is overlap but target is not rtx_equal_p to
the operand (e.g. in this bitwise logical case, but e.g. not in the shift
cases where there is reordering), though that would go against the
preexisting target == op? checks and the rationale that REG_EQUAL notes in
that case isn't correct.
2020-09-27 Jakub Jelinek <jakub@redhat.com>
PR middle-end/97073
* optabs.c (expand_binop, expand_absneg_bit, expand_unop,
expand_copysign_bit): Check reg_overlap_mentioned_p between target
and operand(s) and if it returns true, force a pseudo as target.
Mark Eggleston [Thu, 11 Jun 2020 13:33:51 +0000 (14:33 +0100)]
Fortran : ICE in build_field PR95614
Local identifiers can not be the same as a module name. Original
patch by Steve Kargl resulted in name clashes between common block
names and local identifiers. A local identifier can be the same as
a global identier if that identifier represents a common. The patch
was modified to allow global identifiers that represent a common
block.
2020-09-27 Steven G. Kargl <kargl@gcc.gnu.org>
Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/fortran/
PR fortran/95614
* decl.c (gfc_get_common): Use gfc_match_common_name instead
of match_common_name.
* decl.c (gfc_bind_idents): Use gfc_match_common_name instead
of match_common_name.
* match.c : Rename match_common_name to gfc_match_common_name.
* match.c (gfc_match_common): Use gfc_match_common_name instead
of match_common_name.
* match.h : Rename match_common_name to gfc_match_common_name.
* resolve.c (resolve_common_vars): Check each symbol in a
common block has a global symbol. If there is a global symbol
issue an error if the symbol type is known as is not a common
block name.
2020-09-27 Mark Eggleston <markeggleston@gcc.gnu.org>
gcc/testsuite/
PR fortran/95614
* gfortran.dg/pr95614_1.f90: New test.
* gfortran.dg/pr95614_2.f90: New test.
Add processing STRICT_LOW_PART for matched reloads.
2020-06-04 Vladimir Makarov <vmakarov@redhat.com>
PR middle-end/95464
* lra.c (lra_emit_move): Add processing STRICT_LOW_PART.
* lra-constraints.c (match_reload): Use STRICT_LOW_PART in output
reload if the original insn has it too.
Joe Ramsay [Wed, 19 Aug 2020 12:34:06 +0000 (12:34 +0000)]
arm: Require MVE memory operand for destination of vst1q intrinsic
Previously, the machine description patterns for vst1q accepted a generic memory
operand for the destination, which could lead to an unrecognised builtin when
expanding vst1q* intrinsics. This change fixes the pattern to only accept MVE
memory operands.
PR target/96683
* gcc.target/arm/mve/intrinsics/vst1q_f16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_s16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_s8.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_u16.c: New test.
* gcc.target/arm/mve/intrinsics/vst1q_u8.c: New test.
H.J. Lu [Mon, 14 Sep 2020 15:52:27 +0000 (08:52 -0700)]
rtl_data: Add sp_is_clobbered_by_asm
Add sp_is_clobbered_by_asm to rtl_data to inform backends that the stack
pointer is clobbered by asm statement.
gcc/
PR target/97032
* cfgexpand.c (asm_clobber_reg_kind): Set sp_is_clobbered_by_asm
to true if the stack pointer is clobbered by asm statement.
* emit-rtl.h (rtl_data): Add sp_is_clobbered_by_asm.
* config/i386/i386.c (ix86_get_drap_rtx): Set need_drap to true
if the stack pointer is clobbered by asm statement.
gcc/testsuite/
PR target/97032
* gcc.target/i386/pr97032.c: New test.
Alan Modra [Fri, 18 Sep 2020 13:51:05 +0000 (23:21 +0930)]
[RS6000] Power10 libffi fixes
Power10 pc-relative code doesn't use or preserve r2 as a TOC pointer.
That means calling between pc-relative and TOC using code can't be
done without intervening linker stubs, and a call from TOC code to
pc-relative code must have a nop after the bl in order to restore r2.
Now the PowerPC libffi assembly code doesn't use r2 except for the
implicit use when making calls back to C, ffi_closure_helper_LINUX64
and ffi_prep_args64. So changing the assembly to interoperate with
pc-relative code without stubs is easily done.
PR target/97166
* src/powerpc/linux64.S (ffi_call_LINUX64): Don't emit global
entry when __PCREL__. Call using @notoc. Add nops.
* src/powerpc/linux64_closure.S (ffi_closure_LINUX64): Likewise.
(ffi_go_closure_linux64): Likewise.
Jonathan Wakely [Tue, 22 Sep 2020 19:02:58 +0000 (20:02 +0100)]
libstdc++: Fix out-of-bounds string_view access in filesystem::path [PR 97167]
libstdc++-v3/ChangeLog:
PR libstdc++/97167
* src/c++17/fs_path.cc (path::_Parser::root_path()): Check
for empty string before inspecting the first character.
* testsuite/27_io/filesystem/path/append/source.cc: Append
empty string_view to path.
David Faust [Tue, 22 Sep 2020 18:31:35 +0000 (20:31 +0200)]
bpf: use xBPF signed div, mod insns when available
The 'mod' and 'div' operators in eBPF are unsigned, with no signed
counterpart. xBPF adds two new ALU operations, sdiv and smod, for
signed division and modulus, respectively. Update bpf.md with
'define_insn' blocks for signed div and mod to use them when targetting
xBPF, and add new tests to ensure they are used appropriately.
2020-09-17 David Faust <david.faust@oracle.com>
gcc/
* config/bpf/bpf.md: Add defines for signed div and mod operators.
gcc/testsuite/
* gcc.target/bpf/diag-sdiv.c: New test.
* gcc.target/bpf/diag-smod.c: New test.
* gcc.target/bpf/xbpf-sdiv-1.c: New test.
* gcc.target/bpf/xbpf-smod-1.c: New test.
Jonathan Wakely [Tue, 22 Sep 2020 07:42:18 +0000 (08:42 +0100)]
libstdc++: Use correct argument type for __use_alloc, again [PR 96803]
While backporting 5494edae83ad33c769bd1ebc98f0c492453a6417 I noticed
that it's still not correct. I made the allocator-extended constructor
use the right type for the uses-allocator construction detection, but I
used an rvalue when it should be a const lvalue.
This should fix it properly this time.
libstdc++-v3/ChangeLog:
PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl<U...>&)):
Use correct value category in __use_alloc call.
* testsuite/20_util/tuple/cons/96803.cc: Check with constructors
that require correct value category to be used.
Jonathan Wakely [Wed, 26 Aug 2020 18:32:30 +0000 (19:32 +0100)]
libstdc++: Use correct argument type for __use_alloc [PR 96803]
The _Tuple_impl constructor for allocator-extended construction from a
different tuple type uses the _Tuple_impl's own _Head type in the
__use_alloc test. That is incorrect, because the argument tuple could
have a different type. Using the wrong type might select the
leading-allocator convention when it should use the trailing-allocator
convention, or vice versa.
libstdc++-v3/ChangeLog:
PR libstdc++/96803
* include/std/tuple
(_Tuple_impl(allocator_arg_t, Alloc, const _Tuple_impl<U...>&)):
Replace parameter pack with a type parameter and a pack and pass
the first type to __use_alloc.
* testsuite/20_util/tuple/cons/96803.cc: New test.