Jakub Jelinek [Fri, 28 May 2021 09:26:48 +0000 (11:26 +0200)]
openmp: Fix up handling of reduction clause on constructs combined with target [PR99928]
The reduction clause should be copied as map (tofrom: ) to combined target if
present, but as we need different handling of array sections between map and reduction,
doing that during gimplification would be harder.
So, this patch adds them during splitting, and similarly to firstprivate adds them
with a new flag that they should be just ignored/removed if an explicit map clause
of the same list item is present.
The exact rules are to be decided in https://github.com/OpenMP/spec/issues/2766
so this patch just implements something that is IMHO reasonable and exact detailed
testcases for the cornercases will follow once it is clarified.
2021-05-28 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
gcc/
* tree.h (OMP_CLAUSE_MAP_IMPLICIT): Define.
gcc/c-family/
* c-omp.c (c_omp_split_clauses): For reduction clause if combined with
target add a map tofrom clause with OMP_CLAUSE_MAP_IMPLICIT.
gcc/c/
* c-typeck.c (handle_omp_array_sections): Copy OMP_CLAUSE_MAP_IMPLICIT.
(c_finish_omp_clauses): Move not just OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT
marked clauses last, but also OMP_CLAUSE_MAP_IMPLICIT. Add
map_firstprivate_head bitmap, set it for GOMP_MAP_FIRSTPRIVATE_POINTER
maps and silently remove OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT if it is
present too. For OMP_CLAUSE_MAP_IMPLICIT silently remove the clause
if present in map_head, map_field_head or map_firstprivate_head
bitmaps.
gcc/cp/
* semantics.c (handle_omp_array_sections): Copy
OMP_CLAUSE_MAP_IMPLICIT.
(finish_omp_clauses): Move not just OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT
marked clauses last, but also OMP_CLAUSE_MAP_IMPLICIT. Add
map_firstprivate_head bitmap, set it for GOMP_MAP_FIRSTPRIVATE_POINTER
maps and silently remove OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT if it is
present too. For OMP_CLAUSE_MAP_IMPLICIT silently remove the clause
if present in map_head, map_field_head or map_firstprivate_head
bitmaps.
gcc/testsuite/
* c-c++-common/gomp/pr99928-8.c: Remove all xfails.
* c-c++-common/gomp/pr99928-9.c: Likewise.
* c-c++-common/gomp/pr99928-10.c: Likewise.
* c-c++-common/gomp/pr99928-16.c: New test.
* testsuite/libgomp.fortran/depend-iterator-2.f90: New test.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/affinity-1.c: New test.
* c-c++-common/gomp/affinity-2.c: New test.
* c-c++-common/gomp/affinity-3.c: New test.
* c-c++-common/gomp/affinity-4.c: New test.
* c-c++-common/gomp/affinity-5.c: New test.
* c-c++-common/gomp/affinity-6.c: New test.
* c-c++-common/gomp/affinity-7.c: New test.
* gfortran.dg/gomp/affinity-clause-1.f90: New test.
* gfortran.dg/gomp/affinity-clause-2.f90: New test.
* gfortran.dg/gomp/affinity-clause-3.f90: New test.
* gfortran.dg/gomp/affinity-clause-4.f90: New test.
* gfortran.dg/gomp/affinity-clause-5.f90: New test.
* gfortran.dg/gomp/affinity-clause-6.f90: New test.
* gfortran.dg/gomp/depend-iterator-1.f90: New test.
* gfortran.dg/gomp/depend-iterator-2.f90: New test.
* gfortran.dg/gomp/depend-iterator-3.f90: New test.
* gfortran.dg/gomp/taskwait.f90: New test.
Jakub Jelinek [Fri, 28 May 2021 08:08:56 +0000 (10:08 +0200)]
libgomp: Add openacc_{cuda,cublas,cudart} effective targets and use them in openacc testsuite
When gcc is configured for nvptx offloading with --without-cuda-driver
and full CUDA isn't installed, many libgomp.oacc-*/* tests fail,
some of them because cuda.h header can't be found, others because
the tests can't be linked against -lcuda, -lcudart or -lcublas.
I usually only have akmod-nvidia and xorg-x11-drv-nvidia-cuda rpms
installed, so libcuda.so.1 can be dlopened and the offloading works,
but linking against those libraries isn't possible nor are the
headers around (for the plugin itself there is the fallback
libgomp/plugin/cuda/cuda.h).
The following patch adds 3 new effective targets and uses them in tests that
needs those.
Richard Earnshaw [Thu, 27 May 2021 09:25:37 +0000 (10:25 +0100)]
arm: Remove use of opts_set in arm_configure_build_target [PR100767]
The variable global_options_set is a reflection of which options have
been explicitly set from the command line in the structure
global_options. But it doesn't describe the contents of a
cl_target_option. cl_target_option is a set of options to apply and
once configured should represent a viable set of options without
needing to know which were explicitly set by the user.
Unfortunately arm_configure_build_target was incorrectly conflating
the two. Fortunately, however, we do not really need to know this
since the various override_options functions should have sanitized the
target_options values before constructing a cl_target_option
structure. It is safe, therefore, to simply drop this parameter to
arm_configure_build_target and rely on checking that various string
parameters are non-null before dereferencing them.
Alex Coplan [Tue, 11 May 2021 12:11:09 +0000 (13:11 +0100)]
arm: Avoid emitting bogus CFA adjusts for CMSE nonsecure calls [PR99725]
The PR shows us attaching REG_CFA_ADJUST_CFA notes to stack pointer
adjustments emitted in cmse_nonsecure_call_inline_register_clear (when
-march=armv8.1-m.main). However, the stack pointer is not guaranteed to
be the CFA reg. If we're at -O0 or we have -fno-omit-frame-pointer, then
the frame pointer will be used as the CFA reg, and these notes on the sp
adjustments will lead to ICEs in dwarf2out_frame_debug_adjust_cfa.
This patch avoids emitting these notes if the current function has a
frame pointer.
gcc/ChangeLog:
PR target/99725
* config/arm/arm.c (cmse_nonsecure_call_inline_register_clear):
Avoid emitting CFA adjusts on the sp if we have the fp.
gcc/testsuite/ChangeLog:
PR target/99725
* gcc.target/arm/cmse/pr99725.c: New test.
Jakub Jelinek [Wed, 26 May 2021 13:15:19 +0000 (15:15 +0200)]
openmp: Fix up handling of target constructs in offloaded routines [PR100573]
OpenMP Nesting of Regions restrictions say:
- If a target update, target data, target enter data, or target exit data
construct is encountered during execution of a target region, the behavior is unspecified.
- If a target construct is encountered during execution of a target region and a device
clause in which the ancestor device-modifier appears is not present on the construct, the
behavior is unspecified.
That wording is about the dynamic (runtime) behavior, not about lexical nesting,
so while it is UB if omp target * is encountered in the target region, we need to make
it compile and link (for lexical nesting of target * inside of target we actually
emit a warning).
To make this work, I had to do multiple changes.
One was to mark .omp_data_{sizes,kinds}.* variables when static as "omp declare target".
Another one was to add stub GOMP_target* entrypoints to nvptx and gcn libgomp.a.
The entrypoint functions shouldn't be called or passed in the offload regions,
otherwise
libgomp: cuLaunchKernel error: too many resources requested for launch
was reported; fixed by changing those arguments of calls to GOMP_target_ext
to NULL.
And we didn't mark the entrypoints "omp target entrypoint" when the caller
has been "omp declare target".
2021-05-26 Jakub Jelinek <jakub@redhat.com>
PR libgomp/100573
gcc/
* omp-low.c: Include omp-offload.h.
(create_omp_child_function): If current_function_decl has
"omp declare target" attribute and is_gimple_omp_offloaded,
remove that attribute from the copy of attribute list and
add "omp target entrypoint" attribute instead.
(lower_omp_target): Mark .omp_data_sizes.* and .omp_data_kinds.*
variables for offloading if in omp_maybe_offloaded_ctx.
* omp-offload.c (pass_omp_target_link::execute): Nullify second
argument to GOMP_target_data_ext in offloaded code.
libgomp/
* config/nvptx/target.c (GOMP_target_ext, GOMP_target_data_ext,
GOMP_target_end_data, GOMP_target_update_ext,
GOMP_target_enter_exit_data): New dummy entrypoints.
* config/gcn/target.c (GOMP_target_ext, GOMP_target_data_ext,
GOMP_target_end_data, GOMP_target_update_ext,
GOMP_target_enter_exit_data): Likewise.
* testsuite/libgomp.c-c++-common/for-3.c (DO_PRAGMA, OMPTEAMS,
OMPFROM, OMPTO): Define.
(main): Remove #pragma omp target teams around all the tests.
* testsuite/libgomp.c-c++-common/target-41.c: New test.
* testsuite/libgomp.c-c++-common/target-42.c: New test.
Harald Anlauf [Sun, 23 May 2021 18:51:14 +0000 (20:51 +0200)]
Fortran: fix passing return value to class(*) dummy argument
gcc/fortran/ChangeLog:
PR fortran/100551
* trans-expr.c (gfc_conv_procedure_call): Adjust check for
implicit conversion of actual argument to an unlimited polymorphic
procedure argument.
gcc/testsuite/ChangeLog:
PR fortran/100551
* gfortran.dg/pr100551.f90: New test.
Uros Bizjak [Tue, 18 May 2021 13:45:54 +0000 (15:45 +0200)]
i386: Fix split_double_mode with paradoxical subreg [PR100626]
split_double_mode calls simplify_gen_subreg, which fails for the
high half of the paradoxical subreg. Return temporary register
instead of NULL RTX in this case.
2021-05-18 Uroš Bizjak <ubizjak@gmail.com>
gcc/
PR target/100626
* config/i386/i386-expand.c (split_double_mode): Return
temporary register when simplify_gen_subreg fails with
the high half od the paradoxical subreg.
Jakub Jelinek [Tue, 25 May 2021 10:45:15 +0000 (12:45 +0200)]
openmp: Fix reduction clause handling on teams distribute simd [PR99928]
When a directive isn't combined with worksharing-loop, it takes much
simpler clause splitting path for reduction, and that one was missing
handling of teams when combined with simd.
2021-05-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Copy reduction to teams when teams is
combined with simd and not with taskloop or for.
gcc/testsuite/
* c-c++-common/gomp/pr99928-8.c: Remove xfails from omp teams r21 and
r28 checks.
* c-c++-common/gomp/pr99928-9.c: Likewise.
* c-c++-common/gomp/pr99928-10.c: Likewise.
libgomp/
* testsuite/libgomp.c-c++-common/reduction-17.c: New test.
This splits can_associate_p into checks for SSA defs and checks
for the type so it can be called from is_reassociable_op to
catch cases not catched by the earlier fix.
2021-05-11 Richard Biener <rguenther@suse.de>
PR tree-optimization/100519
* tree-ssa-reassoc.c (can_associate_p): Split into...
(can_associate_op_p): ... this
(can_associate_type_p): ... and this.
(is_reassociable_op): Call can_associate_op_p.
(break_up_subtract_bb): Call the appropriate predicates.
(reassociate_bb): Likewise.
Richard Biener [Tue, 11 May 2021 11:23:45 +0000 (13:23 +0200)]
ipa/100513 - fix SSA_NAME_DEF_STMT corruption in IPA param manip
This fixes unintended clobbering of SSA_NAME_DEF_STMT of the
cloned/inlined from SSA name during IPA parameter manipulation
of call stmt LHSs. gimple_call_set_lhs adjusts SSA_NAME_DEF_STMT
of the lhs to the stmt being modified but when
ipa_param_body_adjustments::modify_call_stmt is called the
cloning/inlining process has not yet remapped the stmts operands
to the copy variants but they are still original.
2021-05-11 Richard Biener <rguenther@suse.de>
PR ipa/100513
* ipa-param-manipulation.c
(ipa_param_body_adjustments::modify_call_stmt): Avoid
altering SSA_NAME_DEF_STMT by adjusting the calls LHS
via gimple_call_lhs_ptr.
Richard Biener [Tue, 11 May 2021 08:58:35 +0000 (10:58 +0200)]
middle-end/100509 - avoid folding constant to aggregate type
When folding a constant initializer looking through aliases to
incompatible types can lead to us trying to fold a constant
to an aggregate type which can't work. Simply avoid trying
to constant fold non-register typed symbols.
2021-05-11 Richard Biener <rguenther@suse.de>
PR middle-end/100509
* gimple-fold.c (fold_gimple_assign): Only call
get_symbol_constant_value on register type symbols.
Richard Biener [Mon, 10 May 2021 09:37:27 +0000 (11:37 +0200)]
tree-optimization/100492 - avoid irreducible regions in loop distribution
When we distribute away a condition we rely on the ability to
change it to either 1 != 0 or 0 != 0 depending on the direction
of the exit branch in the respective loop. But when the loop
contains an irreducible sub-region then for the conditions inside
this this fails and can lead to infinite loops being generated.
Avoid distibuting loops with irreducible sub-regions.
2021-05-10 Richard Biener <rguenther@suse.de>
PR tree-optimization/100492
* tree-loop-distribution.c (find_seed_stmts_for_distribution):
Find nothing when the loop contains an irreducible region.
PR fortran/86470
* testsuite/libgomp.fortran/class-firstprivate-1.f90: New test.
* testsuite/libgomp.fortran/class-firstprivate-2.f90: New test.
* testsuite/libgomp.fortran/class-firstprivate-3.f90: New test.
gcc/testsuite/ChangeLog:
PR fortran/86470
* gfortran.dg/gomp/class-firstprivate-1.f90: New test.
* gfortran.dg/gomp/class-firstprivate-2.f90: New test.
* gfortran.dg/gomp/class-firstprivate-3.f90: New test.
* gfortran.dg/gomp/class-firstprivate-4.f90: New test.
Alex Coplan [Mon, 10 May 2021 08:46:45 +0000 (09:46 +0100)]
arm: Fix wrong code with MVE V2DImode loads and stores [PR99960]
As the PR shows, we currently miscompile V2DImode loads and stores for
MVE. We're currently using 64-bit loads/stores, but need to be using
128-bit vector loads and stores. Fixed thusly.
Some intrinsics tests were checking that we (incorrectly) used the
64-bit loads/stores: these have been updated.
gcc/ChangeLog:
PR target/99960
* config/arm/mve.md (*mve_mov<mode>): Simplify output code. Use
vldrw.u32 and vstrw.32 for V2D[IF]mode loads and stores.
gcc/testsuite/ChangeLog:
PR target/99960
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c:
Update now that we're (correctly) using full 128-bit vector
loads/stores.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c:
Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_int.c: Likewise.
* gcc.target/arm/mve/intrinsics/vuninitializedq_int1.c:
Likewise.
Jakub Jelinek [Sun, 23 May 2021 11:18:10 +0000 (13:18 +0200)]
openmp: Fix up firstprivate+lastprivate clause handling [PR99928]
The C/C++ clause splitting happens very early during construct parsing,
but only the FEs later on handle possible instantiations, non-static
member handling and array section lowering.
In the OpenMP 5.0/5.1 rules, whether firstprivate is added to combined
target depends on whether it isn't also mentioned in lastprivate or map
clauses, but unfortunately I think such checks are much better done only
when the FEs perform all the above mentioned changes.
So, this patch arranges for the firstprivate clause to be copied or moved
to combined target construct (as before), but sets flags on that clause,
which tell the FE *finish_omp_clauses and the gimplifier it has been added
only conditionally and let the FEs and gimplifier DTRT for these.
2021-05-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
gcc/
* tree.h (OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET): Define.
* gimplify.c (enum gimplify_omp_var_data): Fix up
GOVD_MAP_HAS_ATTACHMENTS value, add GOVD_FIRSTPRIVATE_IMPLICIT.
(omp_lastprivate_for_combined_outer_constructs): If combined target
has GOVD_FIRSTPRIVATE_IMPLICIT set for the decl, change it to
GOVD_MAP | GOVD_SEEN.
(gimplify_scan_omp_clauses): Set GOVD_FIRSTPRIVATE_IMPLICIT for
firstprivate clauses with OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT.
(gimplify_adjust_omp_clauses): For firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT either clear that bit and
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET too, or remove it and
let it be replaced by implicit map clause.
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Set OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT
on firstprivate clause copy going to target construct, and for
target simd set also OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET bit.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): Move firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT to the end of the chain. Don't error
if a decl is mentioned both in map clause and in such firstprivate
clause unless OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET is also set.
gcc/cp/
* semantics.c (finish_omp_clauses): Move firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT to the end of the chain. Don't error
if a decl is mentioned both in map clause and in such firstprivate
clause unless OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET is also set.
gcc/testsuite/
* c-c++-common/gomp/pr99928-3.c: Remove all xfails.
* c-c++-common/gomp/pr99928-15.c: New test.
Jakub Jelinek [Sun, 23 May 2021 11:09:26 +0000 (13:09 +0200)]
openmp: Fix up handling of implicit lastprivate on outer constructs for implicit linear and lastprivate IVs [PR99928]
This patch fixes the handling of lastprivate propagation to outer combined/composite
leaf constructs from implicit linear or lastprivate clauses on simd IVs and adds missing
testsuite coverage for explicit and implicit lastprivate on simd IVs.
2021-05-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
* gimplify.c (omp_lastprivate_for_combined_outer_constructs): New
function.
(gimplify_scan_omp_clauses) <case OMP_CLAUSE_LASTPRIVATE>: Use it.
(gimplify_omp_for): Likewise.
* c-c++-common/gomp/pr99928-6.c: Remove all xfails.
* c-c++-common/gomp/pr99928-13.c: New test.
* c-c++-common/gomp/pr99928-14.c: New test.
Eric Botcazou [Fri, 21 May 2021 08:57:02 +0000 (10:57 +0200)]
Fix internal error on locally derived bit-packed array type
This is a regression present on the mainline, 11 and 10 branches,
in the form of an ICE on a locally derived bit-packed array type.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) <E_Array_Type>: Process
the implementation type of a packed type implemented specially.
gcc/testsuite/
* gnat.dg/derived_type7.adb, gnat.dg/derived_type7.ads: New test.
Eric Botcazou [Fri, 21 May 2021 08:45:21 +0000 (10:45 +0200)]
Always translate Is_Pure flag into pure in C sense
Gigi has historically translated the Is_Pure flag of the front-end into
the "const" attribute of GNU C. That's correct for subprograms of pure
Ada units, but not fully exact according to the semantics of the flag.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_subprog_type): Always translate
the Is_Pure flag into the "pure" attribute of GNU C.
Eric Botcazou [Fri, 21 May 2021 08:40:41 +0000 (10:40 +0200)]
Fix segfault at run time on strict-alignment platforms
This fixes a regression present on the mainline and 11 branch by
restricting the problematic change dealing with bitfields whose
nomimal subtype is self-referential to the cases where the size
is really lower.
gcc/ada/
* gcc-interface/trans.c (Call_to_gnu): Restrict previous change
to bitfields whose size is not equal to the type size.
(gnat_to_gnu): Likewise.
Eric Botcazou [Fri, 21 May 2021 08:26:50 +0000 (10:26 +0200)]
Fix incorrect SLOC on instruction
This puts the missing SLOC on a statement generated by a return.
gcc/ada/
* gcc-interface/trans.c (gnat_to_gnu) <N_Simple_Return_Statement>:
Put a SLOC on the assignment from the return value to the return
object in the copy-in/copy-out case.
Jason Merrill [Thu, 20 May 2021 01:12:45 +0000 (21:12 -0400)]
c++: designated init with anonymous union [PR100489]
My patch for PR98463 added an assert that tripped on this testcase, because
we ended up with a U CONSTRUCTOR with an initializer for a, which is not a
member of U. We need to wrap the a initializer in another CONSTRUCTOR for
the anonymous union.
There was already support for this in process_init_constructor_record, but
not in process_init_constructor_union. But since this is about brace
elision, it really belongs under reshape_init rather than digest_init, so
this patch moves the handling to reshape_init_class, which also handles
unions.
PR c++/100489
gcc/cp/ChangeLog:
* decl.c (reshape_init_class): Handle designator for
member of anonymous aggregate here.
* typeck2.c (process_init_constructor_record): Not here.
Andreas Krebbel [Tue, 27 Apr 2021 08:09:06 +0000 (10:09 +0200)]
PR100281 C++: Fix SImode pointer handling
The problem appears to be triggered by two locations in the front-end
where non-POINTER_SIZE pointers aren't handled right now.
1. An assertion in strip_typedefs is triggered because the alignment
of the types don't match. This in turn is caused by creating the new
type with build_pointer_type instead of taking the type of the
original pointer into account.
2. An assertion in cp_convert_to_pointer is triggered which expects
the target type to always have POINTER_SIZE.
gcc/cp/ChangeLog:
PR c++/100281
* cvt.c (cp_convert_to_pointer): Use the size of the target
pointer type.
* tree.c (cp_build_reference_type): Call
cp_build_reference_type_for_mode with VOIDmode.
(cp_build_reference_type_for_mode): Rename from
cp_build_reference_type. Add MODE argument and invoke
build_reference_type_for_mode.
(strip_typedefs): Use build_pointer_type_for_mode and
cp_build_reference_type_for_mode for pointers and references.
gcc/ChangeLog:
PR c++/100281
* tree.c (build_reference_type_for_mode)
(build_pointer_type_for_mode): Pick pointer mode if MODE argument
is VOIDmode.
(build_reference_type, build_pointer_type): Invoke
build_*_type_for_mode with VOIDmode.
gcc/testsuite/ChangeLog:
PR c++/100281
* g++.target/s390/pr100281-1.C: New test.
* g++.target/s390/pr100281-2.C: New test.
Joern Rennecke [Thu, 20 May 2021 12:21:41 +0000 (13:21 +0100)]
libstdc++: Disable floating_to_chars.cc on 16 bit targets
This patch conditionally disables the compilation of floating_to_chars.cc
on 16 bit targets, thus fixing a build failure for these targets as
the POW10_SPLIT_2 array exceeds the maximum object size.
libstdc++-v3/
PR libstdc++/100361
* include/std/charconv (to_chars): Hide the overloads for
floating-point types for 16 bit targets.
* src/c++17/floating_to_chars.cc: Don't compile for 16 bit targets.
* testsuite/20_util/to_chars/double.cc: Run this test only on
size32plus targets.
* testsuite/20_util/to_chars/float.cc: Likewise.
* testsuite/20_util/to_chars/long_double.cc: Likewise.
Jakub Jelinek [Thu, 20 May 2021 11:30:48 +0000 (13:30 +0200)]
openmp: Handle explicit linear clause properly in combined constructs with target [PR99928]
linear clause should have the effect of firstprivate+lastprivate (or for IVs
not declared in the construct lastprivate) on outer constructs and eventually
map(tofrom:) on target when combined with it.
2021-05-20 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
* gimplify.c (gimplify_scan_omp_clauses) <case OMP_CLAUSE_LINEAR>: For
explicit linear clause when combined with target, make it map(tofrom:)
instead of no clause or firstprivate.
* c-c++-common/gomp/pr99928-4.c: Remove all xfails.
* c-c++-common/gomp/pr99928-5.c: Likewise.
Jason Merrill [Wed, 19 May 2021 21:33:21 +0000 (17:33 -0400)]
c++: _Complex template parameter [PR100634]
We were crashing because invalid_nontype_parm_type_p allowed _Complex
template parms, but convert_nontype_argument didn't know what to do for
them. Let's just disallow it, people can and should use std::complex
instead.
PR c++/100634
gcc/cp/ChangeLog:
* pt.c (invalid_nontype_parm_type_p): Return true for COMPLEX_TYPE.
Jason Merrill [Wed, 19 May 2021 20:40:24 +0000 (16:40 -0400)]
c++: ICE with using and enum [PR100659]
Here the code for 'using enum' is confused by the combination of a
using-decl and an enum that are not from 'using enum'; this CONST_DECL is
from the normal unscoped enum scoping.
PR c++/100659
gcc/cp/ChangeLog:
* cp-tree.h (CONST_DECL_USING_P): Check for null TREE_TYPE.
Jason Merrill [Tue, 18 May 2021 16:29:33 +0000 (12:29 -0400)]
c++: ICE with <=> fallback [PR100367]
Here, when genericizing lexicographical_compare_three_way, we haven't yet
walked the operands, so (a == a) still sees ADDR_EXPR <a>, but this is after
we've changed the type of a to REFERENCE_TYPE. When we try to fold (a == a)
by constexpr evaluation, the constexpr code doesn't understand trying to
take the address of a reference, and we end up crashing.
Fixed by avoiding constexpr evaluation in genericize_spaceship, by using
fold_build2 instead of build_new_op on scalar operands. Class operands
should have been expanded during parsing.
PR c++/100367
PR c++/96299
gcc/cp/ChangeLog:
* method.c (genericize_spaceship): Use fold_build2 for scalar
operands.
Christophe Lyon [Wed, 19 May 2021 14:45:54 +0000 (14:45 +0000)]
arm/testsuite: Fix testcase for PR99977
Some targets (eg arm-none-uclinuxfdpiceabi) do not support Thumb-1,
and since the testcase forces -march=armv8-m.base, we need to check
whether this option is actually supported.
Using dg-add-options arm_arch_v8m_base ensure that we pass -mthumb as
needed too.
Jakub Jelinek [Wed, 19 May 2021 07:41:55 +0000 (09:41 +0200)]
openmp: Handle lastprivate on combined target correctly [PR99928]
This patch deals with 2 issues:
1) the gimplifier couldn't differentiate between
#pragma omp parallel master
#pragma omp taskloop simd
and
#pragma omp parallel master taskloop simd
when there is a significant difference for clause handling between
the two; as master construct doesn't have any clauses, we don't currently
represent it during gimplification by an gimplification omp context at all,
so this patch makes sure we don't set OMP_PARALLEL_COMBINED on parallel master
when not combined further. If we ever add a separate master context during
gimplification, we'd use ORT_COMBINED_MASTER vs. ORT_MASTER (or MASKED probably).
2) lastprivate when combined with target should be map(tofrom:) on the target,
this change handles it only when not combined with firstprivate though, that
will need further work (similarly to linear or reduction).
2021-05-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
gcc/
* tree.h (OMP_MASTER_COMBINED): Define.
* gimplify.c (gimplify_scan_omp_clauses): Rewrite lastprivate
handling for outer combined/composite constructs to a loop.
Handle lastprivate on combined target.
(gimplify_expr): Formatting fix.
gcc/c/
* c-parser.c (c_parser_omp_master): Set OMP_MASTER_COMBINED on
master when combined with taskloop.
(c_parser_omp_parallel): Don't set OMP_PARALLEL_COMBINED on
parallel master when not combined with taskloop.
gcc/cp/
* parser.c (cp_parser_omp_master): Set OMP_MASTER_COMBINED on
master when combined with taskloop.
(cp_parser_omp_parallel): Don't set OMP_PARALLEL_COMBINED on
parallel master when not combined with taskloop.
gcc/testsuite/
* c-c++-common/gomp/pr99928-2.c: Remove all xfails.
* c-c++-common/gomp/pr99928-12.c: New test.
Jason Merrill [Tue, 18 May 2021 21:15:42 +0000 (17:15 -0400)]
c++: ICE with bad definition of decimal32 [PR100261]
The change to only look at the global binding for non-classes meant that
here, when dealing with decimal32 which is magically mangled like its first
non-static data member, we got a collision with the mangling for float.
Fixed by also looking up an existing binding for such magical classes.
Here we have a pack expansion of a template template parameter pack, of
which the pattern is a TEMPLATE_DECL, which strip_typedefs doesn't want to
see.
PR c++/100372
gcc/cp/ChangeLog:
* tree.c (strip_typedefs): Only look at the pattern of a
TYPE_PACK_EXPANSION if it's a type.
Jason Merrill [Tue, 18 May 2021 16:06:36 +0000 (12:06 -0400)]
c++: "perfect" implicitly deleted move [PR100644]
Here we were ignoring the template constructor because the implicit move
constructor had all perfect conversions. But CWG1402 says that an
implicitly deleted move constructor is ignored by overload resolution; we
implement that instead by preferring any other candidate in joust, to get
better diagnostics, but that means we need to handle that case here as well.
gcc/cp/ChangeLog:
PR c++/100644
* call.c (perfect_candidate_p): An implicitly deleted move
is not perfect.
Jason Merrill [Wed, 14 Apr 2021 15:24:50 +0000 (11:24 -0400)]
c++: constant expressions are evaluated [PR93314]
My GCC 11 patch for PR93314 turned off cp_unevaluated_operand while
processing an id-expression that names a non-static data member, but the
broader issue is that in general, a constant-expression is evaluated even in
an unevaluated operand.
This also fixes 100205, introduced by the earlier patch that couldn't
distinguish between the different allow_non_constant_p cases.
PR c++/100205
PR c++/93314
gcc/cp/ChangeLog:
* cp-tree.h (cp_evaluated): Add reset parm to constructor.
* parser.c (cp_parser_constant_expression): Change
allow_non_constant_p to int. Use cp_evaluated.
(cp_parser_initializer_clause): Pass 2 to allow_non_constant_p.
* semantics.c (finish_id_expression_1): Don't mess with
cp_unevaluated_operand here.
Tom de Vries [Tue, 18 May 2021 06:24:00 +0000 (08:24 +0200)]
[nvptx] Handle memmodel for atomic ops
The atomic ops in nvptx.md have memmodel arguments, which are currently
ignored.
Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c
on volta.
Tested libgomp on x86_64-linux with nvptx accelerator.
gcc/ChangeLog:
2021-05-17 Tom de Vries <tdevries@suse.de>
PR target/100497
* config/nvptx/nvptx-protos.h (nvptx_output_atomic_insn): Declare
* config/nvptx/nvptx.c (nvptx_output_barrier)
(nvptx_output_atomic_insn): New function.
(nvptx_print_operand): Add support for 'B'.
* config/nvptx/nvptx.md: Use nvptx_output_atomic_insn for atomic
insns.
openmp: Notify team barrier of pending tasks in omp_fulfill_event
The team barrier should be notified of any new tasks that become runnable
as the result of a completing task, otherwise the barrier threads might
not resume processing available tasks, resulting in a hang.
openmp: Notify team barrier of pending tasks in omp_fulfill_event
The team barrier should be notified of any new tasks that become runnable
as the result of a completing task, otherwise the barrier threads might
not resume processing available tasks, resulting in a hang.
Jonathan Wakely [Mon, 17 May 2021 10:54:06 +0000 (11:54 +0100)]
libstdc++: Fix filesystem::path constraints for volatile [PR 100630]
The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval<Source>())) which mean it considers
conversion from an rvalue. When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.
Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval<const Source&>(),
not declval<Source>(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.
libstdc++-v3/ChangeLog:
PR libstdc++/100630
* include/experimental/bits/fs_path.h (__is_constructible_from):
Test construction from a const lvalue, not an rvalue.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.
libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): loop
until value change observed.
(__waiter_base::_M_laundered): New member.
(__waiter_base::_M_notify): Check _M_laundered to determine
whether to wake one or all.
(__detail::__atomic_compare): Return true if call to
__builtin_memcmp() == 0.
(__waiter_base::_S_do_spin_v): Adjust predicate.
* testsuite/29_atomics/atomic/wait_notify/100334.cc: New
test.
Alex Coplan [Tue, 27 Apr 2021 13:56:15 +0000 (14:56 +0100)]
arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]
The PR shows two ICEs with __sync_bool_compare_and_swap and
-mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and one
later on, after the CAS insn is split.
The LRA ICE occurs because the
@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern attempts to tie
two output operands together (operands 0 and 1 in the third
alternative). LRA can't handle this, since it doesn't make sense for an
insn to assign to the same operand twice.
The later (post-splitting) ICE occurs because the expansion of the
cbranchsi4_scratch insn doesn't quite go according to plan. As it
stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
attempting to pass a register (neg_bval) to use as a scratch register.
However, since the RTL template has a match_scratch here,
gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx.
Since this is all happening after RA, this is doomed to fail (and we get
an ICE about the insn not matching its constraints).
It seems that the motivation for the choice of constraints in the
atomic_compare_and_swap pattern comes from an attempt to satisfy the
constraints of the cbranchsi4_scratch insn. This insn requires the
scratch register to be the same as the input register in the case that
we use a larger negative immediate (one that satisfies J, but not L).
Of course, as noted above, LRA refuses to assign two output operands to
the same register, so this was never going to work.
The solution I'm proposing here is to collapse the alternatives to the
CAS insn (allowing the two output register operands to be matched to
different registers) and to ensure that the constraints for
cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this by
inserting a move to ensure the source and destination registers match if
necessary (i.e. in the case of large negative immediates).
Another notable change here is that we only do:
emit_move_insn (neg_bval, const1_rtx);
for non-negative immediates. This is because the ADDS instruction used in
the negative case suffices to leave a suitable value in neg_bval: if the
operands compare equal, we don't take the branch (so neg_bval will be
set by the load exclusive). Otherwise, the ADDS will leave a nonzero
value in neg_bval, which will correctly signal that the CAS has failed
when it is later negated.
gcc/ChangeLog:
PR target/99977
* config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
with negative immediates: ensure we expand cbranchsi4_scratch
correctly and ensure we satisfy its constraints.
* config/arm/sync.md
(@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): Don't
attempt to tie two output operands together with constraints;
collapse two alternatives.
(@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise.
* config/arm/thumb1.md (cbranchsi4_neg_late): New.
gcc/testsuite/ChangeLog:
PR target/99977
* gcc.target/arm/pr99977.c: New test.
Richard Biener [Mon, 17 May 2021 06:51:03 +0000 (08:51 +0200)]
Update mpfr version to 3.1.6
This updates the mpfr version to 3.1.6 which is the last bugfix
release from the 3.1.x series and avoids printing the version
is buggy but acceptable from our configury.
2021-05-17 Richard Biener <rguenther@suse.de>
contrib/ChangeLog:
* download_prerequisites: Update mpfr version to 3.1.6.
* prerequisites.md5: Update.
* prerequisites.sha512: Likewise.
Jakub Jelinek [Fri, 14 May 2021 08:17:19 +0000 (10:17 +0200)]
openmp: Add testcases to verify OpenMP 5.0 2.14 and OpenMP 5.1 2.17 rules [PR99928]
In preparation of PR99928 patch review, I've prepared testcases with clauses
that need more interesting handling on combined/composite constructs,
in particular firstprivate, lastprivate, firstprivate+lastprivate, linear
(explicit on non-iv, explicit on simd iv, implicit on simd iv, implicit on
simd iv declared in the construct), reduction (scalars, array sections of
array variables, array sections with pointer bases) and in_reduction.
OpenMP 5.0 had the wording broken for reduction, the intended rule to use
map(tofrom:) on target when combined with it was bound only on inscan modifier
presence which makes no sense, as then inscan may not be used, this has
been fixed in 5.1 and I'm just assuming 5.1 wording for that.
There are various cases where e.g. from historical or optimization reasons
GCC slightly deviates from the rules, but in most cases it is something
that shouldn't be really observable, e.g. whether
#pragma omp parallel for firstprivate(x)
is handled as
#pragma omp parallel shared(x)
#pragma omp for firstprivate(x)
or
#pragma omp parallel firstprivate(x)
#pragma omp for
shouldn't be possible to distinguish in user code. I've added FIXMEs
in the testcases about that, but maybe we just should keep it as is
(alternative would be to do it in standard compliant way and transform
into whatever we like after gimplification (e.g. early during omplower)).
Some cases we for historical reasons implement even with clauses on
constructs which in the standard don't accept them that way and then
handling those magically in omp lowering/expansion, in particular e.g.
#pragma omp parallel for firstprivate(x) lastprivate(x)
we treat as
#pragma omp parallel firstprivate(x) lastprivate(x)
#pragma omp for
even when lastprivate is not valid on parallel. Maybe one day we
could change that if we make sure we don't regress generated code quality.
I've also found a bug in OpenMP 5.0/5.1,
#pragma omp parallel sections firstprivate(x) lastprivate(x)
incorrectly says that it should be handled as
#pragma omp parallel firstprivate(x)
#pragma omp sections lastprivate(x)
which when written that way results in error; filed as
https://github.com/OpenMP/spec/issues/2758
to be fixed in OpenMP 5.2. GCC handles it the way it used to do
and users expect, so nothing to fix on the GCC side.
Also, we don't support yet in_reduction clause on target construct,
which means the -11.c testcase can't include any tests about in_reduction
handling on all the composite constructs that include target.
The work found two kinds of bugs on the GCC side, one is the known thing
that we implement still the 4.5 behavior and don't mark for
lastprivate/linear/reduction the list item as map(tofrom:) as mentioned
in PR99928. These cases are xfailed in the tests.
And another one is with r21 and r28 in -{8,9,10}.c tests - we don't add
reduction clause on teams for
#pragma omp {target ,}teams distribute simd reduction(+:r)
even when the spec says that teams shouldn't receive reduction only
when combined with loop construct.
In
make check-gcc check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} gomp.exp=pr99928*'
testing this shows:
# of expected passes 5648
# of expected failures 872
and with Tobias' patch applied:
# of expected passes 5648
# of unexpected successes 384
# of expected failures 488
2021-05-13 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
* c-c++-common/gomp/pr99928-1.c: New test.
* c-c++-common/gomp/pr99928-2.c: New test.
* c-c++-common/gomp/pr99928-3.c: New test.
* c-c++-common/gomp/pr99928-4.c: New test.
* c-c++-common/gomp/pr99928-5.c: New test.
* c-c++-common/gomp/pr99928-6.c: New test.
* c-c++-common/gomp/pr99928-7.c: New test.
* c-c++-common/gomp/pr99928-8.c: New test.
* c-c++-common/gomp/pr99928-9.c: New test.
* c-c++-common/gomp/pr99928-10.c: New test.
* c-c++-common/gomp/pr99928-11.c: New test.
Tobias Burnus [Fri, 14 May 2021 07:30:01 +0000 (09:30 +0200)]
OpenMP: Support complex/float in && and || reduction
C/C++ permit logical AND and logical OR also with floating-point or complex
arguments by doing an unequal zero comparison; the result is an 'int' with
value one or zero. Hence, those are also permitted as reduction variable,
even though it is not the most sensible thing to do.
gcc/c/ChangeLog:
* c-typeck.c (c_finish_omp_clauses): Accept float + complex
for || and && reductions.
gcc/cp/ChangeLog:
* semantics.c (finish_omp_reduction_clause): Accept float + complex
for || and && reductions.
gcc/ChangeLog:
* omp-low.c (lower_rec_input_clauses, lower_reduction_clauses): Handle
&& and || with floating-point and complex arguments.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/clause-1.c: Use 'reduction(&:..)' instead of '...(&&:..)'.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/reduction-1.c: New test.
* testsuite/libgomp.c-c++-common/reduction-2.c: New test.
* testsuite/libgomp.c-c++-common/reduction-3.c: New test.
Tom de Vries [Fri, 14 May 2021 07:24:47 +0000 (09:24 +0200)]
Disable SIMT for user-defined reduction
The test-case included in this patch contains this target region:
...
for (int i0 = 0 ; i0 < N0 ; i0++ )
counter_N0.i += 1;
...
When running with nvptx accelerator, the counter variable is expected to
be N0 after the region, but instead is N0 / 32. The problem is that rather
than getting the result for all warp lanes, we get it for just one lane.
This is caused by the implementation of SIMT being incomplete. It handles
regular reductions, but appearantly not user-defined reductions.
For now, handle this by disabling SIMT in this case, specifically by setting
sctx->max_vf to 1.
Tested libgomp on x86_64-linux with nvptx accelerator.
gcc/ChangeLog:
2021-05-03 Tom de Vries <tdevries@suse.de>
PR target/100321
* omp-low.c (lower_rec_input_clauses): Disable SIMT for user-defined
reduction.
libgomp/ChangeLog:
2021-05-03 Tom de Vries <tdevries@suse.de>
PR target/100321
* testsuite/libgomp.c/target-44.c: New test.
Tom de Vries [Fri, 14 May 2021 07:21:36 +0000 (09:21 +0200)]
Handle alternative IV
Consider the test-case libgomp.c/pr81778.c added in this commit, with
this core loop (note: CANARY_SIZE set to 0 for simplicity):
...
int s = 1;
#pragma omp target simd
for (int i = N - 1; i > -1; i -= s)
a[i] = 1;
...
which, given that N is 32, sets a[0..31] to 1.
On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
on the lane that is executing) at bb entry.
So we have the following sequence:
- a[0..31] is set to 1
- i is updated to -32..-1
- D.3209 is updated to 1 (being 0 initially)
- bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
to true
- bb6 is once more executed, which should not happen because all the elements
that needed to be handled were already handled.
- consequently, elements that should not be written are written
- with CANARY_SIZE == 0, we may run into a libgomp error:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
and with CANARY_SIZE unmodified, we run into:
...
Expected 0, got 1 at base[-961]
Aborted (core dumped)
...
The cause of this is as follows:
- because the step s is a variable rather than a constant, an alternative
IV (D.3209 in our example) is generated in expand_omp_simd, and the
loop condition is tested in terms of the alternative IV rather than
the original IV (i in our example).
- the SIMT code in expand_omp_simd works by modifying step and initial value.
- The initial value fd->loop.n1 is loaded into a variable n1, which is
modified by the SIMT code and then used there-after.
- The step fd->loop.step is loaded into a variable step, which is modified
by the SIMT code, but afterwards there are uses of both step and
fd->loop.step.
- There are uses of fd->loop.step in the alternative IV handling code,
which should use step instead.
Fix this by introducing an additional variable orig_step, which is not
modified by the SIMT code and replacing all remaining uses of fd->loop.step
by either step or orig_step.
Build on x86_64-linux with nvptx accelerator, tested libgomp.
This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
with driver 450.66.
gcc/ChangeLog:
2020-10-02 Tom de Vries <tdevries@suse.de>
* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
fd->loop.step by either step or orig_step.
Tobias Burnus [Fri, 14 May 2021 07:44:19 +0000 (09:44 +0200)]
Fix fallout from merge from releases/gcc-11
Re-apply commit f963d6c79b1d52ce565c772166c3c3e1b1d0aa78 on nvptx.c:
"Remove duplicate SESE code in NVPTX backend"
The code got readded by a wrongly resolved merge conflict.
Apply to the OG11 code the change of the merge-conflict causing change:
commit da9c085ddbfe61e5954c8ec4e996240fa3a994c0
"nvptx: Fix up nvptx build against latest libstdc++ [PR100375]"
gcc/ChangeLog
* omp-sese.c (omp_sese_pseudo): Use nullptr instead of 0
as first argument of pseudo_node_t constructors.
* config/nvptx/nvptx.c (bb_pair_t, bb_pair_vec_t,
pseudo_node_t, bracket, bracket_vec_t,
bb_sese, bb_sese::~bb_sese, bb_sese::append, bb_sese::remove,
BB_SET_SESE, BB_GET_SESE, nvptx_sese_number, nvptx_sese_pseudo,
nvptx_sese_color, nvptx_find_sese): Remove again.