Jakub Jelinek [Wed, 26 May 2021 13:15:19 +0000 (15:15 +0200)]
openmp: Fix up handling of target constructs in offloaded routines [PR100573]
OpenMP Nesting of Regions restrictions say:
- If a target update, target data, target enter data, or target exit data
construct is encountered during execution of a target region, the behavior is unspecified.
- If a target construct is encountered during execution of a target region and a device
clause in which the ancestor device-modifier appears is not present on the construct, the
behavior is unspecified.
That wording is about the dynamic (runtime) behavior, not about lexical nesting,
so while it is UB if omp target * is encountered in the target region, we need to make
it compile and link (for lexical nesting of target * inside of target we actually
emit a warning).
To make this work, I had to do multiple changes.
One was to mark .omp_data_{sizes,kinds}.* variables when static as "omp declare target".
Another one was to add stub GOMP_target* entrypoints to nvptx and gcn libgomp.a.
The entrypoint functions shouldn't be called or passed in the offload regions,
otherwise
libgomp: cuLaunchKernel error: too many resources requested for launch
was reported; fixed by changing those arguments of calls to GOMP_target_ext
to NULL.
And we didn't mark the entrypoints "omp target entrypoint" when the caller
has been "omp declare target".
2021-05-26 Jakub Jelinek <jakub@redhat.com>
PR libgomp/100573
gcc/
* omp-low.c: Include omp-offload.h.
(create_omp_child_function): If current_function_decl has
"omp declare target" attribute and is_gimple_omp_offloaded,
remove that attribute from the copy of attribute list and
add "omp target entrypoint" attribute instead.
(lower_omp_target): Mark .omp_data_sizes.* and .omp_data_kinds.*
variables for offloading if in omp_maybe_offloaded_ctx.
* omp-offload.c (pass_omp_target_link::execute): Nullify second
argument to GOMP_target_data_ext in offloaded code.
libgomp/
* config/nvptx/target.c (GOMP_target_ext, GOMP_target_data_ext,
GOMP_target_end_data, GOMP_target_update_ext,
GOMP_target_enter_exit_data): New dummy entrypoints.
* config/gcn/target.c (GOMP_target_ext, GOMP_target_data_ext,
GOMP_target_end_data, GOMP_target_update_ext,
GOMP_target_enter_exit_data): Likewise.
* testsuite/libgomp.c-c++-common/for-3.c (DO_PRAGMA, OMPTEAMS,
OMPFROM, OMPTO): Define.
(main): Remove #pragma omp target teams around all the tests.
* testsuite/libgomp.c-c++-common/target-41.c: New test.
* testsuite/libgomp.c-c++-common/target-42.c: New test.
Jakub Jelinek [Tue, 25 May 2021 10:45:15 +0000 (12:45 +0200)]
openmp: Fix reduction clause handling on teams distribute simd [PR99928]
When a directive isn't combined with worksharing-loop, it takes much
simpler clause splitting path for reduction, and that one was missing
handling of teams when combined with simd.
2021-05-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Copy reduction to teams when teams is
combined with simd and not with taskloop or for.
gcc/testsuite/
* c-c++-common/gomp/pr99928-8.c: Remove xfails from omp teams r21 and
r28 checks.
* c-c++-common/gomp/pr99928-9.c: Likewise.
* c-c++-common/gomp/pr99928-10.c: Likewise.
libgomp/
* testsuite/libgomp.c-c++-common/reduction-17.c: New test.
PR fortran/86470
* testsuite/libgomp.fortran/class-firstprivate-1.f90: New test.
* testsuite/libgomp.fortran/class-firstprivate-2.f90: New test.
* testsuite/libgomp.fortran/class-firstprivate-3.f90: New test.
gcc/testsuite/ChangeLog:
PR fortran/86470
* gfortran.dg/gomp/class-firstprivate-1.f90: New test.
* gfortran.dg/gomp/class-firstprivate-2.f90: New test.
* gfortran.dg/gomp/class-firstprivate-3.f90: New test.
* gfortran.dg/gomp/class-firstprivate-4.f90: New test.
Jakub Jelinek [Sun, 23 May 2021 11:18:10 +0000 (13:18 +0200)]
openmp: Fix up firstprivate+lastprivate clause handling [PR99928]
The C/C++ clause splitting happens very early during construct parsing,
but only the FEs later on handle possible instantiations, non-static
member handling and array section lowering.
In the OpenMP 5.0/5.1 rules, whether firstprivate is added to combined
target depends on whether it isn't also mentioned in lastprivate or map
clauses, but unfortunately I think such checks are much better done only
when the FEs perform all the above mentioned changes.
So, this patch arranges for the firstprivate clause to be copied or moved
to combined target construct (as before), but sets flags on that clause,
which tell the FE *finish_omp_clauses and the gimplifier it has been added
only conditionally and let the FEs and gimplifier DTRT for these.
2021-05-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
gcc/
* tree.h (OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET): Define.
* gimplify.c (enum gimplify_omp_var_data): Fix up
GOVD_MAP_HAS_ATTACHMENTS value, add GOVD_FIRSTPRIVATE_IMPLICIT.
(omp_lastprivate_for_combined_outer_constructs): If combined target
has GOVD_FIRSTPRIVATE_IMPLICIT set for the decl, change it to
GOVD_MAP | GOVD_SEEN.
(gimplify_scan_omp_clauses): Set GOVD_FIRSTPRIVATE_IMPLICIT for
firstprivate clauses with OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT.
(gimplify_adjust_omp_clauses): For firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT either clear that bit and
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET too, or remove it and
let it be replaced by implicit map clause.
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Set OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT
on firstprivate clause copy going to target construct, and for
target simd set also OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET bit.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): Move firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT to the end of the chain. Don't error
if a decl is mentioned both in map clause and in such firstprivate
clause unless OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET is also set.
gcc/cp/
* semantics.c (finish_omp_clauses): Move firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT to the end of the chain. Don't error
if a decl is mentioned both in map clause and in such firstprivate
clause unless OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET is also set.
gcc/testsuite/
* c-c++-common/gomp/pr99928-3.c: Remove all xfails.
* c-c++-common/gomp/pr99928-15.c: New test.
Jakub Jelinek [Sun, 23 May 2021 11:09:26 +0000 (13:09 +0200)]
openmp: Fix up handling of implicit lastprivate on outer constructs for implicit linear and lastprivate IVs [PR99928]
This patch fixes the handling of lastprivate propagation to outer combined/composite
leaf constructs from implicit linear or lastprivate clauses on simd IVs and adds missing
testsuite coverage for explicit and implicit lastprivate on simd IVs.
2021-05-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
* gimplify.c (omp_lastprivate_for_combined_outer_constructs): New
function.
(gimplify_scan_omp_clauses) <case OMP_CLAUSE_LASTPRIVATE>: Use it.
(gimplify_omp_for): Likewise.
* c-c++-common/gomp/pr99928-6.c: Remove all xfails.
* c-c++-common/gomp/pr99928-13.c: New test.
* c-c++-common/gomp/pr99928-14.c: New test.
Eric Botcazou [Fri, 21 May 2021 08:57:02 +0000 (10:57 +0200)]
Fix internal error on locally derived bit-packed array type
This is a regression present on the mainline, 11 and 10 branches,
in the form of an ICE on a locally derived bit-packed array type.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_entity) <E_Array_Type>: Process
the implementation type of a packed type implemented specially.
gcc/testsuite/
* gnat.dg/derived_type7.adb, gnat.dg/derived_type7.ads: New test.
Eric Botcazou [Fri, 21 May 2021 08:45:21 +0000 (10:45 +0200)]
Always translate Is_Pure flag into pure in C sense
Gigi has historically translated the Is_Pure flag of the front-end into
the "const" attribute of GNU C. That's correct for subprograms of pure
Ada units, but not fully exact according to the semantics of the flag.
gcc/ada/
* gcc-interface/decl.c (gnat_to_gnu_subprog_type): Always translate
the Is_Pure flag into the "pure" attribute of GNU C.
Eric Botcazou [Fri, 21 May 2021 08:40:41 +0000 (10:40 +0200)]
Fix segfault at run time on strict-alignment platforms
This fixes a regression present on the mainline and 11 branch by
restricting the problematic change dealing with bitfields whose
nomimal subtype is self-referential to the cases where the size
is really lower.
gcc/ada/
* gcc-interface/trans.c (Call_to_gnu): Restrict previous change
to bitfields whose size is not equal to the type size.
(gnat_to_gnu): Likewise.
Eric Botcazou [Fri, 21 May 2021 08:26:50 +0000 (10:26 +0200)]
Fix incorrect SLOC on instruction
This puts the missing SLOC on a statement generated by a return.
gcc/ada/
* gcc-interface/trans.c (gnat_to_gnu) <N_Simple_Return_Statement>:
Put a SLOC on the assignment from the return value to the return
object in the copy-in/copy-out case.
Jason Merrill [Thu, 20 May 2021 01:12:45 +0000 (21:12 -0400)]
c++: designated init with anonymous union [PR100489]
My patch for PR98463 added an assert that tripped on this testcase, because
we ended up with a U CONSTRUCTOR with an initializer for a, which is not a
member of U. We need to wrap the a initializer in another CONSTRUCTOR for
the anonymous union.
There was already support for this in process_init_constructor_record, but
not in process_init_constructor_union. But since this is about brace
elision, it really belongs under reshape_init rather than digest_init, so
this patch moves the handling to reshape_init_class, which also handles
unions.
PR c++/100489
gcc/cp/ChangeLog:
* decl.c (reshape_init_class): Handle designator for
member of anonymous aggregate here.
* typeck2.c (process_init_constructor_record): Not here.
Andreas Krebbel [Tue, 27 Apr 2021 08:09:06 +0000 (10:09 +0200)]
PR100281 C++: Fix SImode pointer handling
The problem appears to be triggered by two locations in the front-end
where non-POINTER_SIZE pointers aren't handled right now.
1. An assertion in strip_typedefs is triggered because the alignment
of the types don't match. This in turn is caused by creating the new
type with build_pointer_type instead of taking the type of the
original pointer into account.
2. An assertion in cp_convert_to_pointer is triggered which expects
the target type to always have POINTER_SIZE.
gcc/cp/ChangeLog:
PR c++/100281
* cvt.c (cp_convert_to_pointer): Use the size of the target
pointer type.
* tree.c (cp_build_reference_type): Call
cp_build_reference_type_for_mode with VOIDmode.
(cp_build_reference_type_for_mode): Rename from
cp_build_reference_type. Add MODE argument and invoke
build_reference_type_for_mode.
(strip_typedefs): Use build_pointer_type_for_mode and
cp_build_reference_type_for_mode for pointers and references.
gcc/ChangeLog:
PR c++/100281
* tree.c (build_reference_type_for_mode)
(build_pointer_type_for_mode): Pick pointer mode if MODE argument
is VOIDmode.
(build_reference_type, build_pointer_type): Invoke
build_*_type_for_mode with VOIDmode.
gcc/testsuite/ChangeLog:
PR c++/100281
* g++.target/s390/pr100281-1.C: New test.
* g++.target/s390/pr100281-2.C: New test.
Joern Rennecke [Thu, 20 May 2021 12:21:41 +0000 (13:21 +0100)]
libstdc++: Disable floating_to_chars.cc on 16 bit targets
This patch conditionally disables the compilation of floating_to_chars.cc
on 16 bit targets, thus fixing a build failure for these targets as
the POW10_SPLIT_2 array exceeds the maximum object size.
libstdc++-v3/
PR libstdc++/100361
* include/std/charconv (to_chars): Hide the overloads for
floating-point types for 16 bit targets.
* src/c++17/floating_to_chars.cc: Don't compile for 16 bit targets.
* testsuite/20_util/to_chars/double.cc: Run this test only on
size32plus targets.
* testsuite/20_util/to_chars/float.cc: Likewise.
* testsuite/20_util/to_chars/long_double.cc: Likewise.
Jakub Jelinek [Thu, 20 May 2021 11:30:48 +0000 (13:30 +0200)]
openmp: Handle explicit linear clause properly in combined constructs with target [PR99928]
linear clause should have the effect of firstprivate+lastprivate (or for IVs
not declared in the construct lastprivate) on outer constructs and eventually
map(tofrom:) on target when combined with it.
2021-05-20 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
* gimplify.c (gimplify_scan_omp_clauses) <case OMP_CLAUSE_LINEAR>: For
explicit linear clause when combined with target, make it map(tofrom:)
instead of no clause or firstprivate.
* c-c++-common/gomp/pr99928-4.c: Remove all xfails.
* c-c++-common/gomp/pr99928-5.c: Likewise.
Jason Merrill [Wed, 19 May 2021 21:33:21 +0000 (17:33 -0400)]
c++: _Complex template parameter [PR100634]
We were crashing because invalid_nontype_parm_type_p allowed _Complex
template parms, but convert_nontype_argument didn't know what to do for
them. Let's just disallow it, people can and should use std::complex
instead.
PR c++/100634
gcc/cp/ChangeLog:
* pt.c (invalid_nontype_parm_type_p): Return true for COMPLEX_TYPE.
Jason Merrill [Wed, 19 May 2021 20:40:24 +0000 (16:40 -0400)]
c++: ICE with using and enum [PR100659]
Here the code for 'using enum' is confused by the combination of a
using-decl and an enum that are not from 'using enum'; this CONST_DECL is
from the normal unscoped enum scoping.
PR c++/100659
gcc/cp/ChangeLog:
* cp-tree.h (CONST_DECL_USING_P): Check for null TREE_TYPE.
Jason Merrill [Tue, 18 May 2021 16:29:33 +0000 (12:29 -0400)]
c++: ICE with <=> fallback [PR100367]
Here, when genericizing lexicographical_compare_three_way, we haven't yet
walked the operands, so (a == a) still sees ADDR_EXPR <a>, but this is after
we've changed the type of a to REFERENCE_TYPE. When we try to fold (a == a)
by constexpr evaluation, the constexpr code doesn't understand trying to
take the address of a reference, and we end up crashing.
Fixed by avoiding constexpr evaluation in genericize_spaceship, by using
fold_build2 instead of build_new_op on scalar operands. Class operands
should have been expanded during parsing.
PR c++/100367
PR c++/96299
gcc/cp/ChangeLog:
* method.c (genericize_spaceship): Use fold_build2 for scalar
operands.
Christophe Lyon [Wed, 19 May 2021 14:45:54 +0000 (14:45 +0000)]
arm/testsuite: Fix testcase for PR99977
Some targets (eg arm-none-uclinuxfdpiceabi) do not support Thumb-1,
and since the testcase forces -march=armv8-m.base, we need to check
whether this option is actually supported.
Using dg-add-options arm_arch_v8m_base ensure that we pass -mthumb as
needed too.
Jakub Jelinek [Wed, 19 May 2021 07:41:55 +0000 (09:41 +0200)]
openmp: Handle lastprivate on combined target correctly [PR99928]
This patch deals with 2 issues:
1) the gimplifier couldn't differentiate between
#pragma omp parallel master
#pragma omp taskloop simd
and
#pragma omp parallel master taskloop simd
when there is a significant difference for clause handling between
the two; as master construct doesn't have any clauses, we don't currently
represent it during gimplification by an gimplification omp context at all,
so this patch makes sure we don't set OMP_PARALLEL_COMBINED on parallel master
when not combined further. If we ever add a separate master context during
gimplification, we'd use ORT_COMBINED_MASTER vs. ORT_MASTER (or MASKED probably).
2) lastprivate when combined with target should be map(tofrom:) on the target,
this change handles it only when not combined with firstprivate though, that
will need further work (similarly to linear or reduction).
2021-05-19 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
gcc/
* tree.h (OMP_MASTER_COMBINED): Define.
* gimplify.c (gimplify_scan_omp_clauses): Rewrite lastprivate
handling for outer combined/composite constructs to a loop.
Handle lastprivate on combined target.
(gimplify_expr): Formatting fix.
gcc/c/
* c-parser.c (c_parser_omp_master): Set OMP_MASTER_COMBINED on
master when combined with taskloop.
(c_parser_omp_parallel): Don't set OMP_PARALLEL_COMBINED on
parallel master when not combined with taskloop.
gcc/cp/
* parser.c (cp_parser_omp_master): Set OMP_MASTER_COMBINED on
master when combined with taskloop.
(cp_parser_omp_parallel): Don't set OMP_PARALLEL_COMBINED on
parallel master when not combined with taskloop.
gcc/testsuite/
* c-c++-common/gomp/pr99928-2.c: Remove all xfails.
* c-c++-common/gomp/pr99928-12.c: New test.
Jason Merrill [Tue, 18 May 2021 21:15:42 +0000 (17:15 -0400)]
c++: ICE with bad definition of decimal32 [PR100261]
The change to only look at the global binding for non-classes meant that
here, when dealing with decimal32 which is magically mangled like its first
non-static data member, we got a collision with the mangling for float.
Fixed by also looking up an existing binding for such magical classes.
Here we have a pack expansion of a template template parameter pack, of
which the pattern is a TEMPLATE_DECL, which strip_typedefs doesn't want to
see.
PR c++/100372
gcc/cp/ChangeLog:
* tree.c (strip_typedefs): Only look at the pattern of a
TYPE_PACK_EXPANSION if it's a type.
Jason Merrill [Tue, 18 May 2021 16:06:36 +0000 (12:06 -0400)]
c++: "perfect" implicitly deleted move [PR100644]
Here we were ignoring the template constructor because the implicit move
constructor had all perfect conversions. But CWG1402 says that an
implicitly deleted move constructor is ignored by overload resolution; we
implement that instead by preferring any other candidate in joust, to get
better diagnostics, but that means we need to handle that case here as well.
gcc/cp/ChangeLog:
PR c++/100644
* call.c (perfect_candidate_p): An implicitly deleted move
is not perfect.
Jason Merrill [Wed, 14 Apr 2021 15:24:50 +0000 (11:24 -0400)]
c++: constant expressions are evaluated [PR93314]
My GCC 11 patch for PR93314 turned off cp_unevaluated_operand while
processing an id-expression that names a non-static data member, but the
broader issue is that in general, a constant-expression is evaluated even in
an unevaluated operand.
This also fixes 100205, introduced by the earlier patch that couldn't
distinguish between the different allow_non_constant_p cases.
PR c++/100205
PR c++/93314
gcc/cp/ChangeLog:
* cp-tree.h (cp_evaluated): Add reset parm to constructor.
* parser.c (cp_parser_constant_expression): Change
allow_non_constant_p to int. Use cp_evaluated.
(cp_parser_initializer_clause): Pass 2 to allow_non_constant_p.
* semantics.c (finish_id_expression_1): Don't mess with
cp_unevaluated_operand here.
Tom de Vries [Tue, 18 May 2021 06:24:00 +0000 (08:24 +0200)]
[nvptx] Handle memmodel for atomic ops
The atomic ops in nvptx.md have memmodel arguments, which are currently
ignored.
Handle these, fixing test-case fails libgomp.c-c++-common/reduction-{5,6}.c
on volta.
Tested libgomp on x86_64-linux with nvptx accelerator.
gcc/ChangeLog:
2021-05-17 Tom de Vries <tdevries@suse.de>
PR target/100497
* config/nvptx/nvptx-protos.h (nvptx_output_atomic_insn): Declare
* config/nvptx/nvptx.c (nvptx_output_barrier)
(nvptx_output_atomic_insn): New function.
(nvptx_print_operand): Add support for 'B'.
* config/nvptx/nvptx.md: Use nvptx_output_atomic_insn for atomic
insns.
openmp: Notify team barrier of pending tasks in omp_fulfill_event
The team barrier should be notified of any new tasks that become runnable
as the result of a completing task, otherwise the barrier threads might
not resume processing available tasks, resulting in a hang.
openmp: Notify team barrier of pending tasks in omp_fulfill_event
The team barrier should be notified of any new tasks that become runnable
as the result of a completing task, otherwise the barrier threads might
not resume processing available tasks, resulting in a hang.
Jonathan Wakely [Mon, 17 May 2021 10:54:06 +0000 (11:54 +0100)]
libstdc++: Fix filesystem::path constraints for volatile [PR 100630]
The constraint check for filesystem::path construction uses
decltype(__is_path_src(declval<Source>())) which mean it considers
conversion from an rvalue. When Source is a volatile-qualified type
it cannot use is_path_src(const Unknown&) because a const lvalue
reference can only bind to a non-volatile rvalue.
Since the relevant path members all have a const Source& parameter,
the constraint should be defined in terms of declval<const Source&>(),
not declval<Source>(). This avoids the problem of volatile-qualified
rvalues, because we no longer use an rvalue at all.
libstdc++-v3/ChangeLog:
PR libstdc++/100630
* include/experimental/bits/fs_path.h (__is_constructible_from):
Test construction from a const lvalue, not an rvalue.
* testsuite/27_io/filesystem/path/construct/100630.cc: New test.
* testsuite/experimental/filesystem/path/construct/100630.cc:
New test.
libstdc++-v3/ChangeLog:
* include/bits/atomic_wait.h (__waiter::_M_do_wait_v): loop
until value change observed.
(__waiter_base::_M_laundered): New member.
(__waiter_base::_M_notify): Check _M_laundered to determine
whether to wake one or all.
(__detail::__atomic_compare): Return true if call to
__builtin_memcmp() == 0.
(__waiter_base::_S_do_spin_v): Adjust predicate.
* testsuite/29_atomics/atomic/wait_notify/100334.cc: New
test.
Alex Coplan [Tue, 27 Apr 2021 13:56:15 +0000 (14:56 +0100)]
arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]
The PR shows two ICEs with __sync_bool_compare_and_swap and
-mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and one
later on, after the CAS insn is split.
The LRA ICE occurs because the
@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern attempts to tie
two output operands together (operands 0 and 1 in the third
alternative). LRA can't handle this, since it doesn't make sense for an
insn to assign to the same operand twice.
The later (post-splitting) ICE occurs because the expansion of the
cbranchsi4_scratch insn doesn't quite go according to plan. As it
stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
attempting to pass a register (neg_bval) to use as a scratch register.
However, since the RTL template has a match_scratch here,
gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx.
Since this is all happening after RA, this is doomed to fail (and we get
an ICE about the insn not matching its constraints).
It seems that the motivation for the choice of constraints in the
atomic_compare_and_swap pattern comes from an attempt to satisfy the
constraints of the cbranchsi4_scratch insn. This insn requires the
scratch register to be the same as the input register in the case that
we use a larger negative immediate (one that satisfies J, but not L).
Of course, as noted above, LRA refuses to assign two output operands to
the same register, so this was never going to work.
The solution I'm proposing here is to collapse the alternatives to the
CAS insn (allowing the two output register operands to be matched to
different registers) and to ensure that the constraints for
cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this by
inserting a move to ensure the source and destination registers match if
necessary (i.e. in the case of large negative immediates).
Another notable change here is that we only do:
emit_move_insn (neg_bval, const1_rtx);
for non-negative immediates. This is because the ADDS instruction used in
the negative case suffices to leave a suitable value in neg_bval: if the
operands compare equal, we don't take the branch (so neg_bval will be
set by the load exclusive). Otherwise, the ADDS will leave a nonzero
value in neg_bval, which will correctly signal that the CAS has failed
when it is later negated.
gcc/ChangeLog:
PR target/99977
* config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
with negative immediates: ensure we expand cbranchsi4_scratch
correctly and ensure we satisfy its constraints.
* config/arm/sync.md
(@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): Don't
attempt to tie two output operands together with constraints;
collapse two alternatives.
(@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise.
* config/arm/thumb1.md (cbranchsi4_neg_late): New.
gcc/testsuite/ChangeLog:
PR target/99977
* gcc.target/arm/pr99977.c: New test.
Richard Biener [Mon, 17 May 2021 06:51:03 +0000 (08:51 +0200)]
Update mpfr version to 3.1.6
This updates the mpfr version to 3.1.6 which is the last bugfix
release from the 3.1.x series and avoids printing the version
is buggy but acceptable from our configury.
2021-05-17 Richard Biener <rguenther@suse.de>
contrib/ChangeLog:
* download_prerequisites: Update mpfr version to 3.1.6.
* prerequisites.md5: Update.
* prerequisites.sha512: Likewise.
Jakub Jelinek [Fri, 14 May 2021 08:17:19 +0000 (10:17 +0200)]
openmp: Add testcases to verify OpenMP 5.0 2.14 and OpenMP 5.1 2.17 rules [PR99928]
In preparation of PR99928 patch review, I've prepared testcases with clauses
that need more interesting handling on combined/composite constructs,
in particular firstprivate, lastprivate, firstprivate+lastprivate, linear
(explicit on non-iv, explicit on simd iv, implicit on simd iv, implicit on
simd iv declared in the construct), reduction (scalars, array sections of
array variables, array sections with pointer bases) and in_reduction.
OpenMP 5.0 had the wording broken for reduction, the intended rule to use
map(tofrom:) on target when combined with it was bound only on inscan modifier
presence which makes no sense, as then inscan may not be used, this has
been fixed in 5.1 and I'm just assuming 5.1 wording for that.
There are various cases where e.g. from historical or optimization reasons
GCC slightly deviates from the rules, but in most cases it is something
that shouldn't be really observable, e.g. whether
#pragma omp parallel for firstprivate(x)
is handled as
#pragma omp parallel shared(x)
#pragma omp for firstprivate(x)
or
#pragma omp parallel firstprivate(x)
#pragma omp for
shouldn't be possible to distinguish in user code. I've added FIXMEs
in the testcases about that, but maybe we just should keep it as is
(alternative would be to do it in standard compliant way and transform
into whatever we like after gimplification (e.g. early during omplower)).
Some cases we for historical reasons implement even with clauses on
constructs which in the standard don't accept them that way and then
handling those magically in omp lowering/expansion, in particular e.g.
#pragma omp parallel for firstprivate(x) lastprivate(x)
we treat as
#pragma omp parallel firstprivate(x) lastprivate(x)
#pragma omp for
even when lastprivate is not valid on parallel. Maybe one day we
could change that if we make sure we don't regress generated code quality.
I've also found a bug in OpenMP 5.0/5.1,
#pragma omp parallel sections firstprivate(x) lastprivate(x)
incorrectly says that it should be handled as
#pragma omp parallel firstprivate(x)
#pragma omp sections lastprivate(x)
which when written that way results in error; filed as
https://github.com/OpenMP/spec/issues/2758
to be fixed in OpenMP 5.2. GCC handles it the way it used to do
and users expect, so nothing to fix on the GCC side.
Also, we don't support yet in_reduction clause on target construct,
which means the -11.c testcase can't include any tests about in_reduction
handling on all the composite constructs that include target.
The work found two kinds of bugs on the GCC side, one is the known thing
that we implement still the 4.5 behavior and don't mark for
lastprivate/linear/reduction the list item as map(tofrom:) as mentioned
in PR99928. These cases are xfailed in the tests.
And another one is with r21 and r28 in -{8,9,10}.c tests - we don't add
reduction clause on teams for
#pragma omp {target ,}teams distribute simd reduction(+:r)
even when the spec says that teams shouldn't receive reduction only
when combined with loop construct.
In
make check-gcc check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} gomp.exp=pr99928*'
testing this shows:
# of expected passes 5648
# of expected failures 872
and with Tobias' patch applied:
# of expected passes 5648
# of unexpected successes 384
# of expected failures 488
2021-05-13 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
* c-c++-common/gomp/pr99928-1.c: New test.
* c-c++-common/gomp/pr99928-2.c: New test.
* c-c++-common/gomp/pr99928-3.c: New test.
* c-c++-common/gomp/pr99928-4.c: New test.
* c-c++-common/gomp/pr99928-5.c: New test.
* c-c++-common/gomp/pr99928-6.c: New test.
* c-c++-common/gomp/pr99928-7.c: New test.
* c-c++-common/gomp/pr99928-8.c: New test.
* c-c++-common/gomp/pr99928-9.c: New test.
* c-c++-common/gomp/pr99928-10.c: New test.
* c-c++-common/gomp/pr99928-11.c: New test.
Tobias Burnus [Fri, 14 May 2021 07:30:01 +0000 (09:30 +0200)]
OpenMP: Support complex/float in && and || reduction
C/C++ permit logical AND and logical OR also with floating-point or complex
arguments by doing an unequal zero comparison; the result is an 'int' with
value one or zero. Hence, those are also permitted as reduction variable,
even though it is not the most sensible thing to do.
gcc/c/ChangeLog:
* c-typeck.c (c_finish_omp_clauses): Accept float + complex
for || and && reductions.
gcc/cp/ChangeLog:
* semantics.c (finish_omp_reduction_clause): Accept float + complex
for || and && reductions.
gcc/ChangeLog:
* omp-low.c (lower_rec_input_clauses, lower_reduction_clauses): Handle
&& and || with floating-point and complex arguments.
gcc/testsuite/ChangeLog:
* gcc.dg/gomp/clause-1.c: Use 'reduction(&:..)' instead of '...(&&:..)'.
libgomp/ChangeLog:
* testsuite/libgomp.c-c++-common/reduction-1.c: New test.
* testsuite/libgomp.c-c++-common/reduction-2.c: New test.
* testsuite/libgomp.c-c++-common/reduction-3.c: New test.
Tom de Vries [Fri, 14 May 2021 07:24:47 +0000 (09:24 +0200)]
Disable SIMT for user-defined reduction
The test-case included in this patch contains this target region:
...
for (int i0 = 0 ; i0 < N0 ; i0++ )
counter_N0.i += 1;
...
When running with nvptx accelerator, the counter variable is expected to
be N0 after the region, but instead is N0 / 32. The problem is that rather
than getting the result for all warp lanes, we get it for just one lane.
This is caused by the implementation of SIMT being incomplete. It handles
regular reductions, but appearantly not user-defined reductions.
For now, handle this by disabling SIMT in this case, specifically by setting
sctx->max_vf to 1.
Tested libgomp on x86_64-linux with nvptx accelerator.
gcc/ChangeLog:
2021-05-03 Tom de Vries <tdevries@suse.de>
PR target/100321
* omp-low.c (lower_rec_input_clauses): Disable SIMT for user-defined
reduction.
libgomp/ChangeLog:
2021-05-03 Tom de Vries <tdevries@suse.de>
PR target/100321
* testsuite/libgomp.c/target-44.c: New test.
Tom de Vries [Fri, 14 May 2021 07:21:36 +0000 (09:21 +0200)]
Handle alternative IV
Consider the test-case libgomp.c/pr81778.c added in this commit, with
this core loop (note: CANARY_SIZE set to 0 for simplicity):
...
int s = 1;
#pragma omp target simd
for (int i = N - 1; i > -1; i -= s)
a[i] = 1;
...
which, given that N is 32, sets a[0..31] to 1.
On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending
on the lane that is executing) at bb entry.
So we have the following sequence:
- a[0..31] is set to 1
- i is updated to -32..-1
- D.3209 is updated to 1 (being 0 initially)
- bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates
to true
- bb6 is once more executed, which should not happen because all the elements
that needed to be handled were already handled.
- consequently, elements that should not be written are written
- with CANARY_SIZE == 0, we may run into a libgomp error:
...
libgomp: cuCtxSynchronize error: an illegal memory access was encountered
...
and with CANARY_SIZE unmodified, we run into:
...
Expected 0, got 1 at base[-961]
Aborted (core dumped)
...
The cause of this is as follows:
- because the step s is a variable rather than a constant, an alternative
IV (D.3209 in our example) is generated in expand_omp_simd, and the
loop condition is tested in terms of the alternative IV rather than
the original IV (i in our example).
- the SIMT code in expand_omp_simd works by modifying step and initial value.
- The initial value fd->loop.n1 is loaded into a variable n1, which is
modified by the SIMT code and then used there-after.
- The step fd->loop.step is loaded into a variable step, which is modified
by the SIMT code, but afterwards there are uses of both step and
fd->loop.step.
- There are uses of fd->loop.step in the alternative IV handling code,
which should use step instead.
Fix this by introducing an additional variable orig_step, which is not
modified by the SIMT code and replacing all remaining uses of fd->loop.step
by either step or orig_step.
Build on x86_64-linux with nvptx accelerator, tested libgomp.
This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200
with driver 450.66.
gcc/ChangeLog:
2020-10-02 Tom de Vries <tdevries@suse.de>
* omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of
fd->loop.step by either step or orig_step.
Tobias Burnus [Fri, 14 May 2021 07:44:19 +0000 (09:44 +0200)]
Fix fallout from merge from releases/gcc-11
Re-apply commit f963d6c79b1d52ce565c772166c3c3e1b1d0aa78 on nvptx.c:
"Remove duplicate SESE code in NVPTX backend"
The code got readded by a wrongly resolved merge conflict.
Apply to the OG11 code the change of the merge-conflict causing change:
commit da9c085ddbfe61e5954c8ec4e996240fa3a994c0
"nvptx: Fix up nvptx build against latest libstdc++ [PR100375]"
gcc/ChangeLog
* omp-sese.c (omp_sese_pseudo): Use nullptr instead of 0
as first argument of pseudo_node_t constructors.
* config/nvptx/nvptx.c (bb_pair_t, bb_pair_vec_t,
pseudo_node_t, bracket, bracket_vec_t,
bb_sese, bb_sese::~bb_sese, bb_sese::append, bb_sese::remove,
BB_SET_SESE, BB_GET_SESE, nvptx_sese_number, nvptx_sese_pseudo,
nvptx_sese_color, nvptx_find_sese): Remove again.
arm: Remove duplicate definitions from arm_mve.h (pr100419).
This patch removes several duplicated intrinsic definitions from
arm_mve.h mentioned in PR100419 and also fixes the wrong arguments
in few of intrinsics polymorphic variants.
Richard Earnshaw [Thu, 13 May 2021 10:42:58 +0000 (11:42 +0100)]
arm: correctly handle inequality comparisons against max constants [PR100563]
Normally we expect the gimple optimizers to fold away comparisons that
are always true, but at some lower optimization levels this is not
always the case, so the back-end has to be able to generate correct
code in these cases.
In this example, we have a comparison of the form
(unsigned long long) op <= ~0ULL
which, of course is always true.
Normally, in the arm back-end we handle these expansions where the
immediate cannot be handled directly by adding 1 to the constant and
then adjusting the comparison operator:
(unsigned long long) op < CONST + 1
but we cannot do that when the constant is already the largest value.
Fortunately, we observe that the comparisons we need to handle this
way are either always true or always false, so instead of forming a
comparison against the maximum value, we can replace it with a
comparison against the minimum value (which just happens to also be a
constant we can handle. So
gcc:
PR target/100563
* config/arm/arm.c (arm_canonicalize_comparison): Correctly
canonicalize DImode inequality comparisons against the
maximum integral value.
Patrick Palka [Fri, 30 Apr 2021 14:59:20 +0000 (10:59 -0400)]
libstdc++: Implement P2367 changes to avoid some list-initialization
This implements the wording changes of P2367R0 "Remove misuses of
list-initialization from Clause 24", modulo the parts that depend
on P1739R4 which we don't yet implement (due to LWG 3407).
libstdc++-v3/ChangeLog:
* include/bits/ranges_util.h (subrange::subrange): Avoid
list-initialization in delegating constructor.
* include/std/ranges (single_view): Replace implicit guide
with explicit deduction guide that decays its argument.
(_Single::operator()): Avoid CTAD when constructing the
single_view object.
(_Iota::operator()): Avoid list-initialization.
(__detail::__can_filter_view, _Filter::operator()): Likewise.
(__detail::__can_transform_view, _Transform::operator()): Likewise.
(take_view::begin): Likewise.
(__detail::__can_take_view, _Take::operator()): Likewise.
(__detail::__can_take_while_view, _TakeWhile::operator()): Likewise.
(__detail::__can_drop_view, _Drop::operator()): Likewise.
(__detail::__can_drop_while_view, _DropWhile::operator()): Likewise.
(split_view::split_view): Use views::single when initializing
_M_pattern.
(__detail::__can_split_view, _Split::operator()): Avoid
list-initialization.
(_Counted::operator()): Likewise.
* testsuite/std/ranges/p2367.cc: New test.
Jakub Jelinek [Wed, 12 May 2021 13:14:35 +0000 (15:14 +0200)]
libcpp: Fix up -fdirectives-only preprocessing of includes not ending with newline [PR100392]
If a header doesn't end with a new-line, with -fdirectives-only we right now
preprocess it as
int i = 1;# 2 "pr100392.c" 2
i.e. the line directive isn't on the next line, which means we fail to parse
it when compiling.
GCC 10 and earlier libcpp/directives-only.c had for this:
if (!pfile->state.skipping && cur != base)
{
/* If the file was not newline terminated, add rlimit, which is
guaranteed to point to a newline, to the end of our range. */
if (cur[-1] != '\n')
{
cur++;
CPP_INCREMENT_LINE (pfile, 0);
lines++;
}
cb->print_lines (lines, base, cur - base);
}
and we have the assertion
/* Files always end in a newline or carriage return. We rely on this for
character peeking safety. */
gcc_assert (buffer->rlimit[0] == '\n' || buffer->rlimit[0] == '\r');
So, this patch just does readd the more less same thing, so that we emit
a newline after the inline even when it wasn't there before.
2021-05-12 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/100392
* lex.c (cpp_directive_only_process): If buffer doesn't end with '\n',
add buffer->rlimit[0] character to the printed range and
CPP_INCREMENT_LINE and increment line_count.
* gcc.dg/cpp/pr100392.c: New test.
* gcc.dg/cpp/pr100392.h: New file.
Jakub Jelinek [Wed, 12 May 2021 08:38:35 +0000 (10:38 +0200)]
expand: Don't reuse DEBUG_EXPRs with vector type if they have different modes [PR100508]
The inliner doesn't remap DEBUG_EXPR_DECLs, so the same decls can appear
in multiple functions.
Furthermore, expansion reuses corresponding DEBUG_EXPRs too, so they again
can be reused in multiple functions.
Neither of that is a major problem, DEBUG_EXPRs are just magic value holders
and what value they stand for is independent in each function and driven by
what debug stmts or DEBUG_INSNs they are bound to.
Except for DEBUG_EXPR*s with vector types, TYPE_MODE can be either BLKmode
or some vector mode depending on whether current function's enabled ISAs
support that vector mode or not. On the following testcase, we expand it
first in foo function without AVX2 enabled and so the DEBUG_EXPR is
BLKmode, but later the same DEBUG_EXPR_DECL is used in a simd clone with
AVX2 enabled and expansion ICEs because of a mode mismatch.
The following patch fixes that by forcing recreation of a DEBUG_EXPR if
there is a mode mismatch for vector typed DEBUG_EXPR_DECL, DEBUG_EXPRs
will be still reused in between functions otherwise and within the same
function the mode should be always the same.
2021-05-12 Jakub Jelinek <jakub@redhat.com>
PR middle-end/100508
* cfgexpand.c (expand_debug_expr): For DEBUG_EXPR_DECL with vector
type, don't reuse DECL_RTL if it has different mode, instead force
creation of a new DEBUG_EXPR.
Jakub Jelinek [Tue, 11 May 2021 07:07:47 +0000 (09:07 +0200)]
openmp: Fix up taskloop reduction ICE if taskloop has no iterations [PR100471]
When a taskloop doesn't have any iterations, GOMP_taskloop* takes an early
return, doesn't create any tasks and more importantly, doesn't create
a taskgroup and doesn't register task reductions. But, the code emitted
in the callers assumes task reductions have been registered and performs
the reduction handling and task reduction unregistration. The pointer
to the task reduction private variables is reused, on input it is the alignment
and only on output it is the pointer, so in the case taskloop with no iterations
the caller attempts to dereference the alignment value as if it was a pointer
and crashes. We could in the early returns register the task reductions
only to have them looped over and unregistered in the caller, but I think
it is better to tell the caller there is nothing to task reduce and bypass
all that.
2021-05-11 Jakub Jelinek <jakub@redhat.com>
PR middle-end/100471
* omp-low.c (lower_omp_task_reductions): For OMP_TASKLOOP, if data
is 0, bypass the reduction loop including
GOMP_taskgroup_reduction_unregister call.
* taskloop.c (GOMP_taskloop): If GOMP_TASK_FLAG_REDUCTION and not
GOMP_TASK_FLAG_NOGROUP, when doing early return clear the task
reduction pointer.
* testsuite/libgomp.c/task-reduction-4.c: New test.
RISC-V: For '-march' and '-mabi' options, add 'Negative' property mentions itself.
When use multi-lib riscv-tool-chain. A bug is triggered when there are two
'-march' at command line.
riscv64-unknown-elf-gcc -march=rv32gcp -mabi=ilp32f -march=rv32gcpzp64 HelloWorld.c
/lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/lib/crt0.o: ABI is incompatible with that of the selected emulation:
target emulation `elf64-littleriscv' does not match `elf32-littleriscv'
/lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: failed to merge target specific data of file /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/lib/crt0.o
/lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/crtbegin.o: ABI is incompatible with that of the selected emulation:
target emulation `elf64-littleriscv' does not match `elf32-littleriscv'
/lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/../../../../riscv64-unknown-elf/bin/ld: failed to merge target specific data of file /lhome/gengq/riscv64-linux-ptest/lib/gcc/riscv64-unknown-elf/10.2.0/crtbegin.o
......
This patch fix it. And the DRIVER would prune the extra '-march' and '-mabi'
options and keep only the last one valid.
Jonathan Wakely [Tue, 11 May 2021 14:01:01 +0000 (15:01 +0100)]
libstdc++: Fix missing members in std::allocator<void>
The changes in 75c6a925dab5b7af9ab47c10906cb0e140261cc2 were slightly
incorrect, because the converting constructor should be noexcept, and
the POCMA and is_always_equal traits should still be present in C++20.
This fixes it, and slightly refactors the preprocessor conditions and
order of members. Also add comments explaining things.
The non-standard construct and destroy members added for PR 78052 can be
private if allocator_traits<allocator<void>> is made a friend.
libstdc++-v3/ChangeLog:
* include/bits/allocator.h (allocator<void>) [C++20]: Add
missing noexcept to constructor. Restore missing POCMA and
is_always_equal_traits.
[C++17]: Make construct and destroy members private and
declare allocator_traits as a friend.
* include/bits/memoryfwd.h (allocator_traits): Declare.
* include/ext/malloc_allocator.h (malloc_allocator::allocate):
Add nodiscard attribute. Add static assertion for LWG 3307.
* include/ext/new_allocator.h (new_allocator::allocate): Add
static assertion for LWG 3307.
* testsuite/20_util/allocator/void.cc: Check that converting
constructor is noexcept. Check for propagation traits and
size_type and difference_type. Check that pointer and
const_pointer are gone in C++20.
Jonathan Wakely [Mon, 10 May 2021 20:06:22 +0000 (21:06 +0100)]
libstdc++: Remove TODO comment
We have a comment saying to replace the simple binary_semaphore type
with std::binary_semaphore, which has been done. However, that isn't
defined on all targets. So keep the simple one here that just implements
the parts of the API needed by <stop_token>, and remove the comment
suggesting it should be replaced.
Jonathan Wakely [Mon, 10 May 2021 19:46:38 +0000 (20:46 +0100)]
libstdc++: Implement proposed resolution to LWG 3548
This has been tentatively approved by LWG. The deleter from a unique_ptr
can be moved into the shared_ptr (at least, since LWG 2802). This uses
std::forward<_Del>(__r.get_deleter()) not std::move(__r.get_deleter())
because we don't want to convert the deleter to an rvalue when _Del is
an lvalue reference type.
This also adds a missing is_move_constructible_v<D> constraint to the
shared_ptr(unique_ptr<Y, D>&&) constructor, which is inherited from the
shared_ptr(Y*, D) constructor due to the use of "equivalent to" in the
specified effects.
libstdc++-v3/ChangeLog:
* include/bits/shared_ptr_base.h (__shared_count(unique_ptr&&)):
Initialize a non-reference deleter from an rvalue, as per LWG
3548.
(__shared_ptr::_UniqCompatible): Add missing constraint.
* testsuite/20_util/shared_ptr/cons/lwg3548.cc: New test.
* testsuite/20_util/shared_ptr/cons/unique_ptr_deleter.cc: Check
constraints.
Jonathan Wakely [Mon, 10 May 2021 15:22:54 +0000 (16:22 +0100)]
libstdc++: Remove redundant -std=gnu++17 option from remaining tests
GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.
Jonathan Wakely [Mon, 10 May 2021 15:22:54 +0000 (16:22 +0100)]
libstdc++: Remove redundant -std=gnu++17 option from algorithm tests
GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.
Jonathan Wakely [Mon, 10 May 2021 15:22:53 +0000 (16:22 +0100)]
libstdc++: Remove redundant -std=gnu++17 option from containers tests
GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.
Jonathan Wakely [Mon, 10 May 2021 15:22:53 +0000 (16:22 +0100)]
libstdc++: Remove redundant -std=gnu++17 option from strings tests
GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.
Jonathan Wakely [Mon, 10 May 2021 15:22:53 +0000 (16:22 +0100)]
libstdc++: Remove redundant -std=gnu++17 option from PMR tests
GCC defaults to -std=gnu++17 now anyway, and using it explicitly in the
dg-options directive prevents running these tests with different modes
such as -std=c++17 or -std=gnu++20.