Eric Botcazou [Wed, 26 May 2021 17:12:05 +0000 (19:12 +0200)]
Warn on type punning that toggles scalar storage order
As documented in the manual, we do not support type punning that toggles
the scalar storage order, so this adds a warning for the case of unions.
gcc/c/
PR c/100653
* c-decl.c (finish_struct): Warn for a union containing an aggregate
field with a differing scalar storage order.
gcc/testsuite/
* gcc.dg/sso-13.c: New test.
Patrick Palka [Wed, 26 May 2021 12:37:30 +0000 (08:37 -0400)]
c++: constexpr and copy elision within mem init [PR100368]
In the testcase below, the member initializer b(f()) inside C's default
constructor is encoded as a TARGET_EXPR wrapping the CALL_EXPR f() in
C++17 mode. During massaging of this constexpr constructor,
build_target_expr_with_type called from bot_manip on this initializer
tries to add an extra copy using B's implicitly deleted copy constructor
rather than just preserving the copy elision.
Since it's wrong to introduce an extra copy when initializing a
temporary from a CALL_EXPR, this patch makes build_target_expr_with_type
avoid calling force_rvalue in this case. Additionally, bot_manip should
be copying TARGET_EXPRs in a more oblivious manner, so this patch makes
bot_manip use force_target_expr instead of build_target_expr_with_type.
And since bot_manip is now no longer a caller, we can remove the void
initializer handling in build_target_expr_with_type.
PR c++/100368
gcc/cp/ChangeLog:
* tree.c (build_target_expr_with_type): Don't call force_rvalue
on CALL_EXPR initializer. Simplify now that bot_manip is no
longer a caller.
(bot_manip): Use force_target_expr instead of
build_target_expr_with_type.
Patrick Palka [Wed, 26 May 2021 12:35:31 +0000 (08:35 -0400)]
c++: Fix reference NTTP binding to noexcept fn [PR97420]
Here, in C++17 mode, convert_nontype_argument_function is rejecting
binding a non-noexcept function reference template parameter to a
noexcept function (encoded as the template argument '*(int (&) (int)) &f').
The first roadblock to making this work is that the argument is wrapped
an an implicit INDIRECT_REF, so we need to unwrap it before calling
strip_fnptr_conv.
The second roadblock is that the NOP_EXPR cast converts from a function
pointer type to a reference type while simultaneously removing the
noexcept qualification, and fnptr_conv_p doesn't consider this cast to
be a function pointer conversion. This patch fixes this by making
fnptr_conv_p treat REFERENCE_TYPEs and POINTER_TYPEs interchangeably.
Finally, in passing, this patch also simplifies noexcept_conv_p by
removing a bunch of redundant checks already performed by its only
caller fnptr_conv_p.
PR c++/97420
gcc/cp/ChangeLog:
* cvt.c (noexcept_conv_p): Remove redundant checks and simplify.
(fnptr_conv_p): Don't call non_reference. Use INDIRECT_TYPE_P
instead of TYPE_PTR_P.
* pt.c (convert_nontype_argument_function): Look through
implicit INDIRECT_REFs before calling strip_fnptr_conv.
Jakub Jelinek [Wed, 26 May 2021 09:18:07 +0000 (11:18 +0200)]
openmp: Fix up handling of target constructs in offloaded routines [PR100573]
OpenMP Nesting of Regions restrictions say:
- If a target update, target data, target enter data, or target exit data
construct is encountered during execution of a target region, the behavior is unspecified.
- If a target construct is encountered during execution of a target region and a device
clause in which the ancestor device-modifier appears is not present on the construct, the
behavior is unspecified.
That wording is about the dynamic (runtime) behavior, not about lexical nesting,
so while it is UB if omp target * is encountered in the target region, we need to make
it compile and link (for lexical nesting of target * inside of target we actually
emit a warning).
To make this work, I had to do multiple changes.
One was to mark .omp_data_{sizes,kinds}.* variables when static as "omp declare target".
Another one was to add stub GOMP_target* entrypoints to nvptx and gcn libgomp.a.
The entrypoint functions shouldn't be called or passed in the offload regions,
otherwise
libgomp: cuLaunchKernel error: too many resources requested for launch
was reported; fixed by changing those arguments of calls to GOMP_target_ext
to NULL.
And we didn't mark the entrypoints "omp target entrypoint" when the caller
has been "omp declare target".
2021-05-26 Jakub Jelinek <jakub@redhat.com>
PR libgomp/100573
gcc/
* omp-low.c: Include omp-offload.h.
(create_omp_child_function): If current_function_decl has
"omp declare target" attribute and is_gimple_omp_offloaded,
remove that attribute from the copy of attribute list and
add "omp target entrypoint" attribute instead.
(lower_omp_target): Mark .omp_data_sizes.* and .omp_data_kinds.*
variables for offloading if in omp_maybe_offloaded_ctx.
* omp-offload.c (pass_omp_target_link::execute): Nullify second
argument to GOMP_target_data_ext in offloaded code.
libgomp/
* config/nvptx/target.c (GOMP_target_ext, GOMP_target_data_ext,
GOMP_target_end_data, GOMP_target_update_ext,
GOMP_target_enter_exit_data): New dummy entrypoints.
* config/gcn/target.c (GOMP_target_ext, GOMP_target_data_ext,
GOMP_target_end_data, GOMP_target_update_ext,
GOMP_target_enter_exit_data): Likewise.
* testsuite/libgomp.c-c++-common/for-3.c (DO_PRAGMA, OMPTEAMS,
OMPFROM, OMPTO): Define.
(main): Remove #pragma omp target teams around all the tests.
* testsuite/libgomp.c-c++-common/target-41.c: New test.
* testsuite/libgomp.c-c++-common/target-42.c: New test.
Geng Qi [Wed, 26 May 2021 03:29:19 +0000 (11:29 +0800)]
C-SKY: Support fldrd/fstrd for fpuv2 and fldr.64/fstr.64 for fpuv3.
gcc/ChangeLog:
* config/csky/csky.c (ck810_legitimate_index_p): Support
"base + index" with DF mode.
* config/csky/constraints.md ("Y"): New constraint for memory operands
without index register.
* config/csky/csky_insn_fpuv2.md (fpuv3_movdf): Use "Y" instead of "m"
when mov between memory and general registers, and lower their priority.
* config/csky/csky_insn_fpuv3.md (fpuv2_movdf): Likewise.
Andrew Pinski [Sat, 22 May 2021 19:49:50 +0000 (19:49 +0000)]
Optimize x < 0 ? ~y : y to (x >> 31) ^ y in match.pd
This copies the optimization that is done in phiopt for
"x < 0 ? ~y : y to (x >> 31) ^ y" into match.pd. The code
for phiopt is kept around until phiopt uses match.pd (which
I am working towards).
Note the original testcase is now optimized early on and I added a
new testcase to optimize during phiopt.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Thanks,
Andrew Pinski
Differences from v1:
V2: Add check for integeral type to make sure vector types are not done.
gcc:
* match.pd (x < 0 ? ~y : y): New patterns.
gcc/testsuite:
* gcc.dg/tree-ssa/pr96928.c: Update test for slightly different IR.
* gcc.dg/tree-ssa/pr96928-1.c: New testcase.
Andrew Pinski [Sun, 16 May 2021 20:07:06 +0000 (13:07 -0700)]
Add a couple of A?CST1:CST2 match and simplify optimizations
Instead of some of the more manual optimizations inside phi-opt,
it would be good idea to do a lot of the heavy lifting inside match
and simplify instead. In the process, this moves the three simple
A?CST1:CST2 (where CST1 or CST2 is zero) simplifications.
OK? Boostrapped and tested on x86_64-linux-gnu with no regressions.
Differences from V1:
* Use bit_xor 1 instead of bit_not to fix the problem with boolean types
which are not 1 bit precision.
Thanks,
Andrew Pinski
gcc:
* match.pd (A?CST1:CST2): Add simplifcations for A?0:+-1, A?+-1:0,
A?POW2:0 and A?0:POW2.
Andrew MacLeod [Tue, 25 May 2021 18:55:04 +0000 (14:55 -0400)]
Adjust fur_source internal api to use gori_compute not ranger_cache.
In order to access the dependencies, the FoldUsingRange source API class
stored a range_cache.. THis is now contained in the base gori_compute class,
so use that now.
* gimple-range.cc (fold_using_range::range_of_range_op): Use m_gori
intead of m_cache.
(fold_using_range::range_of_address): Adjust.
(fold_using_range::range_of_phi): Adjust.
* gimple-range.h (class fur_source): Adjust.
(fur_source::fur_source): Adjust.
Andrew MacLeod [Tue, 25 May 2021 18:41:16 +0000 (14:41 -0400)]
Tweak location of non-null calls. revamp ranger debug output.
range_on_entry shouldnt be checking non-null, but we sometimes should
after calling it.
change the debug output a bit.
* gimple-range.cc (gimple_ranger::range_of_expr): Non-null should be
checked only after range_of_stmt, not range_on_entry.
(gimple_ranger::range_on_entry): Check for non-null in any
predecessor block, if it is not already non-null.
(gimple_ranger::range_on_exit): DOnt check for non-null after
range on entry call.
(gimple_ranger::dump_bb): New. Split from dump.
(gimple_ranger::dump): Adjust.
* gimple-range.h (class gimple_ranger): Adjust.
Andrew MacLeod [Tue, 25 May 2021 17:53:25 +0000 (13:53 -0400)]
fully populate the export list from range_cache, not gori_compute.
Ranger wants to prepopulate all the export blocks so that it has an initial
invariant set of names. GORI consumers shouldn't be penalized for ranger
requirements. This way any gori client remains lightweight.
* gimple-range-cache.cc (ranger_cache::ranger_cache): Move initial
export cache filling to here.
* gimple-range-gori.cc (gori_compute::gori_compute) : From Here.
Aldy Hernandez [Tue, 25 May 2021 06:36:44 +0000 (08:36 +0200)]
Fix selftest for targets where short and int are the same size.
avr-elf seems to use HImode for both integer_type_node and
signed_char_type_node, which is causing the check for different sized
VARYING ranges to fail.
gcc/ChangeLog:
* value-range.cc (range_tests_legacy): Use
build_nonstandard_integer_type instead of int and short.
Jakub Jelinek [Tue, 25 May 2021 15:24:38 +0000 (17:24 +0200)]
c++: Avoid -Wunused-value false positives on nullptr passed to ellipsis [PR100666]
When passing expressions with decltype(nullptr) type with side-effects to
ellipsis, we pass (void *)0 instead, but for the side-effects evaluate them
on the lhs of a COMPOUND_EXPR. Unfortunately that means we warn about it
if the expression is a call to nodiscard marked function, even when the
result is really used, just needs to be transformed.
Fixed by adding a warning_sentinel.
2021-05-25 Jakub Jelinek <jakub@redhat.com>
PR c++/100666
* call.c (convert_arg_to_ellipsis): For expressions with NULLPTR_TYPE
and side-effects, temporarily disable -Wunused-result warning when
building COMPOUND_EXPR.
* g++.dg/cpp1z/nodiscard8.C: New test.
* g++.dg/cpp1z/nodiscard9.C: New test.
Jakub Jelinek [Tue, 25 May 2021 14:44:35 +0000 (16:44 +0200)]
c++tools: Include <cstdlib> for exit [PR100731]
This TU uses exit, but doesn't include <stdlib.h> or <cstdlib> and relies
on some other header to include it indirectly, which apparently doesn't
happen on reporter's host.
The other <c*> headers aren't guarded either and we rely on a compiler
capable of C++11, so maybe we can rely on <cstdlib> being around
unconditionally.
2021-05-25 Jakub Jelinek <jakub@redhat.com>
PR bootstrap/100731
* server.cc: Include <cstdlib>.
Martin Liska [Wed, 10 Mar 2021 14:12:31 +0000 (15:12 +0100)]
Improve global state for options.
gcc/c-family/ChangeLog:
PR tree-optimization/92860
PR target/99592
* c-attribs.c (handle_optimize_attribute): Save target node
before calling parse_optimize_options and save it in case
it changes.
* c-pragma.c (handle_pragma_target): Similarly for pragma.
(handle_pragma_pop_options): Likewise here.
Cooper Qu [Tue, 25 May 2021 12:03:48 +0000 (20:03 +0800)]
C-SKY: Fix copyright of csky-modes.def.
The incorrect copyright comment format causes build error:
builddir/source//gcc/gcc/config/csky/csky-modes.def: In function ‘void create_modes()’:
builddir/source//gcc/gcc/config/csky/csky-modes.def:1:4: error: ‘C’ was not declared in this scope
;; C-SKY extra machine modes.
^
builddir/source//gcc/gcc/config/csky/csky-modes.def:1:6: error: ‘SKY’ was not declared in this scope
;; C-SKY extra machine modes.
^
builddir/source//gcc/gcc/config/csky/csky-modes.def:2:16: error: ‘Copyright’ was not declared in this scope
;; Copyright (C) 2018-2021 Free Software Foundation, Inc.
^
builddir/source//gcc/gcc/config/csky/csky-modes.def:3:4: error: ‘Contributed’ was not declared in this scope
;; Contributed by C-SKY Microsystems and Mentor Graphics.
^
Richard Biener [Tue, 25 May 2021 08:21:41 +0000 (10:21 +0200)]
middle-end/100727 - fix call expansion with WITH_SIZE_EXPR arg
call expansion used the result of get_base_address to switch between
ABIs - with get_base_address now never returning NULL we have to
re-instantiate the check in a more explicit way. This also adjusts
mark_addressable to skip WITH_SIZE_EXPRs, consistent with how
build_fold_addr_expr handles it.
2021-05-25 Richard Biener <rguenther@suse.de>
PR middle-end/100727
* calls.c (initialize_argument_information): Explicitely test
for WITH_SIZE_EXPR.
* gimple-expr.c (mark_addressable): Skip outer WITH_SIZE_EXPR.
Jakub Jelinek [Tue, 25 May 2021 09:07:01 +0000 (11:07 +0200)]
openmp: Fix reduction clause handling on teams distribute simd [PR99928]
When a directive isn't combined with worksharing-loop, it takes much
simpler clause splitting path for reduction, and that one was missing
handling of teams when combined with simd.
2021-05-25 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Copy reduction to teams when teams is
combined with simd and not with taskloop or for.
gcc/testsuite/
* c-c++-common/gomp/pr99928-8.c: Remove xfails from omp teams r21 and
r28 checks.
* c-c++-common/gomp/pr99928-9.c: Likewise.
* c-c++-common/gomp/pr99928-10.c: Likewise.
libgomp/
* testsuite/libgomp.c-c++-common/reduction-17.c: New test.
Geng Qi [Mon, 24 May 2021 12:22:55 +0000 (20:22 +0800)]
C-SKY: Separate FRAME_POINTER_REGNUM into FRAME_POINTER_REGNUM and HARD_FRAME_POINTER_REGNUM.
gcc/ChangeLog:
* config/csky/csky.h (FRAME_POINTER_REGNUM): Use
HARD_FRAME_POINTER_REGNUM and FRAME_POINTER_REGNUM instead of
the signle definition. The signle definition may not work well
at simplify_subreg_regno().
(HARD_FRAME_POINTER_REGNUM): New.
(ELIMINABLE_REGS): Add for HARD_FRAME_POINTER_REGNUM.
* config/csky/csky.c (get_csky_live_regs, csky_can_eliminate,
csky_initial_elimination_offset, csky_expand_prologue,
csky_expand_epilogue): Add for HARD_FRAME_POINTER_REGNUM.
Geng Qi [Mon, 24 May 2021 12:22:52 +0000 (20:22 +0800)]
C-SKY: Add fpuv3 instructions and CK860 arch.
gcc/ChangeLog:
* config/csky/constraints.md ("W"): New constriant for mem operand
with base reg, index register.
("Q"): Renamed and modified "csky_valid_fpuv2_mem_operand" to
"csky_valid_mem_constraint_operand" to deal with both "Q" and "W"
constraint.
("Dv"): New constraint for const double value that can be used at
fmovi instruction.
* config/csky/csky-modes.def (HFmode): New mode.
* config/csky/csky-protos.h (csky_valid_fpuv2_mem_operand): Rename
to "csky_valid_mem_constraint_operand" and support new constraint
"W".
(csky_get_movedouble_length): New.
(fpuv3_output_move): New.
(fpuv3_const_double): New.
* config/csky/csky.c (csky_option_override): New arch CK860 with fpv3.
(decompose_csky_address): Refine.
(csky_print_operand): New "CONST_DOUBLE" operand.
(csky_output_move): Support fpv3 instructions.
(csky_get_movedouble_length): New.
(fpuv3_output_move): New.
(fpuv3_const_double): New.
(csky_emit_compare): Cover float comparsion.
(csky_emit_compare_float): Refine.
(csky_vaild_fpuv2_mem_operand): Rename to
"csky_valid_mem_constraint_operand" and support new constraint "W".
(ck860_rtx_costs): New.
(csky_rtx_costs): Add the cost calculation of CK860.
(regno_reg_class): New vregs for fpuv3.
(csky_dbx_regno): Likewise.
(csky_cpu_cpp_builtins): New builtin macro for fpuv3.
(csky_conditional_register_usage): Suporrot fpuv3.
(csky_dwarf_register_span): Suporrot fpuv3.
(csky_init_builtins, csky_mangle_type): Support "__fp16" type.
(ck810_legitimate_index_p): Support fp16.
* config/csky/csky.h (TARGET_TLS): ADD CK860.
(CSKY_VREG_P, CSKY_VREG_LO_P, CSKY_VREG_HI_P): Support fpuv3.
(TARGET_SINGLE_FPU): Support fpuv3.
(TARGET_SUPPORT_FPV3): New.
(FIRST_PSEUDO_REGISTER): Change to 202 to hold the new fpuv3 registers.
(FIXED_REGISTERS, CALL_REALLY_USED_REGISTERS, REGISTER_NAMES,
REG_CLASS_CONTENTS): Support fpuv3.
* config/csky/csky.md (movsf): Move to cksy_insn_fpu.md and refine.
(csky_movsf_fpv2): Likewise.
(ck801_movsf): Likewise.
(csky_movsf): Likewise.
(movdf): Likewise.
(csky_movdf_fpv2): Likewise.
(ck801_movdf): Likewise.
(csky_movdf): Likewise.
(movsicc): Refine. Use "comparison_operatior" instead of
"ordered_comparison_operatior".
(addsicc): Likewise.
(CSKY_FIRST_VFP3_REGNUM, CSKY_LAST_VFP3_REGNUM): New constant.
(call_value_internal_vh): New.
* config/csky/csky_cores.def (CK860): New arch and cpu.
(fpv3_hf): New.
(fpv3_hsf): New.
(fpv3_sdf): New.
(fpv3): New.
* config/csky/csky_insn_fpu.md: Refactor. Separate all float patterns
into emit-patterns and match-patterns, remain the emit-patterns here,
and move the match-patterns to csky_insn_fpuv2.md or
csky_insn_fpuv3.md.
* config/csky/csky_insn_fpuv2.md: New file for fpuv2 instructions.
* config/csky/csky_insn_fpuv3.md: New file and new patterns for fpuv3
isntructions.
* config/csky/csky_isa.def (fcr): New.
(fpv3_hi): New.
(fpv3_hf): New.
(fpv3_sf): New.
(fpv3_df): New.
(CK860): New definition for ck860.
* config/csky/csky_tables.opt (ck860): New processors ck860,
ck860f. And new arch ck860.
(fpv3_hf): New.
(fpv3_hsf): New.
(fpv3_hdf): New.
(fpv3): New.
* config/csky/predicates.md (csky_float_comparsion_operator): Delete
"geu", "gtu", "leu", "ltu", which will never appear at float comparison.
* config/csky/t-csky-elf: Support 860.
* config/csky/t-csky-linux: Likewise.
* doc/md.texi: Add "Q" and "W" constraints for C-SKY.
François Dumont [Tue, 25 Aug 2020 19:31:23 +0000 (21:31 +0200)]
libstdc++: Limit allocation on iterator insertion in Hashtable [PR 96088]
When inserting into unordered_multiset or unordered_multimap first instantiate
the node to store and compute the hash code from it to avoid a potential
intermediate key_type instantiation.
When inserting into unordered_set or unordered_map check if invoking the hash
functor with container key_type is noexcept and invoking the same hash functor
with key part of the iterator value_type can throw. In this case create a
temporary key_type instance at Hashtable level and use it to compute the hash
code. This temporary instance is moved to the final storage location if needed.
Aaron Sawdey [Wed, 3 Mar 2021 00:06:37 +0000 (18:06 -0600)]
Fusion patterns for add-logical/logical-add
This patch modifies the function in genfusion.pl for generating
the logical-logical patterns so that it can also generate the
add-logical and logical-add patterns which are very similar.
gcc/ChangeLog:
* config/rs6000/genfusion.pl (gen_logical_addsubf): Refactor to
add generation of logical-add and add-logical fusion pairs.
* config/rs6000/rs6000-cpus.def: Add new fusion to ISA 3.1 mask
and powerpc mask.
* config/rs6000/rs6000.c (rs6000_option_override_internal): Turn on
logical-add and add-logical fusion by default.
* config/rs6000/rs6000.opt: Add -mpower10-fusion-logical-add and
-mpower10-fusion-add-logical options.
* config/rs6000/fusion.md: Regenerate file.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/fusion-p10-logadd.c: New file.
Aldy Hernandez [Mon, 24 May 2021 17:57:29 +0000 (19:57 +0200)]
VARYING ranges of different sizes should not be equal.
VARYING ranges are just normal ranges that span the entire domain. Such
ranges have had end-points for a few releases now, and the fact that the
legacy code was still treating all VR_VARYING the same was an oversight.
This patch fixes the oversight to match the multi-range behavior.
gcc/ChangeLog:
* value-range.cc (irange::legacy_equal_p): Check type when
comparing VR_VARYING types.
(range_tests_legacy): Test comparing VARYING ranges of different
sizes.
Patrick Palka [Mon, 24 May 2021 19:24:44 +0000 (15:24 -0400)]
libstdc++: Fix iterator caching inside range adaptors [PR100479]
This fixes two issues with our iterator caching as described in detail
in the PR. Since we recently added the __non_propagating_cache class
template as part of r12-336 for P2328, this patch just rewrites the
problematic _CachedPosition partial specialization in terms of this
class template.
For the offset partial specialization, it's safe to propagate the cached
offset on copy/move, but we should still invalidate the cached offset in
the source object on move.
libstdc++-v3/ChangeLog:
PR libstdc++/100479
* include/std/ranges (__detail::__non_propagating_cache): Move
definition up to before that of _CachedPosition. Make base
class _Optional_base protected instead of private. Add const
overload for operator*.
(__detail::_CachedPosition): Rewrite the partial specialization
for forward ranges as a derived class of __non_propagating_cache.
Remove the size constraint on the partial specialization for
random access ranges. Add copy/move/copy-assignment/move-assignment
members to the offset partial specialization for random
access ranges that propagate the cached value but additionally
invalidate it in the source object on move.
* testsuite/std/ranges/adaptors/100479.cc: New test.
Jonathan Wakely [Mon, 24 May 2021 17:42:09 +0000 (18:42 +0100)]
libstdc++: Qualify functions used in tests
These tests rely on ADL for some functions, probably unintentionally.
The calls only work because the iterator wrappers derive from
std::iterator and so namespace std is an associated namespace.
libstdc++-v3/ChangeLog:
* testsuite/25_algorithms/inplace_merge/constrained.cc: Qualify
call to ranges::next.
* testsuite/25_algorithms/is_sorted/constrained.cc: Likewise.
* testsuite/25_algorithms/is_sorted_until/constrained.cc:
Likewise.
* testsuite/25_algorithms/swap_ranges/1.cc: Qualify call to
swap_ranges.
PR fortran/86470
* testsuite/libgomp.fortran/class-firstprivate-1.f90: New test.
* testsuite/libgomp.fortran/class-firstprivate-2.f90: New test.
* testsuite/libgomp.fortran/class-firstprivate-3.f90: New test.
gcc/testsuite/ChangeLog:
PR fortran/86470
* gfortran.dg/gomp/class-firstprivate-1.f90: New test.
* gfortran.dg/gomp/class-firstprivate-2.f90: New test.
* gfortran.dg/gomp/class-firstprivate-3.f90: New test.
* gfortran.dg/gomp/class-firstprivate-4.f90: New test.
Wilco Dijkstra [Mon, 24 May 2021 13:31:37 +0000 (14:31 +0100)]
AArch64: Enable fast shifts on Neoverse N1
Enable the fast shift feature in Neoverse N1 tuning - this means additions with
a shift left by 1-4 are as fast as addition. This improves multiply by constant
expansions, eg. x * 25 is now emitted using shifts rather than a multiply:
Wilco Dijkstra [Mon, 24 May 2021 13:23:50 +0000 (14:23 +0100)]
AArch64: Cleanup aarch64_classify_symbol
Use a GOT indirection for extern weak symbols instead of a literal - this is
the same as PIC/PIE and mirrors LLVM behaviour. Ensure PIC/PIE use the same
offset limits for symbols that don't use the GOT.
Christophe Lyon [Thu, 11 Mar 2021 11:08:49 +0000 (11:08 +0000)]
arm: Auto-vectorization for MVE: vld4/vst4
This patch enables MVE vld4/vst4 instructions for auto-vectorization.
We move the existing expanders from neon.md and enable them for MVE,
calling the respective emitter.
Christophe Lyon [Mon, 8 Mar 2021 12:23:49 +0000 (12:23 +0000)]
arm: Auto-vectorization for MVE: vld2/vst2
This patch enables MVE vld2/vst2 instructions for auto-vectorization.
We move the existing expanders from neon.md and enable them for MVE,
calling the respective emitter.
Andrew Pinski [Sun, 23 May 2021 17:35:40 +0000 (17:35 +0000)]
Fix two testcases for ssa names which are more than 1 digit
phi-opt-10.c and phi-opt-7.c both depend on currently that some ssa name
versions are one digit long which is not always correct. This fixes the
problem by detecting digits rather than just using '.'.
Committed as obvious after a bootstrap/test.
Thanks,
Andrew Pinski
gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/phi-opt-10.c: Use "\[0-9\]*" instead of '.'
when matching ssa name version.
* gcc.dg/tree-ssa/phi-opt-7.c: Likewise.
Harald Anlauf [Sun, 23 May 2021 18:51:14 +0000 (20:51 +0200)]
Fortran: fix passing return value to class(*) dummy argument
gcc/fortran/ChangeLog:
PR fortran/100551
* trans-expr.c (gfc_conv_procedure_call): Adjust check for
implicit conversion of actual argument to an unlimited polymorphic
procedure argument.
gcc/testsuite/ChangeLog:
PR fortran/100551
* gfortran.dg/pr100551.f90: New test.
Andrew Pinski [Sun, 16 May 2021 17:40:16 +0000 (10:40 -0700)]
Don't simplify (A & C) != 0 ? D : 0 for pointer types.
While rewriting part of PHI-OPT to use match-and-simplify,
I ran into a bug where this pattern in match.pd would hit
and would produce invalid gimple; a shift of a pointer type.
This just disables this simplification for pointer types similarly
to what is already done in PHI-OPT for the generic A ? D : 0 case.
OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Thanks,
Andrew Pinski
2021-5-23 Andrew Pinski <apinski@marvell.com>
gcc/
* match.pd ((A & C) != 0 ? D : 0): Limit to non pointer types.
Correct implementation of random_init() when -fcoarray=lib is given.
gcc/fortran/ChangeLog:
PR fortran/98301
* trans-decl.c (gfc_build_builtin_function_decls): Move decl.
* trans-intrinsic.c (conv_intrinsic_random_init): Use bool for
lib-call of caf_random_init instead of logical (4-byte).
* trans.h: Add tree var for random_init.
libgfortran/ChangeLog:
PR fortran/98301
* caf/libcaf.h (_gfortran_caf_random_init): New function.
* caf/single.c (_gfortran_caf_random_init): New function.
* gfortran.map: Added fndecl.
* intrinsics/random_init.f90: Implement random_init.
Thomas Schwinge [Sat, 22 May 2021 08:28:34 +0000 (10:28 +0200)]
[OpenACC privatization] Prune uninteresting/varying diagnostics in 'libgomp.oacc-fortran/privatized-ref-2.f90'
Minor fix-up for my recent commit 11b8286a83289f5b54e813f14ff56d730c3f3185
"[OpenACC privatization] Largely extend diagnostics and corresponding testsuite
coverage [PR90115]".
Aaron Sawdey [Sat, 22 May 2021 02:59:39 +0000 (21:59 -0500)]
Fix rs6000 p10 fusion patterns with old attr type names
Somehow I managed to check in a version of genfusion.pl this
afternoon that was not updated to the new insn attr type names.
Committing as obvious and to make the code match what was posted
and reviewed.
Jakub Jelinek [Fri, 21 May 2021 19:16:21 +0000 (21:16 +0200)]
openmp: Fix up firstprivate+lastprivate clause handling [PR99928]
The C/C++ clause splitting happens very early during construct parsing,
but only the FEs later on handle possible instantiations, non-static
member handling and array section lowering.
In the OpenMP 5.0/5.1 rules, whether firstprivate is added to combined
target depends on whether it isn't also mentioned in lastprivate or map
clauses, but unfortunately I think such checks are much better done only
when the FEs perform all the above mentioned changes.
So, this patch arranges for the firstprivate clause to be copied or moved
to combined target construct (as before), but sets flags on that clause,
which tell the FE *finish_omp_clauses and the gimplifier it has been added
only conditionally and let the FEs and gimplifier DTRT for these.
2021-05-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
gcc/
* tree.h (OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET): Define.
* gimplify.c (enum gimplify_omp_var_data): Fix up
GOVD_MAP_HAS_ATTACHMENTS value, add GOVD_FIRSTPRIVATE_IMPLICIT.
(omp_lastprivate_for_combined_outer_constructs): If combined target
has GOVD_FIRSTPRIVATE_IMPLICIT set for the decl, change it to
GOVD_MAP | GOVD_SEEN.
(gimplify_scan_omp_clauses): Set GOVD_FIRSTPRIVATE_IMPLICIT for
firstprivate clauses with OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT.
(gimplify_adjust_omp_clauses): For firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT either clear that bit and
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET too, or remove it and
let it be replaced by implicit map clause.
gcc/c-family/
* c-omp.c (c_omp_split_clauses): Set OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT
on firstprivate clause copy going to target construct, and for
target simd set also OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET bit.
gcc/c/
* c-typeck.c (c_finish_omp_clauses): Move firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT to the end of the chain. Don't error
if a decl is mentioned both in map clause and in such firstprivate
clause unless OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET is also set.
gcc/cp/
* semantics.c (finish_omp_clauses): Move firstprivate clauses with
OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT to the end of the chain. Don't error
if a decl is mentioned both in map clause and in such firstprivate
clause unless OMP_CLAUSE_FIRSTPRIVATE_IMPLICIT_TARGET is also set.
gcc/testsuite/
* c-c++-common/gomp/pr99928-3.c: Remove all xfails.
* c-c++-common/gomp/pr99928-15.c: New test.
Jakub Jelinek [Fri, 21 May 2021 19:13:06 +0000 (21:13 +0200)]
openmp: Fix up handling of implicit lastprivate on outer constructs for implicit linear and lastprivate IVs [PR99928]
This patch fixes the handling of lastprivate propagation to outer combined/composite
leaf constructs from implicit linear or lastprivate clauses on simd IVs and adds missing
testsuite coverage for explicit and implicit lastprivate on simd IVs.
2021-05-21 Jakub Jelinek <jakub@redhat.com>
PR middle-end/99928
* gimplify.c (omp_lastprivate_for_combined_outer_constructs): New
function.
(gimplify_scan_omp_clauses) <case OMP_CLAUSE_LASTPRIVATE>: Use it.
(gimplify_omp_for): Likewise.
* c-c++-common/gomp/pr99928-6.c: Remove all xfails.
* c-c++-common/gomp/pr99928-13.c: New test.
* c-c++-common/gomp/pr99928-14.c: New test.
gcc/
PR middle-end/90115
* omp-low.c (oacc_privatization_candidate_p): New function.
(oacc_privatization_scan_clause_chain)
(oacc_privatization_scan_decl_chain): Use it. Also
'gcc_checking_assert' that we're not seeing duplicates.
Julian Brown [Fri, 26 Feb 2021 12:34:49 +0000 (04:34 -0800)]
openacc: Add support for gang local storage allocation in shared memory [PR90115]
This patch implements a method to track the "private-ness" of
OpenACC variables declared in offload regions in gang-partitioned,
worker-partitioned or vector-partitioned modes. Variables declared
implicitly in scoped blocks and those declared "private" on enclosing
directives (e.g. "acc parallel") are both handled. Variables that are
e.g. gang-private can then be adjusted so they reside in GPU shared
memory.
The reason for doing this is twofold: correct implementation of OpenACC
semantics, and optimisation, since shared memory might be faster than
the main memory on a GPU. Handling of private variables is intimately
tied to the execution model for gangs/workers/vectors implemented by
a particular target: for current targets, we use (or on mainline, will
soon use) a broadcasting/neutering scheme.
That is sufficient for code that e.g. sets a variable in worker-single
mode and expects to use the value in worker-partitioned mode. The
difficulty (semantics-wise) comes when the user wants to do something like
an atomic operation in worker-partitioned mode and expects a worker-single
(gang private) variable to be shared across each partitioned worker.
Forcing use of shared memory for such variables makes that work properly.
In terms of implementation, the parallelism level of a given loop is
not fixed until the oaccdevlow pass in the offload compiler, so the
patch delays fixing the parallelism level of variables declared on or
within such loops until the same point. This is done by adding a new
internal UNIQUE function (OACC_PRIVATE) that lists (the address of) each
private variable as an argument, and other arguments set so as to be able
to determine the correct parallelism level to use for the listed
variables. This new internal function fits into the existing scheme for
demarcating OpenACC loops, as described in comments in the patch.
Two new target hooks are introduced: TARGET_GOACC_ADJUST_PRIVATE_DECL and
TARGET_GOACC_EXPAND_VAR_DECL. The first can tweak a variable declaration
at oaccdevlow time, and the second at expand time. The first or both
of these target hooks can be used by a given offload target, depending
on its strategy for implementing private variables.
This patch updates the TARGET_GOACC_ADJUST_PRIVATE_DECL target hook in
the AMD GCN backend to the current name and prototype. (An earlier
version of the hook was already present, but dormant.)
gcc/
PR middle-end/90115
* doc/tm.texi.in (TARGET_GOACC_EXPAND_VAR_DECL)
(TARGET_GOACC_ADJUST_PRIVATE_DECL): Add documentation hooks.
* doc/tm.texi: Regenerate.
* expr.c (expand_expr_real_1): Expand decls using the
expand_var_decl OpenACC hook if defined.
* internal-fn.c (expand_UNIQUE): Handle IFN_UNIQUE_OACC_PRIVATE.
* internal-fn.h (IFN_UNIQUE_CODES): Add OACC_PRIVATE.
* omp-low.c (omp_context): Add oacc_privatization_candidates
field.
(lower_oacc_reductions): Add PRIVATE_MARKER parameter. Insert
before fork.
(lower_oacc_head_tail): Add PRIVATE_MARKER parameter. Modify
private marker's gimple call arguments, and pass it to
lower_oacc_reductions.
(oacc_privatization_scan_clause_chain)
(oacc_privatization_scan_decl_chain, lower_oacc_private_marker):
New functions.
(lower_omp_for, lower_omp_target, lower_omp_1): Use these.
* omp-offload.c (convert.h): Include.
(oacc_loop_xform_head_tail): Treat private-variable markers like
fork/join when transforming head/tail sequences.
(struct var_decl_rewrite_info): Add struct.
(oacc_rewrite_var_decl, is_sync_builtin_call): New functions.
(execute_oacc_device_lower): Support rewriting gang-private
variables using target hook, and fix up addr_expr and var_decl
nodes afterwards.
* target.def (adjust_private_decl, expand_var_decl): New hooks.
* config/gcn/gcn-protos.h (gcn_goacc_adjust_gangprivate_decl):
Rename to...
(gcn_goacc_adjust_private_decl): ...this.
* config/gcn/gcn-tree.c (gcn_goacc_adjust_gangprivate_decl):
Rename to...
(gcn_goacc_adjust_private_decl): ...this. Add LEVEL parameter.
* config/gcn/gcn.c (TARGET_GOACC_ADJUST_GANGPRIVATE_DECL): Rename
definition using gcn_goacc_adjust_gangprivate_decl...
(TARGET_GOACC_ADJUST_PRIVATE_DECL): ...to this, using
gcn_goacc_adjust_private_decl.
* config/nvptx/nvptx.c (tree-pretty-print.h): Include.
(gang_private_shared_size): New global variable.
(gang_private_shared_align): Likewise.
(gang_private_shared_sym): Likewise.
(gang_private_shared_hmap): Likewise.
(nvptx_option_override): Initialize these.
(nvptx_file_end): Output gang_private_shared_sym.
(nvptx_goacc_adjust_private_decl, nvptx_goacc_expand_var_decl):
New functions.
(nvptx_set_current_function): Clear gang_private_shared_hmap.
(TARGET_GOACC_ADJUST_PRIVATE_DECL): Define hook.
(TARGET_GOACC_EXPAND_VAR_DECL): Likewise.
libgomp/
PR middle-end/90115
* testsuite/libgomp.oacc-c-c++-common/private-atomic-1-gang.c: New
test.
* testsuite/libgomp.oacc-fortran/private-atomic-1-gang.f90:
Likewise.
* testsuite/libgomp.oacc-fortran/private-atomic-1-worker.f90:
Likewise.
Co-Authored-By: Chung-Lin Tang <cltang@codesourcery.com> Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>
H.J. Lu [Fri, 21 May 2021 12:16:20 +0000 (05:16 -0700)]
Elide expand_constructor if move by pieces is preferred
Elide expand_constructor when the constructor is static storage and not
mostly zeros and we can move it by pieces prefer to do so since that's
usually more efficient than performing a series of stores from immediates.
2021-05-21 Richard Biener <rguenther@suse.de>
H.J. Lu <hjl.tools@gmail.com>
gcc/
PR middle-end/90773
* expr.c (expand_constructor): Elide expand_constructor if
move by pieces is preferred.
gcc/testsuite/
* gcc.target/i386/pr90773-24.c: New test.
* gcc.target/i386/pr90773-25.c: Likewise.
Kyrylo Tkachov [Fri, 21 May 2021 13:46:00 +0000 (14:46 +0100)]
aarch64: Add attributes for builtins specified in aarch64-builtins.c
Besides the builtins in aarch64-simd-builtins.def there are a number of builtins defined in aarch64-builtins.c itself.
They could also benefit from the attributes generated by aarch64_get_attributes.
However aarch64_get_attributes and its helpers are only set up to handle a aarch64_simd_builtin_datum.
This patch changes these functions to instead take a flag and mode value that are extracted from
aarch64_simd_builtin_datum.flags and aarch64_simd_builtin_datum.mode anyway.
Then the various builtin init functions in aarch64-builtins.c can pass down their own FLAG_* flags
that they want to derive attributes from.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.c (aarch64_call_properties):
Take a flag and mode value as arguments.
(aarch64_modifies_global_state_p): Likewise.
(aarch64_reads_global_state_p): Likewise.
(aarch64_could_trap_p): Likewise.
(aarch64_get_attributes): Likewise.
(aarch64_init_simd_builtins): Adjust callsite of above.
(aarch64_init_fcmla_laneq_builtins): Use aarch64_get_attributes to get
function attributes to apply to builtins.
(aarch64_init_crc32_builtins): Likewise.
(aarch64_init_builtin_rsqrt): Likewise.
Aaron Sawdey [Tue, 2 Mar 2021 23:50:52 +0000 (17:50 -0600)]
Add insn types for fusion pairs
This adds new values for insn attr type for p10 fusion. The genfusion.pl
script is modified to use them, and fusion.md regenerated to capture
the new patterns. There are also some formatting only changes to
fusion.md that apparently weren't captured after a previous commit
of genfusion.pl.
gcc/
* config/rs6000/rs6000.md (define_attr "type"): Add types for fusion.
* config/rs6000/genfusion.pl (gen_ld_cmpi_p10): Use new fusion types.
(gen_2logical): Use new fusion types.
* config/rs6000/fusion.md: Regenerate.