re PR middle-end/70626 (bogus results in 'acc parallel loop' reductions)
gcc/c-family/
PR middle-end/70626
* c-common.h (c_oacc_split_loop_clauses): Add boolean argument.
* c-omp.c (c_oacc_split_loop_clauses): Use it to duplicate
reduction clauses in acc parallel loops.
gcc/c/
PR middle-end/70626
* c-parser.c (c_parser_oacc_loop): Don't augment mask with
OACC_LOOP_CLAUSE_MASK.
(c_parser_oacc_kernels_parallel): Update call to
c_oacc_split_loop_clauses.
gcc/cp/
PR middle-end/70626
* parser.c (cp_parser_oacc_loop): Don't augment mask with
OACC_LOOP_CLAUSE_MASK.
(cp_parser_oacc_kernels_parallel): Update call to
c_oacc_split_loop_clauses.
gcc/fortran/
PR middle-end/70626
* trans-openmp.c (gfc_trans_oacc_combined_directive): Duplicate
the reduction clause in both parallel and loop directives.
gcc/testsuite/
PR middle-end/70626
* c-c++-common/goacc/combined-reduction.c: New test.
* gfortran.dg/goacc/reduction-2.f95: Add check for kernels reductions.
libgomp/
PR middle-end/70626
* testsuite/libgomp.oacc-c++/template-reduction.C: Adjust test.
* testsuite/libgomp.oacc-c-c++-common/combined-reduction.c: New test.
* testsuite/libgomp.oacc-fortran/combined-reduction.f90: New test.
tree-vect-loop.c (vect_transform_loop): Fix nb_iterations_upper_bound computation for vectorized loop.
gcc/
* tree-vect-loop.c (vect_transform_loop): Fix
nb_iterations_upper_bound computation for vectorized loop.
gcc/testsuite/
* gcc.target/i386/vect-unpack-2.c (avx512bw_test): Avoid
optimization of vector loop.
* gcc.target/i386/vect-unpack-3.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-1.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-2.c: New test.
* gcc.dg/vect/vect-nb-iter-ub-3.c: New test.
Andreas Krebbel [Fri, 29 Apr 2016 09:17:35 +0000 (09:17 +0000)]
S/390: Replace LDER with LDR.
For performance reasons it is important to write the full 64 bits of
an FPR target reg even when dealing with 32 bit values. So we chose
lder over ler for 32 bit float register moves. lder zero-extends the
32 bit value from the source reg to 64 bit in the target. However,
since it actually doesn't matter whether we write the upper 32 bits
with zeros or with any other garbage we can also use ldr instead. It
is bit shorter and therefore will do good for I-Cache usage.
gcc/ChangeLog:
2016-04-29 Andreas Krebbel <krebbel@linux.vnet.ibm.com>
This fixes an issue with the long displacement memory address
constraints S and T. These were defined to only accept long
displacement addresses. This is wrong since a memory constraint must
not reject an address with a 0 displacement. Reload relies on being
able to turn an invalid memory address into a valid one by reloading
the address into a base register. The S and T constraints would
reject such an address.
This isn't really a problem for the backend since we used the
constraints with that knowledge there but it is a problem for people
writing inline assemblies.
gcc/ChangeLog:
2016-04-29 Ulrich Weigand <uweigand@de.ibm.com>
* config/s390/constraints.md ("U", "W"): Invoke
s390_mem_constraint with "ZR" and "ZT".
* config/s390/s390.c (s390_check_qrst_address): Reject invalid
addresses when using LRA. Accept also short displacements for S
and T constraints. Do not check for long displacement target for
S and T constraints.
(s390_mem_constraint): Remove handling of U and W constraints.
* config/s390/s390.md (various patterns): Remove the short
displacement constraints (Q and R) if a long displacement
constraint is present. Add longdisp as required CPU capability.
* config/s390/vector.md: Likewise.
* config/s390/vx-builtins.md: Likewise.
[ARC] Fix unwanted match for sign extend 16-bit constant.
The combine pass may conclude umulhisi3_imm pattern can accept also sign
extended 16-bit constants. This patch prohibits the combine in considering
this pattern as suitable.
Richard Biener [Fri, 29 Apr 2016 08:36:49 +0000 (08:36 +0000)]
re PR tree-optimization/13962 ([tree-ssa] make "fold" use alias information to optimize pointer comparisons)
2016-04-29 Richard Biener <rguenther@suse.de>
PR tree-optimization/13962
PR tree-optimization/65686
* tree-ssa-alias.h (ptrs_compare_unequal): Declare.
* tree-ssa-alias.c (ptrs_compare_unequal): New function
using PTA to compare pointers.
* match.pd: Add pattern for pointer equality compare simplification
using ptrs_compare_unequal.
i386.md (Load+RegOp to Mov+MemOp peephole2): Use SWI mode iterator.
* config/i386/i386.md (Load+RegOp to Mov+MemOp peephole2):
Use SWI mode iterator. Use general_reg_operand predicate.
(Load+RegOp to Mov+MemOp peephole2 with vector regs): Split
peephole to MMX and SSE part. Use mmx_reg_operand and sse_reg_operand
predicates.
Ian Lance Taylor [Thu, 28 Apr 2016 19:12:13 +0000 (19:12 +0000)]
compiler: Export String_index_expression.
Exports String_index_expression and adds the getter `string` that
returns the underlying string. This will be used to handle string
indexing different from array indexing in escape analysis.
i386.md (peephole2s for operations with memory inputs): Use SWI mode iterator.
* config/i386/i386.md (peephole2s for operations with memory inputs):
Use SWI mode iterator.
(peephole2s for operations with memory outputs): Ditto.
Do not check for stack checking probe.
Jason Merrill [Thu, 28 Apr 2016 19:01:13 +0000 (15:01 -0400)]
cvt.c (cp_get_callee): New.
* cvt.c (cp_get_callee): New.
* constexpr.c (get_function_named_in_call): Use it.
* cxx-pretty-print.c (postfix_expression): Use it.
* except.c (check_noexcept_r): Use it.
* method.c (check_nontriv): Use it.
* tree.c (build_aggr_init_expr): Use it.
* cp-tree.h: Declare it.
In r193072 sbitmap_popcount was removed, so we cannot ask for the popcount
of an sbitmap anymore. Nothing calls sbitmap_alloc_with_popcount either.
This patch removes everything else popcount-related from sbitmap.
* cfganal.c (bitmap_intersection_of_succs): Delete assert checking
dst->popcount.
(bitmap_intersection_of_preds): Ditto.
(bitmap_union_of_succs): Ditto.
(bitmap_union_of_preds): Ditto.
* sbitmap.c (do_popcount): Delete.
(BITMAP_DEBUGGING): Delete.
(sbitmap_verify_popcount): Delete.
(sbitmap_alloc): Don't initialize the popcount field.
(sbitmap_alloc_with_popcount): Delete.
(sbitmap_resize): Don't resize the popcount array.
(sbitmap_vector_alloc): Don't initialize the popcount field.
(bitmap_copy): Don't copy the popcount array.
(bitmap_clear): Don't clear the popcount array.
(bitmap_clear): Delete the popcount array handling.
(bitmap_ior_and_compl): Delete the popcount assert.
(bitmap_not): Ditto.
(bitmap_and_compl): Ditto.
(bitmap_and): Delete the popcount array handling.
(bitmap_xor): Ditto.
(bitmap_ior): Ditto.
(bitmap_or_and): Delete the popcount assert.
(bitmap_and_or): Ditto.
(popcount_table): Delete.
(sbitmap_elt_popcount): Delete.
* sbitmap.h (simple_bitmap_def): Delete the popcount field.
(bitmap_set_bit): Delete the popcount assert.
(bitmap_clear_bit): Ditto.
(sbitmap_free): Don't free the popcount array.
(sbitmap_alloc_with_popcount): Delete declaration.
(sbitmap_popcount): Ditto.
Jakub Jelinek [Thu, 28 Apr 2016 17:10:14 +0000 (19:10 +0200)]
re PR target/70821 (x86_64: __atomic_fetch_add/sub() uses XADD rather than DECL in some cases)
PR target/70821
* config/i386/sync.md (define_peephole2 *atomic_fetch_add_cmp<mode>):
Add new peephole2 where the first insn is *mov<mode>_or instead of
*mov<mode>_internal.
Expanders do not have more elements in the operands array than declared
in the pattern. So, we cannot use operands[5] here. Instead just
declare and use another rtx.
PR target/70668
* config/nds32/nds32.md (casesi): Don't access the operands array
out of bounds.
Bill Seurer [Thu, 28 Apr 2016 16:01:52 +0000 (16:01 +0000)]
This patch adds support for the signed and unsigned int versions of the...
This patch adds support for the signed and unsigned int versions of the
vec_adde altivec builtins from the Power Architecture 64-Bit ELF V2 ABI
OpenPOWER ABI for Linux Supplement (16 July 2015 Version 1.1). There are
many of the builtins that are missing and this is the first of a series
of patches to add them.
There aren't instructions for the int versions of vec_adde so the
output code is built from other built-ins that do have instructions
which in this case is just two vec_adds with a vec_and to ensure the
carry vector is comprised of only the values 0 or 1.
The new test cases are executable tests which verify that the generated
code produces expected values. C macros were used so that the same
test case could be used for both the signed and unsigned versions. An
extra executable test case is also included to ensure that the modified
support for the __int128 versions of vec_adde is not broken. The same
test case could not be used for both int and __int128 because of some
differences in loading and storing the vectors.
Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no
regressions. Is this ok for trunk?
[gcc]
2016-04-28 Bill Seurer <seurer@linux.vnet.ibm.com>
* config/rs6000/rs6000-builtin.def (vec_adde): Change vec_adde to a
special case builtin.
* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Remove
ALTIVEC_BUILTIN_VEC_ADDE.
* config/rs6000/rs6000-c.c (altivec_resolve_overloaded_builtin): Add
support for ALTIVEC_BUILTIN_VEC_ADDE.
* config/rs6000/rs6000.c (altivec_init_builtins): Add definition
for __builtin_vec_adde.
[gcc/testsuite]
2016-04-28 Bill Seurer <seurer@linux.vnet.ibm.com>
* gcc.target/powerpc/vec-adde.c: New test.
* gcc.target/powerpc/vec-adde-int128.c: New test.
* gcc.target/i386/avx-vround-1.c: New test.
* gcc.target/i386/avx-vround-2.c: New test.
* gcc.target/i386/avx512vl-vround-1.c: New test.
* gcc.target/i386/avx512vl-vround-2.c: New test.
Martin Jambor [Thu, 28 Apr 2016 14:35:04 +0000 (16:35 +0200)]
Verify that context of local DECLs is the current function
2016-04-28 Martin Jambor <mjambor@suse.cz>
* tree-cfg.c (verify_expr): Verify that local declarations belong to
this function. Call verify_expr on MEM_REFs and bases of other
handled_components.
[ARC] Don't use drsub* instructions when selecting fpuda.
The double precision floating point assist instructions are not
implementing the reverse double subtract instruction (drsub) found in
the FPX extension.
* config/arc/arc.md (cpu_facility): Add fpx variant.
(subdf3): Prohibit use reverse sub when assist operations option
is enabled.
* config/arc/fpx.md (subdf3_insn, *dsubh_peep2_insn): Allow drsub
instructions only when FPX is enabled.
* testsuite/gcc.target/arc/trsub.c: New test.
i386.md (*fop_<mode>_1_mixed): Do not check for mult_operator when calculating "type" attribute.
* config/i386/i386.md (*fop_<mode>_1_mixed): Do not check for
mult_operator when calculating "type" attribute.
(*fop_<mode>_1_i387): Ditto.
(*fop_xf_1_i387): Ditto.
(x87 stack loads peephole2): Add "reg = op (mem, reg)" peephole2.
Use std::swap to swap operands. Use RTL expressions to generate
converted pattern.
Eduard Sanou [Thu, 28 Apr 2016 09:12:05 +0000 (09:12 +0000)]
c-common.c (get_source_date_epoch): New function...
gcc/c-family/ChangeLog:
2016-04-28 Eduard Sanou <dhole@openmailbox.org>
Matthias Klose <doko@debian.org>
* c-common.c (get_source_date_epoch): New function, gets the environment
variable SOURCE_DATE_EPOCH and parses it as long long with error
handling.
* c-common.h (get_source_date_epoch): Prototype.
* c-lex.c (c_lex_with_flags): set parse_in->source_date_epoch.
gcc/ChangeLog:
2016-04-28 Eduard Sanou <dhole@openmailbox.org>
Matthias Klose <doko@debian.org>
2016-04-28 Eduard Sanou <dhole@openmailbox.org>
Matthias Klose <doko@debian.org>
* include/cpplib.h (cpp_init_source_date_epoch): Prototype.
* init.c (cpp_init_source_date_epoch): New function.
* internal.h: Added source_date_epoch variable to struct
cpp_reader to store a reproducible date.
* macro.c (_cpp_builtin_macro_text): Set pfile->date timestamp from
pfile->source_date_epoch instead of localtime if source_date_epoch is
set, to be used for __DATE__ and __TIME__ macros to help reproducible
builds.
Patrick Palka [Wed, 27 Apr 2016 21:18:05 +0000 (21:18 +0000)]
Reduce nesting of parentheses in conditionals generated by genattrtab
gcc/ChangeLog:
* genattrtab.c (write_test_expr): New parameter EMIT_PARENS
which defaults to true. Emit an outer pair of parentheses only if
EMIT_PARENS. When continuing a chain of && or || (or & or |),
don't emit parentheses for the right-hand operand.
Ryan Burn [Wed, 27 Apr 2016 20:41:52 +0000 (20:41 +0000)]
re PR c++/69024 ([cilkpus] cilk_spawn is broken for initializations with implicit conversion operators defined)
PR c++/69024
PR c++/68997
* cilk.c (cilk_ignorable_spawn_rhs_op): Change to external linkage.
(cilk_recognize_spawn): Renamed from recognize_spawn and change to
external linkage.
(cilk_detect_and_unwrap): Corresponding changes.
(extract_free_variables): Don't extract free variables from
AGGR_INIT_EXPR slot.
* c-common.h (cilk_ignorable_spawn_rhs_op): Prototype.
(cilk_recognize_spawn): Likewise.
PR c++/69024
PR c++/68997
* cp-gimplify.c (cp_gimplify_expr): Call cilk_cp_detect_spawn_and_unwrap
instead of cilk_detect_spawn_and_unwrap.
* cp-cilkplus.c (is_conversion_operator_function_decl_p): New.
(find_spawn): New.
(cilk_cp_detect_spawn_and_unwrap): New.
* lambda.c: Include cp-cilkplus.h.
* parser.c: Include cp-cilkplus.h.
* cp-tree.h (cpp_validate_cilk_plus_loop): Move prototype into...
* cp-cilkpus.h: New file.
PR c++/69024
PR c++/68997
* g++.dg/cilk-plus/CK/pr68001.cc: Fix to not depend on broken
diagnostic.
* g++.dg/cilk-plus/CK/pr69024.cc: New test.
* g++.dg/cilk-plus/CK/pr68997.cc: New test.
Co-Authored-By: Jeff Law <law@redhat.com>
From-SVN: r235534
gcc/
* config/aarch64/aarch64.md
(*movhf_aarch64): Add "movi %0, #0" to zero up register and
remove the "fp" attributes.
(*movsf_aarch64): Add "movi %0, #0" to zero up register and
add the "simd" attributes.
(*movdf_aarch64): Likewise.
(*movtf_aarch64): Remove the "fp" attributes.
* testsuite/gcc.target/aarch64/fmovf-zero-reg.c: Update accordingly.
* testsuite/gcc.target/aarch64/fmovd-zero-reg.c: Likewise.
David Malcolm [Wed, 27 Apr 2016 18:18:45 +0000 (18:18 +0000)]
df: make df_problem instances "const"
The various struct df_problem instances are constant data; mark them
as such.
gcc/ChangeLog:
* df-core.c (df_add_problem): Make the problem param be const.
(df_remove_problem): Make local "problem" be const.
* df-problems.c (problem_RD): Make const.
(problem_LR): Likewise.
(problem_LIVE): Likewise.
(problem_MIR): Likewise.
(problem_CHAIN): Likewise.
(problem_WORD_LR): Likewise.
(problem_NOTE): Likewise.
(problem_MD): Likewise.
* df-scan.c (problem_SCAN): Likewise.
* df.h (struct df_problem): Make field "dependent_problem" be
const.
(struct dataflow): Likewise for field "problem".
(df_add_problem): Make param const.
i386.c (ix86_spill_class): Enable for TARGET_SSE2 when inter-unit moves to/from vector registers are enabled.
* config/i386/i386.c (ix86_spill_class): Enable for TARGET_SSE2 when
inter-unit moves to/from vector registers are enabled. Do not disable
for TARGET_MMX.
Eric Botcazou [Wed, 27 Apr 2016 18:08:39 +0000 (18:08 +0000)]
sem_aux.adb (Is_By_Reference_Type): Also return true for a tagged incomplete type without full view.
* sem_aux.adb (Is_By_Reference_Type): Also return true for a tagged
incomplete type without full view.
* sem_ch6.adb (Exchange_Limited_Views): Change into a function and
return the list of changes.
(Restore_Limited_Views): New procedure to undo the transformation made
by Exchange_Limited_Views.
(Analyze_Subprogram_Body_Helper): Adjust call to Exchange_Limited_Views
and call Restore_Limited_Views at the end, if need be.
(Possible_Freeze): Do not delay freezing because of incomplete types.
(Process_Formals): Remove kludges for class-wide types.
* types.h (By_Copy_Return): Delete.
* gcc-interface/ada-tree.h (TYPE_MAX_ALIGN): Move around.
(TYPE_DUMMY_IN_PROFILE_P): New macro.
* gcc-interface/gigi.h (update_profiles_with): Declare.
(finish_subprog_decl): Likewise.
(get_minimal_subprog_decl): Delete.
(create_subprog_type): Likewise.
(create_param_decl): Adjust prototype.
(create_subprog_decl): Likewise.
* gcc-interface/decl.c (defer_limited_with): Rename into...
(defer_limited_with_list): ...this.
(gnat_to_gnu_entity): Adjust to above renaming.
(finalize_from_limited_with): Likewise.
(tree_entity_vec_map): New structure.
(gt_pch_nx): New helpers.
(dummy_to_subprog_map): New hash table.
(gnat_to_gnu_param): Set the SLOC here. Remove MECH parameter and
add FIRST parameter. Deal with the mechanism here instead of...
Do not make read-only variant of types. Simplify expressions.
In the by-ref case, test the mechanism before must_pass_by_ref
and also TYPE_IS_BY_REFERENCE_P before building the reference type.
(gnat_to_gnu_subprog_type): New static function extracted from...
Do not special-case the type_annotate_only mode. Call
gnat_to_gnu_profile_type instead of gnat_to_gnu_type on return type.
Deal with dummy return types. Likewise for parameter types. Deal
with by-reference types explicitly and add a kludge for null procedures
with untagged incomplete types. Remove assertion on the types and be
prepared for multiple elaboration of the declarations. Skip the whole
CICO processing if the profile is incomplete. Handle the completion of
a previously incomplete profile.
(gnat_to_gnu_entity) <E_Variable>: Rename local variable.
Adjust couple of calls to create_param_decl.
<E_Access_Subprogram_Type, E_Anonymous_Access_Subprogram_Type>:
Remove specific deferring code.
<E_Access_Type>: Also deal with E_Subprogram_Type designated type.
Simplify handling of dummy types and remove obsolete comment.
Constify a couple of variables. Do not set TYPE_UNIVERSAL_ALIASING_P
on dummy types.
<E_Access_Subtype>: Tweak comment and simplify condition.
<E_Subprogram_Type>: ...here. Call it and clean up handling. Remove
obsolete comment and adjust call to gnat_to_gnu_param. Adjust call to
create_subprog_decl.
<E_Incomplete_Type>: Add a couple of 'const' qualifiers and get rid of
inner break statements. Tidy up condition guarding direct use of the
full view.
(get_minimal_subprog_decl): Delete.
(finalize_from_limited_with): Call update_profiles_with on dummy types
with TYPE_DUMMY_IN_PROFILE_P set.
(is_from_limited_with_of_main): Delete.
(associate_subprog_with_dummy_type): New function.
(update_profile): Likewise.
(update_profiles_with): Likewise.
(gnat_to_gnu_profile_type): Likewise.
(init_gnat_decl): Initialize dummy_to_subprog_map.
(destroy_gnat_decl): Destroy dummy_to_subprog_map.
* gcc-interface/misc.c (gnat_get_alias_set): Add guard for accessing
TYPE_UNIVERSAL_ALIASING_P.
(gnat_get_array_descr_info): Minor tweak.
* gcc-interface/trans.c (gigi): Adjust calls to create_subprog_decl.
(build_raise_check): Likewise.
(Compilation_Unit_to_gnu): Likewise.
(Identifier_to_gnu): Accept mismatches coming from a limited context.
(Attribute_to_gnu): Remove kludge for dispatch table entities.
(process_freeze_entity): Do not retrieve old definition if there is an
address clause on the entity. Call update_profiles_with on dummy types
with TYPE_DUMMY_IN_PROFILE_P set.
* gcc-interface/utils.c (build_dummy_unc_pointer_types): Also set
TYPE_REFERENCE_TO to the fat pointer type.
(create_subprog_type): Delete.
(create_param_decl): Remove READONLY parameter.
(finish_subprog_decl): New function extracted from...
(create_subprog_decl): ...here. Call it. Remove CONST_FLAG and
VOLATILE_FLAG parameters and adjust.
(update_pointer_to): Also clear TYPE_REFERENCE_TO in the unconstrained
case.
David Malcolm [Wed, 27 Apr 2016 17:54:42 +0000 (17:54 +0000)]
Fix comment in rtl.def
Commit r210360 removed the first "i" field from the various instruction
nodes in rtx.def, moving it to an explicit "int insn_uid;" field
of the union "u2" within rtx_def.
Update the comment in rtl.def to reflect this change. Also, fix
a stray apostrophe.
gcc/ChangeLog:
* rtl.def: Update comment for "things in the instruction chain" to
reflect the removal of the leading "i" field for INSN_UID in
r210360. Fix bogus apostrophe.
H.J. Lu [Wed, 27 Apr 2016 17:32:40 +0000 (17:32 +0000)]
Extend STV pass to 64-bit mode
128-bit SSE load and store instructions can be used for load and store
of 128-bit integers if they are the only operations on 128-bit integers.
To convert load and store of 128-bit integers to 128-bit SSE load and
store, the original STV pass, which is designed to convert 64-bit integer
operations to SSE2 operations in 32-bit mode, is extended to 64-bit mode
in the following ways:
1. Class scalar_chain is turned into base class. The 32-bit specific
member functions are moved to the new derived class, dimode_scalar_chain.
The new derived class, timode_scalar_chain, is added to convert oad and
store of 128-bit integers to 128-bit SSE load and store.
2. Add the 64-bit version of scalar_to_vector_candidate_p and
remove_non_convertible_regs. Only TImode load and store are allowed
for conversion. If one instruction on the chain of dependent
instructions aren't TImode load or store, the chain of instructions
won't be converted.
3. In 64-bit, we only convert from TImode to V1TImode, which have the
same size. The difference is only vector registers are allowed in
TImode so that 128-bit SSE load and store instructions will be used
for load and store of 128-bit integers.
4. Put the 64-bit STV pass before the CSE pass so that instructions
changed or generated by the STV pass can be CSEed.
convert_scalars_to_vector calls free_dominance_info in 64-bit mode to
work around ICE in fwprop pass:
DWARF: turn dw_loc_descr_node field into hash map for frame offset check
As discussed on
<https://gcc.gnu.org/ml/gcc-patches/2016-02/msg01708.html>, this change
removes a field in the dw_loc_descr_node structure so we can get rid of
the CHECKING_P macro usage.
This field was used to perform consistency checks for frame offset in
DWARF procedures. As a replacement, this commit turns the "visited
nodes" set in resolve_args_picking_1 into a map that remembers for each
dw_loc_descr_node the frame offset associated to it, so that the
consistency check is still operational.
Boostrapped and regtested on x86_64-linux.
2016-04-27 Pierre-Marie de Rodat <derodat@adacore.com>
* dwarf2out.h (struct dw_loc_descr_node): Remove the
dw_loc_frame_offset field.
* dwarf2out.c (new_loc_descr): Likewise.
(resolve_args_picking_1): Turn the VISITED hash set into a
FRAME_OFFSET hash map. Use it to associate a frame offset to
visited nodes. Remove uses of the CHECKING_P macro.
(resolve_args_picking): Update call to resolve_args_picking_1.