We can use the mode iterators directly with an @pattern to avoid the
need for an expander that was only there to pass the mode through.
gcc/ChangeLog:
2019-09-26 Iain Sandoe <iain@sandoe.co.uk>
* config/rs6000/darwin.md: Replace the expanders for
load_macho_picbase and reload_macho_picbase with use of '@'
in their respective define_insns.
(nonlocal_goto_receiver): Pass Pmode to gen_reload_macho_picbase.
* config/rs6000/rs6000-logue.c (rs6000_emit_prologue): Pass
Pmode to gen_load_macho_picbase.
* config/rs6000/rs6000.md: Likewise.
Richard Biener [Thu, 26 Sep 2019 16:54:51 +0000 (16:54 +0000)]
re PR tree-optimization/91896 (ICE in vect_get_vec_def_for_stmt_copy, at tree-vect-stmts.c:1687)
2019-09-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/91896
* tree-vect-loop.c (vectorizable_reduction): The single
def-use cycle optimization cannot apply when there's more
than one pattern stmt involved.
Richard Biener [Thu, 26 Sep 2019 13:52:45 +0000 (13:52 +0000)]
tree-vect-loop.c (vect_analyze_loop_operations): Also call vectorizable_reduction for vect_double_reduction_def.
2019-09-26 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (vect_analyze_loop_operations): Also call
vectorizable_reduction for vect_double_reduction_def.
(vect_transform_loop): Likewise.
(vect_create_epilog_for_reduction): Move double-reduction
PHI creation and preheader argument setting of PHIs ...
(vectorizable_reduction): ... here. Also process
vect_double_reduction_def PHIs, creating the vectorized
PHI nodes, remembering the scalar adjustment computed for
the epilogue in STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT.
Remember the original reduction code in STMT_VINFO_REDUC_CODE.
* tree-vectorizer.c (vec_info::new_stmt_vec_info):
Initialize STMT_VINFO_REDUC_CODE.
* tree-vectorizer.h (_stmt_vec_info::reduc_epilogue_adjustment): New.
(_stmt_vec_info::reduc_code): Likewise.
(STMT_VINFO_REDUC_EPILOGUE_ADJUSTMENT): Likewise.
(STMT_VINFO_REDUC_CODE): Likewise.
This patch implements some more SIMD32, but these ones have a DImode result+addend.
Apart from that there's nothing too exciting about them.
Bootstrapped and tested on arm-none-linux-gnueabihf.
* config/arm/arm.md (arm_<simd32_op>): New define_insn.
* config/arm/arm_acle.h (__smlald, __smlaldx, __smlsld, __smlsldx):
Define.
* config/arm/arm_acle.h: Define builtins for the above.
* config/arm/iterators.md (SIMD32_DIMODE): New int_iterator.
(simd32_op): Handle the above.
* config/arm/unspecs.md: Define unspecs for the above.
This patch is part of a series to implement the SIMD32 ACLE intrinsics [1].
The interesting parts implementation-wise involve adding support for setting and reading
the Q bit for saturation and the GE-bits for the packed SIMD instructions.
That will come in a later patch.
For now, this patch implements the other intrinsics that don't need anything special ;
just a mapping from arm_acle.h function to builtin to RTL expander+unspec.
I've compressed as many as I could with iterators so that we end up needing only 3
new define_insns.
Bootstrapped and tested on arm-none-linux-gnueabihf.
My recent assemble_real patch (r275873) meant that we now output
negative FP16 constants in the same way as we'd output an integer
subreg of them. This patch updates gcc.target/arm/fp16-* accordingly.
2019-09-26 Richard Sandiford <richard.sandiford@arm.com>
gcc/testsuite/
* gcc.target/arm/fp16-compile-alt-3.c: Expect (__fp16) -2.0
to be written as a negative short rather than a positive one.
* gcc.target/arm/fp16-compile-ieee-3.c: Likewise.
David Malcolm [Wed, 25 Sep 2019 19:32:44 +0000 (19:32 +0000)]
Colorize %L and %C text to match diagnostic_show_locus (PR fortran/91426)
gcc/fortran/ChangeLog:
PR fortran/91426
* error.c (curr_diagnostic): New static variable.
(gfc_report_diagnostic): New static function.
(gfc_warning): Replace call to diagnostic_report_diagnostic with
call to gfc_report_diagnostic.
(gfc_format_decoder): Colorize the text of %L and %C to match the
colorization used by diagnostic_show_locus.
(gfc_warning_now_at): Replace call to diagnostic_report_diagnostic with
call to gfc_report_diagnostic.
(gfc_warning_now): Likewise.
(gfc_warning_internal): Likewise.
(gfc_error_now): Likewise.
(gfc_fatal_error): Likewise.
(gfc_error_opt): Likewise.
(gfc_internal_error): Likewise.
Martin Jambor [Wed, 25 Sep 2019 14:24:33 +0000 (16:24 +0200)]
Remove newly unused function and variable in tree-sra
Hi,
Martin and his clang warnings discovered that I forgot to remove a
static inline function and a variable when ripping out the old IPA-SRA
from tree-sra.c and both are now unused. Thus I am doing that now
with the patch below which I will commit as obvious (after including
it in a round of a bootstrap and testing on an x86_64-linux).
[AArch64] Use implementation namespace consistently in arm_neon.h
We're somewhat inconsistent in arm_neon.h when it comes to using the implementation namespace for local
identifiers. This means things like:
#define hash_abcd 0
#define hash_e 1
#define wk 2
#include "arm_neon.h"
uint32x4_t
foo (uint32x4_t a, uint32_t b, uint32x4_t c)
{
return vsha1cq_u32 (a, b, c);
}
don't compile.
This patch fixes these issues throughout the whole of arm_neon.h
Bootstrapped and tested on aarch64-none-linux-gnu.
The advsimd-intrinsics.exp tests pass just fine.
Richard Biener [Wed, 25 Sep 2019 13:09:25 +0000 (13:09 +0000)]
re PR tree-optimization/91896 (ICE in vect_get_vec_def_for_stmt_copy, at tree-vect-stmts.c:1687)
2019-09-25 Richard Biener <rguenther@suse.de>
PR tree-optimization/91896
* tree-vect-loop.c (vectorizable_reduction): The single
def-use cycle optimization cannot apply when there's more
than one pattern stmt involved.
[AARCH64] Add support for new control bits CTR_EL0.DIC and CTR_EL0.IDC
The DCache clean & ICache invalidation requirements for instructions
to be data coherence are discoverable through new fields in CTR_EL0.
Let's support the two bits if they are enabled, the CPU core will
not execute the unnecessary DCache clean or Icache Invalidation
instructions.
[Darwin, PPC, Mode Iterators 1/n] Use mode iterators in picbase patterns.
This switches the picbase load and reload patterns to use the 'P' mode
iterator instead of writing an SI and DI pattern for each.
gcc/ChangeLog:
2019-09-24 Iain Sandoe <iain@sandoe.co.uk>
* config/rs6000/rs6000.md (load_macho_picbase_<mode>): New, using
the 'P' mode iterator, replacing the (removed) SI and DI variants.
(reload_macho_picbase_<mode>): Likewise.
[Darwin, PPC, Mode Iterators 0/n] Make iterators visible to darwin.md.
As a clean-up, we want to be able to use mode iterators in darwin.md.
This patch moves the include point for the Darwin include until after
the definition of the mode iterators and attrs. No functional change
intended.
gcc/ChangeLog:
2019-09-24 Iain Sandoe <iain@sandoe.co.uk>
* config/rs6000/rs6000.md: Move darwin.md include until
after the definition of the mode iterators.
ends up producing two distinct stores if the destination is volatile:
void bar(u64 *x)
{
*(volatile u64 *)x = 0xabcdef10abcdef10;
}
mov w1, 61200
movk w1, 0xabcd, lsl 16
str w1, [x0]
str w1, [x0, 4]
because we end up not merging the strs into an stp. It's questionable whether the use of STP is valid for volatile in the first place.
To avoid unnecessary pain in a context where it's unlikely to be performance critical [1] (use of volatile), this patch avoids this
transformation for volatile destinations, so we produce the original single STR-X.
Bootstrapped and tested on aarch64-none-linux-gnu.
[GCC][PATCH][AArch64] Update hwcap string for fp16fml in aarch64-option-extensions.def
This is a minor patch that fixes the entry for the fp16fml feature in
GCC's aarch64-option-extensions.def.
As can be seen in the Linux sources here
https://github.com/torvalds/linux/blob/master/arch/arm64/kernel/cpuinfo.c#L69
the correct string is "asimdfhm", not "asimdfml".
Cross-compiled and tested on aarch64-none-linux-gnu.
Martin Jambor [Tue, 24 Sep 2019 11:20:57 +0000 (13:20 +0200)]
[PR 91831] Copy PARM_DECLs of artificial thunks
Hi,
I am quite surprised I did not catch this before but the new
ipa-param-manipulation does not copy PARM_DECLs when creating
artificial thinks (I think it originally did but then I somehow
removed during one cleanups). Fixed by adding the capability at the
natural place. It is triggered whenever context of the PARM_DECL that
is just taken from the original function does not match the target
fndecl rather than by some constructor parameter because in such
situation it is always the correct thing to do.
Bootstrapped and tested on x86_64-linux. OK for trunk?
Thanks,
Martin
2019-09-24 Martin Jambor <mjambor@suse.cz>
PR ipa/91831
* ipa-param-manipulation.c (carry_over_param): Make a method of
ipa_param_body_adjustments, remove now unnecessary argument. Also copy
in case of a context mismatch.
(ipa_param_body_adjustments::common_initialization): Adjust call to
carry_over_param.
* ipa-param-manipulation.h (class ipa_param_body_adjustments): Add
private method carry_over_param.
Martin Jambor [Tue, 24 Sep 2019 11:16:57 +0000 (13:16 +0200)]
[PR 91832] Do not ICE on negative offsets in ipa-sra
Hi,
IPA-SRA asserts that an offset obtained from get_ref_base_and_extent
is non-negative (after it verifies it is based on a parameter). That
assumption is invalid as the testcase shows. One could probably also write a
testcase with defined behavior, but unless I see a reasonable one
where the transformation is really desirable, I'd like to just punt on
those cases.
Bootstrapped and tested on x86_64-linux. OK for trunk?
Thanks,
Martin
2019-09-24 Martin Jambor <mjambor@suse.cz>
PR ipa/91832
* ipa-sra.c (scan_expr_access): Check that offset is non-negative.
Jonathan Wakely [Tue, 24 Sep 2019 10:09:18 +0000 (11:09 +0100)]
PR libstdc++/91871 fix Clang warnings in testsuite
PR libstdc++/91871
* testsuite/util/testsuite_hooks.h
(conversion::iterator_to_const_iterator()): Do not return an invalid
iterator. Test direct-initialization and direct-list-initialization
as well as implicit conversion.
GNAT/testsuite: Pass the `ada' option to target compilation
Pass the `ada' option to DejaGNU's `target_compile' procedure, which by
default calls `default_target_compile', so that it arranges for an Ada
compilation rather the default of C. We set the compiler to `gnatmake'
manually here, so that part of the logic in `default_target_compile' is
not used, but it affects other settings, such as the use of `adaflags'.
gcc/testsuite/
* lib/gnat.exp (gnat_target_compile): Pass the `ada' option to
`target_compile'.
Jason Merrill [Mon, 23 Sep 2019 17:48:00 +0000 (13:48 -0400)]
PR c++/91809 - bit-field and ellipsis.
decay_conversion converts a bit-field access to its declared type, which
isn't what we want here; it even has a comment that the caller is expected
to have already used default_conversion to perform integral promotion. This
function handles arithmetic promotion differently, but we still don't want
to call decay_conversion before that happens.
* call.c (convert_arg_to_ellipsis): Don't call decay_conversion for
arithmetic arguments.
Jonathan Wakely [Mon, 23 Sep 2019 15:54:16 +0000 (16:54 +0100)]
PR libstdc++/91788 improve codegen for std::variant<T...>::index()
If __index_type is a smaller type than size_t, then the result of
size_t(__index_type(-1)) is not equal to size_t(-1), but to an incorrect
value such as size_t(255) or size_t(65535). The old implementation of
variant<T...>::index() uses (size_t(__index_type(_M_index + 1)) - 1)
which is always correct, but generates suboptimal code for many common
cases.
When the __index_type is size_t or valueless variants are not possible
we can just return the value directly.
When the number of alternatives is sufficiently small the result of
converting the _M_index value to the corresponding signed type will be
either non-negative or -1. In those cases converting to the signed type
and then to size_t will either produce the correct positive value or
will sign extend -1 to (size_t)-1 as desired.
For the remaining case we keep the existing arithmetic operations to
ensure the correct result.
PR libstdc++/91788 (partial)
* include/std/variant (variant::index()): Improve codegen for cases
where conversion to size_t already works correctly.
Richard Biener [Mon, 23 Sep 2019 10:21:45 +0000 (10:21 +0000)]
tree-vect-loop.c (get_initial_def_for_reduction): Simplify, avoid adjusting by + 0 or * 1.
2019-09-23 Richard Biener <rguenther@suse.de>
* tree-vect-loop.c (get_initial_def_for_reduction): Simplify,
avoid adjusting by + 0 or * 1.
(vect_create_epilog_for_reduction): Get reduction code only
when necessary. Deal with adjustment_def only when necessary.
Paul Thomas [Mon, 23 Sep 2019 09:19:10 +0000 (09:19 +0000)]
re PR fortran/91729 (ICE in gfc_match_select_rank, at fortran/match.c:6586)
2019-09-23 Paul Thomas <pault@gcc.gnu.org>
PR fortran/91729
* match.c (gfc_match_select_rank): Initialise 'as' to NULL.
Check for a symtree in the selector expression before trying to
assign a value to 'as'. Revert to gfc_error and go to cleanup
after setting a MATCH_ERROR.
2019-09-23 Paul Thomas <pault@gcc.gnu.org>
PR fortran/91729
* gfortran.dg/select_rank_2.f90 : Add two more errors in foo2.
* gfortran.dg/select_rank_3.f90 : New test.
Eric Botcazou [Mon, 23 Sep 2019 08:31:52 +0000 (08:31 +0000)]
trans.c (Regular_Loop_to_gnu): Do not rotate the loop if -Og is enabled.
* gcc-interface/trans.c (Regular_Loop_to_gnu): Do not rotate the loop
if -Og is enabled.
(build_return_expr): Do not perform NRV if -Og is enabled.
(Subprogram_Body_to_gnu): Likewise.
(gnat_to_gnu) <N_Simple_Return_Statement>: Likewise.
(Handled_Sequence_Of_Statements_to_gnu): Do not inline finalizers if
-Og is enabled.
* gcc-interface/utils.c (convert_to_index_type): Return early if -Og
is enabled.
Eric Botcazou [Mon, 23 Sep 2019 08:27:40 +0000 (08:27 +0000)]
trans.c (gnat_compile_time_expr_list): New variable.
* gcc-interface/trans.c (gnat_compile_time_expr_list): New variable.
(Pragma_to_gnu): Rename local variable. Save the (first) expression
of pragma Compile_Time_{Error|Warning} for later processing.
(Compilation_Unit_to_gnu): Process the expressions saved above.
Eric Botcazou [Mon, 23 Sep 2019 08:08:08 +0000 (08:08 +0000)]
trans.c (Attribute_to_gnu): Test Can_Use_Internal_Rep on the underlying type of the node.
* gcc-interface/trans.c (Attribute_to_gnu): Test Can_Use_Internal_Rep
on the underlying type of the node.
(Call_to_gnu): Likewise with the type of the prefix.
Eric Botcazou [Mon, 23 Sep 2019 07:45:58 +0000 (07:45 +0000)]
decl.c (components_to_record): Do not reorder fields in packed record types if...
* gcc-interface/decl.c (components_to_record): Do not reorder fields
in packed record types if they contain fixed-size fields that cannot
be laid out in a packed manner.
Remove dead code for the the TARGET_LINK_STACK which is not
applicable to Darwin. Use MACHOPIC_PURE instead of a hard-wired
PIC level to determine the stub kind.
Merge common code blocks.
gcc/ChangeLog:
2019-09-22 Iain Sandoe <iain@sandoe.co.uk>
* config/rs6000/rs6000.c (machopic_output_stub): Remove dead
code. Merge code blocks with common conditionals. Use declared
macro instead of a magic number for PIC level.
Avoid adding impossible copies in ira-conflicts.c:process_reg_shuffles
If an insn requires two operands to be tied, and the input operand dies
in the insn, IRA acts as though there were a copy from the input to the
output with the same execution frequency as the insn. Allocating the
same register to the input and the output then saves the cost of a move.
If there is no such tie, but an input operand nevertheless dies
in the insn, IRA creates a similar move, but with an eighth of the
frequency. This helps to ensure that chains of instructions reuse
registers in a natural way, rather than using arbitrarily different
registers for no reason.
This heuristic seems to work well in the vast majority of cases.
However, for SVE, the handling of untied operands ends up creating
copies between dying predicate registers and vector outputs, even though
vector and predicate registers are distinct classes and can never be
tied. This is a particular problem because the dying predicate tends
to be the loop control predicate, which is used by most instructions
in a vector loop and so (rightly) has a very high allocation priority.
Any copies involving the loop predicate therefore tend to get processed
before copies involving only vector registers. The end result is that
we tend to allocate the output of the last vector instruction in a loop
ahead of its natural place in the allocation order and don't benefit
from chains created between vector registers.
This patch tries to avoid the problem by not adding register shuffle
copies if there appears to be no chance that the two operands could be
allocated to the same register.
2019-09-21 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* ira-conflicts.c (can_use_same_reg_p): New function.
(process_reg_shuffles): Take an insn parameter. Ignore cases
in which input operand op_num could seemingly never be allocated
to the same register as the destination.
(add_insn_allocno_copies): Update call to process_reg_shuffles.
Extend neg_const_int simplifications to other const rtxes
This patch generalises some neg_const_int-based rtx simplifications
so that they handle all CONST_SCALAR_INTs and also CONST_POLY_INT.
This actually simplifies things a bit, since we no longer have
to treat HOST_WIDE_INT_MIN specially.
This is tested by later SVE patches.
2019-09-21 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* simplify-rtx.c (neg_const_int): Replace with...
(neg_poly_int_rtx): ...this new function.
(simplify_binary_operation_1): Extend (minus x C) -> (plus X -C)
to all CONST_SCALAR_INTs and to CONST_POLY_INT.
(simplify_plus_minus): Likewise for constant terms here.
This fails at m32 because the scan-asm is looking for an absence
of "ret". Darwin is generating the correct code for the function
but the picbase thunk has a 'ret' insn. Fixed by making the test
use -mdynamic-no-pic for m32.
gcc/testsuite/ChangeLog:
2019-09-20 Iain Sandoe <iain@sandoe.co.uk>
* gcc.target/i386/naked-1.c: Alter options to use non-
PIC codegen for m32 Darwin.
PR fortran/78260
* openmp.c (gfc_resolve_oacc_declare): Reject all
non variables but accept function result variables.
* trans-openmp.c (gfc_trans_omp_clauses): Handle
function-result variables for remaing cases.
PR fortran/78260
* gfortran.dg/goacc/parameter.f95: Change
dg-error as it is now detected earlier.
* gfortran.dg/goacc/pr85701.f90: Modify to
use a separate result variable.
* gfortran.dg/goacc/pr78260.f90: New.
* gfortran.dg/goacc/pr78260-2.f90: New.
* gfortran.dg/gomp/pr78260.f90: New.
* gfortran.dg/gomp/pr78260-2.f90: New.
* gfortran.dg/gomp/pr78260-3.f90: New.
Richard Biener [Fri, 20 Sep 2019 11:14:34 +0000 (11:14 +0000)]
re PR target/91814 (ICE in elimination_costs_in_insn, at reload1.c:3549 since r274926)
2019-09-20 Richard Biener <rguenther@suse.de>
Uros Bizjak <ubizjak@gmail.com>
PR target/91814
* config/i386/i386-features.c (gen_gpr_to_xmm_move_src): Revert
previous change.
(general_scalar_chain::convert_op): Force not suitable memory
operands to a register.
Eric Botcazou [Fri, 20 Sep 2019 09:11:20 +0000 (09:11 +0000)]
re PR c/91815 (questionable error on type definition at file scope)
PR c/91815
* c-decl.c (pushdecl): In C detect duplicate declarations across scopes
of identifiers in the external scope only for variables and functions.
Richard Biener [Fri, 20 Sep 2019 06:42:39 +0000 (06:42 +0000)]
re PR target/91767 (After r274953, clang-compiled xgcc segfaults during RTL pass: stv)
2019-09-20 Richard Biener <rguenther@suse.de>
PR target/91767
* config/i386/i386-features.c (general_scalar_chain::convert_registers):
Ensure there's a sequence point between allocating the new register
and passing a reference to a reg via regno_reg_rtx.