Kyrylo Tkachov [Wed, 20 Jun 2018 08:57:17 +0000 (08:57 +0000)]
[AArch64] Support for LDP/STP of Q-registers
This patch adds support for generating LDPs and STPs of Q-registers.
This allows for more compact code generation and makes better use of the ISA.
It's implemented in a straightforward way by allowing 16-byte modes in the
sched-fusion machinery and adding appropriate peepholes in aarch64-ldpstp.md
as well as the patterns themselves in aarch64-simd.md.
It adds a new no_ldp_stp_qregs tuning flag.
I use it to restrict the peepholes in aarch64-ldpstp.md from merging the
operations together into PARALLELs. I also use it to restrict the sched fusion
check that brings such loads and stores together. This is enough to avoid
forming the pairs when the tuning flag is set.
I didn't see any non-noise performance effect on SPEC2017 on Cortex-A72 and Cortex-A53.
* config/aarch64/aarch64-tuning-flags.def (no_ldp_stp_qregs): New.
* config/aarch64/aarch64.c (xgene1_tunings): Add
AARCH64_EXTRA_TUNE_NO_LDP_STP_QREGS to tune_flags.
(aarch64_mode_valid_for_sched_fusion_p):
Allow 16-byte modes.
(aarch64_classify_address): Allow 16-byte modes for load_store_pair_p.
* config/aarch64/aarch64-ldpstp.md: Add peepholes for LDP STP of
128-bit modes.
* config/aarch64/aarch64-simd.md (load_pair<VQ:mode><VQ2:mode>):
New pattern.
(vec_store_pair<VQ:mode><VQ2:mode>): Likewise.
* config/aarch64/iterators.md (VQ2): New mode iterator.
* gcc.target/aarch64/ldp_stp_q.c: New test.
* gcc.target/aarch64/stp_vec_128_1.c: Likewise.
* gcc.target/aarch64/ldp_stp_q_disable.c: Likewise.
[8/n] PR85694: Make patterns check for target support
This patch makes pattern recognisers do their own checking for vector
types and target support. Previously some recognisers did this
themselves and some left it to vect_pattern_recog_1.
Doing this means we can get rid of the type_in argument, which was
ignored if the recogniser did its own checking. It also means
we create fewer junk statements.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (NUM_PATTERNS, vect_recog_func_ptr): Move to
tree-vect-patterns.c.
* tree-vect-patterns.c (vect_supportable_direct_optab_p): New function.
(vect_recog_dot_prod_pattern): Use it. Remove the type_in argument.
(vect_recog_sad_pattern): Likewise.
(vect_recog_widen_sum_pattern): Likewise.
(vect_recog_pow_pattern): Likewise. Check for a null vectype.
(vect_recog_widen_shift_pattern): Remove the type_in argument.
(vect_recog_rotate_pattern): Likewise.
(vect_recog_mult_pattern): Likewise.
(vect_recog_vector_vector_shift_pattern): Likewise.
(vect_recog_divmod_pattern): Likewise.
(vect_recog_mixed_size_cond_pattern): Likewise.
(vect_recog_bool_pattern): Likewise.
(vect_recog_mask_conversion_pattern): Likewise.
(vect_try_gather_scatter_pattern): Likewise.
(vect_recog_widen_mult_pattern): Likewise. Check for a null vectype.
(vect_recog_over_widening_pattern): Likewise.
(vect_recog_gather_scatter_pattern): Likewise.
(vect_recog_func_ptr): Move from tree-vectorizer.h
(vect_vect_recog_func_ptrs): Move further down the file.
(vect_recog_func): Likewise. Remove the third argument.
(NUM_PATTERNS): Define based on vect_vect_recog_func_ptrs.
(vect_pattern_recog_1): Expect the pattern function to do any
necessary target tests. Also expect it to provide a vector type.
Remove the type_in handling.
This message is a long write-up for a patch that simply adds a common
routine for printing the "vector_foo_pattern: detected:" messages.
The reason for doing this is that some routines check for target support
themselves and some leave it to vect_pattern_recog_1. Those that leave
it to vect_pattern_recog_1 currently print these "detected:" messages if
the statements have the right form, even if the pattern is eventually
discarded. IMO that's useful, and a lot of existing scan tests rely on it.
However, a later patch makes patterns do their own testing, and stops
them creating pattern statements until the tests have passed. This means
(a) they need to print the "detected:" message earlier and (b) the pattern
statement won't be around to print.
The patch therefore makes all routines print the original statement
rather than the pattern one. That information isn't obvious otherwise,
whereas vect_pattern_recog_1 already prints the pattern statement
in the case of a successful match. This also avoids the previous
situation in which a routine could print "detected:" and then
silently bail out before saying what had been detected.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
This patch adds a helper for pattern code that wants to find an
internal (vectorisable) definition of an SSA name.
A later patch will make more use of this, and alter the definition.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_get_internal_def): New function.
(vect_recog_dot_prod_pattern, vect_recog_sad_pattern)
(vect_recog_vector_vector_shift_pattern, check_bool_pattern)
(search_type_for_mask_1): Use it.
vect_recog_dot_prod_pattern and vect_recog_sad_pattern both checked
whether the statement passed in had already been recognised as a
WIDEN_SUM_EXPR pattern. That isn't possible (any more?), since the
first recognised pattern wins, and since vect_recog_widen_sum_pattern
never matches a later statement than the one it's given.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_recog_dot_prod_pattern): Remove
redundant WIDEN_SUM_EXPR handling.
(vect_recog_sad_pattern): Likewise.
[4/n] PR85694: Remove redundant calls to types_compatible_p
tree-vect-patterns.c checked that operands to primitive arithmetic ops
are compatible with each other and with the result. The checks date
back years and have long been redundant with verify_gimple_stmt.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_recog_dot_prod_pattern): Remove
redundant check that the types of a PLUS_EXPR or MULT_EXPR agree.
(vect_recog_sad_pattern): Likewise PLUS_EXPR, ABS_EXPR and MINUS_EXPR.
(vect_recog_widen_mult_pattern): Likewise MULT_EXPR.
(vect_recog_widen_sum_pattern): Likewise PLUS_EXPR.
[3/n] PR85694: Fix dummy assignment handling in vectorizable_call
vectorizable_call stubs out the original scalar statement with
a dummy assignment to the same lhs, so that we don't leave any bogus
scalar calls around. If the call is actually a pattern statement,
the code rightly took the lhs of the original bb statement:
But it then associated the new statement with the stmt_vec_info of the
pattern statement rather than the bb statement, which meant we had two
stmt_vec_infos assigning to the same lhs. This seems to be latent at
the moment but caused problems further into the series.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-stmts.c (vectorizable_call): Make sure that we
use the stmt_vec_info of the original bb statement for the
new zero assignment, even if the call is part of a pattern.
[2/n] PR85694: Attach a DEF_SEQ only to the original statement
A pattern's PATTERN_DEF_SEQ was attached to both the original statement
and the main pattern statement, which made it harder to update later.
This patch attaches it to just the original statement. In practice,
anything that cared had ready access to both.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vectorizer.h (_stmt_vec_info): Note above pattern_def_seq
that the sequence is attached to the original statement rather
than the pattern statement.
* tree-vect-loop.c (vect_determine_vf_for_stmt): Take the
PATTERN_DEF_SEQ from the original statement rather than
the main pattern statement.
* tree-vect-stmts.c (free_stmt_vec_info): Likewise.
* tree-vect-patterns.c (vect_recog_dot_prod_pattern): Likewise.
(vect_mark_pattern_stmts): Don't copy the PATTERN_DEF_SEQ.
[1/n] PR85694: Allow pattern definition statements to be reused
This patch is the first part of a series to fix to PR85694.
Later patches can make the pattern for a statement S2 reuse the
results of a PATTERN_DEF_SEQ statement attached to an earlier
statement S1. Although vect_mark_stmts_to_be_vectorized handled
this fine, vect_analyze_stmt and vect_transform_loop both skipped the
PATTERN_DEF_SEQ for S1 if S1's main pattern wasn't live or relevant.
I couldn't wrap my head around the flow in vect_transform_loop,
so ended up moving the per-statement handling into a subroutine.
That makes the patch look bigger than it actually is.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-stmts.c (vect_analyze_stmt): Move the handling of pattern
definition statements before the early exit for statements that aren't
live or relevant.
* tree-vect-loop.c (vect_transform_loop_stmt): New function,
split out from...
(vect_transform_loop): ...here. Process pattern definition
statements without first checking whether the main pattern
statement is live or relevant.
Aaron Sawdey [Tue, 19 Jun 2018 21:23:39 +0000 (21:23 +0000)]
rs6000-string.c (select_block_compare_mode): Check TARGET_EFFICIENT_OVERLAPPING_UNALIGNED here instead of in caller.
2018-06-19 Aaron Sawdey <acsawdey@linux.ibm.com>
* config/rs6000/rs6000-string.c (select_block_compare_mode): Check
TARGET_EFFICIENT_OVERLAPPING_UNALIGNED here instead of in caller.
(do_and3, do_and3_mask, do_compb3, do_rotl3): New functions.
(expand_block_compare): Change select_block_compare_mode call.
(expand_strncmp_align_check): Use new functions, fix comment.
(emit_final_str_compare_gpr): New function.
(expand_strn_compare): Refactor and clean up code.
* config/rs6000/vsx.md (vsx_mov<mode>_64bit): Remove *.
Max Filippov [Tue, 19 Jun 2018 18:26:07 +0000 (18:26 +0000)]
xtensa: fix PR target/65416
The issue is caused by reordering of stack pointer update after stack
space allocation with instructions that write to the allocated stack
space. In windowed ABI register spill area for the previous call frame
is located just below the stack pointer and may be reloaded back into
the register file on movsp.
Implement allocate_stack pattern for windowed ABI configuration and
insert an instruction that prevents reordering of frame memory access
and stack pointer update.
gcc/
2018-06-19 Max Filippov <jcmvbkbc@gmail.com>
* config/xtensa/xtensa.md (UNSPEC_FRAME_BLOCKAGE): New unspec
constant.
(allocate_stack, frame_blockage, *frame_blockage): New patterns.
Nick Clifton [Tue, 19 Jun 2018 11:49:08 +0000 (11:49 +0000)]
Allow building of the zlib component when the building takes place in the source directory.
* zlib/configure.ac: Restore old behaviour of only enabling
multilibs when a target subdirectory is defined. This allows
building with srcdir == builddir.
* zlib/configure: Regenerate.
The existing code allows only 4 vectors worth of ieee128 homogeneous
aggregates, but it should be 8. This happens because at one spot it
is mistakenly qualified as being passed in floating point registers.
PR target/86197
* config/rs6000/rs6000.md (rs6000_discover_homogeneous_aggregate): An
ieee128 argument takes up only one (vector) register, not two (floating
point) registers.
Jonathan Wakely [Mon, 18 Jun 2018 20:17:44 +0000 (21:17 +0100)]
LWG 2975 ensure construct(pair<T,U>*, ...) used to construct pairs
* include/std/scoped_allocator (__not_pair): Define SFINAE helper.
(construct(_Tp*, _Args&&...)): Remove from overload set when _Tp is
a specialization of std::pair.
* testsuite/20_util/scoped_allocator/construct_pair.cc: Ensure
pair elements are constructed correctly.
Michael Meissner [Mon, 18 Jun 2018 19:10:08 +0000 (19:10 +0000)]
re PR target/85358 (PowerPC: Using -mabi=ieeelongdouble -mcpu=power9 breaks __ibm128)
[gcc]
2018-06-18 Michael Meissner <meissner@linux.ibm.com>
PR target/85358
* config/rs6000/rs6000-modes.def (toplevel): Rework the 128-bit
floating point modes, so that IFmode is numerically greater than
TFmode, which is greater than KFmode using FRACTIONAL_FLOAT_MODE
to declare the ordering. This prevents IFmode from being
converted to TFmode when long double is IEEE 128-bit on an ISA 3.0
machine. Include rs6000-modes.h to share the fractional values
between genmodes* and the rest of the compiler.
(IFmode): Likewise.
(KFmode): Likewise.
(TFmode): Likewise.
* config/rs6000/rs6000-modes.h: New file.
* config/rs6000/rs6000.c (rs6000_debug_reg_global): Change the
meaning of rs6000_long_double_size so that 126..128 selects an
appropriate 128-bit floating point type.
(rs6000_option_override_internal): Likewise.
* config/rs6000/rs6000.h (toplevel): Include rs6000-modes.h.
(TARGET_LONG_DOUBLE_128): Change the meaning of
rs6000_long_double_size so that 126..128 selects an appropriate
128-bit floating point type.
(LONG_DOUBLE_TYPE_SIZE): Update comment.
* config/rs6000/rs6000.md (trunciftf2): Correct the modes of the
source and destination to match the standard usage.
(truncifkf2): Likewise.
(copysign<mode>3, IEEE iterator): Rework copysign of float128 on
ISA 2.07 to use an explicit clobber, instead of passing in a
temporary.
(copysign<mode>3_soft): Likewise.
[libgcc]
2018-06-18 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/t-float128 (FP128_CFLAGS_SW): Compile float128
support modules with -mno-gnu-attribute.
* config/rs6000/t-float128-hw (FP128_CFLAGS_HW): Likewise.
Jonathan Wakely [Mon, 18 Jun 2018 18:59:44 +0000 (19:59 +0100)]
LWG 2989 hide path iostream operators from normal lookup
By only defining these operators as friends (with no namespace-scope
declaration) they can only be found by ADL and do not participate in
overload resolution for arguments of types other than path.
LWG 2989 hide path iostream operators from normal lookup
* include/bits/fs_path.h (operator<<, operator>>): Define inline as
friends.
* testsuite/27_io/filesystem/path/io/dr2989.cc: New.
Martin Sebor [Mon, 18 Jun 2018 16:32:59 +0000 (16:32 +0000)]
PR tree-optimization/81384 - built-in form of strnlen missing
gcc/ChangeLog:
PR tree-optimization/81384
* builtin-types.def (BT_FN_SIZE_CONST_STRING_SIZE): New.
* builtins.c (expand_builtin_strnlen): New function.
(expand_builtin): Call it.
(fold_builtin_n): Avoid setting TREE_NO_WARNING.
* builtins.def (BUILT_IN_STRNLEN): New.
* calls.c (maybe_warn_nonstring_arg): Handle BUILT_IN_STRNLEN.
Warn for bounds in excess of maximum object size.
* tree-ssa-strlen.c (maybe_set_strlen_range): Return tree representing
single-value ranges. Handle strnlen.
(handle_builtin_strlen): Handle strnlen.
(strlen_check_and_optimize_stmt): Same.
* doc/extend.texi (Other Builtins): Document strnlen.
gcc/testsuite/ChangeLog:
PR tree-optimization/81384
* gcc.c-torture/execute/builtins/lib/strnlen.c: New test.
* gcc.c-torture/execute/builtins/strnlen-lib.c: New test.
* gcc.c-torture/execute/builtins/strnlen.c: New test.
* gcc.dg/attr-nonstring-2.c: New test.
* gcc.dg/attr-nonstring-3.c: New test.
* gcc.dg/attr-nonstring-4.c: New test.
* gcc.dg/strlenopt-45.c: New test.
* gcc.dg/strlenopt.h (strnlen): Declare.
Jonathan Wakely [Mon, 18 Jun 2018 16:01:24 +0000 (17:01 +0100)]
Fix bootstrap failure for bare metal due to autoconf link tests
The AC_CHECK_FUNCS tests cause the build to fail for bare metal cross
compilers, where link tests are not allowed. Replace them with
GCC_TRY_COMPILE_OR_LINK tests instead. Skip all the Filesystem
dependency checks if not building the filesystem library.
* acinclude.m4 (GLIBCXX_CHECK_FILESYSTEM_DEPS): Only check when
enable_libstdcxx_filesystem_ts = yes. Check for link, readlink and
symlink.
* config.h.in: Regenerate.
* configure: Regenerate.
* configure.ac: Remove AC_CHECK_FUNCS for link, readlink and symlink.
Nick Clifton [Mon, 18 Jun 2018 10:39:01 +0000 (10:39 +0000)]
Ensure that control characters in user supplied error and warning messages are escaped.
PR 84195
* tree.c (escaped_string): New class. Converts an unescaped
string into its escaped equivalent.
(warn_deprecated_use): Use the new class to convert the
deprecation message, if present.
(test_escaped_strings): New self test.
(test_c_tests): Add test_escaped_strings.
Eric Botcazou [Sun, 17 Jun 2018 11:36:58 +0000 (11:36 +0000)]
gimplify.c (nonlocal_vlas): Delete.
* gimplify.c (nonlocal_vlas): Delete.
(nonlocal_vla_vars): Likewise.
(gimplify_var_or_parm_decl): Do not add debug VAR_DECLs for non-local
referenced VLAs.
(gimplify_body): Do not create and destroy nonlocal_vlas.
* tree-nested.c: Include diagnostic.h.
(use_pointer_in_frame): Tweak.
(lookup_field_for_decl): Add assertion and declare the transformation.
(convert_nonlocal_reference_op) <PARM_DECL>: Rework and issue an
internal error when the reference is in a wrong context. Do not
create a debug decl by default.
(note_nonlocal_block_vlas): Delete.
(convert_nonlocal_reference_stmt) <GIMPLE_BIND>: Do not call it.
(convert_local_reference_op) <PARM_DECL>: Skip the frame decl. Do not
create a debug decl by default.
(convert_gimple_call) <GIMPLE_CALL>: Issue an internal error when the
call is in a wrong context.
(fixup_vla_decls): New function.
(finalize_nesting_tree_1): Adjust comment. Call fixup_vla_decls if no
debug variables were created.
* tree.c (decl_value_expr_lookup): Add checking assertion.
(decl_value_expr_insert): Likewise.
fortran/
* fortran/trans-decl.c (nonlocal_dummy_decl_pset): Delete.
(nonlocal_dummy_decls): Likewise.
(gfc_nonlocal_dummy_array_decl): Likewise.
(gfc_get_symbol_decl): Do not call gfc_nonlocal_dummy_array_decl.
(gfc_get_fake_result_decl): Do not generate a new DECL if simply
reusing the result of a recursive call.
(gfc_generate_function_code): Do not create, insert and destroy
nonlocal_dummy_decls.
Jakub Jelinek [Sat, 16 Jun 2018 06:50:31 +0000 (08:50 +0200)]
re PR rtl-optimization/86108 (crash during unwinding with -O2)
PR rtl-optimization/86108
* bb-reorder.c (create_forwarder_block): Renamed to ...
(create_eh_forwarder_block): ... this. Split OLD_BB after labels and
jump from new landing pad to the second part.
(sjlj_fix_up_crossing_landing_pad, dw2_fix_up_crossing_landing_pad):
Adjust callers.
Jonathan Wakely [Fri, 15 Jun 2018 23:47:33 +0000 (00:47 +0100)]
LWG 3076 basic_string CTAD ambiguity
When deduction guides are supported by the compiler (i.e. for C++17 and
later) replace two basic_string constructors by constrained function
templates as required by LWG 3075. In order to ensure that the pre-C++17
non-template constructors are still exported from the shared library
define a macro in src/c++11/string-inst.cc to force the non-template
declarations (this isn't strictly needed yet, because the string
instantiations are compiled with -std=gnu++11, but that is likely to
change).
LWG 3076 basic_string CTAD ambiguity
* doc/xml/manual/intro.xml: Document LWG 3076 change.
* include/bits/basic_string.h
[__cpp_deduction_guides && !_GLIBCXX_DEFINING_STRING_INSTANTIATIONS]
(basic_string(const _CharT*, const _Alloc&)): Turn into a function
template constrained by _RequireAllocator.
(basic_string(size_type, _CharT, const _Alloc&)): Likewise.
* src/c++11/string-inst.cc (_GLIBCXX_DEFINING_STRING_INSTANTIATIONS):
Define.
* testsuite/21_strings/basic_string/cons/char/deduction.cc: Test
deduction
* testsuite/21_strings/basic_string/cons/wchar_t/deduction.cc:
Likewise.
Jakub Jelinek [Fri, 15 Jun 2018 20:36:38 +0000 (22:36 +0200)]
re PR middle-end/85878 (ICE in convert_mode_scalar, at expr.c:287)
PR middle-end/85878
* expr.c (expand_assignment): Remove now redundant COMPLEX_MODE_P
check from first store_expr, use to_mode instead of GET_MODE (to_rtx).
Only call store_expr for halves if the mode is the same.
* gfortran.fortran-torture/compile/pr85878.f90: New test.
Jason Merrill [Fri, 15 Jun 2018 20:22:44 +0000 (16:22 -0400)]
PR c++/82882 - ICE with lambda in template default argument.
* lambda.c (record_null_lambda_scope): New.
* pt.c (tsubst_lambda_expr): Use it.
* name-lookup.c (do_pushtag): Don't give a lambda DECL_CONTEXT of a
function that isn't open.
Jonathan Wakely [Fri, 15 Jun 2018 16:47:55 +0000 (17:47 +0100)]
Decorate string_view members with nonnull attribute
The C++ committee has confirmed that passing a null pointer to the
unary basic_string_view constructor is undefined. This removes the check
from our implementation, and adds the nonnull attribute to warn when the
compiler can detect undefined input.
Nick Clifton [Fri, 15 Jun 2018 15:25:16 +0000 (15:25 +0000)]
Force user provided warning and error messages to only occupy one line.
PR 84195
gcc: * tree.c (escaped_string): New class. Converts an unescaped
string into its escaped equivalent.
(warn_deprecated_use): Use the new class to convert the
deprecation message, if present.
(test_escaped_strings): New self test.
(test_c_tests): Add test_escaped_strings.
* doc/extend.texi (deprecated): Add a note that the
deprecation message is affected by the -fmessage-length
option, and that control characters will be escaped.
(#pragma GCC error): Document this pragma.
(#pragma GCC warning): Likewise.
* doc/invoke.texi (-fmessage-length): Document this option's
effect on the #warning and #error preprocessor directives and
the deprecated attribute.
testsuite;
* gcc.c-torture/compile/pr84195.c: New test.
Sebastian Huber [Fri, 15 Jun 2018 05:19:44 +0000 (05:19 +0000)]
RISC-V: Add custom RTEMS multilibs
Add multilib variants for -march=rv64imafd, e.g. to support the BOOMv2 core.
Add -mcmodel=medany as a variant of the 64-bit multilibs for RTEMS. The
rationale for this change is that several existing RISC-V chips map the
RAM at 0x80000000. In RTEMS, we do not use virtual memory, so
applications will run at this location which is outside the +-2GiB range
in a 64-bit configuration.
gcc/
* config.gcc (riscv*-*-elf* | riscv*-*-rtems*): Use custom
multilibs for *-*-rtems*.
* config/riscv/t-rtems: New file.
Jonathan Wakely [Fri, 15 Jun 2018 00:19:07 +0000 (01:19 +0100)]
LWG 3039 Unnecessary decay in thread and packaged_task
* include/std/future (__constrain_pkgdtask): Replace with ...
(packaged_task::__not_same): New alias template, using
__remove_cvref_t instead of decay.
* include/std/thread (thread::__not_same): Add comment.
Jonathan Wakely [Thu, 14 Jun 2018 20:27:04 +0000 (21:27 +0100)]
LWG 3075 basic_string needs deduction guides from basic_string_view
* testsuite/21_strings/basic_string/cons/char/deduction.cc: Test
deduction from string views.
* testsuite/21_strings/basic_string/cons/wchar_t/deduction.cc:
Likewise.