Nathan Sidwell [Thu, 2 Aug 2018 14:23:50 +0000 (14:23 +0000)]
Assert body is back as trailing array
Assert body is back as trailing array
libcpp/
* include/cpp-id-data.h (struct answer): Make body a trailing array
pointer.
* directives.c (parse_answer, parse_assertion, find_answer):
Return whole answer struct.
(_cpp_test_assertion, do_assert, do_unassert): Adjust.
macro.c (parse_params): Re implement state machine.
libcpp/
* macro.c (parse_params): Re implement state machine.
(create_iso_definition): Adjust first token peeking.
* traditional.c (save_replacement_text): No need to set macro kind
here.
gcc/testsuite/
* gcc.dg/cpp/macsyntx.c: Update errors.
* gcc.dg/cpp/macsyntx2.c: Update errors.
internal.h (_cpp_save_parameter): Take parm no, not macro.
libcpp/
* internal.h (_cpp_save_parameter): Take parm no, not macro.
* macro,c (_cpp_save_parameter): Adjust. Invert sense of return value.
(parse_params): Adjust.
* traditional.c (scan_parameters): Likewise.
libcpp/
* include/cpp-id-data.h (struct answer): Make body an external
pointer.
* directives.c (parse_answer, parse_assertion, find_answer): Use
separate base-ptr/len tuple for body.
(_cpp_test_assertion, do_assert, do_unassert): Adjust.
Default allocator
libcpp/
* include/line-map.h (line_maps): Document default allocator.
* line-map.c (linemap_init): Set default allocator.
(new_linemap): No need to set default here. Simplify data flow.
Paul Thomas [Thu, 5 Jul 2018 16:27:38 +0000 (16:27 +0000)]
re PR fortran/86408 (bogus error: ABSTRACT INTERFACE must not have an assumed character length result (F2003: C418))
2018-07-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/86408
* resolve.c.c (resolve_contained_fntype): Reference to C418 is
in F2008 and not F2003.
(resolve_function): Ditto in error message. Also, exclude
deferred character length results from the error.
2018-07-05 Paul Thomas <pault@gcc.gnu.org>
PR fortran/86408
* gfortran.dg/deferred_character_20.f90: New test.
Jonathan Wakely [Thu, 5 Jul 2018 15:56:06 +0000 (16:56 +0100)]
PR libstdc++/58265 implement LWG 2063 for COW strings
For COW strings the default constructor does not allocate when
_GLIBCXX_FULLY_DYNAMIC_STRING == 0, so can be noexcept. The move
constructor and swap do not allocate when the allocators are equal, so
add conditional noexcept using allocator_traits::is_always_equal.
Fritz Reese [Thu, 5 Jul 2018 15:39:27 +0000 (15:39 +0000)]
re PR fortran/83183 (Out of memory with option -finit-derived)
2018-07-05 Fritz Reese <fritzoreese@gmail.com>
gcc/fortran/ChangeLog:
PR fortran/83183
PR fortran/86325
* expr.c (class_allocatable, class_pointer, comp_allocatable,
comp_pointer): New helpers.
(component_initializer): Generate EXPR_NULL for allocatable or pointer
components. Do not generate initializers for components within BT_CLASS.
Do not assign to comp->initializer.
(gfc_generate_initializer): Use new helpers; move code to generate
EXPR_NULL for class allocatable components into component_initializer().
gcc/testsuite/ChangeLog:
PR fortran/83183
PR fortran/86325
* gfortran.dg/init_flag_18.f90: New testcase.
* gfortran.dg/init_flag_19.f03: New testcase.
Carl Love [Thu, 5 Jul 2018 14:48:51 +0000 (14:48 +0000)]
rs6000-c.c: Map ALTIVEC_BUILTIN_VEC_UNPACKH for float argument to VSX_BUILTIN_DOUBLEH_V4SF.
gcc/ChangeLog:
2018-07-05 Carl Love <cel@us.ibm.com>
* config/rs6000/rs6000-c.c: Map ALTIVEC_BUILTIN_VEC_UNPACKH for
float argument to VSX_BUILTIN_DOUBLEH_V4SF.
Map ALTIVEC_BUILTIN_VEC_UNPACKL for float argument to
VSX_BUILTIN_DOUBLEL_V4SF.
gcc/testsuite/ChangeLog:
2018-07-05 Carl Love <cel@us.ibm.com>
* gcc.target/altivec-1-runnable.c: New test file.
* gcc.target/altivec-2-runnable.c: New test file.
* gcc.target/vsx-7.c (main2): Change expected expected instruction
for tests.
Tamar Christina [Thu, 5 Jul 2018 10:31:04 +0000 (10:31 +0000)]
Simplify movmem code by always doing overlapping copies when larger than 8 bytes on AArch64.
This changes the movmem code in AArch64 that does copy for data between 4 and 7
bytes to use the smallest possible mode capable of copying the remaining bytes in one
go and then overlapping the reads if needed.
This means that if we're copying 5 bytes we would issue an SImode and QImode
load instead of two SImode loads.
This does smaller memory accesses but also gives the mid-end a chance to realise
that it can CSE the loads in certain circumstances. e.g. when you have something
like
return foo;
where foo is a struct. This would be transformed by the mid-end into SSA form as
D.XXXX = foo;
return D.XXXX;
This movmem routine will handle the first copy, but it's usually not needed,
the mid-end would do SImode and QImode stores into X0 for the 5 bytes example
but without the first copies being in the same mode, it doesn't know it doesn't
need the stores at all.
PR sanitizer/84250
* config/gnu-user.h (LIBASAN_EARLY_SPEC): Pass -lstdc++ for static
libasan.
* gcc.c: Do not pass LIBUBSAN_SPEC if ASan is enabled with UBSan.
Jonathan Wakely [Wed, 4 Jul 2018 20:15:01 +0000 (21:15 +0100)]
P0646R1 Improving the Return Value of Erase-Like Algorithms I
In C++2a the remove, remove_if and unique members of std::list and
std::forward_list have been changed to return the number of elements
removed. This is an ABI change for the remove members and the
non-template unique members, so an abi-tag is used to give those symbols
new mangled names in C++2a mode. For the function templates the return
type is part of the mangled name so no abi-tag is needed.
* include/bits/forward_list.h (__cpp_lib_list_remove_return_type):
Define.
(forward_list::__remove_return_type): Define typedef as size_type or
void, according to __cplusplus value.
(_GLIBCXX_FWDLIST_REMOVE_RETURN_TYPE_TAG): Define macro as abi-tag or
empty, according to __cplusplus value.
(forward_list::remove, forward_list::unique): Use typedef and macro
to change return type and add abi-tag for C++2a.
(forward_list::remove_if<Pred>, forward_list::unique<BinPred>): Use
typedef to change return type for C++2a.
* include/bits/forward_list.tcc (_GLIBCXX20_ONLY): Define macro.
(forward_list::remove, forward_list::remove_if<Pred>)
(forward_list::unique<BinPred>): Return number of removed elements
for C++2a.
* include/bits/list.tcc (_GLIBCXX20_ONLY): Define macro.
(list::remove, list::unique, list::remove_if<Predicate>)
(list::unique<BinaryPredicate>): Return number of removed elements
for C++2a.
* include/bits/stl_list.h (__cpp_lib_list_remove_return_type): Define.
(list::__remove_return_type): Define typedef as size_type or
void, according to __cplusplus value.
(_GLIBCXX_LIST_REMOVE_RETURN_TYPE_TAG): Define macro as abi-tag or
empty, according to __cplusplus value.
(list::remove, list::unique): Use typedef and macro to change return
type and add abi-tag for C++2a.
(list::remove_if<Predicate>, list::unique<BinaryPredicate>): Use
typedef to change return type for C++2a.
* include/std/version (__cpp_lib_list_remove_return_type): Define.
* testsuite/23_containers/forward_list/operations/
remove_cxx20_return.cc: New.
* testsuite/23_containers/forward_list/operations/
unique_cxx20_return.cc: New.
PR sanitizer/84250
* config/gnu-user.h (LIBASAN_EARLY_SPEC): Pass -lstdc++ for static
libasan.
* gcc.c: Do not pass LIBUBSAN_SPEC if ASan is enabled with UBSan.
The intrinsic doesn't check for allowed conversions between scalar
types, so restore the std::is_constructible check.
Also make some trivial whitespace changes.
PR libstdc++/86398
* include/std/type_traits (is_trivially_constructible): Check
is_constructible before __is_trivially_constructible.
* testsuite/20_util/is_trivially_constructible/value.cc: Add more
tests, including negative cases.
* testsuite/20_util/make_signed/requirements/typedefs_neg.cc: Use
zero for dg-error lineno.
* testsuite/20_util/make_unsigned/requirements/typedefs_neg.cc:
Likewise.
Martin Liska [Wed, 4 Jul 2018 07:51:08 +0000 (07:51 +0000)]
[multiple changes]
2018-07-04 Denys Vlasenko <dvlasenk@redhat.com>
Martin Liska <mliska@suse.cz>
PR middle-end/66240
PR target/45996
PR c/84100
* common.opt: Rename align options with 'str_' prefix.
* common/config/i386/i386-common.c (set_malign_value): New
function.
(ix86_handle_option): Use it to set -falign-* options/
* config/aarch64/aarch64-protos.h (struct tune_params): Change
type from int to string.
* config/aarch64/aarch64.c: Update default values from int
to string.
* config/alpha/alpha.c (alpha_override_options_after_change):
Likewise.
* config/arm/arm.c (arm_override_options_after_change_1): Likewise.
* config/i386/dragonfly.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
max skip conditionally.
* config/i386/freebsd.h (SUBALIGN_LOG): New.
(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
max skip conditionally.
* config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
max skip conditionally.
* config/i386/gnu-user.h (SUBALIGN_LOG): New.
(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
max skip conditionally.
* config/i386/i386.c (struct ptt): Change type from int to
string.
(ix86_default_align): Set default values.
* config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Print
max skip conditionally.
* config/i386/iamcu.h (SUBALIGN_LOG): New.
(ASM_OUTPUT_MAX_SKIP_ALIGN):
* config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN):
* config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
max skip conditionally.
* config/i386/openbsdelf.h (SUBALIGN_LOG): New.
(ASM_OUTPUT_MAX_SKIP_ALIGN) Print max skip conditionally.:
* config/i386/x86-64.h (SUBALIGN_LOG): New.
(ASM_OUTPUT_MAX_SKIP_ALIGN): Print
max skip conditionally.
(ASM_OUTPUT_MAX_SKIP_PAD): Likewise.
* config/ia64/ia64.c (ia64_option_override): Set default values
for alignment options.
* config/m68k/m68k.c: Handle new str_align_* options.
* config/mips/mips.c (mips_set_compression_mode): Change
type of constants.
(mips_option_override): Set default values for options.
* config/powerpcspe/powerpcspe.c (rs6000_option_override_internal):
Likewise.
* config/rs6000/rs6000.c (rs6000_option_override_internal):
Likewise.
* config/rx/rx.c (rx_option_override): Likewise.
* config/rx/rx.h (JUMP_ALIGN): Use align_jumps_log.
(LABEL_ALIGN): Use align_labels_log.
(LOOP_ALIGN): Use align_loops_align.
* config/s390/s390.c (s390_asm_output_function_label): Use new
macros.
* config/sh/sh.c (sh_override_options_after_change):
Change type of constants.
* config/spu/spu.c (spu_sched_init): Likewise.
* config/sparc/sparc.c (sparc_option_override): Set default
values for options.
* config/visium/visium.c (visium_option_override): Likewise.
* config/visium/visium.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Do not
emit p2align format with last argument if it's not needed.
* doc/invoke.texi: Document extended format of -falign-*.
* final.c: Use align_labels alignment.
* flags.h (struct target_flag_state): Change type to use
align_flags.
(struct align_flags_tuple): New.
(struct align_flags): Likewise.
(align_loops_log): Redefine macro to use new types.
(align_loops_max_skip): Redefine macro to use new types.
(align_jumps_log): Redefine macro to use new types.
(align_jumps_max_skip): Redefine macro to use new types.
(align_labels_log): Redefine macro to use new types.
(align_labels_max_skip): Redefine macro to use new types.
(align_functions_log): Redefine macro to use new types.
(align_loops): Redefine macro to use new types.
(align_jumps): Redefine macro to use new types.
(align_labels): Redefine macro to use new types.
(align_functions): Redefine macro to use new types.
(align_functions_max_skip): Redefine macro to use new types.
(align_loops_value): New macro.
(align_jumps_value): New macro.
(align_labels_value): New macro.
(align_functions_value): New macro.
* function.c (invoke_set_current_function_hook): Propagate
alignment values from flags to global variables default in
topleev.h.
* ipa-icf.c (sem_function::equals_wpa): Use
cl_optimization_option_eq instead of memcmp.
* lto-streamer.h (cl_optimization_stream_out): Support streaming
of string types.
(cl_optimization_stream_in): Likewise.
* optc-save-gen.awk: Support strings in cl_optimization.
* opth-gen.awk: Likewise.
* opts.c (finish_options): Remove error checking of invalid
value ranges.
(MAX_CODE_ALIGN): Remove.
(MAX_CODE_ALIGN_VALUE): Likewise.
(parse_and_check_align_values): New function.
(check_alignment_argument): Likewise.
(common_handle_option): Use check_alignment_argument.
* opts.h (parse_and_check_align_values): Declare.
* toplev.c (init_alignments): Remove.
(read_log_maxskip): New.
(parse_N_M): Likewise.
(parse_alignment_opts): Likewise.
(backend_init_target): Remove usage of init_alignments.
* toplev.h (parse_alignment_opts): Declare.
* tree-streamer-in.c (streamer_read_tree_bitfields): Add new
argument.
* tree-streamer-out.c (streamer_write_tree_bitfields): Likewise.
* tree.c (cl_option_hasher::equal): New.
* varasm.c: Use new global macros.
2018-07-04 Martin Liska <mliska@suse.cz>
PR middle-end/66240
PR target/45996
PR c/84100
* lto.c (compare_tree_sccs_1): Use cl_optimization_option_eq
instead of memcmp.
2018-07-04 Martin Liska <mliska@suse.cz>
PR middle-end/66240
PR target/45996
PR c/84100
* gcc.dg/pr84100.c (foo):
* gcc.target/i386/falign-functions-2.c: New test.
* gcc.target/i386/falign-functions.c: New test.
H.J. Lu [Wed, 4 Jul 2018 03:01:33 +0000 (03:01 +0000)]
i386: Add indirect_return function attribute
On x86, swapcontext may return via indirect branch when shadow stack
is enabled. To support code instrumentation of control-flow transfers
with -fcf-protection, add indirect_return function attribute to inform
compiler that a function may return via indirect branch.
Note: Unlike setjmp, swapcontext only returns once. Mark it return
twice will unnecessarily disable compiler optimization as shown in
the testcase here.
gcc/
PR target/85620
* config/i386/i386.c (rest_of_insert_endbranch): Also generate
ENDBRANCH for non-tail call which may return via indirect branch.
* doc/extend.texi: Document indirect_return attribute.
Jeff Law [Wed, 4 Jul 2018 01:03:52 +0000 (19:03 -0600)]
h8300.md (ors code_iterator): New.
* config/h8300/h8300.md (ors code_iterator): New.
(bsetqi_msx, bnotqi_msx patterns and splitters): Consolidate into
a single pattern and single splitter.
(bsethi_msx, bnothi_msx patterns): Consolidate into a single pattern.
(iorqi3_1, xorqi3_1): Likewise.
(iorqi3, xorqi3 expanders): Similarly.
Jeff Law [Wed, 4 Jul 2018 00:27:38 +0000 (18:27 -0600)]
h8300.md (movmd_internal_normal): Consolidated with (movmd_internal) into a single pattern using the P mode iterator.
* config/h8300/h8300.md (movmd_internal_normal): Consolidated with
(movmd_internal) into a single pattern using the P mode iterator.
(movmd splitters): Similarly.
(stpcpy_internal_normal, stpcpy_internal): Similarly for thes patterns.
(movsd splitters): Similarly.
Jonathan Wakely [Tue, 3 Jul 2018 21:04:45 +0000 (22:04 +0100)]
P0556R3 Integral power-of-2 operations, P0553R2 Bit operations
P0553R2 is not in the C++2a working draft yet, but is likely to be
approved soon. Neither proposal supports std::byte but this adds
overloads of each function for std::byte, assuming that will also get
added.
* include/Makefile.am: Add new header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Include new header.
* include/std/bit: New header.
(__rotl, __rotr, __countl_zero, __countl_one, __countr_zero)
(__countr_one, __popcount, __ispow2, __ceil2, __floor2, __log2p1):
Define for C++14.
[!__STRICT_ANSI__] (rotl, rotr, countl_zero, countl_one, countr_zero)
(countr_one, popcount): Define for C++2a. Also overload for std::byte.
(ispow2, ceil2, floor2, log2p1): Define for C++2a.
[!__STRICT_ANSI__] (ispow2, ceil2, floor2, log2p1): Overload for
std::byte.
* testsuite/26_numerics/bit/bit.pow.two/ceil2.cc: New.
* testsuite/26_numerics/bit/bit.pow.two/floor2.cc: New.
* testsuite/26_numerics/bit/bit.pow.two/ispow2.cc: New.
* testsuite/26_numerics/bit/bit.pow.two/log2p1.cc: New.
* testsuite/26_numerics/bit/bitops.rot/rotl.cc: New.
* testsuite/26_numerics/bit/bitops.rot/rotr.cc: New.
* testsuite/26_numerics/bit/bitops.count/countl_one.cc: New.
* testsuite/26_numerics/bit/bitops.count/countl_zero.cc: New.
* testsuite/26_numerics/bit/bitops.count/countr_one.cc: New.
* testsuite/26_numerics/bit/bitops.count/countr_zero.cc: New.
Paolo Carlini [Tue, 3 Jul 2018 21:03:51 +0000 (21:03 +0000)]
decl.c (min_location): New.
/cp
2018-07-03 Paolo Carlini <paolo.carlini@oracle.com>
* decl.c (min_location): New.
(smallest_type_quals_location): Use the latter.
(check_concept_fn): Use DECL_SOURCE_LOCATION.
(grokdeclarator): Use accurate locations in a number of error
messages involving ds_thread, ds_storage_class, ds_virtual,
ds_constexpr, ds_typedef and ds_friend; exploit min_location.
/testsuite
2018-07-03 Paolo Carlini <paolo.carlini@oracle.com>
* config/h8300/h8300.c (h8300_insn_length_from_table): Consolidate
ADDB, ADDW and ADDL into a single ADD attribute which selects the
right table based on the size of the operand.
* config/h8300/h8300.md (length_table): Corresponding changes. All
references to "addb", "addw" and "addl" changed to "add".
(btst patterns): Merge two variants into a single pattern.
(tstqi, tsthi): Likewise.
(addhi3_incdec, addsi3_incdec): Likewise.
(subhi3_h8300hs, subsi3_h8300hs): Likewise.
(mulhi3, mulsi3): Likewise.
(udivhi3, udivsi3): Likewise.
(divhi3, divsi3): Likewise.
(andorqi3, andorhi3, andorsi3): Likewise.
The PR85694 series added a vectype argument to append_pattern_def_seq.
This patch makes more callers use it.
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_recog_rotate_pattern)
(vect_recog_vector_vector_shift_pattern, vect_recog_divmod_pattern)
(vect_recog_mixed_size_cond_pattern, adjust_bool_pattern_cast)
(adjust_bool_pattern, vect_recog_bool_pattern): Pass the vector
type to append_pattern_def_seq instead of creating a stmt_vec_info
directly.
(build_mask_conversion): Likewise. Remove vinfo argument.
(vect_add_conversion_to_patterm): Likewise, renaming to...
(vect_add_conversion_to_pattern): ...this.
(vect_recog_mask_conversion_pattern): Update call to
build_mask_conversion. Pass the vector type to
append_pattern_def_seq here too.
(vect_recog_gather_scatter_pattern): Update call to
vect_add_conversion_to_pattern.
Ensure PATTERN_DEF_SEQ is empty before recognising patterns
Various recognisers set PATTERN_DEF_SEQ to null before adding
statements to it, but it should always be null at that point anyway.
This patch asserts for that in vect_pattern_recog_1 and removes
the redundant code.
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (new_pattern_def_seq): Delete.
(vect_recog_dot_prod_pattern, vect_recog_sad_pattern)
(vect_recog_widen_op_pattern, vect_recog_over_widening_pattern)
(vect_recog_rotate_pattern, vect_synth_mult_by_constant): Don't set
STMT_VINFO_PATTERN_DEF_SEQ to null here.
(vect_recog_pow_pattern, vect_recog_vector_vector_shift_pattern)
(vect_recog_mixed_size_cond_pattern, vect_recog_bool_pattern): Use
append_pattern_def_seq instead of new_pattern_def_seq.
(vect_recog_divmod_pattern): Do both of the above.
(vect_pattern_recog_1): Assert that STMT_VINO_PATTERN_DEF_SEQ
is null.
The PR85694 series removed the only cases in which a pattern recogniser
could attach patterns to more than one statement. I think it would be
better to avoid adding any new instances of that, since it interferes
with the normal matching order.
This patch therefore switches the interface back to passing a single
statement instead of a vector. It also gets rid of the clearing of
STMT_VINFO_RELATED_STMT on failure, since no recognisers use it now.
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_recog_dot_prod_pattern):
(vect_recog_sad_pattern, vect_recog_widen_op_pattern)
(vect_recog_widen_mult_pattern, vect_recog_pow_pattern):
(vect_recog_widen_sum_pattern, vect_recog_over_widening_pattern)
(vect_recog_average_pattern, vect_recog_cast_forwprop_pattern)
(vect_recog_widen_shift_pattern, vect_recog_rotate_pattern)
(vect_recog_vector_vector_shift_pattern, vect_synth_mult_by_constant)
(vect_recog_mult_pattern, vect_recog_divmod_pattern)
(vect_recog_mixed_size_cond_pattern, vect_recog_bool_pattern)
(vect_recog_mask_conversion_pattern): Replace vec<gimple *>
parameter with a single stmt_vec_info.
(vect_recog_func_ptr): Likewise.
(vect_recog_gather_scatter_pattern): Likewise, folding in...
(vect_try_gather_scatter_pattern): ...this.
(vect_pattern_recog_1): Remove stmts_to_replace and just pass
the stmt_vec_info of the statement to be matched. Don't clear
STMT_VINFO_RELATED_STMT.
(vect_pattern_recog): Update call accordingly.
[16/n] PR85694: Add detection of averaging operations
This patch adds detection of average instructions:
a = (((wide) b + (wide) c) >> 1);
--> a = (wide) .AVG_FLOOR (b, c);
a = (((wide) b + (wide) c + 1) >> 1);
--> a = (wide) .AVG_CEIL (b, c);
in cases where users of "a" need only the low half of the result,
making the cast to (wide) redundant. The heavy lifting was done by
earlier patches.
This showed up another problem in vectorizable_call: if the call is a
pattern definition statement rather than the main pattern statement,
the type of vectorised call might be different from the type of the
original statement.
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/85694
* doc/md.texi (avgM3_floor, uavgM3_floor, avgM3_ceil)
(uavgM3_ceil): Document new optabs.
* doc/sourcebuild.texi (vect_avg_qi): Document new target selector.
* internal-fn.def (IFN_AVG_FLOOR, IFN_AVG_CEIL): New internal
functions.
* optabs.def (savg_floor_optab, uavg_floor_optab, savg_ceil_optab)
(savg_ceil_optab): New optabs.
* tree-vect-patterns.c (vect_recog_average_pattern): New function.
(vect_vect_recog_func_ptrs): Add it.
* tree-vect-stmts.c (vectorizable_call): Get the type of the zero
constant directly from the associated lhs.
[15/n] PR85694: Try to split existing casts in widened patterns
The main over-widening patch can introduce quite a few extra casts,
and in many cases those casts simply "tap into" an intermediate
point in an existing extension. E.g. if we have:
unsigned char a;
int ax = (int) a;
and a later operation using ax is shortened to "unsigned short",
we would need:
unsigned short ax' = (unsigned short) a;
The a->ax extension requires one set of unpacks to get to unsigned
short and another set of unpacks to get to int. The first set are
then duplicated for ax'. If both ax and ax' are needed, the a->ax'
extension would end up counting twice during cost calculations.
This patch rewrites the original:
int ax = (int) a;
into a pattern:
unsigned short ax' = (unsigned short) a;
int ax = (int) ax';
so that each extension only counts once.
2018-07-03 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* tree-vect-patterns.c (vect_split_statement): New function.
(vect_convert_input): Use it to try to split an existing cast.
gcc/testsuite/
* gcc.dg/vect/vect-over-widen-5.c: Test that the extensions
get split into two for use by the over-widening pattern.
* gcc.dg/vect/vect-over-widen-6.c: Likewise.
* gcc.dg/vect/vect-over-widen-7.c: Likewise.
* gcc.dg/vect/vect-over-widen-8.c: Likewise.
* gcc.dg/vect/vect-over-widen-9.c: Likewise.
* gcc.dg/vect/vect-over-widen-10.c: Likewise.
* gcc.dg/vect/vect-over-widen-11.c: Likewise.
* gcc.dg/vect/vect-over-widen-12.c: Likewise.
* gcc.dg/vect/vect-over-widen-13.c: Likewise.
* gcc.dg/vect/vect-over-widen-14.c: Likewise.
* gcc.dg/vect/vect-over-widen-15.c: Likewise.
* gcc.dg/vect/vect-over-widen-16.c: Likewise.
* gcc.dg/vect/vect-over-widen-22.c: New test.
This patch is the main part of PR85694. The aim is to recognise at least:
signed char *a, *b, *c;
...
for (int i = 0; i < 2048; i++)
c[i] = (a[i] + b[i]) >> 1;
as an over-widening pattern, since the addition and shift can be done
on shorts rather than ints. However, it ended up being a lot more
general than that.
The current over-widening pattern detection is limited to a few simple
cases: logical ops with immediate second operands, and shifts by a
constant. These cases are enough for common pixel-format conversion
and can be detected in a peephole way.
The loop above requires two generalisations of the current code: support
for addition as well as logical ops, and support for non-constant second
operands. These are harder to detect in the same peephole way, so the
patch tries to take a more global approach.
The idea is to get information about the minimum operation width
in two ways:
(1) by using the range information attached to the SSA_NAMEs
(effectively a forward walk, since the range info is
context-independent).
(2) by back-propagating the number of output bits required by
users of the result.
As explained in the comments, there's a balance to be struck between
narrowing an individual operation and fitting in with the surrounding
code. The approach is pretty conservative: if we could narrow an
operation to N bits without changing its semantics, it's OK to do that if:
- no operations later in the chain require more than N bits; or
- all internally-defined inputs are extended from N bits or fewer,
and at least one of them is single-use.
See the comments for the rationale.
I didn't bother adding STMT_VINFO_* wrappers for the new fields
since the code seemed more readable without.
2018-06-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* poly-int.h (print_hex): New function.
* dumpfile.h (dump_dec, dump_hex): Declare.
* dumpfile.c (dump_dec, dump_hex): New poly_wide_int functions.
* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
min_input_precision, operation_precision and operation_sign.
* tree-vect-patterns.c (vect_get_range_info): New function.
(vect_same_loop_or_bb_p, vect_single_imm_use)
(vect_operation_fits_smaller_type): Delete.
(vect_look_through_possible_promotion): Add an optional
single_use_p parameter.
(vect_recog_over_widening_pattern): Rewrite to use new
stmt_vec_info infomration. Handle one operation at a time.
(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
(vect_truncatable_operation_p, vect_set_operation_type)
(vect_set_min_input_precision): New functions.
(vect_determine_min_output_precision_1): Likewise.
(vect_determine_min_output_precision): Likewise.
(vect_determine_precisions_from_range): Likewise.
(vect_determine_precisions_from_users): Likewise.
(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
(vect_vect_recog_func_ptrs): Put over_widening first.
Add cast_forwprop.
(vect_pattern_recog): Call vect_determine_precisions.