Use steady_clock to implement condition_variable::wait_for with predicate
In r263225 (d2e378182a12d68fe5caeffae681252662a2fe7b), I fixed
condition_variable::wait_for to use std::chrono::steady_clock for the wait.
Unfortunately, I failed to spot that the same fix is required for the
wait_for variant that takes a predicate too.
2018-09-25 Mike Crowe <mac@mcrowe.com>
* include/std/condition_variable (condition_variable::wait_for): Use
steady clock in overload that uses a predicate.
If a std::variant can never get into valueless state then we don't need
to do a runtime check for a valid alternative.
PR libstdc++/87431
* include/std/variant (_Variant_storage<true, _Types...>::_M_valid):
Avoid runtime test when all alternatives are scalars and so cannot
throw during initialization.
In internal/bytealg correct a +build tag to never build indexbyte_generic.go
for the gofrontend, where we always use indexbyte_native.go.
For internal/cpu let the Makefile define CacheLineSize using goarch.sh,
rather than trying to enumerate all the possibilities in cpu_ARCH.go files.
In internal/poll call the C fcntl function rather than using SYS_FCNTL.
Change mksysinfo.sh to ensure that F_GETPIPE_SZ is always defined,
and check that in internal/poll.
ian [Tue, 25 Sep 2018 14:16:32 +0000 (14:16 +0000)]
cmd/go: pass down testing gccgo in TestScript
This permits TestScript to work when gccgo is not installed.
Previous testing was using a previously installed gccgo, not the newly
built one.
This revealed that the testing of whether an internal package is
permitted was incorrect for standard library packages, since the
uninstalled gccgo can see internal packages in the uninstalled libgo.
Fix the internal package tests.
This permitted removing a couple of gccgo-specific changes in the
testsuite.
i386: Compile pr82699-5.c and pr82699-6.c with -fno-pic
Compile pr82699-5.c and pr82699-6.c with -fno-pic to avoid
FAIL: gcc.target/i386/pr82699-5.c (test for excess errors)
Excess errors:
cc1: sorry, unimplemented: -mfentry isn't supported for 32-bit in combination with -fpic
FAIL: gcc.target/i386/pr82699-6.c (test for excess errors)
Excess errors:
cc1: error: -mnop-mcount is not implemented for -fPIC
cc1: sorry, unimplemented: -mfentry isn't supported for 32-bit in combination with -fpic
when running GCC testsuite with --target_board='unix{-fpic\ -m32,-fpic}'.
PR tree-optimization/87402
* tree-ssa-sccvn.c (SSA_VISITED): Remove unused function.
(visit_phi): Re-instantiate handling of supposed to be VARYING
but non-VARYING backedge value.
We need to check pie_enabled target in PIC tests to support GCC where
PIE is enabled by default when configured with --enable-default-pie.
PR testsuite/70150
* gcc.dg/20020312-2.c (dg-additional-options): Set to "-no-pie"
for pie_enabled target.
* gcc.dg/uninit-19.c: Check pie_enabled for PIC.
* gcc.target/i386/pr34256.c: Likewise.
PR debug/83941
* dwarf2out.c (struct sym_off_pair): New.
(external_die_map): New global.
(lookup_decl_die): When in LTO create DIEs lazily from the
external_die_map.
(lookup_block_die): New function, create DIEs lazily in LTO.
(equate_block_to_die): New function.
(dwarf2out_die_ref_for_decl): During WPA get the association
from the external DIE map.
(dwarf2out_register_external_die): Record mapping into the
external DIE map.
(maybe_create_die_with_external_ref): New function split out from
DIE generation part of old dwarf2out_register_external_die.
(add_abstract_origin_attribute): Do not return the DIE. When
in LTO reference externals directly.
(dwarf2out_abstract_function): When in LTO ignore calls for
decls with external DIEs (already present abstract instances).
(gen_call_site_die): Adjust.
(add_high_low_attributes): Likewise.
(gen_lexical_block_die): Likewise.
(gen_inlined_subroutine_die): Likewie.
(gen_block_die): Likewise.
(dwarf2out_inline_entry): Likewise.
(dwarf2out_early_finish): In LTRANS phase create DW_TAG_imported_unit
DIEs.
PR fortran/87394
* dbgcnt.c (dbg_cnt_process_single_pair): Return false
instead of NULL.
* dumpfile.c (dump_enable_all): Remove extra parenthesis.
* gcov-tool.c: Declare the function with ATTRIBUTE_NORETURN.
* godump.c (go_format_type): Remove extra parenthesis.
2018-09-25 Martin Liska <mliska@suse.cz>
PR fortran/87394
* decl.c (add_hidden_procptr_result): Simplify condition
as we are in branch witch 'case1 || case2'.
"r264537: Change EQ_ATTR_ALT to support up to 64 alternatives" changed
the format of EQ_ATTR_ALT from ii to ww. This broke the bootstrap on
32-bit systems, because the formula for rtx_code_size assumed that only
certain codes contain HOST_WIDE_INTs. This did not surface on 64-bit
systems, because rtunion is 8 bytes anyway, but on 32-bit systems it's
only 4 bytes. This resulted in out-of-bounds writes and memory
corruptions in genattrtab.
gcc/ChangeLog:
2018-09-25 Ilya Leoshkevich <iii@linux.ibm.com>
PR bootstrap/87417
* rtl.c (rtx_code_size): Take into account that EQ_ATTR_ALT
contains HOST_WIDE_INTs when computing its size.
gotools/:
* Makefile.am (mostlyclean-local): Run chmod on check-go-dir to
make sure it is writable.
(check-go-tools): Likewise.
(check-vet): Copy internal/objabi to check-vet-dir.
* Makefile.in: Rebuild.
* cp-tree.h (build_noexcept_spec, add_exception_specifier): Adjust
declarations.
* except.c (build_noexcept_spec): Change the type of the complain
parameter to tsubst_flags_t.
* typeck2.c (add_exception_specifier): Likewise.
i386: Insert ENDBR before the profiling counter call
ENDBR must be the first instruction of a function. This patch queues
ENDBR if we need to put the profiling counter call before the prologue
and generate ENDBR before the profiling counter call.
gcc/
PR target/82699
* config/i386/i386.c (rest_of_insert_endbranch): Set
endbr_queued_at_entrance to true and don't insert ENDBR if
x86_function_profiler will be called.
(x86_function_profiler): Insert ENDBR if endbr_queued_at_entrance
is true.
* config/i386/i386.h (machine_function): Add
endbr_queued_at_entrance.
iii [Mon, 24 Sep 2018 15:01:57 +0000 (15:01 +0000)]
Change EQ_ATTR_ALT to support up to 64 alternatives
On S/390 there is a need to support more than 32 instruction
alternatives per define_insn. Currently this is not explicitly
prohibited or unsupported: MAX_RECOG_ALTERNATIVES is equal 35, and,
futhermore, the related code uses uint64_t for bitmaps in most places.
However, genattrtab contains the logic to convert (eq_attr "attribute"
"value") RTXs to (eq_attr_alt bitmap) RTXs, where bitmap contains
alternatives, whose "attribute" has the corresponding "value".
Unfortunately, bitmap is only 32 bits.
When adding the 33rd alternative, this led to (eq_attr "type" "larl")
becoming (eq_attr_alt -1050625 1), where -1050625 == 0xffeff7ff. The
cleared bits 12, 21 and 32 correspond to two existing and one newly
added insn of type "larl". compute_alternative_mask sign extended this
to 0xffffffffffeff7ff, which contained non-existent alternatives, and
this made simplify_test_exp fail with "invalid alternative specified".
I'm not sure why it didn't fail the same way before, since the top bit,
which led to sign extension, should have been set even with 32
alternatives. Maybe simplify_test_exp was not called for "type"
attribute for some reason?
This patch widens EQ_ATTR_ALT bitmap to 64 bits, making it possible to
gracefully handle up to 64 alternatives. It eliminates the problem with
the 33rd alternative on S/390.
gcc/ChangeLog:
2018-09-24 Ilya Leoshkevich <iii@linux.ibm.com>
* genattrtab.c (mk_attr_alt): Use alternative_mask.
(attr_rtx_1): Adjust caching to match the new EQ_ATTR_ALT field
types.
(check_attr_test): Use alternative_mask.
(get_attr_value): Likewise.
(compute_alternative_mask): Use alternative_mask and XWINT.
(make_alternative_compare): Use alternative_mask.
(attr_alt_subset_p): Use XWINT.
(attr_alt_subset_of_compl_p): Likewise.
(attr_alt_intersection): Use alternative_mask and XWINT.
(attr_alt_union): Likewise.
(attr_alt_complement): Use HOST_WIDE_INT and XWINT.
(mk_attr_alt): Use alternative_mask and HOST_WIDE_INT.
(simplify_test_exp): Use alternative_mask and XWINT.
(write_test_expr): Use alternative_mask and XWINT, adjust bit
number calculation to support 64 bits. Generate code that
checks 64-bit masks.
(main): Use alternative_mask.
* rtl.def (EQ_ATTR_ALT): Change field types from ii to ww.
iii [Mon, 24 Sep 2018 14:21:03 +0000 (14:21 +0000)]
S/390: Fix conditional returns on z196+
S/390 epilogue ends with (parallel [(return) (use %r14)]) instead of
the more usual (return) or (simple_return). This sequence is not
recognized by the conditional return logic in try_optimize_cfg ().
This was introduced for processors older than z196, where it is
sometimes profitable to use call-clobbered register for returning
instead of %r14. On newer processors we always return via %r14,
for which the fact that it's used is already reflected by
EPILOGUE_USES. In this case a simple (return) suffices.
This patch changes return_use () to emit simple (return)s when
returning via %r14. The resulting sequences are recognized by the
conditional return logic in try_optimize_cfg ().
gcc/ChangeLog:
2018-09-24 Ilya Leoshkevich <iii@linux.ibm.com>
PR target/80080
* config/s390/s390.c (s390_emit_epilogue): Do not use PARALLEL
RETURN+USE when returning via %r14.
PR sanitizer/85774
* asan.c: Make asan_handled_variables extern.
* asan.h: Likewise.
* cfgexpand.c (expand_stack_vars): Make sure
a representative is unpoison if another
variable in the partition is handled by
use-after-scope sanitization.
2018-09-24 Martin Liska <mliska@suse.cz>
PR sanitizer/85774
* g++.dg/asan/pr85774.C: New test.
PR tree-optimization/63155
* tree-ssa-propagate.c (add_ssa_edge): Avoid adding PHIs to
the worklist when the edge of the respective argument isn't
executable.
Do array index calculations in gfc_array_index_type
It was recently noticed that for a few of the coarray intrinsics array
index calculations were done in integer_type_node instead of
gfc_array_index_type. This patch fixes this.
Regtested on x86_64-pc-linux-gnu.
gcc/fortran/ChangeLog:
2018-09-23 Janne Blomqvist <jb@gcc.gnu.org>
* trans-expr.c (gfc_caf_get_image_index): Do array index
calculations in gfc_array_index_type.
* trans-intrinsic.c (conv_intrinsic_event_query): Likewise.
* trans-stmt.c (gfc_trans_lock_unlock): Likewise.
(gfc_trans_event_post_wait): Likewise.
PR fortran/41453
* trans.h (gfc_conv_expr_reference): Add optional argument
add_clobber to prototype.
(gfc_conv_procedure_call): Set add_clobber argument to
gfc_conv_procedure_reference to true for scalar, INTENT(OUT),
non-pointer, non-allocatable, non-dummy variables whose type
is neither BT_CHARACTER, BT_DERIVED or BT_CLASS, but only if
the procedure is not elemental.
* trans-expr.c (gfc_conv_procedure_reference): Add clobber
statement before call if add_clobber is set.
2018-09-22 Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/41453
* gfortran.dg/intent_optimize_2.f90: New test.
law [Fri, 21 Sep 2018 20:00:23 +0000 (20:00 +0000)]
* gimple-ssa-evrp.c (evrp_dom_walker::cleanup): Call
vr_values::cleanup_edges_and_switches.
* tree-vrp.c (to_remove_edges, to_update_switch_stmts): Moved into
vr_values class.
(identify_jump_threads): Remove EDGE_IGNORE handling.
(execute_vrp): Move handling of to_remove_edges and
to_update_switch_stmts into vr_values class member functions.
* tree-vrp.h (switch_update, to_remove_edges): Remove declarations.
(to_update_switch_stmts): Likewise.
* vr-values.c: Include cfghooks.h.
(vr_values::vr_values): Initialize to_remove_edges and
to_update_switch_stmts.
(vr_values::~vr_values): Verify to_remove_edges and
to_update_switch_stmts are empty.
(vr_values::simplify_switch_using_ranges): Set EDGE_IGNORE as needed.
(vr_values::cleanup_edges_and_switches): New member function.
* vr-values.h (vr_values): Add cleanup_edges_and_switches member
function. Add new data members.
* gcc.dg/tree-ssa/vrp113.c: Disable EVRP.
* gcc.dg/tree-ssa/vrp120.c: New test.
Use vectored writes when reporting errors and warnings.
When producing error and warning messages, libgfortran writes a
message by using many system calls. By using vectored writes (the
POSIX writev function) when available and feasible to use without
major surgery, we reduce the chance that output gets intermingled with
other output to stderr.
In practice, this is done by introducing a new function estr_writev in
addition to the existing estr_write. In order to use this, the old
st_vprintf is removed, replaced by direct calls of vsnprintf, allowing
more message batching.
Regtested on x86_64-pc-linux-gnu.
libgfortran/ChangeLog:
2018-09-21 Janne Blomqvist <jb@gcc.gnu.org>
* config.h.in: Regenerated.
* configure: Regenerated.
* configure.ac: Check for writev and sys/uio.h.
* libgfortran.h: Include sys/uio.h.
(st_vprintf): Remove prototype.
(struct iovec): Define if not available.
(estr_writev): New prototype.
* runtime/backtrace.c (error_callback): Use estr_writev.
* runtime/error.c (ST_VPRINTF_SIZE): Remove.
(estr_writev): New function.
(st_vprintf): Remove.
(gf_vsnprintf): New function.
(ST_ERRBUF_SIZE): New macro.
(st_printf): Use vsnprintf.
(os_error): Use estr_writev.
(runtime_error): Use vsnprintf and estr_writev.
(runtime_error_at): Likewise.
(runtime_warning_at): Likewise.
(internal_error): Use estr_writev.
(generate_error_common): Likewise.
(generate_warning): Likewise.
(notify_std): Likewise.
* runtime/pause.c (pause_string): Likewise.
* runtime/stop.c (report_exception): Likewise.
(stop_string): Likewise.
(error_stop_string): Likewise.
PR fortran/77325
* trans-array.c (gfc_alloc_allocatable_for_assignment): If the
rhs has a charlen expression, convert that and use it.
* trans-expr.c (gfc_trans_assignment_1): The rse.pre for the
assignment of deferred character array vars to a realocatable
lhs should not be added to the exterior block since vector
indices, for example, generate temporaries indexed within the
loop.
2018-09-21 Paul Thomas <pault@gcc.gnu.org>
PR fortran/77325
* gfortran.dg/deferred_character_22.f90 : New test.
Remove the hypot-long-double.cc file that used dg-xfail-run-if and
simply use the lower tolerance for double if long double is not larger
than double.
* testsuite/26_numerics/headers/cmath/hypot-long-double.cc: Remove.
* testsuite/26_numerics/headers/cmath/hypot.cc: Restore test for
long double unconditionally, but use lower tolerance when
sizeof(long double) == sizeof(double).
In r262891 I reimplemented this call:
dump_printf_loc (MSG_NOTE, loc, "=== %s ===\n", name);
in dump_begin_scope to use direct calls to dump_loc:
if (dump_file)
{
dump_loc (MSG_NOTE, dump_file, loc.get_location_t ());
fprintf (dump_file, "=== %s ===\n", name);
}
However ::dump_loc doesn't filter with pflags and alt_flags.
This lead to stray output of the form:
test.cpp:1:6: note: test.cpp:1:11: note:
when using -fopt-info with "optimized" or "missed".
This patch adds this missing filtering, eliminating the stray partial
note output.
gcc/ChangeLog:
PR tree-optimization/87309
* dumpfile.c (dump_context::begin_scope): Filter the dump_loc
calls with pflags and alt_flags.
(selftest::test_capture_of_dump_calls): Add test of interaction of
MSG_OPTIMIZED_LOCATIONS with AUTO_DUMP_SCOPE.
gcc/testsuite/ChangeLog:
PR tree-optimization/87309
* gcc.dg/pr87309.c: New test.
Cleanup handling of libgcc and libc_internal for VxWorks
2018-09-21 Olivier Hainque <hainque@adacore.com>
* config/vxworks.h (VXWORKS_LIBGCC_SPEC): Remove -lc_internal.
Merge block comment with the one ahead of VXWORKS_LIBS_RTP. Then:
(VXWORKS_LIBS_RTP): Minor reordering.
Introduce TARGET_VXWORKS64 for VxWorks 64bit ports
2018-09-21 Olivier Hainque <hainque@adacore.com>
* config.gcc: Enforce def of TARGET_VXWORKS64 to 1 from
triplet, similar to support for VxWorks7.
* config/vxworks-dummy.h: Provide a default definition
of TARGET_VXWORKS64 to 0.
The Fortran front-end has a bug in which it uses "int" values for "size_t"
parameters. I don't know why this isn't problem for all 64-bit architectures,
but GCN ends up with the data in the wrong argument register and/or stack slot,
and bad things happen.
This patch corrects the issue by setting the correct type.
2018-09-21 Andrew Stubbs <ams@codesourcery.com>
Kwok Cheung Yeung <kcy@codesourcery.com>
gcc/fortran/
* trans-expr.c (gfc_trans_structure_assign): Ensure that the first
argument of a call to _gfortran_caf_register is of size_type_node.
* trans-intrinsic.c (conv_intrinsic_event_query): Convert computed
index to a size_type_node type.
* trans-stmt.c (gfc_trans_event_post_wait): Likewise.
At present, pointers passed to builtin functions, including atomic operators,
are stripped of their address space properties. This doesn't seem to be
deliberate, it just omits to copy them.
Not only that, but it forces pointer sizes to Pmode, which isn't appropriate
for all address spaces.
This patch attempts to correct both issues. It works for GCN atomics and
GCN OpenACC gang-private variables.
2018-09-21 Andrew Stubbs <ams@codesourcery.com>
Julian Brown <julian@codesourcery.com>
* auto-profile.c (autofdo_source_profile::read): Do not
set sum_all.
(read_profile): Do not add working sets.
(read_autofdo_file): Remove sum_all.
(afdo_callsite_hot_enough_for_early_inline): Remove const
qualifier.
* coverage.c (struct counts_entry): Remove gcov_summary.
(read_counts_file): Read new GCOV_TAG_OBJECT_SUMMARY,
do not support GCOV_TAG_PROGRAM_SUMMARY.
(get_coverage_counts): Remove summary and expected
arguments.
* coverage.h (get_coverage_counts): Likewise.
* doc/gcov-dump.texi: Remove -w option.
* gcov-dump.c (dump_working_sets): Remove.
(main): Do not support '-w' option.
(print_usage): Likewise.
(tag_summary): Likewise.
* gcov-io.c (gcov_write_summary): Do not dump
histogram.
(gcov_read_summary): Likewise.
(gcov_histo_index): Remove.
(gcov_histogram_merge): Likewise.
(compute_working_sets): Likewise.
* gcov-io.h (GCOV_TAG_OBJECT_SUMMARY): Mark
it not obsolete.
(GCOV_TAG_PROGRAM_SUMMARY): Mark it obsolete.
(GCOV_TAG_SUMMARY_LENGTH): Adjust.
(GCOV_HISTOGRAM_SIZE): Remove.
(GCOV_HISTOGRAM_BITVECTOR_SIZE): Likewise.
(struct gcov_summary): Simplify rapidly just
to runs and sum_max fields.
(gcov_histo_index): Remove.
(NUM_GCOV_WORKING_SETS): Likewise.
(compute_working_sets): Likewise.
* gcov-tool.c (print_overlap_usage_message): Remove
trailing empty line.
* gcov.c (read_count_file): Read GCOV_TAG_OBJECT_SUMMARY.
(output_lines): Remove program related line.
* ipa-profile.c (ipa_profile): Do not consider GCOV histogram.
* lto-cgraph.c (output_profile_summary): Do not stream GCOV
histogram.
(input_profile_summary): Do not read it.
(merge_profile_summaries): And do not merge it.
(input_symtab): Do not call removed function.
* modulo-sched.c (sms_schedule): Do not print sum_max.
* params.def (HOT_BB_COUNT_FRACTION): Reincarnate param that was
removed when histogram method was invented.
(HOT_BB_COUNT_WS_PERMILLE): Mention that it's used only in LTO
mode.
* postreload-gcse.c (eliminate_partially_redundant_load): Fix
GCOV coding style.
* predict.c (get_hot_bb_threshold): Use HOT_BB_COUNT_FRACTION
and dump selected value.
* profile.c (add_working_set): Remove.
(get_working_sets): Likewise.
(find_working_set): Likewise.
(get_exec_counts): Do not work with working sets.
(read_profile_edge_counts): Do not inform as sum_max is removed.
(compute_branch_probabilities): Likewise.
(compute_value_histograms): Remove argument for call of
get_coverage_counts.
* profile.h: Do not make gcov_summary const.
2018-09-21 Martin Liska <mliska@suse.cz>
* libgcov-driver.c (crc32_unsigned): Remove.
(gcov_histogram_insert): Likewise.
(gcov_compute_histogram): Likewise.
(compute_summary): Simplify rapidly.
(merge_one_data): Do not handle PROGRAM_SUMMARY tag.
(merge_summary): Rapidly simplify.
(dump_one_gcov): Ignore gcov_summary.
(gcov_do_dump): Do not handle program summary, it's not
used.
* libgcov-util.c (tag_summary): Remove.
(read_gcda_finalize): Fix coding style.
(read_gcda_file): Initialize curr_object_summary.
(compute_summary): Remove.
(calculate_overlap): Remove settings of run_max.
PR tree-optimization/86990
* gimple-ssa-store-merging.c (imm_store_chain_info:coalesce_immediate):
Check that the entire merged store group is made of constants only for
overlapping stores.
PR c++/87109 - wrong ctor with maybe-rvalue semantics.
* call.c (build_user_type_conversion_1): Refine the maybe-rvalue
check to only return if we're converting the return value to a base
class.
* g++.dg/cpp0x/ref-qual19.C: Adjust the expected results.
* g++.dg/cpp0x/ref-qual20.C: New test.
Building an ADDR_EXPR uses the canonical type to build the pointer
type, but then, as we dereference it, we lose track of lax alignment
known to apply to the dereferenced object. This might not be a
problem in general, but it is when the compiler implicitly introduces
address taking and dereferencing, as it does for asm statements, and
as it may do in some loop optimizations.
From: Richard Biener <rguenther@suse.de>
for gcc/ChangeLog
PR middle-end/87054
* gimplify.c (gimplify_expr): Retain alignment of
addressable lvalue in dereference.
From: Alexandre Oliva <oliva@adacore.com>
for gcc/testsuite/ChangeLog
[PR87013] check for .loc is_stmt support in the assembler
Back when we had the logic to output is_stmt but never exercised it,
it didn't matter that we didn't test for assembler support for it.
But there are still assemblers out there that do not support it, so
now that we enable the formerly latent is_stmt logic, we'd better make
sure the assembler can deal with it.
for gcc/ChangeLog
PR bootstrap/87013
* configure.ac: Check for .loc is_stmt support.
* configure, config.in: Rebuilt.
* dwarf2out.c (dwarf2out_source_line): Skip is_stmt
if not supported.
jason [Thu, 20 Sep 2018 17:09:19 +0000 (17:09 +0000)]
PR c++/87075 - ICE with constexpr array initialization.
My patch of 2016-08-26 to avoid calling a trivial default constructor
introduced TARGET_EXPRs initialized with void_node to express trivial
initialization. But when this shows up in a VEC_INIT_EXPR, we weren't
prepared to handle it. Fixed by handling it explicitly in
cxx_eval_vec_init_1.
PEELING_FOR_GAPS now means "peel one iteration for the epilogue",
in much the same way that PEELING_FOR_ALIGNMENT > 0 means
"peel that number of iterations for the prologue". We weren't
taking this into account when deciding whether we needed to peel
further scalar iterations beyond the iterations for "gaps" and
"alignment".
Only the first test failed before the patch. The other two
are just for completeness.
2018-09-20 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/87288
* tree-vect-loop.c (vect_analyze_loop_2): Take PEELING_FOR_GAPS
into account when determining PEELING_FOR_NITERS.
Add missing alignment checks in epilogue loop vectorisation (PR 86877)
Epilogue loop vectorisation skips vect_enhance_data_refs_alignment
since it doesn't make sense to version or peel the epilogue loop
(that will already have happened for the main loop). But this means
that it also fails to check whether the accesses are suitably aligned
for the new vector subarch.
We don't seem to carry alignment information from the (potentially
peeled or versioned) main loop to the epilogue loop, which would be
good to fix at some point. I think we want this patch regardless,
since there's no guarantee that the alignment requirements are the
same for every subarch.
2018-09-20 Richard Sandiford <richard.sandiford@arm.com>