This patch allows combine to combine two insns into two. This helps
in many cases, by reducing instruction path length, and also allowing
further combinations to happen. PR85160 is a typical example of code
that it can improve.
This patch does not allow such combinations if either of the original
instructions was a simple move instruction. In those cases combining
the two instructions increases register pressure without improving the
code. With this move test register pressure does no longer increase
noticably as far as I can tell.
(At first I also didn't allow either of the resulting insns to be a
move instruction. But that is actually a very good thing to have, as
should have been obvious).
PR rtl-optimization/85160
* combine.c (is_just_move): New function.
(try_combine): Allow combining two instructions into two if neither of
the original instructions was a move.
lra: consider clobbers when selecting hard_regno to spill
The idea behind the rclass loop in spill_hard_reg_in_range() seems to
be: find a hard_regno, which in general conflicts with reload regno,
but does not do so between `from` and `to`, and then do the live range
splitting based on this information. To check the absence of conflicts,
we make use of insn_bitmap, which does not contain insns which clobber
the hard_regno.
gcc/ChangeLog:
2018-07-30 Ilya Leoshkevich <iii@linux.ibm.com>
PR target/86547
* lra-constraints.c (spill_hard_reg_in_range): When selecting the
hard_regno, make sure no insn between `from` and `to` clobbers it.
Tom de Vries [Mon, 30 Jul 2018 08:17:26 +0000 (08:17 +0000)]
[libgomp, nvptx] Handle per-function max-threads-per-block in default dims
Currently parallel-loop-1.c fails at -O0 on a Quadro M1200, because one of the
kernel launch configurations exceeds the resources available in the device, due
to the default dimensions chosen by the runtime.
This patch fixes that by taking the per-function max_threads_per_block into
account when using the default dimensions.
[nvptx, offloading] Determine default workers at runtime
Currently, if the user doesn't specify the number of workers for an openacc
region, the compiler hardcodes it to a default value.
This patch removes this functionality, such that the libgomp runtime can decide
on a default value.
2018-07-30 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
* config/nvptx/nvptx.c (PTX_GANG_DEFAULT): Rename to ...
(PTX_DEFAULT_RUNTIME_DIM): ... this.
(nvptx_goacc_validate_dims): Set default worker and gang dims to
PTX_DEFAULT_RUNTIME_DIM.
(nvptx_dim_limit): Ignore GOMP_DIM_WORKER.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
From-SVN: r263060
David Malcolm [Sat, 28 Jul 2018 17:03:56 +0000 (17:03 +0000)]
C++: clean up cp_printer
This makes it easier to compare cp_printer with gcc_cxxdiag_char_table
in c-format.c.
No functional change intended.
gcc/cp/ChangeLog:
* error.c (cp_printer): In the leading comment, move "%H" and "%I"
into alphabetical order, and add missing "%G" and "%K". Within
the switch statement, move cases 'G', 'H', 'I' and 'K' so that the
cases are in alphabetical order.
Michael Meissner [Fri, 27 Jul 2018 22:13:36 +0000 (22:13 +0000)]
constraints.md (wG constraint): Delete, no longer used.
2018-07-27 Michael Meissner <meissner@linux.ibm.com>
* config/rs6000/constraints.md (wG constraint): Delete, no longer
used.
* config/rs6000/predicates.md (p9_fusion_reg_operand): Rename
predicate to reflect toc fusion has been deleted.
(toc_fusion_mem_raw): Delete, no longer used.
(toc_fusion_mem_wrapped): Likewise.
* config/rs6000/rs6000-cpus.def (POWERPC_MASKS): Delete toc
fusion mask bit.
* config/rs6000/rs6000-protos.h (fusion_wrap_memory_address):
Delete, no longer used.
* config/rs6000/rs6000.c (struct rs6000_reg_addr): Delete fields
meant to be used for toc fusion.
(rs6000_debug_print_mode): Delete toc fusion debugging.
(rs6000_debug_reg_global): Likewise.
(rs6000_init_hard_regno_mode_ok): Delete setting up fields for toc
fusion and secondary reload support that were never used.
(rs6000_option_override_internal): Delete TOC fusion, that was only
partially defined, and it did not work unless you also used the
-mcmodel= switch.
(rs6000_legitimate_address_p): Delete TOC fusion support.
(rs6000_opt_masks): Likewise.
(fusion_wrap_memory_address): Delete function, no longer used.
(fusion_split_address); Delete TOC fusion support.
* config/rs6000/rs6000.h (TARGET_TOC_FUSION_INT): Delete, no
longer used with toc fusion being deleted.
(TARGET_TOC_FUSION_FP): Likewise.
* config/rs6000/rs6000.md (UNSPEC_FUSION_ADDIS): Delete TOC fusion
UNSPEC.
(toc fusion spliter): Delete TOC fusion support.
(toc_fusionload_<mode>): Likewise.
(toc_fusionload_di): Likewise.
(fusion_gpr_load_<mode>): Delete generator function, this insn no
longer needs to be named. Rename predicate to delete TOC fusion.
(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
(fusion_gpr_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
(fusion_vsx_<P:mode>_<GPR_FUSION:mode>_load): Likewise.
(fusion_vsx_<P:mode>_<GPR_FUSION:mode>_store): Likewise.
(p9 fusion peephole2s): Rename predicate to delete TOC fusion.
Ian Lance Taylor [Fri, 27 Jul 2018 18:43:34 +0000 (18:43 +0000)]
libgo: prune sighandler frames in runtime.sigprof
When writing stack frames to the pprof CPU profile machinery, it is
very important to insure that the frames emitted do not contain any
frames corresponding to artifacts of the profiling process itself
(signal handlers, sigprof, etc). This patch changes runtime.sigprof to
strip out those frames from the raw stack generated by
"runtime.callers".
extend.texi (Basic PowerPC Built-in Functions Available on ISA 2.05): Replace __uint128_t with __uint128 and __int128_t with __int128 in built-in...
gcc/ChangeLog:
2018-07-27 Kelvin Nilsen <kelvin@gcc.gnu.org>
* doc/extend.texi (Basic PowerPC Built-in Functions Available on
ISA 2.05): Replace __uint128_t with __uint128 and __int128_t with
__int128 in built-in function prototypes.
(PowerPC AltiVec Built-in Functions on ISA 2.07): Likewise.
(PowerPC AltiVec Built-in Functions on ISA 3.0): Likewise.
Martin Sebor [Fri, 27 Jul 2018 17:06:44 +0000 (17:06 +0000)]
PR tree-optimization/86696 - ICE in handle_char_store at gcc/tree-ssa-strlen.c:3332
gcc/ChangeLog:
PR tree-optimization/86696
* tree-ssa-strlen.c (get_min_string_length): Handle all integer
types, including enums.
(handle_char_store): Be prepared for the above function to fail.
gcc/testsuite/ChangeLog:
PR tree-optimization/86696
* gcc.dg/pr86696.C: New test.
H.J. Lu [Fri, 27 Jul 2018 14:40:47 +0000 (14:40 +0000)]
i386: Remove _Unwind_Frames_Increment
CET kernel has been changed to place a restore token on shadow stack for
signal handler to enhance security. It is usually transparent to user
programs since kernel will pop the restore token when signal handler
returns. But when an exception is thrown from a signal handler, now
we need to remove _Unwind_Frames_Increment to pop the the restore token
from shadow stack. Otherwise, we get
FAIL: g++.dg/torture/pr85334.C -O0 execution test
FAIL: g++.dg/torture/pr85334.C -O1 execution test
FAIL: g++.dg/torture/pr85334.C -O2 execution test
FAIL: g++.dg/torture/pr85334.C -O3 -g execution test
FAIL: g++.dg/torture/pr85334.C -Os execution test
FAIL: g++.dg/torture/pr85334.C -O2 -flto -fno-use-linker-plugin -flto-partition=none execution test
Martin Sebor [Thu, 26 Jul 2018 16:45:43 +0000 (16:45 +0000)]
PR tree-optimization/86043 - strlen after memcpy partially overwriting a string not optimized
PR tree-optimization/86043 - strlen after memcpy partially overwriting a string not optimized
PR tree-optimization/86042 - missing strlen optimization after second strcpy
gcc/ChangeLog:
PR tree-optimization/86043
PR tree-optimization/86042
* tree-ssa-strlen.c (handle_builtin_memcpy): Handle strict overlaps.
(get_string_cst_length): Rename...
(get_min_string_length): ...to this. Add argument.
(handle_char_store): Extend to handle multi-character stores by
MEM_REF.
* tree.c (initializer_zerop): Use new argument. Handle MEM_REF.
* tree.h (initializer_zerop): Add argument.
gcc/testsuite/ChangeLog:
PR tree-optimization/86043
PR tree-optimization/86042
* gcc/testsuite/gcc.dg/attr-nonstring-2.c: Xfail test cases due to
pr86688.
* gcc.dg/strlenopt-44.c: New test.
Jakub Jelinek [Thu, 26 Jul 2018 16:12:58 +0000 (18:12 +0200)]
re PR middle-end/86660 (libgomp.c++/for-15.C ICEs with nvptx offloading)
PR testsuite/86660
* testsuite/libgomp.c++/for-15.C (results): Include it in
omp declare target region.
(main): Use map (always, tofrom: results) instead of
map (tofrom: results).
Jakub Jelinek [Thu, 26 Jul 2018 16:12:02 +0000 (18:12 +0200)]
re PR middle-end/86660 (libgomp.c++/for-15.C ICEs with nvptx offloading)
PR middle-end/86660
* omp-low.c (scan_sharing_clauses): Don't ignore map clauses for
declare target to variables if they have always,{to,from,tofrom} map
kinds.
H.J. Lu [Thu, 26 Jul 2018 14:48:55 +0000 (14:48 +0000)]
libsanitizer: Mark REAL(swapcontext) with indirect_return attribute on x86
Cherry-pick compiler-rt revision 337603:
When shadow stack from Intel CET is enabled, the first instruction of all
indirect branch targets must be a special instruction, ENDBR.
lib/asan/asan_interceptors.cc has
...
int res = REAL(swapcontext)(oucp, ucp);
...
REAL(swapcontext) is a function pointer to swapcontext in libc. Since
swapcontext may return via indirect branch on x86 when shadow stack is
enabled, as in this case,
int res = REAL(swapcontext)(oucp, ucp);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This function may be
returned via an indirect branch.
Here compiler must insert ENDBR after call, like
call *bar(%rip)
endbr64
I opened an LLVM bug:
https://bugs.llvm.org/show_bug.cgi?id=38207
to add the indirect_return attribute so that it can be used to inform
compiler to insert ENDBR after REAL(swapcontext) call. We mark
REAL(swapcontext) with the indirect_return attribute if it is available.
Jonathan Wakely [Thu, 26 Jul 2018 14:02:11 +0000 (15:02 +0100)]
Add missing checks for _GLIBCXX_USE_C99_STDINT_TR1
The throw_allocator extension depends on <tr1/random> which depends on
_GLIBCXX_USE_C99_STDINT_TR1.
The Transactional Memory support uses fixed-width integer types from
<stdint.h>.
* include/ext/throw_allocator.h [!_GLIBCXX_USE_C99_STDINT_TR1]
(random_condition, throw_value_random, throw_allocator_random)
(std::hash<throw_value_random>): Do not define when <tr1/random> is
not usable.
* src/c++11/cow-stdexcept.cc [!_GLIBCXX_USE_C99_STDINT_TR1]: Do not
define transactional memory support when <stdint.h> is not usable.
Jonathan Wakely [Thu, 26 Jul 2018 14:02:05 +0000 (15:02 +0100)]
Modify some library internals to work without <stdint.h>
std::__detail::__clp2 used uint_fast32_t and uint_fast64_t without
checking _GLIBCXX_USE_C99_STDINT_TR1 which was a potential bug. A
simpler implementation based on the new std::__ceil2 code performs
better and doesn't depend on <stdint.h> types.
std::align and other C++11 functions in <memory> where unnecessarily
missing when _GLIBCXX_USE_C99_STDINT_TR1 was not defined.
* include/bits/hashtable_policy.h (__detail::__clp2): Use faster
implementation that doesn't depend on <stdint.h> types.
* include/std/memory (align) [!_GLIBCXX_USE_C99_STDINT_TR1]: Use
std::size_t when std::uintptr_t is not usable.
[!_GLIBCXX_USE_C99_STDINT_TR1] (pointer_safety, declare_reachable)
(undeclare_reachable, declare_no_pointers, undeclare_no_pointers):
Define independent of _GLIBCXX_USE_C99_STDINT_TR1.
Jonathan Wakely [Thu, 26 Jul 2018 14:02:01 +0000 (15:02 +0100)]
Remove char16_t and char32_t dependency on <stdint.h>
The char16_t and char32_t types are automatically defined by the
compiler and do not depend on support in <stdint.h>. The char_traits
specializations depend on uint_leastNN_t but can be made to work anyway
by using the predefined macros, or as a last resort make_unsigned.
Jonathan Wakely [Thu, 26 Jul 2018 14:01:55 +0000 (15:01 +0100)]
Remove <chrono> dependency on _GLIBCXX_USE_C99_STDINT_TR1
By adding fallback definitions of std::intmax_t and std::uintmax_t it's
possible to define <ratio> without _GLIBCXX_USE_C99_STDINT_TR1. This in
turn allows most of <chrono> to be defined, which removes the dependency
on _GLIBCXX_USE_C99_STDINT_TR1 for all of the C++11 concurrency features.
The compiler defines __INTMAX_TYPE__ and __UINTMAX_TYPE__
unconditionally so it should be safe to rely on them.
Martin Liska [Thu, 26 Jul 2018 12:13:14 +0000 (14:13 +0200)]
Add linker_output as prefix for LTO temps (PR lto/86548).
2018-07-26 Martin Liska <mliska@suse.cz>
PR lto/86548
* lto-wrapper.c: Add linker_output as prefix
for ltrans_output_file.
2018-07-26 Martin Liska <mliska@suse.cz>
PR lto/86548
* libiberty.h (make_temp_file_with_prefix): New function.
2018-07-26 Martin Liska <mliska@suse.cz>
PR lto/86548
* make-temp-file.c (TEMP_FILE): Remove leading 'cc'.
(make_temp_file): Call make_temp_file_with_prefix with
first argument set to NULL.
(make_temp_file_with_prefix): Support also prefix.
[libgomp, nvptx] Add error with recompilation hint for launch failure
Currently, when a kernel is lauched with too many workers, it results in a cuda
launch failure. This is triggered f.i. for parallel-loop-1.c at -O0 on a Quadro
M1200.
This patch detects this situation, and errors out with a hint on how to fix it.
Build and reg-tested on x86_64 with nvptx accelerator.
2018-07-26 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (nvptx_exec): Error if the hardware doesn't have
sufficient resources to launch a kernel, and give a hint on how to fix
it.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
From-SVN: r262997
[libgomp, nvptx] Move device property sampling from nvptx_exec to nvptx_open
Move sampling of device properties from nvptx_exec to nvptx_open, and assume
the sampling always succeeds. This simplifies the default dimension
initialization code in nvptx_open.
2018-07-26 Cesar Philippidis <cesar@codesourcery.com>
Tom de Vries <tdevries@suse.de>
* plugin/plugin-nvptx.c (struct ptx_device): Add warp_size,
max_threads_per_block and max_threads_per_multiprocessor fields.
(nvptx_open_device): Initialize new fields.
(nvptx_exec): Use num_sms, and new fields.
Co-Authored-By: Tom de Vries <tdevries@suse.de>
From-SVN: r262996
The current code in reg_nonzero_bits_for_combine allows using the
reg_stat info when last_set_mode is a different integer mode. This is
completely wrong for non-pseudos. For example, as in the PR, a value
in a DImode hard register is set by eight writes to its constituent
QImode parts. The value written to the DImode is not the same as that
written to the lowest-numbered QImode!
PR rtl-optimization/85805
* combine.c (reg_nonzero_bits_for_combine): Only use the last set
value for hard registers if that was written in the same mode.
Martin Liska [Thu, 26 Jul 2018 08:50:21 +0000 (10:50 +0200)]
gcov: Fix wrong usage of NAN in statistics (PR gcov-profile/86536).
2018-07-26 Martin Liska <mliska@suse.cz>
PR gcov-profile/86536
* gcov.c (format_gcov): Use printf format %.*f directly
and do not handle special values.
2018-07-26 Martin Liska <mliska@suse.cz>
PR gcov-profile/86536
* gcc.misc-tests/gcov-pr86536.c: New test.
Tom de Vries [Thu, 26 Jul 2018 07:52:45 +0000 (07:52 +0000)]
[libgomp, openacc, testsuite] Fix async logic in lib-12.f90
In testcase lib-12.f90, all acc_async_test calls are placed in a location
where they are not guaranteed to succeed, which explains why there's an xfail
for the lower optimization levels.
This patch fixes the problem by moving the acc_async_test calls to the correct
locations.
Reg-tested on x86_64 with nvptx accelerator.
2018-07-26 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-fortran/lib-12.f90: Move acc_async_test calls
to correct locations. Remove xfail.
Tom de Vries [Thu, 26 Jul 2018 07:52:35 +0000 (07:52 +0000)]
[libgomp, openacc, testsuite] Fix async/wait logic in lib-13.f90
The purpose of the lib-13.f90 test-case is to test acc_wait_all_async. The
test indeed calls acc_wait_all_async, but then subsequentlys calls
acc_wait_all, so the acc_wait_all_async functionality is not tested.
Furthermore, all acc_async_test calls are placed in a location where they are
not guaranteed to succeed, which explains why there's an xfail for the lower
optimization levels.
This patch fixes the problems by replacing acc_wait_all with an acc_wait on
the async id used for the acc_wait_all_async call, and moving the
acc_async_test calls to the correct locations.
Reg-tested on x86_64 with nvptx accelerator.
2018-07-26 Tom de Vries <tdevries@suse.de>
* testsuite/libgomp.oacc-fortran/lib-13.f90: Replace acc_wait_all with
acc_wait. Move acc_async_test calls to correct locations. Remove
xfail.
Jonathan Wakely [Wed, 25 Jul 2018 20:23:07 +0000 (21:23 +0100)]
PR libstdc++/86676 Do not assume stack buffer is aligned
PR libstdc++/86676
* testsuite/20_util/monotonic_buffer_resource/release.cc: Allow for
buffer being misaligned and so returned pointer not being at start.
Nicolas Koenig [Wed, 25 Jul 2018 18:48:39 +0000 (18:48 +0000)]
re PR fortran/25829 ([F03] Asynchronous IO support)
2018-07-25 Nicolas Koenig <koenigni@gcc.gnu.org>
Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/25829
* gfortran.texi: Add description of asynchronous I/O.
* trans-decl.c (gfc_finish_var_decl): Treat asynchronous variables
as volatile.
* trans-io.c (gfc_build_io_library_fndecls): Rename st_wait to
st_wait_async and change argument spec from ".X" to ".w".
(gfc_trans_wait): Pass ID argument via reference.
2018-07-25 Nicolas Koenig <koenigni@gcc.gnu.org>
Thomas Koenig <tkoenig@gcc.gnu.org>
2018-07-25 Nicolas Koenig <koenigni@gcc.gnu.org>
Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/25829
* Makefile.am: Add async.c to gfor_io_src.
Add async.h to gfor_io_headers.
* Makefile.in: Regenerated.
* gfortran.map: Add _gfortran_st_wait_async.
* io/async.c: New file.
* io/async.h: New file.
* io/close.c: Include async.h.
(st_close): Call async_wait for an asynchronous unit.
* io/file_pos.c (st_backspace): Likewise.
(st_endfile): Likewise.
(st_rewind): Likewise.
(st_flush): Likewise.
* io/inquire.c: Add handling for asynchronous PENDING
and ID arguments.
* io/io.h (st_parameter_dt): Add async bit.
(st_parameter_wait): Correct.
(gfc_unit): Add au pointer.
(st_wait_async): Add prototype.
(transfer_array_inner): Likewise.
(st_write_done_worker): Likewise.
* io/open.c: Include async.h.
(new_unit): Initialize asynchronous unit.
* io/transfer.c (async_opt): New struct.
(wrap_scalar_transfer): New function.
(transfer_integer): Call wrap_scalar_transfer to do the work.
(transfer_real): Likewise.
(transfer_real_write): Likewise.
(transfer_character): Likewise.
(transfer_character_wide): Likewise.
(transfer_complex): Likewise.
(transfer_array_inner): New function.
(transfer_array): Call transfer_array_inner.
(transfer_derived): Call wrap_scalar_transfer.
(data_transfer_init): Check for asynchronous I/O.
Perform a wait operation on any pending asynchronous I/O
if the data transfer is synchronous. Copy PDT and enqueue
thread for data transfer.
(st_read_done_worker): New function.
(st_read_done): Enqueue transfer or call st_read_done_worker.
(st_write_done_worker): New function.
(st_write_done): Enqueue transfer or call st_read_done_worker.
(st_wait): Document as no-op for compatibility reasons.
(st_wait_async): New function.
* io/unit.c (insert_unit): Use macros LOCK, UNLOCK and TRYLOCK;
add NOTE where necessary.
(get_gfc_unit): Likewise.
(init_units): Likewise.
(close_unit_1): Likewise. Call async_close if asynchronous.
(close_unit): Use macros LOCK and UNLOCK.
(finish_last_advance_record): Likewise.
(newunit_alloc): Likewise.
* io/unix.c (find_file): Likewise.
(flush_all_units_1): Likewise.
(flush_all_units): Likewise.
* libgfortran.h (generate_error_common): Add prototype.
* runtime/error.c: Include io.h and async.h.
(generate_error_common): New function.
2018-07-25 Nicolas Koenig <koenigni@gcc.gnu.org>
Thomas Koenig <tkoenig@gcc.gnu.org>
PR fortran/25829
* testsuite/libgomp.fortran/async_io_1.f90: New test.
* testsuite/libgomp.fortran/async_io_2.f90: New test.
* testsuite/libgomp.fortran/async_io_3.f90: New test.
* testsuite/libgomp.fortran/async_io_4.f90: New test.
* testsuite/libgomp.fortran/async_io_5.f90: New test.
* testsuite/libgomp.fortran/async_io_6.f90: New test.
* testsuite/libgomp.fortran/async_io_7.f90: New test.
Co-Authored-By: Thomas Koenig <tkoenig@gcc.gnu.org>
From-SVN: r262978
* cp-tree.h (enum cp_tree_index): Add
CPTI_{ABI_TAG,ALIGNED,BEGIN,END,GET,TUPLE_{ELEMENT,SIZE}}_IDENTIFIER
and CPTI_{GNU,TYPE,VALUE,FUN,CLOSURE}_IDENTIFIER.
(abi_tag_identifier, aligned_identifier, begin_identifier,
end_identifier, get__identifier, gnu_identifier,
tuple_element_identifier, tuple_size_identifier, type_identifier,
value_identifier, fun_identifier, closure_identifier): Define.
* decl.c (initialize_predefined_identifiers): Initialize the above
identifiers.
(get_tuple_size): Use tuple_size_identifier instead of
get_identifier ("tuple_size") and value_identifier instead of
get_identifier ("value").
(get_tuple_element_type): Use tuple_element_identifier instead of
get_identifier ("tuple_element") and type_identifier instead of
get_identifier ("type").
(get_tuple_decomp_init): Use get__identifier instead of
get_identifier ("get").
* lambda.c (maybe_add_lambda_conv_op): Use fun_identifier instead of
get_identifier ("_FUN").
* parser.c (cp_parser_lambda_declarator_opt): Use closure_identifier
instead of get_identifier ("__closure").
(cp_parser_std_attribute): Use gnu_identifier instead of
get_identifier ("gnu").
(cp_parser_std_attribute_spec): Likewise. Use aligned_identifier
instead of get_identifier ("aligned").
* class.c (check_abi_tags, inherit_targ_abi_tags): Use
abi_tag_identifier instead of get_identifier ("abi_tag").
* config/arc/arc.c (compact_memory_operand_p): Check for uncached
accesses as well.
(arc_is_uncached_mem_p): uncached applies to both the variable and
the pointer.
Richard Biener [Wed, 25 Jul 2018 12:10:13 +0000 (12:10 +0000)]
re PR lto/86654 (ICE in gen_member_die, at dwarf2out.c:24933)
2018-07-25 Richard Biener <rguenther@suse.de>
PR debug/86654
* dwarf2out.c (dwarf2out_decl): Do not handle nested functions
special wrt context_die late.
(gen_subprogram_die): Re-use DIEs in local scope.
Jonathan Wakely [Wed, 25 Jul 2018 10:40:12 +0000 (11:40 +0100)]
Move std::unique_lock definition to a separate header
This will allow std::mutex and std::lock_guard to be used elsewhere in
the library without pulling in the whole of <chrono>.
Previously the whole of <bits/std_mutex.h> was conditional on the
_GLIBCXX_USE_C99_STDINT_TR1 macro, but only the std::unique_lock members
that use <chrono> facilities should depend on that. std::mutex only
needs to depend on _GLIBCXX_HAS_GTHREADS and std::lock_guard can be
defined unconditionally.
Some parts of <bits/std_mutex.h> and <mutex> are based on code in
<ext/concurrence.h> which dates from 2003. However, the std::unique_lock
implementation was added in 2008 by r135007, without using any earlier
code. Therefore the new header file has copyright years 2008-2018.
* include/Makefile.am: Add new <bits/unique_lock.h> header.
* include/Makefile.in: Regenerate.
* include/bits/std_mutex.h [!_GLIBCXX_USE_C99_STDINT_TR1] (mutex)
(lock_guard): Define independent of _GLIBCXX_USE_C99_STDINT_TR1.
(unique_lock): Move definition to ...
* include/bits/unique_lock.h: New header.
[!_GLIBCXX_USE_C99_STDINT_TR1] (unique_lock): Define unconditionally.
[_GLIBCXX_USE_C99_STDINT_TR1] (unique_lock(mutex_type&, time_point))
(unique_lock(mutex_type&, duration), unique_lock::try_lock_until)
(unique_lock::try_lock_for): Define only when <chrono> is usable.
* include/std/condition_variable: Include <bits/unique_lock.h>.
* include/std/mutex: Likewise.
This PR shows a pathological case in which we try SLP vectorisation on
dead code. We record that 0 bits of the result are enough to satisfy
all users (which is true), and that led to precision being 0 in:
static unsigned int
vect_element_precision (unsigned int precision)
{
precision = 1 << ceil_log2 (precision);
return MAX (precision, BITS_PER_UNIT);
}
ceil_log2 (0) returned 64 rather than 0, leading to 1 << 64, which is UB.
2018-07-25 Richard Sandiford <richard.sandiford@arm.com>
gcc/
* hwint.c (ceil_log2): Fix comment. Return 0 for 0.
Martin Sebor [Wed, 25 Jul 2018 02:11:31 +0000 (02:11 +0000)]
PR tree-optimization/86622 - incorrect strlen of array of array plus variable offset
PR tree-optimization/86622 - incorrect strlen of array of array plus variable offset
PR tree-optimization/86532 - Wrong code due to a wrong strlen folding starting with r262522
gcc/ChangeLog:
PR tree-optimization/86622
PR tree-optimization/86532
* builtins.h (string_length): Declare.
* builtins.c (c_strlen): Correct handling of non-constant offsets.
(check_access): Be prepared for non-constant length ranges.
(string_length): Make extern.
* expr.c (string_constant): Only handle the minor non-constant
array index. Use string_constant to compute the length of
a generic string constant.
gcc/testsuite/ChangeLog:
PR tree-optimization/86622
PR tree-optimization/86532
* gcc.c-torture/execute/strlen-2.c: New test.
* gcc.c-torture/execute/strlen-3.c: New test.
* gcc.c-torture/execute/strlen-4.c: New test.
Jonathan Wakely [Tue, 24 Jul 2018 21:09:55 +0000 (22:09 +0100)]
Add initial version of C++17 <memory_resource> header
This is missing the synchronized_pool_resource and
unsynchronized_pool_resource classes but is otherwise complete.
This is a new implementation, not based on the existing code in
<experimental/memory_resource>, but memory_resource and
polymorphic_allocator ended up looking almost the same anyway.
The constant_init kluge in src/c++17/memory_resource.cc is apparently
due to Richard Smith and ensures that the objects are constructed during
constant initialiation phase and not destroyed (because the
constant_init destructor doesn't destroy the union member and the
storage is not reused).
* config/abi/pre/gnu.ver: Export new symbols.
* configure: Regenerate.
* include/Makefile.am: Add new <memory_resource> header.
* include/Makefile.in: Regenerate.
* include/precompiled/stdc++.h: Include <memory_resource> for C++17.
* include/std/memory_resource: New header.
(memory_resource, polymorphic_allocator, new_delete_resource)
(null_memory_resource, set_default_resource, get_default_resource)
(pool_options, monotonic_buffer_resource): Define.
* src/Makefile.am: Add c++17 directory.
* src/Makefile.in: Regenerate.
* src/c++11/Makefile.am: Fix comment.
* src/c++17/Makefile.am: Add makefile for new sub-directory.
* src/c++17/Makefile.in: Generate.
* src/c++17/memory_resource.cc: New.
(newdel_res_t, null_res_t, constant_init, newdel_res, null_res)
(default_res, new_delete_resource, null_memory_resource)
(set_default_resource, get_default_resource): Define.
* testsuite/20_util/memory_resource/1.cc: New test.
* testsuite/20_util/memory_resource/2.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/1.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/allocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/deallocate.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/release.cc: New test.
* testsuite/20_util/monotonic_buffer_resource/upstream_resource.cc:
New test.
* testsuite/20_util/polymorphic_allocator/1.cc: New test.
* testsuite/20_util/polymorphic_allocator/resource.cc: New test.
* testsuite/20_util/polymorphic_allocator/select.cc: New test.
* testsuite/util/testsuite_allocator.h (__gnu_test::memory_resource):
Define concrete memory resource for testing.
(__gnu_test::default_resource_mgr): Define RAII helper for changing
default resource.
Avoid &LOOP_VINFO_MASKS for bb vectorisation (PR 86618)
r262589 introduced another instance of the bug fixed in r258131.
2018-07-23 Richard Sandiford <richard.sandiford@arm.com>
gcc/
PR tree-optimization/86618
* tree-vect-stmts.c (vectorizable_call): Don't take the address
of LOOP_VINFO_MASKS (loop_vinfo) when loop_vinfo is null.
David Malcolm [Tue, 24 Jul 2018 16:06:58 +0000 (16:06 +0000)]
Fix segfault in -fsave-optimization-record (PR tree-optimization/86636)
There are various ways that it's possible for a gimple statement to
have an UNKNOWN_LOCATION, and for that UNKNOWN_LOCATION to be wrapped
in an ad-hoc location to capture inlining information.
For such a location, LOCATION_FILE (loc) is NULL.
Various places in -fsave-optimization-record were checking for
loc != UNKNOWN_LOCATION
and were passing LOCATION_FILE (loc) to code that assumed a non-NULL
filename, thus leading to segfaults for the above cases.
This patch updates the tests to use
LOCATION_LOCUS (loc) != UNKNOWN_LOCATION
instead, to look through ad-hoc location wrappers, fixing the segfaults.
It also adds various assertions to the affected code.
gcc/ChangeLog:
PR tree-optimization/86636
* json.cc (json::object::set): Fix comment. Add assertions.
(json::array::append): Move here from json.h. Add comment and an
assertion.
(json::string::string): Likewise.
* json.h (json::array::append): Move to json.cc.
(json::string::string): Likewise.
* optinfo-emit-json.cc
(optrecord_json_writer::impl_location_to_json): Assert that we
aren't attempting to write out UNKNOWN_LOCATION, or an ad-hoc
wrapper around it. Expand the location once, rather than three
times.
(optrecord_json_writer::inlining_chain_to_json): Fix the check for
UNKNOWN_LOCATION, to use LOCATION_LOCUS to look through ad-hoc
wrappers.
(optrecord_json_writer::optinfo_to_json): Likewise, in four
places. Fix some overlong lines.
gcc/testsuite/ChangeLog:
PR tree-optimization/86636
* gcc.c-torture/compile/pr86636.c: New test.
Jakub Jelinek [Tue, 24 Jul 2018 14:23:18 +0000 (16:23 +0200)]
re PR middle-end/86627 (Signed 128-bit division by 2 no longer expanded to RTL)
PR middle-end/86627
* expmed.c (expand_divmod): Punt if d == HOST_WIDE_INT_MIN
and size > HOST_BITS_PER_WIDE_INT. For size > HOST_BITS_PER_WIDE_INT
and abs_d == d, do the power of two handling if profitable.
Jonathan Wakely [Tue, 24 Jul 2018 13:51:50 +0000 (14:51 +0100)]
Minor refactoring in <bit> header
* include/std/bit (__countl_zero, __countr_zero, __popcount): Use
local variables for number of digits instead of type aliases.
(__log2p1): Remove redundant branch also checked in __countl_zero.
Jonathan Wakely [Tue, 24 Jul 2018 13:03:25 +0000 (14:03 +0100)]
Reorder conditions in uses-allocator construction helper
The erased_type condition is only true for code using the Library
Fundamentals TS, so assume it's less common and only check it after
checking for convertibility.
This does mean for types using erased_type the more expensive
convertibility check is done first, but such types are rare.
Jonathan Wakely [Tue, 24 Jul 2018 13:03:20 +0000 (14:03 +0100)]
Make __resource_adaptor_imp usable with C++17 memory_resource
By making the memory_resource base class a template parameter the
__resource_adaptor_imp can be used to adapt an allocator into a
std::pmr::memory_resource instead of experimental::pmr::memory_resource.
* include/experimental/memory_resource: Adjust comments and
whitespace.
(__resource_adaptor_imp): Add second template parameter for type of
memory resource base class.
(memory_resource): Define default constructor, destructor, copy
constructor and copy assignment operator as defaulted.
Jonathan Wakely [Tue, 24 Jul 2018 13:03:11 +0000 (14:03 +0100)]
PR libstdc++/70966 fix lifetime bug for default resource
PR libstdc++/70966
* include/experimental/memory_resource (__get_default_resource): Use
placement new to create an object with dynamic storage duration.
Jonathan Wakely [Mon, 23 Jul 2018 19:40:28 +0000 (20:40 +0100)]
PR libstdc++/70940 optimize pmr::resource_adaptor for allocators using malloc
pmr::resource_adaptor can avoid allocating an oversized buffer and doing
manual alignment within that buffer when the wrapped allocator is known
to always meet the requested alignment. Specifically, if the allocator
is known to use malloc or new directly, then we can call the allocator
directly for any fundamental alignment.
PR libstdc++/70940
* include/experimental/memory_resource
(__resource_adaptor_common::_AlignMgr::_M_unadjust): Add assertion.
(__resource_adaptor_common::__guaranteed_alignment): New helper to
give maximum alignment an allocator guarantees. Specialize for known
allocators using new and malloc.
(__resource_adaptor_imp::do_allocate): Use __guaranteed_alignment.
(__resource_adaptor_imp::do_deallocate): Likewise.
* testsuite/experimental/memory_resource/new_delete_resource.cc:
Check that new and delete are called with expected sizes.
This changes vsx_init_v4si to be an expander. That way, no special
cases are needed anymore for special arguments: the normal RTL passes
can deal with it.
* config/rs6000/rs6000-p8swap.c (rtx_is_swappable_p): Adjust.
* config/rs6000/rs6000-protos.h (rs6000_split_v4si_init): Delete.
* config/rs6000/rs6000.c (rs6000_expand_vector_init): Always force
the elements into a register.
(rs6000_split_v4si_init_di_reg): Delete.
(rs6000_split_v4si_init): Delete.
* config/rs6000/vsx.md (unspec): Delete UNSPEC_VSX_VEC_INIT.
(vsx_init_v4si): Rewrite as a define_expand.
An rl<wd>imi instruction is often written like "(a << 8) | (b & 255)".
If "b" now is a byte in memory, combine will combine the load with the
masking (with 255 in the example), since that is a single instruction;
and then the rl*imi isn't combined from the remaining pieces.
This patch adds a splitter to make combine handle this case.
* config/rs6000/rs6000.md (splitters for rldimi and rlwimi with the
zero_extend argument from memory): New.
Jakub Jelinek [Mon, 23 Jul 2018 07:48:56 +0000 (09:48 +0200)]
re PR c++/86569 (-Wnonnull-compare affects code generation since r233684)
PR c++/86569
* cp-gimplify.c (cp_fold): Don't fold comparisons into other kind
of expressions other than INTEGER_CST regardless of TREE_NO_WARNING
or warn_nonnull_compare.
Martin Sebor [Sun, 22 Jul 2018 21:09:32 +0000 (21:09 +0000)]
PR bootstrap/86621 - 'alloca' bound is unknown in tree-vect-slp.c:1437:16
gcc/ChangeLog:
* gcc/gimple-ssa-warn-alloca.c (alloca_call_type_by_arg): Avoid
diagnosing calls with unknown arguments unless -Walloca-larger-than
is restricted to less than PTRDIFF_MAX bytes.