Jakub Jelinek [Fri, 10 Sep 2021 18:41:33 +0000 (20:41 +0200)]
openmp: Implement OpenMP 5.1 atomics, so far for C only
This patch implements OpenMP 5.1 atomics (with clarifications from upcoming 5.2).
The most important changes are that it is now possible to write (for C/C++,
for Fortran it was possible before already) min/max atomics and more importantly
compare and exchange in various forms.
Also, acq_rel is now allowed on read/write and acq_rel/acquire are allowed on
update, and there are new compare, weak and fail clauses.
2021-09-10 Jakub Jelinek <jakub@redhat.com>
gcc/
* tree-core.h (enum omp_memory_order): Add OMP_MEMORY_ORDER_MASK,
OMP_FAIL_MEMORY_ORDER_UNSPECIFIED, OMP_FAIL_MEMORY_ORDER_RELAXED,
OMP_FAIL_MEMORY_ORDER_ACQUIRE, OMP_FAIL_MEMORY_ORDER_RELEASE,
OMP_FAIL_MEMORY_ORDER_ACQ_REL, OMP_FAIL_MEMORY_ORDER_SEQ_CST and
OMP_FAIL_MEMORY_ORDER_MASK enumerators.
(OMP_FAIL_MEMORY_ORDER_SHIFT): Define.
* gimple-pretty-print.c (dump_gimple_omp_atomic_load,
dump_gimple_omp_atomic_store): Print [weak] for weak atomic
load/store.
* gimple.h (enum gf_mask): Change GF_OMP_ATOMIC_MEMORY_ORDER
to 6-bit mask, adjust GF_OMP_ATOMIC_NEED_VALUE value and add
GF_OMP_ATOMIC_WEAK.
(gimple_omp_atomic_weak_p, gimple_omp_atomic_set_weak): New inline
functions.
* tree.h (OMP_ATOMIC_WEAK): Define.
* tree-pretty-print.c (dump_omp_atomic_memory_order): Adjust for
fail memory order being encoded in the same enum and also print
fail clause if present.
(dump_generic_node): Print weak clause if OMP_ATOMIC_WEAK.
* gimplify.c (goa_stabilize_expr): Add target_expr and rhs arguments,
handle pre_p == NULL case as a test mode that only returns value
but doesn't change gimplify nor change anything otherwise, adjust
recursive calls, add MODIFY_EXPR, ADDR_EXPR, COND_EXPR, TARGET_EXPR
and CALL_EXPR handling, adjust COMPOUND_EXPR handling for
__builtin_clear_padding calls, for !rhs gimplify as lvalue rather
than rvalue.
(gimplify_omp_atomic): Adjust goa_stabilize_expr caller. Handle
COND_EXPR rhs. Set weak flag on gimple load/store for
OMP_ATOMIC_WEAK.
* omp-expand.c (omp_memory_order_to_fail_memmodel): New function.
(omp_memory_order_to_memmodel): Adjust for fail clause encoded
in the same enum.
(expand_omp_atomic_cas): New function.
(expand_omp_atomic_pipeline): Use omp_memory_order_to_fail_memmodel
function.
(expand_omp_atomic): Attempt to optimize atomic compare and exchange
using expand_omp_atomic_cas.
gcc/c-family/
* c-common.h (c_finish_omp_atomic): Add r and weak arguments.
* c-omp.c: Include gimple-fold.h.
(c_finish_omp_atomic): Add r and weak arguments. Add support for
OpenMP 5.1 atomics.
gcc/c/
* c-parser.c (c_parser_conditional_expression): If omp_atomic_lhs and
cond.value is >, < or == with omp_atomic_lhs as one of the operands,
don't call build_conditional_expr, instead build a COND_EXPR directly.
(c_parser_binary_expression): Avoid calling parser_build_binary_op
if omp_atomic_lhs even in more cases for >, < or ==.
(c_parser_omp_atomic): Update function comment for OpenMP 5.1 atomics,
parse OpenMP 5.1 atomics and fail, compare and weak clauses, allow
acq_rel on atomic read/write and acq_rel/acquire clauses on update.
* c-typeck.c (build_binary_op): For flag_openmp only handle
MIN_EXPR/MAX_EXPR.
gcc/cp/
* parser.c (cp_parser_omp_atomic): Allow acq_rel on atomic read/write
and acq_rel/acquire clauses on update.
* semantics.c (finish_omp_atomic): Adjust c_finish_omp_atomic caller.
gcc/testsuite/
* c-c++-common/gomp/atomic-17.c (foo): Add tests for atomic read,
write or update with acq_rel clause and atomic update with acquire clause.
* c-c++-common/gomp/atomic-18.c (foo): Adjust expected diagnostics
wording, remove tests moved to atomic-17.c.
* c-c++-common/gomp/atomic-21.c: Expect only 2 omp atomic release and
2 omp atomic acq_rel directives instead of 4 omp atomic release.
* c-c++-common/gomp/atomic-25.c: New test.
* c-c++-common/gomp/atomic-26.c: New test.
* c-c++-common/gomp/atomic-27.c: New test.
* c-c++-common/gomp/atomic-28.c: New test.
* c-c++-common/gomp/atomic-29.c: New test.
* c-c++-common/gomp/atomic-30.c: New test.
* c-c++-common/goacc-gomp/atomic.c: Expect 1 omp atomic release and
1 omp atomic_acq_rel instead of 2 omp atomic release directives.
* gcc.dg/gomp/atomic-5.c: Adjust expected error diagnostic wording.
* g++.dg/gomp/atomic-18.C:Expect 4 omp atomic release and
1 omp atomic_acq_rel instead of 5 omp atomic release directives.
libgomp/
* testsuite/libgomp.c-c++-common/atomic-19.c: New test.
* testsuite/libgomp.c-c++-common/atomic-20.c: New test.
* testsuite/libgomp.c-c++-common/atomic-21.c: New test.
Ian Lance Taylor [Sat, 21 Aug 2021 19:42:19 +0000 (12:42 -0700)]
compiler: correct condition for calling memclrHasPointers
When compiling append(s, make([]typ, ln)...), where typ has a pointer,
and the append fits within the existing capacity of s, the condition
used to clear out the new elements was reversed.
Disable threading through latches until after loop optimizations.
The motivation for this patch was enabling the use of global ranges in
the path solver, but this caused certain properties of loops being
destroyed which made subsequent loop optimizations to fail.
Consequently, this patch's mail goal is to disable jump threading
involving the latch until after loop optimizations have run.
As can be seen in the test adjustments, we mostly shift the threading
from the early threaders (ethread, thread[12] to the late threaders
thread[34]). I have nuked some of the early notes in the testcases
that came as part of the jump threader rewrite. They're mostly noise
now.
Note that we could probably relax some other restrictions in
profitable_path_p when loop optimizations have completed, but it would
require more testing, and I'm hesitant to touch more things than needed
at this point. I have added a reminder to the function to keep this
in mind.
Finally, perhaps as a follow-up, we should apply the same restrictions to
the forward threader. At some point I'd like to combine the cost models.
Tested on x86-64 Linux.
p.s. There is a thorough discussion involving the limitations of jump
threading involving loops here:
* tree-pass.h (PROP_loop_opts_done): New.
* gimple-range-path.cc (path_range_query::internal_range_of_expr):
Intersect with global range.
* tree-ssa-loop.c (tree_ssa_loop_done): Set PROP_loop_opts_done.
* tree-ssa-threadbackward.c
(back_threader_profitability::profitable_path_p): Disable
threading through latches until after loop optimizations have run.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/ssa-dom-thread-2b.c: Adjust for disabling of
threading through latches.
* gcc.dg/tree-ssa/ssa-dom-thread-6.c: Same.
* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Same.
David Faust [Wed, 8 Sep 2021 17:26:15 +0000 (10:26 -0700)]
bpf: add -mcpu and related feature options
New instructions have been added over time to the eBPF ISA, but
previously there has been no good method to select which version to
target in GCC.
This patch adds the following options to the BPF backend:
-mcpu={v1, v2, v3}
Select which version of the eBPF ISA to target. This enables or
disables generation of certain instructions. The default is v3.
-mjmpext
Enable extra conditional branch instructions.
Enabled for CPU v2 and above.
-mjmp32
Enable 32-bit jump/branch instructions.
Enabled for CPU v3 and above.
-malu32
Enable 32-bit ALU instructions.
Enabled for CPU v3 and above.
gcc/ChangeLog:
* config/bpf/bpf-opts.h (bpf_isa_version): New enum.
* config/bpf/bpf-protos.h (bpf_expand_cbranch): New.
* config/bpf/bpf.c (bpf_option_override): Handle -mcpu option.
(bpf_expand_cbranch): New function.
* config/bpf/bpf.md (AM mode iterator): Conditionalize support for SI
mode.
(zero_extendsidi2): Only use mov32 instruction if it is available.
(SIM mode iterator): Conditionalize support for SI mode.
(JM mode iterator): New.
(cbranchdi4): Update name, use new JM iterator. Use bpf_expand_cbranch.
(*branch_on_di): Update name, use new JM iterator.
* config/bpf/bpf.opt: (mjmpext): New option.
(malu32): Likewise.
(mjmp32): Likewise.
(mcpu): Likewise.
(bpf_isa): New enum.
David Faust [Fri, 20 Aug 2021 21:54:42 +0000 (14:54 -0700)]
bpf: correct zero_extend output templates
The output templates for zero_extendhidi2 and zero_extendqidi2 could
lead to incorrect code generation when zero-extending one register into
another. This patch adds a new output template to the define_insns to
handle such cases and produce correct asm.
gcc/ChangeLog:
* config/bpf/bpf.md (zero_extendhidi2): Add new output template
for register-to-register extensions.
(zero_extendqidi2): Likewise.
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
* gcc.target/i386/avx-1.c: Add test for new builtins.
* gcc.target/i386/sse-13.c: Ditto.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
Richard Biener [Thu, 9 Sep 2021 13:08:22 +0000 (15:08 +0200)]
Remove dbx.h, do not set PREFERRED_DEBUGGING_TYPE from dbxcoff.h, lynx.h
The following removes the unused config/dbx.h file and removes the
setting of PREFERRED_DEBUGGING_TYPE from dbxcoff.h which is
overridden by all users (djgpp/mingw/cygwin) via either including
config/i386/djgpp.h or config/i386/cygming.h
There are still circumstances where mingw and cygwin default to
STABS, namely when HAVE_GAS_PE_SECREL32_RELOC is not defined and
the target defaults to 32bit code generation.
The new style handling DBX_DEBUGGING_INFO is in line with
dbxelf.h which does not define PREFERRED_DEBUGGING_TYPE either.
The patch also removes the PREFERRED_DEBUGGING_TYPE define from
lynx.h which always follows elfos.h already defaulting to DWARF,
so the comment about STABS being the default is misleading and
outdated.
2021-09-09 Richard Biener <rguenther@suse.de>
PR target/102255
* config/dbx.h: Remove.
* config/dbxcoff.h: Do not define PREFERRED_DEBUGGING_TYPE.
* config/lynx.h: Likewise.
Add -ftrivial-auto-var-init option and uninitialized variable attribute.
Initialize automatic variables with either a pattern or with zeroes to increase
the security and predictability of a program by preventing uninitialized memory
disclosure and use.
GCC still considers an automatic variable that doesn't have an explicit
initializer as uninitialized, -Wuninitialized will still report warning messages
on such automatic variables.
With this option, GCC will also initialize any padding of automatic variables
that have structure or union types to zeroes.
You can control this behavior for a specific variable by using the variable
attribute "uninitialized" to control runtime overhead.
gcc/ChangeLog:
2021-09-09 qing zhao <qing.zhao@oracle.com>
* builtins.c (expand_builtin_memset): Make external visible.
* builtins.h (expand_builtin_memset): Declare extern.
* common.opt (ftrivial-auto-var-init=): New option.
* doc/extend.texi: Document the uninitialized attribute.
* doc/invoke.texi: Document -ftrivial-auto-var-init.
* flag-types.h (enum auto_init_type): New enumerated type
auto_init_type.
* gimple-fold.c (clear_padding_type): Add one new parameter.
(clear_padding_union): Likewise.
(clear_padding_emit_loop): Likewise.
(clear_type_padding_in_mask): Likewise.
(gimple_fold_builtin_clear_padding): Handle this new parameter.
* gimplify.c (gimple_add_init_for_auto_var): New function.
(gimple_add_padding_init_for_auto_var): New function.
(is_var_need_auto_init): New function.
(gimplify_decl_expr): Add initialization to automatic variables per
users' requests.
(gimplify_call_expr): Add one new parameter for call to
__builtin_clear_padding.
(gimplify_init_constructor): Add padding initialization in the end.
* internal-fn.c (INIT_PATTERN_VALUE): New macro.
(expand_DEFERRED_INIT): New function.
* internal-fn.def (DEFERRED_INIT): New internal function.
* tree-cfg.c (verify_gimple_call): Verify calls to .DEFERRED_INIT.
* tree-sra.c (generate_subtree_deferred_init): New function.
(scan_function): Avoid setting cannot_scalarize_away_bitmap for
calls to .DEFERRED_INIT.
(sra_modify_deferred_init): New function.
(sra_modify_function_body): Handle calls to DEFERRED_INIT specially.
* tree-ssa-structalias.c (find_func_aliases_for_call): Likewise.
* tree-ssa-uninit.c (warn_uninit): Handle calls to DEFERRED_INIT
specially.
(check_defs): Likewise.
(warn_uninitialized_vars): Likewise.
* tree-ssa.c (ssa_undefined_value_p): Likewise.
* tree.c (build_common_builtin_nodes): Build tree node for
BUILT_IN_CLEAR_PADDING when needed.
gcc/c-family/ChangeLog:
2021-09-09 qing zhao <qing.zhao@oracle.com>
* c-attribs.c (handle_uninitialized_attribute): New function.
(c_common_attribute_table): Add "uninitialized" attribute.
gcc/testsuite/ChangeLog:
2021-09-09 qing zhao <qing.zhao@oracle.com>
* c-c++-common/auto-init-1.c: New test.
* c-c++-common/auto-init-10.c: New test.
* c-c++-common/auto-init-11.c: New test.
* c-c++-common/auto-init-12.c: New test.
* c-c++-common/auto-init-13.c: New test.
* c-c++-common/auto-init-14.c: New test.
* c-c++-common/auto-init-15.c: New test.
* c-c++-common/auto-init-16.c: New test.
* c-c++-common/auto-init-2.c: New test.
* c-c++-common/auto-init-3.c: New test.
* c-c++-common/auto-init-4.c: New test.
* c-c++-common/auto-init-5.c: New test.
* c-c++-common/auto-init-6.c: New test.
* c-c++-common/auto-init-7.c: New test.
* c-c++-common/auto-init-8.c: New test.
* c-c++-common/auto-init-9.c: New test.
* c-c++-common/auto-init-esra.c: New test.
* c-c++-common/auto-init-padding-1.c: New test.
* c-c++-common/auto-init-padding-2.c: New test.
* c-c++-common/auto-init-padding-3.c: New test.
* g++.dg/auto-init-uninit-pred-1_a.C: New test.
* g++.dg/auto-init-uninit-pred-2_a.C: New test.
* g++.dg/auto-init-uninit-pred-3_a.C: New test.
* g++.dg/auto-init-uninit-pred-4.C: New test.
* gcc.dg/auto-init-sra-1.c: New test.
* gcc.dg/auto-init-sra-2.c: New test.
* gcc.dg/auto-init-uninit-1.c: New test.
* gcc.dg/auto-init-uninit-12.c: New test.
* gcc.dg/auto-init-uninit-13.c: New test.
* gcc.dg/auto-init-uninit-14.c: New test.
* gcc.dg/auto-init-uninit-15.c: New test.
* gcc.dg/auto-init-uninit-16.c: New test.
* gcc.dg/auto-init-uninit-17.c: New test.
* gcc.dg/auto-init-uninit-18.c: New test.
* gcc.dg/auto-init-uninit-19.c: New test.
* gcc.dg/auto-init-uninit-2.c: New test.
* gcc.dg/auto-init-uninit-20.c: New test.
* gcc.dg/auto-init-uninit-21.c: New test.
* gcc.dg/auto-init-uninit-22.c: New test.
* gcc.dg/auto-init-uninit-23.c: New test.
* gcc.dg/auto-init-uninit-24.c: New test.
* gcc.dg/auto-init-uninit-25.c: New test.
* gcc.dg/auto-init-uninit-26.c: New test.
* gcc.dg/auto-init-uninit-3.c: New test.
* gcc.dg/auto-init-uninit-34.c: New test.
* gcc.dg/auto-init-uninit-36.c: New test.
* gcc.dg/auto-init-uninit-37.c: New test.
* gcc.dg/auto-init-uninit-4.c: New test.
* gcc.dg/auto-init-uninit-5.c: New test.
* gcc.dg/auto-init-uninit-6.c: New test.
* gcc.dg/auto-init-uninit-8.c: New test.
* gcc.dg/auto-init-uninit-9.c: New test.
* gcc.dg/auto-init-uninit-A.c: New test.
* gcc.dg/auto-init-uninit-B.c: New test.
* gcc.dg/auto-init-uninit-C.c: New test.
* gcc.dg/auto-init-uninit-H.c: New test.
* gcc.dg/auto-init-uninit-I.c: New test.
* gcc.target/aarch64/auto-init-1.c: New test.
* gcc.target/aarch64/auto-init-2.c: New test.
* gcc.target/aarch64/auto-init-3.c: New test.
* gcc.target/aarch64/auto-init-4.c: New test.
* gcc.target/aarch64/auto-init-5.c: New test.
* gcc.target/aarch64/auto-init-6.c: New test.
* gcc.target/aarch64/auto-init-7.c: New test.
* gcc.target/aarch64/auto-init-8.c: New test.
* gcc.target/aarch64/auto-init-padding-1.c: New test.
* gcc.target/aarch64/auto-init-padding-10.c: New test.
* gcc.target/aarch64/auto-init-padding-11.c: New test.
* gcc.target/aarch64/auto-init-padding-12.c: New test.
* gcc.target/aarch64/auto-init-padding-2.c: New test.
* gcc.target/aarch64/auto-init-padding-3.c: New test.
* gcc.target/aarch64/auto-init-padding-4.c: New test.
* gcc.target/aarch64/auto-init-padding-5.c: New test.
* gcc.target/aarch64/auto-init-padding-6.c: New test.
* gcc.target/aarch64/auto-init-padding-7.c: New test.
* gcc.target/aarch64/auto-init-padding-8.c: New test.
* gcc.target/aarch64/auto-init-padding-9.c: New test.
* gcc.target/i386/auto-init-1.c: New test.
* gcc.target/i386/auto-init-2.c: New test.
* gcc.target/i386/auto-init-21.c: New test.
* gcc.target/i386/auto-init-22.c: New test.
* gcc.target/i386/auto-init-23.c: New test.
* gcc.target/i386/auto-init-24.c: New test.
* gcc.target/i386/auto-init-3.c: New test.
* gcc.target/i386/auto-init-4.c: New test.
* gcc.target/i386/auto-init-5.c: New test.
* gcc.target/i386/auto-init-6.c: New test.
* gcc.target/i386/auto-init-7.c: New test.
* gcc.target/i386/auto-init-8.c: New test.
* gcc.target/i386/auto-init-padding-1.c: New test.
* gcc.target/i386/auto-init-padding-10.c: New test.
* gcc.target/i386/auto-init-padding-11.c: New test.
* gcc.target/i386/auto-init-padding-12.c: New test.
* gcc.target/i386/auto-init-padding-2.c: New test.
* gcc.target/i386/auto-init-padding-3.c: New test.
* gcc.target/i386/auto-init-padding-4.c: New test.
* gcc.target/i386/auto-init-padding-5.c: New test.
* gcc.target/i386/auto-init-padding-6.c: New test.
* gcc.target/i386/auto-init-padding-7.c: New test.
* gcc.target/i386/auto-init-padding-8.c: New test.
* gcc.target/i386/auto-init-padding-9.c: New test.
Harald Anlauf [Thu, 9 Sep 2021 19:34:01 +0000 (21:34 +0200)]
Fortran - out of bounds in array constructor with implied do loop
gcc/fortran/ChangeLog:
PR fortran/98490
* trans-expr.c (gfc_conv_substring): Do not generate substring
bounds check for implied do loop index variable before it actually
becomes defined.
gcc/testsuite/ChangeLog:
PR fortran/98490
* gfortran.dg/bounds_check_23.f90: New test.
H.J. Lu [Thu, 9 Sep 2021 14:23:16 +0000 (07:23 -0700)]
x86-64: Update AVX512FP16 ABI tests for x32
On x32, long is the same as int and pointer is 32 bits. Update AVX512FP16
ABI tests:
1. Replace long with long long for 64-bit integers.
2. Update type and alignment for long and pointer.
3. Skip tests for long on x32.
* gcc.target/x86_64/abi/avx512fp16/args.h: Replace long with
long long.
(XMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/defines.h (TYPE_SIZE_LONG):
Define to 4 if __ILP32__ is defined.
(TYPE_SIZE_POINTER): Likewise.
(TYPE_ALIGN_LONG): Likewise.
(TYPE_ALIGN_POINTER): Likewise.
* gcc.target/x86_64/abi/avx512fp16/test_3_element_struct_and_unions.c
(main): Skip test for long if __ILP32__ is defined.
* gcc.target/x86_64/abi/avx512fp16/test_m64m128_returning.c
(do_test): Replace _long with _longlong.
* gcc.target/x86_64/abi/avx512fp16/test_struct_returning.c:
(check_300): Replace _ulong with _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/m256h/args.h: Replace long
with long long.
(YMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
* gcc.target/x86_64/abi/avx512fp16/m512h/args.h: Replace long
with long long.
(ZMM_T): Rename _long to _longlong and _ulong to _ulonglong.
(X87_T): Rename _ulong to _ulonglong.
Richard Biener [Thu, 9 Sep 2021 09:50:20 +0000 (11:50 +0200)]
Improve LIM fill_always_executed_in computation
Currently the DOM walk over a loop body does not walk into not
always executed subloops to avoid scalability issues since doing
so makes the walk quadratic in the loop depth. It turns out this
is not an issue in practice and even with a loop depth of 1800
this function is way off the radar.
So the following patch removes the limitation, replacing it with
a comment.
2021-09-09 Richard Biener <rguenther@suse.de>
* tree-ssa-loop-im.c (fill_always_executed_in_1): Walk
into all subloops.
Richard Biener [Thu, 9 Sep 2021 08:52:12 +0000 (10:52 +0200)]
Avoid full DOM walk in LIM fill_always_executed_in
This avoids a full DOM walk via get_loop_body_in_dom_order in the
loop body walk of fill_always_executed_in which is often terminating
the walk of a loop body early by integrating the DOM walk of
get_loop_body_in_dom_order with the actual processing done by
fill_always_executed_in. This trades the fully populated loop
body array with a worklist allocation of the same size and thus
should be a strict improvement over the recursive approach of
get_loop_body_in_dom_order.
2021-09-09 Richard Biener <rguenther@suse.de>
* tree-ssa-loop-im.c (fill_always_executed_in_1): Integrate
DOM walk from get_loop_body_in_dom_order using a worklist
approach.
* gcc.target/i386/avx-1.c: Add -mavx512vl and test for new intrinsics.
* gcc.target/i386/avx-2.c: Add -mavx512vl.
* gcc.target/i386/avx512fp16-11a.c: New test.
* gcc.target/i386/avx512fp16-11b.c: Ditto.
* gcc.target/i386/avx512vlfp16-11a.c: Ditto.
* gcc.target/i386/avx512vlfp16-11b.c: Ditto.
* gcc.target/i386/sse-13.c: Add test for new builtins.
* gcc.target/i386/sse-23.c: Ditto.
* gcc.target/i386/sse-14.c: Add test for new intrinsics.
* gcc.target/i386/sse-22.c: Ditto.
PR target/101059
* config/i386/sse.md (reduc_plus_scal_<mode>): Split to ..
(reduc_plus_scal_v4sf): .. this, New define_expand.
(reduc_plus_scal_v2df): .. and this, New define_expand.
gcc/testsuite/ChangeLog:
PR target/101059
* gcc.target/i386/sse2-pr101059.c: New test.
* gcc.target/i386/sse3-pr101059.c: New test.
David Malcolm [Wed, 8 Sep 2021 18:37:19 +0000 (14:37 -0400)]
analyzer: fix ICE when discarding result of realloc [PR102225]
gcc/analyzer/ChangeLog:
PR analyzer/102225
* analyzer.h (compat_types_p): New decl.
* constraint-manager.cc
(constraint_manager::get_or_add_equiv_class): Guard against NULL
type when checking for pointer types.
* region-model-impl-calls.cc (region_model::impl_call_realloc):
Guard against NULL lhs type/region. Guard against the size value
not being of a compatible type for dynamic extents.
* region-model.cc (compat_types_p): Make non-static.
Richard Biener [Wed, 8 Sep 2021 08:39:27 +0000 (10:39 +0200)]
c++/102228 - make lookup_anon_field O(1)
For the testcase in PR101555 lookup_anon_field takes the majority
of parsing time followed by get_class_binding_direct/fields_linear_search
which is PR83309. The situation with anon aggregates is particularly
dire when we need to build accesses to their members and the anon
aggregates are nested. There for each such access we recursively
build sub-accesses to the anon aggregate FIELD_DECLs bottom-up,
DFS searching for them. That's inefficient since as I believe
there's a 1:1 relationship between anon aggregate types and the
FIELD_DECL used to place them.
The patch below does away with the search in lookup_anon_field and
instead records the single FIELD_DECL in the anon aggregate types
lang-specific data, re-using the RTTI typeinfo_var field. That
speeds up the compile of the testcase with -fsyntax-only from
about 4.5s to slightly less than 1s.
I tried to poke holes into the 1:1 relationship idea with my C++
knowledge but failed (which might not say much). It also leaves
a hole for the case when the C++ FE itself duplicates such type
and places it at a semantically different position. I've tried
to poke holes into it with the duplication mechanism I understand
(templates) but failed.
2021-09-08 Richard Biener <rguenther@suse.de>
PR c++/102228
gcc/cp/
* cp-tree.h (ANON_AGGR_TYPE_FIELD): New define.
* decl.c (fixup_anonymous_aggr): Wipe RTTI info put in
place on invalid code.
* decl2.c (reset_type_linkage): Guard CLASSTYPE_TYPEINFO_VAR
access.
* module.cc (trees_in::read_class_def): Likewise. Reconstruct
ANON_AGGR_TYPE_FIELD.
* semantics.c (finish_member_declaration): Populate
ANON_AGGR_TYPE_FIELD for anon aggregate typed members.
* typeck.c (lookup_anon_field): Remove DFS search and return
ANON_AGGR_TYPE_FIELD directly.
Joseph Myers [Wed, 8 Sep 2021 15:38:18 +0000 (15:38 +0000)]
testsuite: Allow .sdata in more cases in gcc.dg/array-quals-1.c
When testing for Nios II (gcc-testresults shows this for MIPS as
well), failures of gcc.dg/array-quals-1.c appear where a symbol was
found in .sdata rather than one of the expected sections.
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?a$ (found a) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?b$ (found b) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?c$ (found c) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
FAIL: gcc.dg/array-quals-1.c scan-assembler-symbol-section symbol ^_?d$ (found d) has section ^\\.(const|rodata|srodata)|\\[RO\\] (found .sdata)
Jakub's commit 0b34dbc0a24864b1674bff7a92fa3cf0f1cbcea1 allowed .sdata
for many variables in that test where use of .sdata caused a failure
on powerpc-linux. I'm presuming the choice of which variables had
.sdata allowed was based only on the code generated for powerpc-linux,
not on any reason it would be wrong to allow it for the other
variables; thus, this patch adjusts the test to allow .sdata for some
more variables where that is needed on Nios II (and in one case where
it's not needed on Nios II, but the test results on gcc-testresults
suggest that it is needed on MIPS).
Tested with no regressions with cross to nios2-elf.
* gcc.dg/array-quals-1.c: Allow .sdata section in more cases.
Joseph Myers [Wed, 8 Sep 2021 14:57:20 +0000 (14:57 +0000)]
testsuite: Use explicit -ftree-cselim in tests using -fdump-tree-cselim-details
When testing for Nios II (gcc-testresults shows this for various other
targets as well), tests scanning cselim dumps produce an UNRESOLVED
result because those dumps do not exist.
cselim is enabled conditionally by code in toplev.c:
if (flag_tree_cselim == AUTODETECT_VALUE)
{
if (HAVE_conditional_move)
flag_tree_cselim = 1;
else
flag_tree_cselim = 0;
}
Add explicit -ftree-cselim to dg-options in the affected tests (as
already used by some other tests of cselim dumps) so that this dump
exists on all architectures.
Tested with no regressions with cross to nios2-elf, where this causes
the tests in question to PASS instead of being UNRESOLVED.
We cannot use r12 here, it is already in use as the GEP (for sibling
calls).
2021-09-08 Segher Boessenkool <segher@kernel.crashing.org>
PR target/102107
* config/rs6000/rs6000-logue.c (rs6000_emit_epilogue): For ELFv2 use
r11 instead of r12 for restoring CR.
Jakub Jelinek [Wed, 8 Sep 2021 12:06:10 +0000 (14:06 +0200)]
i386: Fix up xorsign for AVX [PR89984]
Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally. Because for AVX the new code
destructively modifies op1. If that is different from dest, say on:
float
foo (float x, float y)
{
return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(unspec:SF [
(reg:SF 20 xmm0 [88])
(reg:SF 21 xmm1 [89])
(mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128])
] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
(nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
(plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
(nil))
but split the xorsign into:
vandps .LC0(%rip), %xmm1, %xmm1
vxorps %xmm0, %xmm1, %xmm0
and then the addition:
vaddss %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.
On Wed, Sep 08, 2021 at 05:23:40PM +0800, Hongtao Liu wrote:
> I'm curious why we need the post_reload splitter @xorsign<mode>3_1
> for scalar mode, can't we just expand them into and/xor operations in
> the expander, just like vector modes did.
Following seems to work for all the testcases I've tried (and in some
generates better code than the post-reload splitter).
2021-09-08 Jakub Jelinek <jakub@redhat.com>
liuhongt <hongtao.liu@intel.com>
PR target/89984
* config/i386/i386.md (@xorsign<mode>3_1): Remove.
* config/i386/i386-expand.c (ix86_expand_xorsign): Expand right away
into AND with mask and XOR, using paradoxical subregs.
(ix86_split_xorsign): Remove.
* config/i386/i386-protos.h (ix86_split_xorsign): Remove.
* gcc.target/i386/avx-pr102224.c: Fix up PR number.
* gcc.dg/pr89984.c: New test.
* gcc.target/i386/avx-pr89984.c: New test.
Di Zhao [Wed, 8 Sep 2021 07:34:27 +0000 (15:34 +0800)]
tree-optimization/102183 - sccvn: fix result compare in vn_nary_op_insert_into
If the first predicate value is different and copied, the comparison will then
be between val->result and the copied one. That can cause inserting extra
vn_pvals.
gcc/ChangeLog:
* tree-ssa-sccvn.c (vn_nary_op_insert_into): fix result compare
Jakub Jelinek [Wed, 8 Sep 2021 09:34:45 +0000 (11:34 +0200)]
libgcc, i386: Export *hf* and *hc* from libgcc_s.so.1
The following patch exports it for Linux from config/i386/*.ver where it
IMNSHO belongs, aarch64 already exports some of those at GCC_11* and other
targets might add them at completely different gcc versions.
2021-09-08 Jakub Jelinek <jakub@redhat.com>
Iain Sandoe <iain@sandoe.co.uk>
* config/i386/libgcc-glibc.ver: Add %inherit GCC_12.0.0 GCC_7.0.0
and export *hf* and *hc* functions at GCC_12.0.0.
Jakub Jelinek [Wed, 8 Sep 2021 09:25:31 +0000 (11:25 +0200)]
i386: Fix up @xorsign<mode>3_1 [PR102224]
As the testcase shows, we miscompile @xorsign<mode>3_1 if both input
operands are in the same register, because the splitter overwrites op1
before with op1 & mask before using op0.
For dest = xorsign op0, op0 we can actually simplify it from
dest = (op0 & mask) ^ op0 to dest = op0 & ~mask (aka abs).
The expander change is an optimization improvement, if we at expansion
time know it is xorsign op0, op0, we can emit abs right away and get better
code through that.
The @xorsign<mode>3_1 is a fix for the case where xorsign wouldn't be known
to have same operands during expansion, but during RTL optimizations they
would appear. For non-AVX we need to use earlyclobber, we require
dest and op1 to be the same but op0 must be different because we overwrite
op1 first. For AVX the constraints ensure that at most 2 of the 3 operands
may be the same register and if both inputs are the same, handles that case.
This case can be easily tested with the xorsign<mode>3 expander change
reverted.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
Thinking about it more this morning, while this patch fixes the problems
revealed in the testcase, the recent PR89984 change was buggy too, but
perhaps that can be fixed incrementally. Because for AVX the new code
destructively modifies op1. If that is different from dest, say on:
float
foo (float x, float y)
{
return x * __builtin_copysignf (1.0f, y) + y;
}
then we get after RA:
(insn 8 7 9 2 (set (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(unspec:SF [
(reg:SF 20 xmm0 [88])
(reg:SF 21 xmm1 [89])
(mem/u/c:V4SF (symbol_ref/u:DI ("*.LC0") [flags 0x2]) [0 S16 A128])
] UNSPEC_XORSIGN)) "hohoho.c":4:12 649 {xorsignsf3_1}
(nil))
(insn 9 8 15 2 (set (reg:SF 20 xmm0 [87])
(plus:SF (reg:SF 20 xmm0 [orig:82 _2 ] [82])
(reg:SF 21 xmm1 [89]))) "hohoho.c":4:44 1021 {*fop_sf_comm}
(nil))
but split the xorsign into:
vandps .LC0(%rip), %xmm1, %xmm1
vxorps %xmm0, %xmm1, %xmm0
and then the addition:
vaddss %xmm1, %xmm0, %xmm0
which means we miscompile it - instead of adding y in the end we add
__builtin_copysignf (0.0f, y).
So, wonder if we don't want instead in addition to the &Yv <- Yv, 0
alternative (enabled for both pre-AVX and AVX as in this patch) the
&Yv <- Yv, Yv where destination must be different from inputs and another
Yv <- Yv, Yv where it can be the same but then need a match_scratch
(with X for the other alternatives and =Yv for the last one).
That way we'd always have a safe register we can store the op1 & mask
value into, either the destination (in the first alternative known to
be equal to op1 which is needed for non-AVX but ok for AVX too), in the
second alternative known to be different from both inputs and in the third
which could be used for those
float bar (float x, float y) { return x * __builtin_copysignf (1.0f, y); }
cases where op1 is naturally xmm1 and dest == op0 naturally xmm0 we'd use
some other register like xmm2.
2021-09-08 Jakub Jelinek <jakub@redhat.com>
PR target/102224
* config/i386/i386.md (xorsign<mode>3): If operands[1] is equal to
operands[2], emit abs<mode>2 instead.
(@xorsign<mode>3_1): Add early-clobbers for output operand, enable
first alternative even for avx, add another alternative with
=&Yv <- 0, Yv, Yvm constraints.
* config/i386/i386-expand.c (ix86_split_xorsign): If op0 is equal
to op1, emit vpandn instead.
* gcc.dg/pr102224.c: New test.
* gcc.target/i386/avx-pr102224.c: New test.
liuhongt [Mon, 2 Aug 2021 02:56:45 +0000 (10:56 +0800)]
Support -fexcess-precision=16 which will enable FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
gcc/ada/ChangeLog:
* gcc-interface/misc.c (gnat_post_options): Issue an error for
-fexcess-precision=16.
gcc/c-family/ChangeLog:
* c-common.c (excess_precision_mode_join): Update below comments.
(c_ts18661_flt_eval_method): Set excess_precision_type to
EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
* c-cppbuiltin.c (cpp_atomic_builtins): Update below comments.
(c_cpp_flt_eval_method_iec_559): Set excess_precision_type to
EXCESS_PRECISION_TYPE_FLOAT16 when -fexcess-precision=16.
Max Filippov [Tue, 7 Sep 2021 22:40:00 +0000 (15:40 -0700)]
gcc: xtensa: fix PR target/102115
2021-09-07 Takayuki 'January June' Suwa <jjsuwa_sys3175@yahoo.co.jp>
gcc/
PR target/102115
* config/xtensa/xtensa.c (xtensa_emit_move_sequence): Add
'CONST_INT_P (src)' to the condition of the block that tries to
eliminate literal when loading integer contant.
David Faust [Tue, 3 Aug 2021 17:33:03 +0000 (10:33 -0700)]
doc: BPF CO-RE documentation
Document the new command line options (-mco-re and -mno-co-re), the new
BPF target builtin (__builtin_preserve_access_index), and the new BPF
target attribute (preserve_access_index) introduced with BPF CO-RE.
gcc/ChangeLog:
* doc/extend.texi (BPF Type Attributes) New node.
Document new preserve_access_index attribute.
Document new preserve_access_index builtin.
* doc/invoke.texi: Document -mco-re and -mno-co-re options.
David Faust [Tue, 3 Aug 2021 17:04:10 +0000 (10:04 -0700)]
btf: expose get_btf_id
Expose the function get_btf_id, so that it may be used by the BPF
backend. This enables the BPF CO-RE machinery in the BPF backend to
lookup BTF type IDs, in order to create CO-RE relocation records.
A prototype is added in ctfc.h
gcc/ChangeLog:
* btfout.c (get_btf_id): Function is no longer static.
* ctfc.h: Expose it here.
David Faust [Tue, 3 Aug 2021 17:01:31 +0000 (10:01 -0700)]
ctfc: add function to lookup CTF ID of a TREE type
Add a new function, ctf_lookup_tree_type, to return the CTF type ID
associated with a type via its is TREE node. The function is exposed via
a prototype in ctfc.h.
gcc/ChangeLog:
* ctfc.c (ctf_lookup_tree_type): New function.
* ctfc.h: Likewise.
David Faust [Tue, 3 Aug 2021 17:00:42 +0000 (10:00 -0700)]
ctfc: externalize ctf_dtd_lookup
Expose the function ctf_dtd_lookup, so that it can be used by the BPF
CO-RE machinery. The function is no longer static, and an extern
prototype is added in ctfc.h.
gcc/ChangeLog:
* ctfc.c (ctf_dtd_lookup): Function is no longer static.
* ctfc.h: Analogous change.
David Faust [Tue, 3 Aug 2021 16:58:48 +0000 (09:58 -0700)]
dwarf: externalize lookup_type_die
Expose the function lookup_type_die in dwarf2out, so that it can be used
by CTF/BTF when adding BPF CO-RE information. The function is now
non-static, and an extern prototype is added in dwarf2out.h.
gcc/ChangeLog:
* dwarf2out.c (lookup_type_die): Function is no longer static.
* dwarf2out.h: Expose it here.
Fix fatal typo in gcc.dg/no_profile_instrument_function-attr-2.c
Dejagnu is unfortunately brittle: a syntax error in a
directive can abort the test-run for the current "tool"
(gcc, g++, gfortran), and if you don't check for this
condition or actually read the stdout log yourself, your
tools may make you believe the test was successful without
regressions. At the very least, always grep for ^ERROR: in
the stdout log!
With r12-3379, the testsuite got such a fatal syntax error,
causing the gcc test-run to abort at (e.g.):
...
FAIL: gcc.dg/memchr.c (test for excess errors)
FAIL: gcc.dg/memcmp-3.c (test for excess errors)
ERROR: (DejaGnu) proc "scan-tree-dump-not\" = foo {\(\)"} optimized" does not exist.
The error code is TCL LOOKUP COMMAND scan-tree-dump-not\"
The info on the error is:
invalid command name "scan-tree-dump-not""
while executing
"::tcl_unknown scan-tree-dump-not\" = foo {\(\)"} optimized"
("uplevel" body line 1)
invoked from within
"uplevel 1 ::tcl_unknown $args"
=== gcc Summary ===
# of expected passes 63740
# of unexpected failures 38
# of unexpected successes 2
# of expected failures 351
# of unresolved testcases 3
# of unsupported tests 662
x/cris-elf/gccobj/gcc/xgcc version 12.0.0 20210907 (experimental)\
[master r12-3391-g849d5f5929fc] (GCC)
testsuite:
* gcc.dg/no_profile_instrument_function-attr-2.c: Fix
typo in last change.
dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE usecase
DWARF generation is split between early and late phases when LTO is in effect.
This poses challenges for CTF/BTF generation especially if late debug info
generation is desirable, as turns out to be the case for BPF CO-RE.
The approach taken here in this patch is:
1. LTO is disabled for BPF CO-RE
The reason to disable LTO for BPF CO-RE is that if LTO is in effect, BPF CO-RE
relocations need to be generated in the LTO link phase _after_ the optimizations
are done. This means we need to devise way to combine early and late BTF. At
this time, in absence of linker support for BTF sections, it makes sense to
steer clear of LTO for BPF CO-RE and bypass the issue.
2. The BPF backend updates the write_symbols with BPF_WITH_CORE_DEBUG to convey
the case that BTF with CO-RE support needs to be generated. This information
is used by the debug info emission routines to defer the emission of BTF/CO-RE
until dwarf2out_finish.
-mco-re in the BPF backend enables code generation for the CO-RE usecase. LTO is
disabled for CO-RE compilations.
gcc/ChangeLog:
* config/bpf/bpf.c (bpf_option_override): For BPF backend, disable LTO
support when compiling for CO-RE.
* config/bpf/bpf.opt: Add new command line option -mco-re.
To best handle BTF/CO-RE in GCC, a distinct BTF_WITH_CORE_DEBUG debug format is
being added. This helps the compiler detect whether BTF with CO-RE relocations
needs to be emitted.
gcc/ChangeLog:
* flag-types.h (enum debug_info_type): Add new enum
DINFO_TYPE_BTF_WITH_CORE.
(BTF_WITH_CORE_DEBUG): New bitmask.
* flags.h (btf_with_core_debuginfo_p): New declaration.
* opts.c (btf_with_core_debuginfo_p): New definition.
Jason Merrill [Tue, 31 Aug 2021 16:54:37 +0000 (12:54 -0400)]
tree: Change error_operand_p to an inline function
I've thought for a while that many of the macros in tree.h and such should
become inline functions. This one in particular was confusing Coverity; the
null check in the macro made it think that all code guarded by
error_operand_p would also need null checks.
gcc/ChangeLog:
* tree.h (error_operand_p): Change to inline function.
Jakub Jelinek [Tue, 7 Sep 2021 17:33:28 +0000 (19:33 +0200)]
c++: Fix up constexpr evaluation of deleting dtors [PR100495]
We do not save bodies of constexpr clones and instead evaluate the bodies
of the constexpr functions they were cloned from.
I believe that is just fine for constructors because complete vs. base
ctors differ only in classes that have virtual bases and such constructors
aren't constexpr, similarly complete/base destructors.
But as the testcase below shows, for deleting destructors it is not fine,
deleting dtors while marked as clones in fact are just artificial functions
with synthetized body which calls the user destructor and deallocation.
So, either we'd need to evaluate the destructor and afterwards synthetize
and evaluate the deallocation, or we can just save and use the deleting
dtors bodies. The latter seems much easier to me.
2021-09-07 Jakub Jelinek <jakub@redhat.com>
PR c++/100495
* constexpr.c (maybe_save_constexpr_fundef): Save body even for
constexpr deleting dtors.
(cxx_eval_call_expression): Don't use DECL_CLONED_FUNCTION for
deleting dtors.
libgfortran: Makefile fix for ISO_Fortran_binding.h
libgfortran/ChangeLog:
* Makefile.am (gfor_built_src): Depend on
include/ISO_Fortran_binding.h not on ISO_Fortran_binding.h.
(ISO_Fortran_binding.h): Rename make target to ...
(include/ISO_Fortran_binding.h): ... this.
* Makefile.in: Regenerate.
Eric Botcazou [Tue, 7 Sep 2021 13:41:49 +0000 (15:41 +0200)]
Fix PR debug/101947
This is the recent LTO bootstrap failure with Ada enabled. The compiler now
generates DW_OP_deref_type for a unit of the Ada front-end, which means that
the offset of base types in the CU must be computed during early DWARF too.
gcc/
PR debug/101947
* dwarf2out.c (mark_base_types): New overloaded function.
(dwarf2out_early_finish): Invoke it on the COMDAT type list as well
as the compilation unit, and call move_marked_base_types afterward.
* c-c++-common/gomp/flush-1.c: Add test case for 'seq_cst'.
* c-c++-common/gomp/flush-2.c: Add test case for 'seq_cst'.
* g++.dg/gomp/attrs-1.C: Adapt test to handle all flush clauses.
* g++.dg/gomp/attrs-2.C: Adapt test to handle all flush clauses.
* gfortran.dg/gomp/flush-1.f90: Add test case for 'seq_cst'.
* gfortran.dg/gomp/flush-2.f90: Add test case for 'seq_cst'.
Martin Liska [Tue, 22 Jun 2021 08:09:01 +0000 (10:09 +0200)]
inline: do not einline when no_profile_instrument_function is different
PR gcov-profile/80223
gcc/ChangeLog:
* ipa-inline.c (can_inline_edge_p): Similarly to sanitizer
options, do not inline when no_profile_instrument_function
attributes are different in early inliner. It's fine to inline
it after PGO instrumentation.
gcc/testsuite/ChangeLog:
* gcc.dg/no_profile_instrument_function-attr-2.c: New test.
Richard Biener [Tue, 7 Sep 2021 08:35:42 +0000 (10:35 +0200)]
tree-optimization/101555 - avoid redundant alias queries in PRE
This avoids doing redundant work during PHI translation to invalidate
mems when translating their corresponding VUSE through the blocks
virtual PHI node. All the invalidation work is already done by
prune_clobbered_mems.
This speeds up the compile of the testcase from 275s with PRE
taking 91% of the compile-time down to 43s with PRE taking 16%
of the compile-time.
2021-09-07 Richard Biener <rguenther@suse.de>
PR tree-optimization/101555
* tree-ssa-pre.c (translate_vuse_through_block): Do not
perform an alias walk to determine the validity of the
mem at the start of the block which is already guaranteed
by means of prune_clobbered_mems.
(phi_translate_1): Pass edge to translate_vuse_through_block.
Fortran: Revert to non-multilib-specific ISO_Fortran_binding.h
Commit fef67987cf502fe322e92ddce22eea7ac46b4d75 changed the
libgfortran build process to generate multilib-specific versions of
ISO_Fortran_binding.h from a template, by running gfortran to identify
the values of the Fortran kind constants C_LONG_DOUBLE, C_FLOAT128,
and C_INT128_T. This caused multiple problems with search paths, both
for build-tree testing and installed-tree use, not all of which have
been fixed.
This patch reverts to a non-multilib-specific .h file that uses GCC's
predefined preprocessor symbols to detect the supported types and map
them to kind values in the same way as the Fortran front end.
Roger Sayle [Mon, 6 Sep 2021 21:48:53 +0000 (22:48 +0100)]
Correct implementation of wi::clz
As diagnosed with Jakub and Richard in the analysis of PR 102134, the
current implementation of wi::clz has incorrect/inconsistent behaviour.
As mentioned by Richard in comment #7, clz should (always) return zero
for negative values, but the current implementation can only return 0
when precision is a multiple of HOST_BITS_PER_WIDE_INT. The fix is
simply to reorder/shuffle the existing tests.
2021-09-06 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* wide-int.cc (wi::clz): Reorder tests to ensure the result
is zero for all negative values.
For the conversion from _Float16 to int, if the corresponding optab
does not exist, the compiler will try the wider mode (SFmode here),
but when floatsfsi exists but FAIL, FROM will be rewritten, which
leads to a PR runtime error.
gcc/ChangeLog:
PR middle-end/102182
* optabs.c (expand_fix): Add from1 to avoid from being
overwritten.
gcc/testsuite/ChangeLog:
PR middle-end/102182
* gcc.target/i386/pr101282.c: New test.
unresolved symbol __atomic_compare_exchange_1
collect2: error: ld returned 1 exit status
mkoffload: fatal error: [...]/gcc/x86_64-pc-linux-gnu-accel-nvptx-none-gcc returned 1 exit status
libgomp/
* testsuite/libgomp.c/target-43.c: '-latomic' for nvptx offloading.
Eric Botcazou [Mon, 6 Sep 2021 09:16:08 +0000 (11:16 +0200)]
Fix debug info for packed array types in Ada
Packed array types are sometimes represented with integer types under the
hood in Ada, but we nevertheless need to emit them as array types in the
debug info so we have the types.get_array_descr_info langhook for this
purpose; but it is not invoked from modified_type_die, which causes:
Jakub Jelinek [Mon, 6 Sep 2021 08:08:16 +0000 (10:08 +0200)]
match.pd: Fix up __builtin_*_overflow arg demotion [PR102207]
My earlier patch to demote arguments of __builtin_*_overflow unfortunately
caused a wrong-code regression. The builtins operate on infinite precision
arguments, outer_prec > inner_prec signed -> signed, unsigned -> unsigned
promotions there are just repeating the sign or 0s and can be demoted,
similarly unsigned -> signed which also is repeating 0s, but as the
testcase shows, signed -> unsigned promotions need to be preserved (unless
we'd know the inner arguments can't be negative), because for negative
numbers such promotion sets the outer_prec -> inner_prec bits to 1 bit the
bits above that to 0 in the infinite precision.
So, the following patch avoids the demotions for the signed -> unsigned
promotions.
2021-09-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/102207
* match.pd: Don't demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW if they
were promoted from signed to wider unsigned type.
Andrew Pinski [Mon, 6 Sep 2021 00:52:18 +0000 (00:52 +0000)]
Fix PR tree-optimization/63184: add simplification of (& + A) != (& + B)
These two testcases have been failing since GCC 5 but things
have improved such that adding a simplification to match.pd
for this case is easier than before.
In the end we have the following IR:
....
_5 = &a[1] + _4;
_7 = &a + _13;
if (_5 != _7)
So we can fold the _5 != _7 into:
(&a[1] - &a) + _4 != _13
The subtraction is folded into constant by ptr_difference_const.
In this case, the full expression gets folded into a constant
and we are able to remove the if statement.
OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
gcc/ChangeLog:
PR tree-optimization/63184
* match.pd: Add simplification of pointer_diff of two pointer_plus
with addr_expr in the first operand of each pointer_plus.
Add simplificatoin of ne/eq of two pointer_plus with addr_expr
in the first operand of each pointer_plus.
gcc/testsuite/ChangeLog:
PR tree-optimization/63184
* c-c++-common/pr19807-2.c: Enable for all targets and remove the xfail.
* c-c++-common/pr19807-3.c: Likewise.
Explicitly add -msse2 to compile HF related libgcc source file.
For 32-bit libgcc configure w/o sse2, there's would be an error since
GCC only support _Float16 under sse2. Explicitly add -msse2 for those
HF related libgcc functions, so users can still link them w/ the
upper configuration.
libgcc/ChangeLog:
* Makefile.in: Adjust to support specific CFLAGS for each
libgcc source file.
* config/i386/64/t-softfp: Explicitly add -msse2 for HF
related libgcc source files.
* config/i386/t-softfp: Ditto.
* config/i386/_divhc3.c: New file.
* config/i386/_mulhc3.c: New file.
This performs local re-computation of participating scalar stmts
in BB vectorization subgraphs to allow precise computation of
liveness of scalar stmts after vectorization and thus precise
costing. This treats all extern defs as live but continues
to optimistically handle scalar defs that we think we can handle
by lane-extraction even though that can still fail late during
code-generation.
2021-09-02 Richard Biener <rguenther@suse.de>
PR tree-optimization/102176
* tree-vect-slp.c (vect_slp_gather_vectorized_scalar_stmts):
New function.
(vect_bb_slp_scalar_cost): Use the computed set of
vectorized scalar stmts instead of relying on the out-of-date
and not accurate PURE_SLP_STMT.
(vect_bb_vectorization_profitable_p): Compute the set
of vectorized scalar stmts.
Make the path solver's range_of_stmt() handle all statements.
The path solver's range_of_stmt() was handcuffed to only fold
GIMPLE_COND statements, since those were the only statements the
backward threader needed to resolve. However, there is no need for this
restriction, as the folding code is perfectly capable of folding any
statement.
This can be the case when trying to fold other statements in the final
block of a path (for instance, in the forward threader as it tries to
fold candidate statements along a path).
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::range_of_stmt): Remove
GIMPLE_COND special casing.
(path_range_query::range_defined_in_block): Use range_of_stmt
instead of calling fold_range directly.
Add an unreachable_path_p method to path_range_query.
Keeping track of unreachable calculations while traversing a path is
useful to determine edge reachability, among other things. We've been
doing this ad-hoc in the backwards threader, so this provides a cleaner
way of accessing the information.
This patch also makes it easier to compare different threading
implementations, in some upcoming work. For example, it's currently
difficult to gague how good we're doing compared to the forward threader,
because it can thread paths that are obviously unreachable. This
provides a way of discarding those paths.
Note that I've opted to keep unreachable_path_p() out-of-line, because I
have local changes that will enhance this method.
Tested on x86-64 Linux.
gcc/ChangeLog:
* gimple-range-path.cc (path_range_query::range_of_expr): Set
m_undefined_path when appropriate.
(path_range_query::internal_range_of_expr): Copy from range_of_expr.
(path_range_query::unreachable_path_p): New.
(path_range_query::precompute_ranges): Set m_undefined_path.
* gimple-range-path.h (path_range_query::unreachable_path_p): New.
(path_range_query::internal_range_of_expr): New.
* tree-ssa-threadbackward.c (back_threader::find_taken_edge_cond):
Use unreachable_path_p.
Clean up registering of paths in backwards threader.
All callers to maybe_register_path() call find_taken_edge() beforehand
and pass the edge as an argument. There's no reason to repeat this
at each call site.
This is a clean-up in preparation for some other enhancements to the
backwards threader.
Tested on x86-64 Linux.
gcc/ChangeLog:
* tree-ssa-threadbackward.c (back_threader::maybe_register_path):
Remove argument and call find_taken_edge.
(back_threader::resolve_phi): Do not calculate taken edge before
calling maybe_register_path.
(back_threader::find_paths_to_names): Same.