Alex Coplan [Wed, 6 Apr 2022 10:16:10 +0000 (11:16 +0100)]
arm: Fix ICEs with compare-and-swap and -march=armv8-m.base [PR99977]
The PR shows two ICEs with __sync_bool_compare_and_swap and
-mcpu=cortex-m23 (equivalently, -march=armv8-m.base): one in LRA and one
later on, after the CAS insn is split.
The LRA ICE occurs because the
@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1 pattern attempts to tie
two output operands together (operands 0 and 1 in the third
alternative). LRA can't handle this, since it doesn't make sense for an
insn to assign to the same operand twice.
The later (post-splitting) ICE occurs because the expansion of the
cbranchsi4_scratch insn doesn't quite go according to plan. As it
stands, arm_split_compare_and_swap calls gen_cbranchsi4_scratch,
attempting to pass a register (neg_bval) to use as a scratch register.
However, since the RTL template has a match_scratch here,
gen_cbranchsi4_scratch ignores this argument and produces a scratch rtx.
Since this is all happening after RA, this is doomed to fail (and we get
an ICE about the insn not matching its constraints).
It seems that the motivation for the choice of constraints in the
atomic_compare_and_swap pattern comes from an attempt to satisfy the
constraints of the cbranchsi4_scratch insn. This insn requires the
scratch register to be the same as the input register in the case that
we use a larger negative immediate (one that satisfies J, but not L).
Of course, as noted above, LRA refuses to assign two output operands to
the same register, so this was never going to work.
The solution I'm proposing here is to collapse the alternatives to the
CAS insn (allowing the two output register operands to be matched to
different registers) and to ensure that the constraints for
cbranchsi4_scratch are met in arm_split_compare_and_swap. We do this by
inserting a move to ensure the source and destination registers match if
necessary (i.e. in the case of large negative immediates).
Another notable change here is that we only do:
emit_move_insn (neg_bval, const1_rtx);
for non-negative immediates. This is because the ADDS instruction used in
the negative case suffices to leave a suitable value in neg_bval: if the
operands compare equal, we don't take the branch (so neg_bval will be
set by the load exclusive). Otherwise, the ADDS will leave a nonzero
value in neg_bval, which will correctly signal that the CAS has failed
when it is later negated.
gcc/ChangeLog:
PR target/99977
* config/arm/arm.c (arm_split_compare_and_swap): Fix up codegen
with negative immediates: ensure we expand cbranchsi4_scratch
correctly and ensure we satisfy its constraints.
* config/arm/sync.md
(@atomic_compare_and_swap<CCSI:arch><NARROW:mode>_1): Don't
attempt to tie two output operands together with constraints;
collapse two alternatives.
(@atomic_compare_and_swap<CCSI:arch><SIDI:mode>_1): Likewise.
* config/arm/thumb1.md (cbranchsi4_neg_late): New.
gcc/testsuite/ChangeLog:
PR target/99977
* gcc.target/arm/pr99977.c: New test.
i.e. it replaces two uses of the parameter default-def with two
uninitialized default-defs of a new variable - all in hope to produce
code with better debug info.
This patch fixes it by avoiding the transform when the SSA_NAME to be
replaced is a default definition.
gcc/ChangeLog:
2020-10-19 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/97456
PR middle-end/105071
* tree-complex.c (set_component_ssa_name): Do not replace ignored decl
default definitions with new component vars. Reorder if conditions.
gcc/testsuite/ChangeLog:
2020-10-19 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/97456
* gcc.dg/tree-ssa/pr97456.c: New test.
Richard Biener [Mon, 14 Sep 2020 09:25:04 +0000 (11:25 +0200)]
tree-optimization/97043 - fix latent wrong-code with SLP vectorization
When the unrolling decision comes late and would have prevented
eliding a SLP load permutation we can end up generating aligned
loads when the load is in fact unaligned. Most of the time
alignment analysis figures out the load is in fact unaligned
but that cannot be relied upon.
The following removes the SLP load permutation eliding based on
the still premature vectorization factor.
2020-09-14 Richard Biener <rguenther@suse.de>
PR tree-optimization/97043
* tree-vect-slp.c (vect_analyze_slp_instance): Do not
elide a load permutation if the current vectorization
factor is one.
Richard Biener [Thu, 27 Aug 2020 09:48:15 +0000 (11:48 +0200)]
tree-optimization/96522 - transfer of flow-sensitive info in copy_ref_info
This removes the bogus tranfer of flow-sensitive info in copy_ref_info
plus fixes one oversight in FRE when flow-sensitive non-NULLness was added to
points-to info.
2020-08-27 Richard Biener <rguenther@suse.de>
PR tree-optimization/96522
* tree-ssa-address.c (copy_ref_info): Reset flow-sensitive
info of the copied points-to. Transfer bigger alignment
via the access type.
* tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_stmt):
Reset all flow-sensitive info.
Richard Biener [Mon, 18 Oct 2021 07:10:43 +0000 (09:10 +0200)]
tree-optimization/102798 - avoid copying PTA info to old SSA names
The vectorizer duplicates pointer-info to created pointer bases
but it has to avoid changing points-to info on existing SSA names
because there's now flow-sensitive info in there (pt->pt_null as
set from VRP).
2021-10-18 Richard Biener <rguenther@suse.de>
PR tree-optimization/102798
* tree-vect-data-refs.c (vect_create_addr_base_for_vector_ref):
Only copy points-to info to newly generated SSA names.
Richard Biener [Thu, 11 Nov 2021 08:40:36 +0000 (09:40 +0100)]
middle-end/103181 - fix operation_could_trap_p for vector division
For integer vector division we only checked for all zero vector
constants rather than checking whether any element in the constant
vector is zero.
It also fixes the adjustment to operation_could_trap_helper_p
where I failed to realize that RDIV_EXPR is also used for
fixed-point types. It also fixes that handling by properly
checking for a fixed_zerop divisor.
2021-11-11 Richard Biener <rguenther@suse.de>
PR middle-end/103181
PR middle-end/103248
* tree-eh.c (operation_could_trap_helper_p): Properly
check vector constants for a zero element for integer
division. Separate floating point and integer division code.
Properly handle fixed-point RDIV_EXPR.
* gcc.dg/torture/pr103181.c: New testcase.
* gcc.dg/pr103248.c: Likewise.
Double reductions which have multiple LC PHIs in the inner loop
are not handled correctly during transformation since those PHIs
are not properly classified as reduction. The following disables
vectorizing them.
2021-11-15 Richard Biener <rguenther@suse.de>
PR tree-optimization/103237
* tree-vect-loop.c (vect_is_simple_reduction): Fail for
double reductions with multiple inner loop LC PHI nodes.
liuhongt [Wed, 9 Feb 2022 05:14:43 +0000 (13:14 +0800)]
ICE: QImode(not SImode) operand should be passed to gen_vec_initv16qiqi in ashlv16qi3.
ix86_expand_vector_init expects vals to be a parallel containing
values of individual fields which should be either element mode of the
vector mode, or a vector mode with the same element mode and smaller
number of elements.
But in the expander ashlv16qi3, the second operand is SImode which
can't be directly passed to gen_vec_initv16qiqi.
gcc/ChangeLog:
PR target/104451
* config/i386/sse.md (<insn><mode>3): lowpart_subreg
operands[2] from SImode to QImode.
gcc/testsuite/ChangeLog:
PR target/104451
* gcc.target/i386/pr104451.c: New test.
Harald Anlauf [Tue, 1 Feb 2022 22:33:24 +0000 (23:33 +0100)]
Fortran: reject simplifying TRANSFER for MOLD with storage size 0
gcc/fortran/ChangeLog:
PR fortran/104311
* check.c (gfc_calculate_transfer_sizes): Checks for case when
storage size of SOURCE is greater than zero while the storage size
of MOLD is zero and MOLD is an array shall not depend on SIZE.
gcc/testsuite/ChangeLog:
PR fortran/104311
* gfortran.dg/transfer_simplify_15.f90: New test.
Harald Anlauf [Thu, 20 Jan 2022 21:36:50 +0000 (22:36 +0100)]
Fortran: fix simplification of TRANSFER for zero-sized character array result
gcc/fortran/ChangeLog:
PR fortran/104127
* simplify.c (gfc_simplify_transfer): Ensure that the result
typespec is set up for TRANSFER with MOLD of type CHARACTER
including character length even if the result is a zero-sized
array.
gcc/testsuite/ChangeLog:
PR fortran/104127
* gfortran.dg/transfer_simplify_11.f90: Fix logic.
* gfortran.dg/transfer_simplify_13.f90: New test.
Harald Anlauf [Tue, 25 Jan 2022 20:56:39 +0000 (21:56 +0100)]
Fortran: MOLD argument to TRANSFER intrinsic having storage size zero
gcc/fortran/ChangeLog:
PR fortran/104227
* check.c (gfc_calculate_transfer_sizes): Fix checking of arrays
passed as MOLD argument to the TRANSFER intrinsic for having
storage size zero.
gcc/testsuite/ChangeLog:
PR fortran/104227
* gfortran.dg/transfer_check_6.f90: New test.
[compiler-rt] Increase kDlsymAllocPoolSize to fix test failures
Increase kDlsymAllocPoolSize on the release branch as discussed on bug
51620, as an alternative to backporting cb0e14ce6dcdd614a7207f4ce6fcf81a164471ab and its dependencies.
The minimum size is 8192, as needed for the following test to pass: