asan: Handle poly-int sizes in ASAN_MARK [PR97696]
This patch makes the expansion of IFN_ASAN_MARK let through
poly-int-sized objects. The expansion itself was already generic
enough, but the tests for the fast path were too strict.
gcc/
PR sanitizer/97696
* asan.c (asan_expand_mark_ifn): Allow the length to be a poly_int.
gcc/testsuite/
PR sanitizer/97696
* gcc.target/aarch64/sve/pr97696.c: New test.
The vld1_x4 and vst1_x4 patterns use XI registers for both 64-bit and
128-bit vectors. This has the nice property that each individual
vector is within a separate 16-byte subreg of the XI, which should
reduce the number of memory spills needed. However, it means that the
64-bit vector forms must convert between the native 4x64-bit structure
layout and the padded 4x128-bit XI layout.
The vld4 and vst4 functions did this correctly. But the vld1x4 and
vst1x4 functions used a union between the native and padded layouts,
even though the layouts are different sizes.
This patch makes vld1x4 and vst1x4 use the same approach as vld4
and vst4. It also fixes some uses of variables in the user namespace.
gcc/
* config/aarch64/arm_neon.h (vld1_s8_x4, vld1_s16_x4, vld1_s32_x4):
(vld1_u8_x4, vld1_u16_x4, vld1_u32_x4, vld1_f16_x4, vld1_f32_x4):
(vld1_p8_x4, vld1_p16_x4, vld1_s64_x4, vld1_u64_x4, vld1_p64_x4):
(vld1_f64_x4): Avoid using a union of a 256-bit structure and 512-bit
XImode integer. Instead use the same approach as the vld4 intrinsics.
(vst1_s8_x4, vst1_s16_x4, vst1_s32_x4, vst1_u8_x4, vst1_u16_x4):
(vst1_u32_x4, vst1_f16_x4, vst1_f32_x4, vst1_p8_x4, vst1_p16_x4):
(vst1_s64_x4, vst1_u64_x4, vst1_p64_x4, vst1_f64_x4, vld1_bf16_x4):
(vst1_bf16_x4): Likewise for stores.
(vst1q_s8_x4, vst1q_s16_x4, vst1q_s32_x4, vst1q_u8_x4, vst1q_u16_x4):
(vst1q_u32_x4, vst1q_f16_x4, vst1q_f32_x4, vst1q_p8_x4, vst1q_p16_x4):
(vst1q_s64_x4, vst1q_u64_x4, vst1q_p64_x4, vst1q_f64_x4)
(vst1q_bf16_x4): Rename val parameter to __val.
Darwin, Objective-C: Support -fconstant-cfstrings [PR108743].
This support the -fconstant-cfstrings option as used by clang (and
expect by some build scripts) as an alias to the target-specific
-mconstant-cfstrings.
The documentation is also updated to reflect that the 'f' option is
only available on Darwin, and to add the 'm' option to the Darwin
section of the invocation text.
* config/darwin.opt: Add fconstant-cfstrings alias to
mconstant-cfstrings.
* doc/invoke.texi: Amend invocation descriptions to reflect
that the fconstant-cfstrings is a target-option alias and to
add the missing mconstant-cfstrings option description to the
Darwin section.
Eric Botcazou [Thu, 7 Mar 2024 14:05:54 +0000 (15:05 +0100)]
Fix bogus error on allocator for array type with Dynamic_Predicate
This is a regression present on all active branches: the compiler gives
a bogus error on an allocator for an unconstrained array type declared
with a Dynamic_Predicate because Apply_Predicate_Check is invoked directly
on a subtype reference, which it cannot handle.
This moves the check to the resulting access value (after dereference) like
in Expand_Allocator_Expression.
gcc/ada/
PR ada/113979
* exp_ch4.adb (Expand_N_Allocator): In the subtype indication case,
remove call to Apply_Predicate_Check.
gcc/testsuite/
* gnat.dg/predicate15.adb: New test.
Iain Buclaw [Sun, 4 Feb 2024 21:04:14 +0000 (22:04 +0100)]
d: Fix callee destructor call invalidates the live object [PR113758]
When generating the argument, check the isCalleeDestroyingArgs hook, and
force a TARGET_EXPR to be created if true, so that a reference to the
live object isn't passed directly to the function that runs dtors.
When instead dealing with caller running destructors, two temporaries
were being generated, one explicit temporary generated by the D
front-end, and another implicitly by the code generator. This has been
reduced to one by setting DECL_VALUE_EXPR on the explicit temporary to
bind it to the implicit slot created for the TARGET_EXPR, as that has
the shorter lifetime of the two.
PR d/113758
gcc/d/ChangeLog:
* d-codegen.cc (d_build_call): Force a TARGET_EXPR when callee
destorys its arguments.
* decl.cc (DeclVisitor::visit (VarDeclaration *)): Set
SET_DECL_VALUE_EXPR on the temporary variable to make it a placeholder
for the TARGET_EXPR_SLOT.
and the memory operand size is 1 byte. As the result, the rest of 511
bytes is ignored by GCC. Implement ldtilecfg and sttilecfg intrinsics
with a pointer to XImode to honor the 512-byte memory block.
gcc/ChangeLog:
PR target/114098
* config/i386/amxtileintrin.h (_tile_loadconfig): Use
__builtin_ia32_ldtilecfg.
(_tile_storeconfig): Use __builtin_ia32_sttilecfg.
* config/i386/i386-builtin.def (BDESC): Add
__builtin_ia32_ldtilecfg and __builtin_ia32_sttilecfg.
* config/i386/i386-expand.c (ix86_expand_builtin): Handle
IX86_BUILTIN_LDTILECFG and IX86_BUILTIN_STTILECFG.
* config/i386/i386.md (ldtilecfg): New pattern.
(sttilecfg): Likewise.
gcc/testsuite/ChangeLog:
PR target/114098
* gcc.target/i386/amxtile-4.c: New test.
There are no instructions that do traditional AltiVec addresses (i.e.
with the low four bits of the address masked off) for OOmode and XOmode
objects. The solution is to modify the constraints used in the movoo and
movxo pattern to disallow these types of addresses, which assists LRA in
resolving this issue. Furthermore, the mode size 16 check has been
removed in vsx_quad_dform_memory_operand to allow OOmode and XOmode, and
quad_address_p already handles less than size 16.
Eric Botcazou [Mon, 26 Feb 2024 12:13:34 +0000 (13:13 +0100)]
Finalization of object allocated by anonymous access designating local type
The finalization of objects dynamically allocated through an anonymous
access type is deferred to the enclosing library unit in the current
implementation and a warning is given on each of them.
However this cannot be done if the designated type is local, because this
would generate dangling references to the local finalization routine, so
the finalization needs to be dropped in this case and the warning adjusted.
gcc/ada/
PR ada/113893
* exp_ch7.adb (Build_Anonymous_Master): Do not build the master
for a local designated type.
* exp_util.adb (Build_Allocate_Deallocate_Proc): Force Needs_Fin
to false if no finalization master is attached to an access type
and assert that it is anonymous in this case.
* sem_res.adb (Resolve_Allocator): Mention that the object might
not be finalized at all in the warning given when the type is an
anonymous access-to-controlled type.
Richard Earnshaw [Thu, 22 Feb 2024 16:47:20 +0000 (16:47 +0000)]
arm: fix ICE with vectorized reciprocal division [PR108120]
The expand pattern for reciprocal division was enabled for all math
optimization modes, but the patterns it was generating were not
enabled unless -funsafe-math-optimizations were enabled, this leads to
an ICE when the pattern we generate cannot be recognized.
Fixed by only enabling vector division when doing unsafe math.
gcc:
PR target/108120
* config/arm/neon.md (div<VCVTF:mode>3): Rename from div<mode>3.
Gate with ARM_HAVE_NEON_<MODE>_ARITH.
gcc/testsuite:
PR target/108120
* gcc.target/arm/neon-recip-div-1.c: New file.
The PR shows us ICEing due to an unrecognizable TFmode save emitted by
aarch64_process_components. The problem is that for T{I,F,D}mode we
conservatively require mems to be in range for x-register ldp/stp. That
is because (at least for TImode) it can be allocated to both GPRs and
FPRs, and in the GPR case that is an x-reg ldp/stp, and the FPR case is
a q-register load/store.
As Richard pointed out in the PR, aarch64_get_separate_components
already checks that the offsets are suitable for a single load, so we
just need to choose a mode in aarch64_reg_save_mode that gives the full
q-register range. In this patch, we choose V16QImode as an alternative
16-byte "bag-of-bits" mode that doesn't have the artificial range
restrictions imposed on T{I,F,D}mode.
Unlike for GCC 14 we need additional handling in the load/store pair
code as various cases are not expecting to see V16QImode (particularly
the writeback patterns, but also aarch64_gen_load_pair).
gcc/ChangeLog:
PR target/111677
* config/aarch64/aarch64.c (aarch64_reg_save_mode): Use
V16QImode for the full 16-byte FPR saves in the vector PCS case.
(aarch64_gen_storewb_pair): Handle V16QImode.
(aarch64_gen_loadwb_pair): Likewise.
(aarch64_gen_load_pair): Likewise.
* config/aarch64/aarch64.md (loadwb_pair<TX:mode>_<P:mode>):
Rename to ...
(loadwb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
V16QImode.
(storewb_pair<TX:mode>_<P:mode>): Rename to ...
(storewb_pair<TX_V16QI:mode>_<P:mode>): ... this, extending to
V16QImode.
* config/aarch64/iterators.md (TX_V16QI): New.
gcc/testsuite/ChangeLog:
PR target/111677
* gcc.target/aarch64/torture/pr111677.c: New test.
Jakub Jelinek [Thu, 15 Feb 2024 19:04:01 +0000 (20:04 +0100)]
testsuite: Require lra effective target for pr107385.c
Old reload doesn't support asm goto with output operands.
We have lra effective target (though, strangely it returns
0 just for 2 targets out of at least 16 targets with no LRA support),
so this patch uses it, similarly how it is done in other asm goto
tests with output operands.
Jakub Jelinek [Thu, 15 Feb 2024 14:53:01 +0000 (15:53 +0100)]
expand: Fix handling of asm goto outputs vs. PHI argument adjustments [PR113921]
The Linux kernel and the following testcase distilled from it is
miscompiled, because tree-outof-ssa.cc (eliminate_phi) emits some
fixups on some of the edges (but doesn't commit edge insertions).
Later expand_asm_stmt emits further instructions on the same edge.
Now the problem is that expand_asm_stmt uses insert_insn_on_edge
to add its own fixups, but that function appends to the existing
sequence on the edge if any. And the bug triggers when the
fixup sequence emitted by eliminate_phi uses a pseudo which the
fixup sequence emitted by expand_asm_stmt later on sets.
So, we end up with
(set (reg A) (asm_operands ...))
and on one of the edges queued sequence
(set (reg C) (reg B)) // added by eliminate_phi
(set (reg B) (reg A)) // added by expand_asm_stmt
That is wrong, what we emit by expand_asm_stmt needs to be as close
to the asm_operands as possible (they aren't known until expand_asm_stmt
is called, the PHI fixup code assumes it is reg B which holds the right
value) and the PHI adjustments need to be done after it.
So, the following patch introduces a prepend_insn_to_edge function and
uses it from expand_asm_stmt, so that we queue
(set (reg B) (reg A)) // added by expand_asm_stmt
(set (reg C) (reg B)) // added by eliminate_phi
instead and so the value from the asm_operands output propagates correctly
to the PHI result.
2024-02-15 Jakub Jelinek <jakub@redhat.com>
PR middle-end/113921
* cfgrtl.h (prepend_insn_to_edge): New declaration.
* cfgrtl.c (insert_insn_on_edge): Clarify behavior in function
comment.
(prepend_insn_to_edge): New function.
* cfgexpand.c (expand_asm_stmt): Use prepend_insn_to_edge instead of
insert_insn_on_edge.
The linking of libgcc is already present in %(liborig), so the current
situation duplicates libraries. This was not an issue until macOS's new
linker started giving warnings for such cases.
Harald Anlauf [Sat, 27 Jan 2024 16:41:43 +0000 (17:41 +0100)]
Fortran: fix bounds-checking errors for CLASS array dummies [PR104908]
Commit r11-1235 addressed issues with bounds of unlimited polymorphic array
dummies. However, using the descriptor from sym->backend_decl does break
the case of CLASS array dummies. The obvious solution is to restrict the
fix to the unlimited polymorphic case, thus keeping the original descriptor
in the ordinary case.
gcc/fortran/ChangeLog:
PR fortran/104908
* trans-array.c (gfc_conv_array_ref): Restrict use of transformed
descriptor (sym->backend_decl) to the unlimited polymorphic case.
gcc/testsuite/ChangeLog:
PR fortran/104908
* gfortran.dg/pr104908.f90: New test.
Martin Jambor [Fri, 9 Feb 2024 17:58:43 +0000 (18:58 +0100)]
sra: Disqualify bases of operands of asm gotos
PR 110422 shows that SRA can ICE assuming there is a single edge
outgoing from a block terminated with an asm goto. We need that for
BB-terminating statements so that any adjustments they make to the
aggregates can be copied over to their replacements. Because we can't
have that after ASM gotos, we need to punt.
gcc/ChangeLog:
2024-01-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/110422
* tree-sra.c (scan_function): Disqualify bases of operands of asm
gotos.
gcc/testsuite/ChangeLog:
2024-01-17 Martin Jambor <mjambor@suse.cz>
PR tree-optimization/110422
* gcc.dg/torture/pr110422.c: New test.
Xi Ruoyao [Fri, 2 Feb 2024 19:35:07 +0000 (03:35 +0800)]
MIPS: Fix wrong MSA FP vector negation
We expanded (neg x) to (minus const0 x) for MSA FP vectors, this is
wrong because -0.0 is not 0 - 0.0. This causes some Python tests to
fail when Python is built with MSA enabled.
Use the bnegi.df instructions to simply reverse the sign bit instead.
gcc/ChangeLog:
* config/mips/mips-msa.md (elmsgnbit): New define_mode_attr.
(neg<mode>2): Change the mode iterator from MSA to IMSA because
in FP arithmetic we cannot use (0 - x) for -x.
(neg<mode>2): New define_insn to implement FP vector negation,
using a bnegi instruction to negate the sign bit.
The first alternative stores the floating-point status register
in the destination. It should store zero. We need to copy %fr0
to another floating-point register to initialize it to zero.
2024-02-01 John David Anglin <danglin@gcc.gnu.org>
gcc/ChangeLog:
* config/pa/pa.md (atomic_storedi_1): Fix bug in
alternative 1.
Lewis Hyatt [Tue, 5 Dec 2023 16:33:39 +0000 (11:33 -0500)]
c-family: Fix ICE with large column number after restoring a PCH [PR105608]
Users are allowed to define macros prior to restoring a precompiled header
file, as long as those macros are not defined (or are defined identically)
in the PCH. However, the PCH restoration process destroys all the macro
definitions, so libcpp has to record them before restoring the PCH and then
redefine them afterward.
This process does not currently assign great locations to the macros after
redefining them. Some work is needed to also remember the original locations
and get the line_maps instance in the right state (since, like all other
data structures, the line_maps instance is also reset after restoring a PCH).
This patch addresses a more pressing issue, which is that we ICE in some
cases since GCC 11, hitting an assert in line-maps.cc. It happens if the
first line encountered after the PCH restore requires an LC_RENAME map, such
as will happen if the line is sufficiently long. This is much easier to
fix, since we just need to call linemap_line_start before asking libcpp to
redefine the stored macros, instead of afterward, to avoid the unexpected
need for an LC_RENAME before an LC_ENTER has been seen.
gcc/c-family/ChangeLog:
PR preprocessor/105608
* c-pch.c (c_common_read_pch): Start a new line map before asking
libcpp to restore macros defined prior to reading the PCH, instead
of afterward.
gcc/testsuite/ChangeLog:
PR preprocessor/105608
* g++.dg/pch/line-map-1.C: New test.
* g++.dg/pch/line-map-1.Hs: New test.
* g++.dg/pch/line-map-2.C: New test.
* g++.dg/pch/line-map-2.Hs: New test.
Jason Merrill [Tue, 19 Dec 2023 21:12:02 +0000 (16:12 -0500)]
c++: xvalue array subscript [PR103185]
Normally we handle xvalue array subscripting with ARRAY_REF, but in this
case we weren't doing that because the operands were reversed. Handle that
case better.