Richard Biener [Tue, 21 Jan 2025 08:45:41 +0000 (09:45 +0100)]
tree-optimization/118569 - LC SSA broken after unrolling
The following amends the previous fix to mark all of the loop BBs
as need to be scanned for new LC PHI uses when its nesting parents
changed, noticing one caller of fix_loop_placement was already
doing that. So the following moves this code into fix_loop_placement,
covering both callers now.
PR tree-optimization/118569
* cfgloopmanip.cc (fix_loop_placement): When the loops
nesting parents changed, mark all blocks to be scanned
for LC PHI uses.
(fix_bb_placements): Remove code moved into fix_loop_placement.
This patch introduces support for LUTI2/LUTI4 ACLE for SVE2.
LUTI instructions are used for efficient table lookups with 2-bit
or 4-bit indices. LUTI2 reads indexed 8-bit or 16-bit elements from
the low 128 bits of the table vector using packed 2-bit indices,
while LUTI4 can read from the low 128 or 256 bits of the table
vector or from two table vectors using packed 4-bit indices.
These instructions fill the destination vector by copying elements
indexed by segments of the source vector, selected by the vector
segment index.
The changes include the addition of a new AArch64 option
extension "lut", __ARM_FEATURE_LUT preprocessor macro, definitions
for the new LUTI instruction shapes, and implementations of the
svluti2 and svluti4 builtins.
gcc/ChangeLog:
* config/aarch64/aarch64-c.cc
(aarch64_update_cpp_builtins): Add new flag TARGET_LUT.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(struct luti_base): Shape for lut intrinsics.
(SHAPE): Specializations for lut shapes for luti2 and luti4..
* config/aarch64/aarch64-sve-builtins-shapes.h: Declare lut
intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.cc
(class svluti_lane_impl): Define expand for lut intrinsics.
(FUNCTION): Define expand for lut intrinsics.
* config/aarch64/aarch64-sve-builtins-sve2.def
(REQUIRED_EXTENSIONS): Declare lut intrinsics behind lut flag.
(svluti2_lane): Define intrinsic behind flag.
(svluti4_lane): Define intrinsic behind flag.
* config/aarch64/aarch64-sve-builtins-sve2.h: Declare lut
intrinsics.
* config/aarch64/aarch64-sve-builtins.cc
(TYPES_bh_data): New type for byte and halfword.
(bh_data): Type array for byte and halfword.
(h_data): Type array for halfword.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_luti<LUTI_BITS><mode>): Instruction patterns for
lut intrinsics.
* config/aarch64/iterators.md: Iterators and attributes for lut
intrinsics.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: New test
macro.
* lib/target-supports.exp: Add lut flag to the for loop.
* gcc.target/aarch64/sve/acle/general-c/lut_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_2.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_3.c: New test.
* gcc.target/aarch64/sve/acle/general-c/lut_4.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti2_u8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_bf16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_f16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_s8.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u16_x2.c: New test.
* gcc.target/aarch64/sve2/acle/asm/luti4_u8.c: New test.
Simon Martin [Tue, 21 Jan 2025 12:31:41 +0000 (13:31 +0100)]
c++: Don't ICE in build_class_member_access_expr during error recovery [PR118225]
The invalid case in this PR trips on an assertion in
build_class_member_access_expr that build_base_path would never return
an error_mark_node, which is actually incorrect if the object involves a
tree with an error_mark_node DECL_INITIAL, like here.
This patch changes the assert to not fire if an error has been reported.
PR c++/118225
gcc/cp/ChangeLog:
* typeck.cc (build_class_member_access_expr): Let errors that
that have been reported go through.
Tamar Christina [Tue, 21 Jan 2025 10:29:08 +0000 (10:29 +0000)]
middle-end: use ncopies both when registering and reading masks [PR118273]
When registering masks for SIMD clone we end up using nmasks instead of
nvectors where nmasks seems to compute the number of input masks required for
the call given the current simdlen.
This is however wrong as vect_record_loop_mask wants to know how many masks you
want to create from the given vectype. i.e. which level of rgroups to create.
This ends up mismatching with vect_get_loop_mask which uses nvectors and if the
return type is narrower than the input types there will be a mismatch which
causes us to try to read from the given rgroup. It only happens to work if the
function had an additional argument that's wider or if all elements and return
types are the same size.
This fixes it by using nvectors during registration as well, which has already
taken into account SLP and VF.
gcc/ChangeLog:
PR middle-end/118273
* tree-vect-stmts.cc (vectorizable_simd_clone_call): Use nvectors when
doing mask registrations.
gcc/testsuite/ChangeLog:
PR middle-end/118273
* gcc.target/aarch64/vect-simd-clone-4.c: New test.
Tamar Christina [Tue, 21 Jan 2025 10:27:13 +0000 (10:27 +0000)]
aarch64: Drop ILP32 from default elf multilibs after deprecation
Following the deprecation of ILP32 *-elf builds fail now due to -Werror on the
deprecation warning. This is because on embedded builds ILP32 is part of the
default multilib.
This patch removed it from the default target as the build would fail anyway.
gcc/ChangeLog:
* config.gcc (aarch64-*-elf): Drop ILP32 from default multilibs.
Lulu Cheng [Tue, 7 Jan 2025 04:00:12 +0000 (12:00 +0800)]
LoongArch: Implement target pragma.
The target pragmas defined correspond to the target function attributes.
This implementation is derived from AArch64.
gcc/ChangeLog:
* config/loongarch/loongarch-protos.h
(loongarch_reset_previous_fndecl): Add function declaration.
(loongarch_save_restore_target_globals): Likewise.
(loongarch_register_pragmas): Likewise.
* config/loongarch/loongarch-target-attr.cc
(loongarch_option_valid_attribute_p): Optimize the processing
of attributes.
(loongarch_pragma_target_parse): New functions.
(loongarch_register_pragmas): Likewise.
* config/loongarch/loongarch.cc
(loongarch_reset_previous_fndecl): New functions.
(loongarch_set_current_function): When the old_tree is the same
as the new_tree, the rules for using registers, etc.,
are set according to the option values to ensure that the
pragma can be processed correctly.
* config/loongarch/loongarch.h (REGISTER_TARGET_PRAGMAS):
Define macro.
* doc/extend.texi: Supplemental Documentation.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/arch-func-attr-1.c: Add '#pragma'.
* gcc.target/loongarch/cmodel-func-attr-1.c: Likewise.
* gcc.target/loongarch/lasx-func-attr-1.c: Likewise.
* gcc.target/loongarch/lsx-func-attr-1.c: Likewise.
* gcc.target/loongarch/strict_align-func-attr-1.c: Likewise.
* gcc.target/loongarch/strict_align-func-attr-2.c: Likewise.
* gcc.target/loongarch/vector-func-attr-1.c: Likewise.
* gcc.target/loongarch/arch-pragma-attr-1.c: Likewise.
* gcc.target/loongarch/cmodel-pragma-attr-1.c: New test.
* gcc.target/loongarch/lasx-pragma-attr-1.c: New test.
* gcc.target/loongarch/lasx-pragma-attr-2.c: New test.
* gcc.target/loongarch/lsx-pragma-attr-1.c: New test.
* gcc.target/loongarch/lsx-pragma-attr-2.c: New test.
* gcc.target/loongarch/strict_align-pragma-attr-1.c: New test.
* gcc.target/loongarch/strict_align-pragma-attr-2.c: New test.
* gcc.target/loongarch/vector-pragma-attr-1.c: New test.
* gcc.target/loongarch/pragma-push-pop.c: New test.
* attr-urls.def: Regenerate.
* config.gcc: Add loongarch-target-attr.o to extra_objs.
* config/loongarch/loongarch-protos.h
(loongarch_option_valid_attribute_p): Function declaration.
(loongarch_option_override_internal): Likewise.
* config/loongarch/loongarch.cc
(loongarch_option_override_internal): Delete the modifications
to target_option_default_node and target_option_current_node.
(loongarch_set_current_function): Add annotation information.
(loongarch_option_override): add assignment operations to
target_option_default_node and target_option_current_node.
(TARGET_OPTION_VALID_ATTRIBUTE_P): Define.
* config/loongarch/t-loongarch: Add compilation of target file
loongarch-target-attr.o.
* doc/extend.texi: Add description information of LoongArch
Function Attributes.
* config/loongarch/loongarch-target-attr.cc: New file.
gcc/testsuite/ChangeLog:
* gcc.target/loongarch/arch-func-attr-1.c: New test.
* gcc.target/loongarch/cmodel-func-attr-1.c: New test.
* gcc.target/loongarch/lasx-func-attr-1.c: New test.
* gcc.target/loongarch/lasx-func-attr-2.c: New test.
* gcc.target/loongarch/lsx-func-attr-1.c: New test.
* gcc.target/loongarch/lsx-func-attr-2.c: New test.
* gcc.target/loongarch/strict_align-func-attr-1.c: New test.
* gcc.target/loongarch/strict_align-func-attr-2.c: New test.
* gcc.target/loongarch/vector-func-attr-1.c: New test.
* gcc.target/loongarch/attr-check-error-message.c: New test.
Simon Martin [Tue, 21 Jan 2025 09:11:12 +0000 (10:11 +0100)]
testsuite: Fix test failing with -fimplicit-constexpr [PR118277]
While testing an unrelated C++ patch with "make check-c++-all", I
noticed that r15-6760-g38a13ea4117b96 added a test case that fails with
-fimplicit-constexpr.
The problem is that this test unconditionally expects an error stating
that a non-constexpr function is called, but that function is
auto-magically constexpr'd under -fimplicit-constexpr.
As suggested by Jakub, this patch simply passes -fno-implicit-constexpr
in that test.
Alfie Richards [Thu, 9 Jan 2025 09:45:32 +0000 (09:45 +0000)]
Add warning for non-spec compliant FMV in Aarch64
This patch adds a warning when FMV is used for Aarch64.
The reasoning for this is the ACLE [1] spec for FMV has diverged
significantly from the current implementation and we want to prevent
potential future compatability issues.
There is a patch for an ACLE compliant version of target_version and
target_clone in progress but it won't make gcc-15.
This has been bootstrap and regression tested for Aarch64.
Is this okay for master and packport to gcc-14?
Jakub Jelinek [Tue, 21 Jan 2025 08:15:53 +0000 (09:15 +0100)]
c++: Speed up compilation of large char array initializers when not using #embed
The following patch (again, on top of the #embed patchset
attempts to optimize compilation of large {{{,un}signed ,}char,std::byte}
array initializers when not using #embed in the source.
Unlike the C patch which is done during the parsing of initializers this
is done when lexing tokens into an array, because C++ lexes all tokens
upfront and so by the time we parse the initializers we already have 16
bytes per token allocated (i.e. 32 extra compile time memory bytes per
one byte in the array).
The drawback is again that it can result in worse locations for diagnostics
(-Wnarrowing, -Wconversion) when initializing signed char arrays with values
128..255. Not really sure what to do about this though unlike the C case,
the locations would need to be preserved through reshape_init* and perhaps
till template instantiation.
For #embed, there is just a single location_t (could be range of the
directive), for diagnostics perhaps we could extend it to say byte xyz of
the file embedded here or something like that, but the optimization done by
this patch, either we'd need to bump the minimum limit at which to try it,
or say temporarily allocate a location_t array for each byte and then clear
it when we no longer need it or something.
I've been using the same testcases as for C, with #embed of 100'000'000
bytes:
time ./cc1plus -quiet -O2 -o test4a.s2 test4a.c
real 0m0.972s
user 0m0.578s
sys 0m0.195s
with xxd -i alternative of the same data without this patch it consumed
around 13.2GB of RAM and
time ./cc1plus -quiet -O2 -o test4b.s4 test4b.c
real 3m47.968s
user 3m41.907s
sys 0m5.015s
and the same with this patch it consumed around 3.7GB of RAM and
time ./cc1plus -quiet -O2 -o test4b.s3 test4b.c
real 0m24.772s
user 0m23.118s
sys 0m1.495s
2025-01-21 Jakub Jelinek <jakub@redhat.com>
* parser.cc (cp_lexer_new_main): Attempt to optimize large sequences
of CPP_NUMBER with int type and values 0-255 separated by CPP_COMMA
into CPP_EMBED with RAW_DATA_CST u.value.
Jakub Jelinek [Tue, 21 Jan 2025 08:14:01 +0000 (09:14 +0100)]
c, c++: Return 1 for __has_builtin(__builtin_va_arg) and __has_builtin(__builtin_c23_va_start)
The Linux kernel uses its own copy of stdarg.h.
Now, before GCC 15, our stdarg.h had
#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
#define va_start(v, ...) __builtin_va_start(v, 0)
#else
#define va_start(v,l) __builtin_va_start(v,l)
#endif
va_start definition but GCC 15 has:
#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
#define va_start(...) __builtin_c23_va_start(__VA_ARGS__)
#else
#define va_start(v,l) __builtin_va_start(v,l)
#endif
I wanted to suggest to the kernel people during their porting to C23
that they'd better use C23 compatible va_start macro definition,
but to make it portable, I think they really want something like
#if defined __STDC_VERSION__ && __STDC_VERSION__ > 201710L
#define va_start(v, ...) __builtin_va_start(v, 0)
#ifdef __has_builtin
#if __has_builtin(__builtin_c23_va_start)
#undef va_start
#define va_start(...) __builtin_c23_va_start(__VA_ARGS__)
#endif
#else
#define va_start(v,l) __builtin_va_start(v,l)
#endif
or so (or with >= 202311L), as GCC 13-14 and clang don't support
__builtin_c23_va_start (yet?) and one gets better user experience with
that.
Except it seems __has_builtin(__builtin_c23_va_start) doesn't actually work,
it works for most of the stdarg.h __builtin_va_*, doesn't work for
__builtin_va_arg (neither C nor C++) and didn't work for
__builtin_c23_va_start if it was available.
The following patch wires __has_builtin for those.
2025-01-21 Jakub Jelinek <jakub@redhat.com>
gcc/c/
* c-decl.cc (names_builtin_p): Return 1 for RID_C23_VA_START and
RID_VA_ARG.
gcc/cp/
* cp-objcp-common.cc (names_builtin_p): Return 1 for RID_VA_ARG.
gcc/testsuite/
* c-c++-common/cpp/has-builtin-4.c: New test.
Jakub Jelinek [Tue, 21 Jan 2025 08:12:21 +0000 (09:12 +0100)]
c++: Handle RAW_DATA_CST in add_list_candidates [PR118532]
This is the second bug discovered today with the
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673945.html
hack but then turned into proper testcases where embed-2[23].C FAILed
since introduction of optimized #embed support and the others when
optimizing large C++ initializers using RAW_DATA_CST.
The add_list_candidates problem is the same as with
make_tree_vector_from_ctor, unfortunately it can't call that
function because it can have those additional artificial arguments
that need to be pushed earlier.
When working on the patch, I've also noticed an error where we didn't
know how to dump RAW_DATA_CST, so I've added support for that too.
2025-01-21 Jakub Jelinek <jakub@redhat.com>
PR c++/118532
* call.cc (add_list_candidates): Handle RAW_DATA_CST among init_list
elts.
* error.cc (dump_expr_init_vec): Handle RAW_DATA_CST among v elts.
* g++.dg/cpp/embed-22.C: New test.
* g++.dg/cpp/embed-23.C: New test.
* g++.dg/cpp0x/pr118532.C: New test.
* g++.dg/cpp2a/explicit20.C: New test.
Nathaniel Shead [Sun, 19 Jan 2025 04:26:03 +0000 (15:26 +1100)]
c++/modules: Check linkage of structured binding decls
When looking at PR c++/118513 I noticed that we don't currently check
the linkage of structured binding declarations in modules. This patch
adds those checks, and corrects decl_linkage to properly recognise
structured binding declarations as potentially having linkage.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_decomposition_declaration): Check linkage
of structured bindings in modules.
* tree.cc (decl_linkage): Structured bindings don't necessarily
have no linkage.
Nathaniel Shead [Mon, 20 Jan 2025 11:09:22 +0000 (22:09 +1100)]
c++/modules: Handle mismatching TYPE_CANONICAL when deduping partial specs [PR118101]
In r15-4862 we ensured that merging a partial specialisation would
properly update its TYPE_CANONICAL. However, this confuses the deduping
mechanism, since the canonical type has updated out from under it,
causing is_matching_decl to crash when seeing the equivalent types with
different TYPE_CANONICAL.
This patch solves the issue by forcing structural equality checking for
this case; this way mismatching TYPE_CANONICAL doesn't cause issues, but
we still can handle the case that the types are legitimately different.
PR c++/118101
gcc/cp/ChangeLog:
* module.cc (trees_in::decl_value): Use structural equality when
deduping partial specs with mismatching canonical types.
gcc/testsuite/ChangeLog:
* g++.dg/modules/partial-7.h: New test.
* g++.dg/modules/partial-7_a.C: New test.
* g++.dg/modules/partial-7_b.C: New test.
* g++.dg/modules/partial-7_c.C: New test.
[PR118560][LRA]: Fix typo in checking secondary memory mode for the reg class
The patch for PR118067 wrongly checked hard reg set subset. It worked for
the equal sets as in PR118067. But it was wrong in other cases as in
PR118560 (inordinate compile time).
gcc/ChangeLog:
PR target/118560
* lra-constraints.cc (invalid_mode_reg_p): Exchange args in
hard_reg_set_subset_p call.
Jeff Law [Mon, 20 Jan 2025 22:05:34 +0000 (15:05 -0700)]
[PR target/116256] Adjust expected output in a couple testcases
I've had a long standing TODO to review the RISC-V testsuite regressions from
enabling the late-combine pass (pr116256). I adjusted a few cases months ago,
this adjusts a couple more were it looks like the right thing to do.
All that's left after this are the vls/dup-? tests which regress in meaningful
ways and I'm still investigating reasonable approaches to fix them (they play
into the whole mvconst_internal pattern situation), late-combine isn't doing
anything wrong.
Jeff Law [Mon, 20 Jan 2025 21:50:57 +0000 (14:50 -0700)]
[PR target/114442] Add reservations for all insn types to xiangshan-nanhu model
The RISC-V backend has checks to verify that every used insn has an associated
type and that every insn type maps to some reservation in the DFA model. If
either test fails we ICE.
With the cpu/isa allowed to vary independently from the tune/scheduler model,
it's entirely possible (in fact trivial) to trigger those kinds of ICEs.
This patch "fixes" the ICEs for xiangshan-nanhu by throwing every unknown insn
type into a special bucket I wouldn't be surprised if a few of them are
implemented (like rotates as the chip seems to have other bitmanip extensions).
But I know nothing about this design and the DFA author hasn't responded to
requests to update the DFA in ~6 months.
This should dramatically reduce the number of ICEs in the testsuite if someone
were to turn on xiangshan-nanhu scheduling.
Not strictly a regression, but a bugfix and highly isolated to the
xiangshan-nanhu tuning in the RISC-V backend. So I'm gating this into gcc-15,
assuming pre-commit doesn't balk.
PR target/114442
gcc/
* config/riscv/xiangshan.md: Add missing insn types to a
new dummy insn reservation.
Jeff Law [Mon, 20 Jan 2025 21:35:59 +0000 (14:35 -0700)]
[PR target/116256] Fix latent regression in pattern to associate arithmetic to simplify constants
This is something I spotted working on an outstanding issue with pr116256.
It's a latent regression. I'm reasonably sure that with some effort I could
find a testcase that would represent a regression, probably just by adjusting
the constant index in "f" within in gcc.c-torture/execute/index-1.c
We have to define_insn_and_split patterns that potentially reassociate shifts
and adds to make a constant term cheaper to synthesize. The split code for
these two patterns assumes that if the left shifted constant is cheaper than
the right shifted constant then the left shifted constant can be loaded with a
trivial set. That is not always the case -- and we'll ICE.
This patch simplifies the matching condition so that it only matches when the
constant can be loaded with a single instruction.
Tested in my tester on rv32 and rv64. Will wait for precommit CI to render a
verdict.
Jeff
PR target/116256
gcc/
* config/riscv/riscv.md (reassocating constant addition): Adjust
condition to avoid creating an unrecognizable insn.
Denis Chertykov [Mon, 20 Jan 2025 20:27:04 +0000 (00:27 +0400)]
[PR117868][LRA]: Restrict the reuse of spill slots
This is an LRA bug derived from reuse spilling slots after frame pointer spilling.
The slot was created for QImode (1 byte) and it was reused after spilling of the
frame pointer for TImode register (16 bytes long) and it overlaps other slots.
Wrong things happened while `lra_spill ()'
---------------------------- part of lra-spills.cc ----------------------------
n = assign_spill_hard_regs (pseudo_regnos, n);
slots_num = 0;
assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n); <--- first call ---
for (i = 0; i < n; i++)
if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
assign_mem_slot (pseudo_regnos[i]);
if ((n2 = lra_update_fp2sp_elimination (pseudo_regnos)) > 0)
{
/* Assign stack slots to spilled pseudos assigned to fp. */
assign_stack_slot_num_and_sort_pseudos (pseudo_regnos, n2); <--- second call ---
for (i = 0; i < n2; i++)
if (pseudo_slots[pseudo_regnos[i]].mem == NULL_RTX)
assign_mem_slot (pseudo_regnos[i]);
}
------------------------------------------------------------------------------
In a first call of `assign_stack_slot_num_and_sort_pseudos(...)' LRA allocates slot #17
for r93 (QImode - 1 byte).
In a second call of `assign_stack_slot_num_and_sort_pseudos(...)' LRA reuse slot #17 for
r114 (TImode - 16 bytes).
It's wrong. We can't reuse 1 byte slot #17 for 16 bytes register.
The code in patch does reuse slots only without allocated memory or only with equal or
smaller registers with equal or smaller alignment.
Also, a small fix for debugging output of slot width.
Print slot size as width, not a 0 as a size of (mem/c:BLK (...)).
PR rtl-optimization/117868
gcc/
* lra-spills.cc (assign_stack_slot_num_and_sort_pseudos): Reuse slots
only without allocated memory or only with equal or smaller registers
with equal or smaller alignment.
(lra_spill): Print slot size as width.
So if the scalar loop has a {0, +, 1} iv i, idx = i % vf.
Despite this wraparound, the vectoriser pretends that the D.anon
accesses are linear. It records the .OMP_SIMD_LANE's second argument
(val) in the data_reference aux field (-1 - val) and then copies this
to the stmt_vec_info simd_lane_access_p field (val + 1).
vectorizable_load and vectorizable_store use simd_lane_access_p
to detect accesses of this form and suppress the vector pointer
increments that would be used for genuine linear accesses.
The difference in this PR is that the reduction is conditional,
and so the store back to D.anon is recognised as a conditional
store pattern. simd_lane_access_p was not being copied across
from the original stmt_vec_info to the pattern stmt_vec_info,
meaning that it was vectorised as a normal linear store.
So if the scalar loop has a {0, +, 1} iv i, idx = i % vf.
Despite this wraparound, the vectoriser pretends that the D.anon
accesses are linear. It records the .OMP_SIMD_LANE's second argument
(val) in the data_reference aux field (-1 - val) and then copies this
to the stmt_vec_info simd_lane_access_p field (val + 1).
vectorizable_load and vectorizable_store use simd_lane_access_p
to detect accesses of this form and suppress the vector pointer
increments that would be used for genuine linear accesses.
The difference in this PR is that the reduction is conditional,
and so the store back to D.anon is recognised as a conditional
store pattern. simd_lane_access_p was not being copied across
from the original stmt_vec_info to the pattern stmt_vec_info,
meaning that it was vectorised as a normal linear store.
aarch64: Fix invalid subregs in xorsign [PR118501]
In the testcase, we try to use xorsign on:
(subreg:DF (reg:TI R) 8)
i.e. the highpart of the TI. xorsign wants to take a V2DF
paradoxical subreg of this, which is rightly rejected as a direct
operation. In cases like this, we need to force the highpart into
a fresh register first.
gcc/
PR target/118501
* config/aarch64/aarch64.md (@xorsign<mode>3): Use
force_lowpart_subreg.
gcc/testsuite/
PR target/118501
* gcc.c-torture/compile/pr118501.c: New test.
Iain Buclaw [Mon, 20 Jan 2025 19:01:03 +0000 (20:01 +0100)]
d: Fix failing test with 32-bit compiler [PR114434]
Since the introduction of gdc.test/runnable/test23514.d, it's exposed an
incorrect compilation when adding a 64-bit constant to a link-time
address. The current cast to size_t causes a loss of precision, which
can result in incorrect compilation.
PR d/114434
gcc/d/ChangeLog:
* expr.cc (ExprVisitor::visit (PtrExp *)): Get the offset as a
dinteger_t rather than a size_t.
(ExprVisitor::visit (SymOffExp *)): Likewise.
Harald Anlauf [Sun, 19 Jan 2025 20:06:56 +0000 (21:06 +0100)]
Fortran: do not copy back for parameter actual arguments [PR81978]
When an array is packed for passing as an actual argument, and the array
has the PARAMETER attribute (i.e., it is a named constant that can reside
in read-only memory), do not copy back (unpack) from the temporary.
PR fortran/81978
gcc/fortran/ChangeLog:
* trans-array.cc (gfc_conv_array_parameter): Do not copy back data
if actual array parameter has the PARAMETER attribute.
* trans-expr.cc (gfc_conv_subref_array_arg): Likewise.
Jakub Jelinek [Mon, 20 Jan 2025 17:00:43 +0000 (18:00 +0100)]
c++: Handle RAW_DATA_CST in make_tree_vector_from_ctor [PR118528]
This is the first bug discovered today with the
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673945.html
hack but then turned into proper testcases where embed-21.C FAILed
since introduction of optimized #embed support and the other when
optimizing large C++ initializers using RAW_DATA_CST.
The problem is that the C++ FE calls make_tree_vector_from_ctor
and uses that as arguments vector for deduction guide handling.
The call.cc code isn't prepared to handle RAW_DATA_CST just about
everywhere, so I think it is safer to make sure RAW_DATA_CST only
appears in CONSTRUCTOR_ELTS and nowhere else.
Thus, the following patch expands the RAW_DATA_CSTs from initializers
into multiple INTEGER_CSTs in the returned vector.
2025-01-20 Jakub Jelinek <jakub@redhat.com>
PR c++/118528
* c-common.cc (make_tree_vector_from_ctor): Expand RAW_DATA_CST
elements from the CONSTRUCTOR to individual INTEGER_CSTs.
* g++.dg/cpp/embed-21.C: New test.
* g++.dg/cpp2a/class-deduction-aggr16.C: New test.
Andrew Pinski [Mon, 20 Jan 2025 00:07:10 +0000 (16:07 -0800)]
inline: Purge the abnormal edges as needed in fold_marked_statements [PR118077]
While fixing PR target/117665, I had noticed that fold_marked_statements
would not purge the abnormal edges which could not be taken any more due
to folding a call (devirtualization or simplification of a [target] builtin).
Devirutalization could also cause a call that used to be able to have an
abornal edge become one not needing one too so this was needed for GCC 15.
As reported in PR118185, std::ranges::clamp does not correctly forward
the projected value to the comparator. Add the missing forward.
libstdc++-v3/ChangeLog:
PR libstdc++/118185
PR libstdc++/100249
* include/bits/ranges_algo.h (__clamp_fn): Correctly forward the
projected value to the comparator.
* testsuite/25_algorithms/clamp/118185.cc: New test.
Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> Reviewed-by: Patrick Palka <ppalka@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
There's a discrepancy in SLP vs non-SLP vectorization that SLP build
does not handle plain SSA copies (which should have been elimiated
earlier). But this now bites back since non-SLP happily handles them,
causing a regression with --param vect-force-slp=1 which is now default,
resulting in a big performance regression in 456.hmmer.
So the following restores parity between SLP and non-SLP here, defering
the missed copy elimination to later (PR118565).
Xi Ruoyao [Tue, 14 Jan 2025 09:26:04 +0000 (17:26 +0800)]
LoongArch: Improve reassociation for bitwise operation and left shift [PR 115921]
For things like
(x | 0x101) << 11
It's obvious to write:
ori $r4,$r4,257
slli.d $r4,$r4,11
But we are actually generating something insane:
lu12i.w $r12,524288>>12 # 0x80000
ori $r12,$r12,2048
slli.d $r4,$r4,11
or $r4,$r4,$r12
jr $r1
It's because the target-independent canonicalization was written before
we have all the RISC targets where loading an immediate may need
multiple instructions. So for these targets we need to handle this in
the target code.
We do the reassociation on our own (i.e. reverting the
target-independent reassociation) if "(reg [&|^] mask) << shamt" does
not need to load mask into an register, and either:
- (mask << shamt) needs to be loaded into an register, or
- shamt is a const_immalsl_operand, so the outer shift may be further
combined with an add.
gcc/ChangeLog:
PR target/115921
* config/loongarch/loongarch-protos.h
(loongarch_reassoc_shift_bitwise): New function prototype.
* config/loongarch/loongarch.cc
(loongarch_reassoc_shift_bitwise): Implement.
* config/loongarch/loongarch.md
(*alslsi3_extend_subreg): New define_insn_and_split.
(<any_bitwise:optab>_shift_reverse<X:mode>): New
define_insn_and_split.
(<any_bitwise:optab>_alsl_reversesi_extended): New
define_insn_and_split.
(zero_extend_ashift): Remove as it's just a special case of
and_shift_reversedi, and it does not make too much sense to
write "alsl.d rd,rs,r0,shamt" instead of "slli.d rd,rs,shamt".
(bstrpick_alsl_paired): Remove as it is already done by
splitting and_shift_reversedi into and + ashift first, then
late combining the ashift and a further add.
gcc/testsuite/ChangeLog:
PR target/115921
* gcc.target/loongarch/bstrpick_alsl_paired.c (scan-rtl-dump):
Scan for and_shift_reversedi instead of the removed
bstrpick_alsl_paired.
* gcc.target/loongarch/bitwise-shift-reassoc.c: New test.
Xi Ruoyao [Thu, 5 Sep 2024 09:53:41 +0000 (17:53 +0800)]
LoongArch: Simplify using bstr{ins,pick} instructions for and
For bstrins, we can merge it into and<mode>3 instead of having a
separate define_insn.
For bstrpick, we can use the constraints to ensure the first source
register and the destination register are the same hardware register,
instead of emitting a move manually.
This will simplify the next commit where we'll reassociate bitwise
and left shift for better code generation.
gcc/ChangeLog:
* config/loongarch/constraints.md (Yy): New define_constriant.
* config/loongarch/loongarch.cc (loongarch_print_operand):
For "%M", output the index of bits to be used with
bstrins/bstrpick.
* config/loongarch/predicates.md (ins_zero_bitmask_operand):
Exclude low_bitmask_operand as for low_bitmask_operand it's
always better to use bstrpick instead of bstrins.
(and_operand): New define_predicate.
* config/loongarch/loongarch.md (any_or): New
define_code_iterator.
(bitwise_operand): New define_code_attr.
(*<optab:any_or><mode:GPR>3): New define_insn.
(*and<mode:GPR>3): New define_insn.
(<optab:any_bitwise><mode:X>3): New define_expand.
(and<mode>3_extended): Remove, replaced by the 3rd alternative
of *and<mode:GPR>3.
(bstrins_<mode>_for_mask): Remove, replaced by the 4th
alternative of *and<mode:GPR>3.
(*<optab:any_bitwise>si3_internal): Remove, already covered by
the *<optab:any_or><mode:GPR>3 and *and<mode:GPR>3 templates.
Richard Biener [Mon, 20 Jan 2025 10:50:53 +0000 (11:50 +0100)]
tree-optimization/118552 - failed LC SSA update after unrolling
When unrolling changes nesting relationship of loops we fail to
mark blocks as in need to change for LC SSA update. Specifically
the LC SSA PHI on a former inner loop exit might be misplaced
if that loop becomes a sibling of its outer loop.
PR tree-optimization/118552
* cfgloopmanip.cc (fix_loop_placement): Properly mark
exit source blocks as to be scanned for LC SSA update when
the loops nesting relationship changed.
(fix_loop_placements): Adjust.
(fix_bb_placements): Likewise.
Thomas Schwinge [Fri, 17 Jan 2025 20:45:42 +0000 (21:45 +0100)]
nvptx: Gracefully handle '-mptx=3.1' if neither sm_30 nor sm_35 multilib variant is built
For example, for GCC/nvptx built with '--with-arch=sm_52' (current default)
and '--without-multilib-list', neither a sm_30 nor a sm_35 multilib variant
is built, and thus no '-mptx=3.1' sub-variant either. Such a configuration
is possible as of commit 86b3a7532d56f74fcd1c362f2da7f95e8cc4e4a6
"nvptx: Support '--with-multilib-list'", but currently results in the
following bogus behavior:
The latter two '.' are unexpected; linking OpenMP/nvptx offloading code
like this fails with: 'unresolved symbol __nvptx_uni', for example.
Instead of '.', the latter two should print 'mgomp', too. To achieve that,
we must not set up the '-mptx=3.1' multilib axis if no '-mptx=3.1'
sub-variant is built.
gcc/
* config/nvptx/t-nvptx (MULTILIB_OPTIONS): Don't add 'mptx=3.1' if
neither sm_30 nor sm_35 multilib variant is built.
Jakub Jelinek [Mon, 20 Jan 2025 09:26:49 +0000 (10:26 +0100)]
tree, c++: Consider TARGET_EXPR invariant like SAVE_EXPR [PR118509]
My October PR117259 fix to get_member_function_from_ptrfunc to use a
TARGET_EXPR rather than SAVE_EXPR unfortunately caused some regressions as
well as the following testcase shows.
What happens is that
get_member_function_from_ptrfunc -> build_base_path calls save_expr,
so since the PR117259 change in mnay cases it will call save_expr on
a TARGET_EXPR. And, for some strange reason a TARGET_EXPR is not considered
an invariant, so we get a SAVE_EXPR wrapped around the TARGET_EXPR.
That SAVE_EXPR <TARGET_EXPR <...>> gets initially added only to the second
operand of ?:, so at that point it would still work fine during expansion.
But unfortunately an expression with that subexpression is handed to the
caller also through *instance_ptrptr = instance_ptr; and gets evaluated
once again when computing the first argument to the method.
So, essentially, we end up with
(TARGET_EXPR <D.2907, ...>, (... ? ... SAVE_EXPR <TARGET_EXPR <D.2907, ...>
... : ...)) (... SAVE_EXPR <TARGET_EXPR <D.2907, ...> ..., ...);
and while D.2907 is initialized during gimplification in the code dominating
everything that uses it, the extra temporary created for the SAVE_EXPR
is initialized only conditionally (if the ?: condition is true) but then
used unconditionally, so we get
pmf-4.C: In function ‘void foo(C, B*)’:
pmf-4.C:12:11: warning: ‘<anonymous>’ may be used uninitialized [-Wmaybe-uninitialized]
12 | (y->*x) ();
| ~~~~~~~~^~
pmf-4.C:12:11: note: ‘<anonymous>’ was declared here
12 | (y->*x) ();
| ~~~~~~~~^~
diagnostic and wrong-code issue too.
The following patch fixes it by considering a TARGET_EXPR invariant
for SAVE_EXPR purposes the same as SAVE_EXPR is. Really creating another
temporary for it is just a waste of the IL.
Unfortunately I had to tweak the omp matching code to be able to accept
TARGET_EXPR the same as SAVE_EXPR.
2025-01-20 Jakub Jelinek <jakub@redhat.com>
PR c++/118509
gcc/
* tree.cc (tree_invariant_p_1): Return true for TARGET_EXPR too.
gcc/c-family/
* c-omp.cc (c_finish_omp_for): Handle TARGET_EXPR in first operand
of COMPOUND_EXPR incr the same as SAVE_EXPR.
gcc/testsuite/
* g++.dg/expr/pmf-4.C: New test.
Jakub Jelinek [Mon, 20 Jan 2025 09:24:18 +0000 (10:24 +0100)]
tree-ssa-dce: Fix calloc handling [PR118224]
As reported by Dimitar, this should have been a multiplication, but wasn't
caught because in the test (~(__SIZE_TYPE__) 0) / 2 is the largest accepted
size and so adding 3 to it also resulted in "overflow".
The following patch adds one subtest to really verify it is a multiplication
and fixes the operation.
2025-01-20 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/118224
* tree-ssa-dce.cc (is_removable_allocation_p): Multiply a1 by a2
instead of adding it.
* gcc.target/s390/vector/vec-shift-10.c: New test.
* gcc.target/s390/vector/vec-shift-11.c: New test.
* gcc.target/s390/vector/vec-shift-12.c: New test.
* gcc.target/s390/vector/vec-shift-3.c: New test.
* gcc.target/s390/vector/vec-shift-4.c: New test.
* gcc.target/s390/vector/vec-shift-5.c: New test.
* gcc.target/s390/vector/vec-shift-6.c: New test.
* gcc.target/s390/vector/vec-shift-7.c: New test.
* gcc.target/s390/vector/vec-shift-8.c: New test.
* gcc.target/s390/vector/vec-shift-9.c: New test.
s390: arch15: Vector maximum/minimum: Add 128-bit integer support
For previous architectures emulate operation max/min.
gcc/ChangeLog:
* config/s390/s390-builtins.def: Add 128-bit variants and remove
bool variants.
* config/s390/s390-builtin-types.def: Update accordinly.
* config/s390/s390.md: Emulate min/max for GPR.
* config/s390/vector.md: Add min/max patterns and emulate in
case of no VXE3.
gcc/testsuite/ChangeLog:
* gcc.target/s390/vector/vec-max-emu.c: New test.
* gcc.target/s390/vector/vec-min-emu.c: New test.
* gcc.target/s390/vxe3/vd-1.c: New test.
* gcc.target/s390/vxe3/vd-2.c: New test.
* gcc.target/s390/vxe3/vdl-1.c: New test.
* gcc.target/s390/vxe3/vdl-2.c: New test.
* gcc.target/s390/vxe3/vr-1.c: New test.
* gcc.target/s390/vxe3/vr-2.c: New test.
* gcc.target/s390/vxe3/vrl-1.c: New test.
* gcc.target/s390/vxe3/vrl-2.c: New test.
Add vector single element 128-bit integer support utilizing new
instructions vclzq and vctzq. Furthermore, add scalar 64-bit integer
support utilizing new instructions clzg and ctzg. For ctzg, also define
the resulting value if the input operand equals zero.
gcc/ChangeLog:
* config/s390/s390-builtins.def (s390_vec_cntlz): Add 128-bit
integer overloads.
(s390_vclzq): Add.
(s390_vec_cnttz): Add 128-bit integer overloads.
(s390_vctzq): Add.
* config/s390/s390-builtin-types.def: Update accordingly.
* config/s390/s390.h (CTZ_DEFINED_VALUE_AT_ZERO): Define.
* config/s390/s390.md (*clzg): New insn.
(clztidi2): Exploit new insn for target arch15.
(ctzdi2): New insn.
* config/s390/vector.md (clz<mode>2): Extend modes including
128-bit integer.
(ctz<mode>2): Likewise.
* gcc.target/s390/vxe3/veval-1.c: New test.
* gcc.target/s390/vxe3/veval-2.c: New test.
* gcc.target/s390/vxe3/veval-3.c: New test.
* gcc.target/s390/vxe3/veval-4.c: New test.
* gcc.target/s390/vxe3/veval-5.c: New test.
* gcc.target/s390/vxe3/veval-6.c: New test.
* gcc.target/s390/vxe3/veval-7.c: New test.
* gcc.target/s390/vxe3/veval-8.c: New test.
* gcc.target/s390/vxe3/veval-9.c: New test.
* gcc.target/s390/llxa-1.c: New test.
* gcc.target/s390/llxa-2.c: New test.
* gcc.target/s390/llxa-3.c: New test.
* gcc.target/s390/lxa-1.c: New test.
* gcc.target/s390/lxa-2.c: New test.
* gcc.target/s390/lxa-3.c: New test.
* gcc.target/s390/lxa-4.c: New test.
Currently TOINTVEC maps scalar mode TI/TF to vector mode V1TI/V1TF,
respectively. As a consequence we may end up with patterns with a
mixture of scalar and vector modes as e.g. for
This is cumbersome since gen_vec_sel0ti() and gen_vec_sel0tf() require
that operands 3 and 4 are of vector mode whereas the remainder of
operands must be of scalar mode. Likewise for tointvec.
Fixed by staying scalar.
gcc/ChangeLog:
* config/s390/vector.md: Stay scalar for TOINTVEC/tointvec.
Kito Cheng [Wed, 15 Jan 2025 08:13:05 +0000 (16:13 +0800)]
RISC-V: Add sifive_vector.h
sifive_vector.h is a vendor specfic header, it should include before
using sifive vector intrinsic, it's just include riscv_vector.h for now,
we will separate the implementation by adding new pragma in future.
Hongyu Wang [Fri, 17 Jan 2025 01:04:17 +0000 (09:04 +0800)]
i386: Fix wrong insn generated by shld/shrd ndd split [PR118510]
For shld/shrd_ndd_2 insn, the spiltter outputs wrong pattern that
mixed parallel for clobber and set. Use register_operand as dest
and ajdust output template to fix.
gcc/ChangeLog:
PR target/118510
* config/i386/i386.md (*x86_64_shld_ndd_2): Use register_operand
for operand[0] and adjust the output template to directly
generate ndd form shld pattern.
(*x86_shld_ndd_2): Likewise.
(*x86_64_shrd_ndd_2): Likewise.
(*x86_shrd_ndd_2): Likewise.
gcc/testsuite/ChangeLog:
PR target/118510
* gcc.target/i386/pr118510.c: New test.
Dimitar Dimitrov [Sat, 18 Jan 2025 18:19:43 +0000 (20:19 +0200)]
testsuite: Fixes for test case pr117546.c
This test fails on AVR.
Debugging the test on x86 host, I noticed that u in function s sometimes
has value 16128. The "t <= 3 * u" expression in the same function
results in signed integer overflow for targets with sizeof(int)=2.
Jakub Jelinek [Sat, 18 Jan 2025 20:50:23 +0000 (21:50 +0100)]
c++: Copy over further 2 flags for !TREE_PUBLIC in copy_linkage [PR118513]
The following testcase ICEs in import_export_decl.
When cp_finish_decomp handles std::tuple* using structural binding,
it calls copy_linkage to copy various VAR_DECL flags from the structured
binding base to the individual sb variables.
In this case the base variable is in anonymous union, so we call
constrain_visibility (..., VISIBILITY_ANON, ...) on it which e.g.
clears TREE_PUBLIC etc. (flags which copy_linkage copies) but doesn't
copy over DECL_INTERFACE_KNOWN/DECL_NOT_REALLY_EXTERN.
When cp_finish_decl calls determine_visibility on the individual sb
variables, those have !TREE_PUBLIC since copy_linkage and so nothing tries
to determine visibility and nothing sets DECL_INTERFACE_KNOWN and
DECL_NOT_REALLY_EXTERN.
Now, this isn't a big deal without modules, the individual variables are
var_finalized_p and so nothing really cares about missing
DECL_INTERFACE_KNOWN. But in the module case the variables are streamed
out and in and care about those bits.
The following patch is an attempt to copy over also those flags (but I've
limited it to the !TREE_PUBLIC case just in case). Other option would be
to call it unconditionally, or call constrain_visibility with
VISIBILITY_ANON for !TREE_PUBLIC (but are all !TREE_PUBLIC constrained
visibility) or do it only in the cp_finish_decomp case
after the copy_linkage call there.
2025-01-18 Jakub Jelinek <jakub@redhat.com>
PR c++/118513
* decl2.cc (copy_linkage): If not TREE_PUBLIC, also set
DECL_INTERFACE_KNOWN, assert it was set on decl and copy
DECL_NOT_REALLY_EXTERN flags.
* g++.dg/modules/decomp-3_a.H: New test.
* g++.dg/modules/decomp-3_b.C: New test.
Jeff Law [Sat, 18 Jan 2025 20:44:33 +0000 (13:44 -0700)]
[RISC-V][PR target/116308] Fix generation of initial RTL for atomics
While this wasn't originally marked as a regression, it almost certainly is
given that older versions of GCC would have used libatomic and would not have
ICE'd on this code.
Basically this is another case where we directly used simplify_gen_subreg when
we should have used gen_lowpart.
When I fixed a similar bug a while back I noted the code in question as needing
another looksie. I think at that time my brain saw the mixed modes (SI & QI)
and locked up. But the QI stuff is just the shift count, not some deeper
issue. So fixing is trivial.
We just replace the simplify_gen_subreg with a gen_lowpart and get on with our
lives.
Tested on rv64 and rv32 in my tester. Waiting on pre-commit testing for final
verdict.
PR target/116308
gcc/
* config/riscv/riscv.cc (riscv_lshift_subword): Use gen_lowpart
rather than simplify_gen_subreg.
Michal Jires [Thu, 16 Jan 2025 13:42:59 +0000 (14:42 +0100)]
Fix uniqueness of symtab_node::get_dump_name.
symtab_node::get_dump_name uses node order to identify nodes.
Order is no longer unique because of Incremental LTO patches.
This patch moves uid from cgraph_node node to symtab_node,
so get_dump_name can use uid instead and get back unique dump names.
In inlining passes, uid is replaced with more appropriate (more compact
for indexing) summary id.
Bootstrapped/regtested on x86_64-linux.
Ok for trunk?
gcc/ChangeLog:
* cgraph.cc (symbol_table::create_empty):
Move uid to symtab_node.
(test_symbol_table_test): Change expected dump id.
* cgraph.h (struct cgraph_node):
Move uid to symtab_node.
(symbol_table::register_symbol): Likewise.
* dumpfile.cc (test_capture_of_dump_calls):
Change expected dump id.
* ipa-inline.cc (update_caller_keys):
Use summary id instead of uid.
(update_callee_keys): Likewise.
* symtab.cc (symtab_node::get_dump_name):
Use uid instead of order.
Eric Botcazou [Sat, 18 Jan 2025 17:58:02 +0000 (18:58 +0100)]
Fix bootstrap failure on SPARC with -O3 -mcpu=niagara4
This is a regression present on the mainline only, but the underlying issue
has been latent for years: the compiler and the assembler disagree on the
support of the VIS 3B SIMD ISA, the former bundling it with VIS 3 but not
the latter. IMO the documentation is not very clear, so this patch just
aligns the compiler with the assembler.
Jin Ma [Sat, 18 Jan 2025 14:43:17 +0000 (07:43 -0700)]
[PR target/118357] RISC-V: Disable fusing vsetvl instructions by VSETVL_VTYPE_CHANGE_ONLY for XTheadVector.
In RVV 1.0, the instruction "vsetvli zero,zero,*" indicates that the
available vector length (avl) does not change. However, in XTheadVector,
this same instruction signifies that the avl should take the maximum value.
Consequently, when fusing vsetvl instructions, the optimization labeled
"VSETVL_VTYPE_CHANGE_ONLY" is disabled for XTheadVector.
PR target/118357
gcc/ChangeLog:
* config/riscv/riscv-vsetvl.cc: Function change_vtype_only_p always
returns false for XTheadVector.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/xtheadvector/pr118357.c: New test.
Richard Biener [Fri, 17 Jan 2025 14:41:19 +0000 (15:41 +0100)]
tree-optimization/118529 - ICE with condition vectorization
On sparc we end up choosing vector(8) <signed-boolean:1> for the
condition but vector(2) int for the value of a COND_EXPR but we
fail to verify their shapes match and thus things go downhill.
This is a missed-optimization on the pattern recognition side
as well as unhandled vector decomposition in vectorizable_condition.
The following plugs just the observed ICE for now.
PR tree-optimization/118529
* tree-vect-stmts.cc (vectorizable_condition): Check the
shape of the vector and condition vector type are compatible.
Akram Ahmad [Fri, 17 Jan 2025 17:43:49 +0000 (17:43 +0000)]
AArch64: Use standard names for saturating arithmetic
This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.
The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.
Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.
Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for unsigned scalar arithmetic.
Jakub Jelinek [Sat, 18 Jan 2025 08:14:27 +0000 (09:14 +0100)]
c++: Fix up find_array_ctor_elt RAW_DATA_CST handling [PR118534]
This is the third bug discovered today with the
https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673945.html
hack but then turned into proper testcases where embed-24.C FAILed
since introduction of optimized #embed support and the others when
optimizing large C++ initializers using RAW_DATA_CST.
find_array_ctor_elt already has RAW_DATA_CST support, but on the
following testcases it misses one case I've missed.
The CONSTRUCTORs in question went through the braced_list_to_string
optimization which can turn INTEGER_CST RAW_DATA_CST INTEGER_CST
into just larger RAW_DATA_CST covering even those 2 bytes around it
(if they appear there in the underlying RAW_DATA_OWNER).
With this optimization, RAW_DATA_CST can be the last CONSTRUCTOR_ELTS
elt in a CONSTRUCTOR, either the sole one or say preceeded by some
unrelated other elements. Now, if RAW_DATA_CST is the only one or
if there are no RAW_DATA_CSTs earlier in CONSTRUCTOR_ELTS, we can
trigger a bug in find_array_ctor_elt.
It has a smart optimization for the very common case where
CONSTRUCTOR_ELTS have indexes and index of the last elt is equal
to CONSTRUCTOR_NELTS (ary) - 1, then obviously we know there are
no RAW_DATA_CSTs before it and the indexes just go from 0 to nelts-1,
so when we care about any of those earlier indexes, we can just return i;
and not worry about anything.
Except it uses if (i < end) return i; rather than if (i < end - 1) return i;
For the latter cases, i.e. anything before the last elt, we know there
are no surprises and return i; is right. But for the if (i == end - 1)
case, return i; is only correct if the last elt is not RAW_DATA_CST, if it
is RAW_DATA_CST, we still need to split it, which is handled by the code
later in the function. So, for that we need begin = end - 1, so that the
binary search will just care about that last element.
2025-01-18 Jakub Jelinek <jakub@redhat.com>
PR c++/118534
* constexpr.cc (find_array_ctor_elt): Don't return i early if
i == end - 1 and the last elt's value is RAW_DATA_CST.
* g++.dg/cpp/embed-24.C: New test.
* g++.dg/cpp1y/pr118534.C: New test.
Xi Ruoyao [Thu, 5 Sep 2024 16:34:55 +0000 (00:34 +0800)]
LoongArch: Fix cost model for alsl
Our cost model for alsl was wrong: it matches (a + b * imm) where imm is
1, 2, 3, or 4 (should be 2, 4, 8, or 16), and it does not match
(a + (b << imm)) at all. For the test case:
a += c << 3;
b += c << 3;
it caused the compiler to perform a CSE and make one slli and two add,
but we just want two alsl.
Also add a "code == PLUS" check to prevent matching a - (b << imm) as we
don't have any "slsl" instruction.
gcc/ChangeLog:
* config/loongarch/loongarch.cc (loongarch_rtx_costs): Fix the
cost for (a + b * imm) and (a + (b << imm)) which can be
implemented with a single alsl instruction.
[PR118067][LRA]: Check secondary memory mode for the reg class
This is the second patch for the PR for the new test. The patch
solves problem in the case when secondary memory mode (SImode in the
PR test) returned by hook secondary_memory_needed_mode can not be used
for reg class (ALL_MASK_REGS) involved in secondary memory moves. The
patch uses reg mode instead of one returned by
secondary_memory_needed_mode in this case.
gcc/ChangeLog:
PR rtl-optimization/118067
* lra-constraints.cc (invalid_mode_reg_p): New function.
(curr_insn_transform): Use it to check mode returned by target
secondary_memory_needed_mode.
Jakub Jelinek [Fri, 17 Jan 2025 20:00:50 +0000 (21:00 +0100)]
testsuite: Make embed-10.c test more robust
With the https://gcc.gnu.org/pipermail/gcc-patches/2025-January/673945.html
hack we get slightly different error wording in one of the errors, given that
the test actually does use #embed, I think both wordings are just fine and
we should accept them.
2025-01-17 Jakub Jelinek <jakub@redhat.com>
* c-c++-common/cpp/embed-10.c: Allow a different error wording for
C++.
Jakub Jelinek [Fri, 17 Jan 2025 18:27:59 +0000 (19:27 +0100)]
s390: Replace some checking assertions with output_operand_lossage [PR118511]
r15-2002 s390: Fully exploit vgm, vgbm, vrepi change added
some code to print_operand and added gcc_checking_asserts in there.
But print_operand ideally should have no assertions in it, as most
of the assumptions can be easily violated by people using it in
inline asm.
This issue in particular was seen by failure to compile s390-tools,
which had in its extended inline asm uses of %r1 and %r2.
I really don't know if they meant %%r1 and %%r2 or %1 and %2 and
will leave that decision to the maintainers, but the thing is that
%r1 and %r2 used to expand like %1 and %2 in GCC 14 and earlier,
now in checking build it ICEs and in --enable-checking=release build
fails to assemble (the checking assert is ignored and the compiler just uses
some uninitialized variables to emit something arbitrary).
With the following patch it is diagnosed as error consistently
regardless if it is release checking or no checking or checking compiler.
Note, I see also
else if (GET_CODE (x) == UNSPEC && XINT (x, 1) == UNSPEC_TLSLDM)
{
fprintf (file, "%s", ":tls_ldcall:");
const char *name = get_some_local_dynamic_name ();
gcc_assert (name);
assemble_name (file, name);
}
in print_operand, maybe that isn't a big deal because it might be
impossible to construct inline asm argument which is UNSPEC_TLSLDM.
And then there is
case 'e': case 'f':
case 's': case 't':
{
int start, end;
int len;
bool ok;
len = (code == 's' || code == 'e' ? 64 : 32);
ok = s390_contiguous_bitmask_p (ival, true, len, &start, &end);
gcc_assert (ok);
if (code == 's' || code == 't')
ival = start;
else
ival = end;
}
break;
which likely should be also output_operand_lossage but I haven't tried
to reproduce that.
2025-01-17 Jakub Jelinek <jakub@redhat.com>
PR target/118511
* config/s390/s390.cc (print_operand) <case 'p'>: Use
output_operand_lossage instead of gcc_checking_assert.
(print_operand) <case 'q'>: Likewise.
(print_operand) <case 'r'>: Likewise.
Tamar Christina [Fri, 17 Jan 2025 17:43:49 +0000 (17:43 +0000)]
AArch64: Use standard names for saturating arithmetic
This renames the existing {s,u}q{add,sub} instructions to use the
standard names {s,u}s{add,sub}3 which are used by IFN_SAT_ADD and
IFN_SAT_SUB.
The NEON intrinsics for saturating arithmetic and their corresponding
builtins are changed to use these standard names too.
Using the standard names for the instructions causes 32 and 64-bit
unsigned scalar saturating arithmetic to use the NEON instructions,
resulting in an additional (and inefficient) FMOV to be generated when
the original operands are in GP registers. This patch therefore also
restores the original behaviour of using the adds/subs instructions
in this circumstance.
Additional tests are written for the scalar and Adv. SIMD cases to
ensure that the correct instructions are used. The NEON intrinsics are
already tested elsewhere.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc: Expand iterators.
* config/aarch64/aarch64-simd-builtins.def: Use standard names
* config/aarch64/aarch64-simd.md: Use standard names, split insn
definitions on signedness of operator and type of operands.
* config/aarch64/arm_neon.h: Use standard builtin names.
* config/aarch64/iterators.md: Add VSDQ_I_QI_HI iterator to
simplify splitting of insn for unsigned scalar arithmetic.
The built-in __builtin_vsx_xvcvuxwdp can be covered with PVIPR
function vec_doubleo on LE and vec_doublee on BE. There are no test
cases or documentation for __builtin_vsx_xvcvuxwdp. This patch
removes the redundant built-in.
Carl Love [Wed, 31 Jul 2024 20:40:34 +0000 (16:40 -0400)]
rs6000, remove built-ins __builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns
The two built-ins __builtin_vsx_vperm_8hi and __builtin_vsx_vperm_8hi_uns
are redundant. The are covered by the overloaded vec_perm built-in. The
built-ins are not documented and do not have test cases.
The removal of these built-ins was missed in commit gcc r15-1923 on
7/9/2024.
Carl Love [Wed, 31 Jul 2024 20:31:34 +0000 (16:31 -0400)]
rs6000, add testcases to the overloaded vec_perm built-in
The overloaded vec_perm built-in supports permuting signed and unsigned
vectors of char, bool char, short int, short bool, int, bool, long long
int, long long bool, int128, float and double. However, not all of the
supported arguments are included in the test cases. This patch adds
the missing test cases.
Additionally, in the 128-bit debug print statements the expected result and
the result need to be cast to unsigned long long to print correctly. The
patch makes this additional change to the print statements.
gcc/ChangeLog:
* doc/extend.texi: Fix spelling mistake in description of the
vec_sel built-in. Add documentation of the 128-bit vec_perm
instance.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/vsx-builtin-3.c: Add vec_perm test cases for
arguments of type vector signed long long int, long long bool,
bool, bool short, bool char and pixel, vector unsigned long long
int, unsigned int, unsigned short int, unsigned char. Cast
arguments for debug prints to unsigned long long.
* gcc.target/powerpc/builtins-4-int128-runnable.c: Add vec_perm
test cases for signed and unsigned int128 arguments.