/* Handle empty records as per the x86-64 psABI. */
TYPE_EMPTY_P (type) = targetm.calls.empty_record_p (type);
(Indeed x86_64 is still the only target to define 'TARGET_EMPTY_RECORD_P',
calling 'gcc/tree.cc-default_is_empty_record'.)
And so it happens that for an empty struct used in code offloaded from x86_64
host (but not powerpc64le host, for example), we get to see 'TYPE_EMPTY_P' in
offloading compilation (where the offload targets (currently?) don't use it
themselves, and therefore aren't prepared to handle it).
For nvptx offloading compilation, this causes wrong code generation:
'ptxas [...] error : Call has wrong number of parameters', as nvptx code
generation for function definition doesn't pay attention to this flag (say, in
'gcc/config/nvptx/nvptx.cc:pass_in_memory', or whereever else would be
appropriate to handle that), but the generic code 'gcc/calls.cc:expand_call'
via 'gcc/function.cc:aggregate_value_p' does pay attention to it, and we thus
get mismatching function definition vs. function call.
This issue apparently isn't a problem for GCN offloading, but I don't know if
that's by design or by accident.
Richard Biener:
> It looks like TYPE_EMPTY_P is only used during RTL expansion for ABI
> purposes, so computing it during layout_type is premature as shown here.
>
> I would suggest to simply re-compute it at offload stream-in time.
(For avoidance of doubt, the additions to 'gcc.target/nvptx/abi-struct-arg.c',
'gcc.target/nvptx/abi-struct-ret.c' are not dependent on the offload streaming
code changes, but are just to mirror the changes to
'libgomp.oacc-c-c++-common/abi-struct-1.c'.)
Jeff Law [Mon, 19 May 2025 18:00:56 +0000 (12:00 -0600)]
[RISC-V] Fix false positive from Wuninitialized
As Mark and I independently tripped, there's a Wuninitialized issue in the
RISC-V backend. While *I* know the value would always be properly initialized,
it'd be somewhat painful to either eliminate the infeasible paths or do deep
enough analysis to suppress the false positive.
So this initializes OUTPUT and verifies it's got a reasonable value before
using it for the final copy into operands[0].
Bootstrapped on the BPI (regression testing still has ~12hrs to go).
gcc/
* config/riscv/riscv.cc (synthesize_ior_xor): Initialize OUTPUT and
verify it's non-null before emitting the final copy insn.
Harald Anlauf [Sun, 18 May 2025 20:42:26 +0000 (22:42 +0200)]
Fortran: fix FAIL of gfortran.dg/specifics_1.f90 after r16-372 [PR120099]
After commit r16-372, testcase gfortran.dg/specifics_1.f90 started to
FAIL at -O2 and higher, as DCE lead to elimination of evaluations of
Fortran specific intrinsics returning complex results and with -ff2c.
As the Fortran runtime library is compiled with -fno-f2c, the frontend
generates calls to wrapper subroutines _gfortran_f2c_specific_* that
return their result by reference via their first argument when this is
needed. This is e.g. the case when specific names of the intrinsics are
used for passing as actual argument to procedures. These wrappers are
not pure in the GCC IR sense, even if the Fortran intrinsics are.
Therefore gfc_return_by_reference must return true for these.
PR fortran/120099
gcc/fortran/ChangeLog:
* trans-types.cc (gfc_return_by_reference): Intrinsic functions
returning complex numbers may return their result by reference
with -ff2c.
Richard Earnshaw [Mon, 19 May 2025 15:19:39 +0000 (16:19 +0100)]
arm: fully validate mem_noofs_operand [PR120351]
It's not enough to just check that a memory operand is of the form
mem(reg); after RA we also need to validate the register being used.
The safest way to do this is to call memory_operand.
PR target/120351
gcc/ChangeLog:
* config/arm/predicates.md (mem_noofs_operand): Also check the op
is a valid memory_operand.
Jonathan Wakely [Fri, 16 May 2025 10:54:46 +0000 (11:54 +0100)]
libstdc++: Fix some Clang -Wsystem-headers warnings in <ranges>
libstdc++-v3/ChangeLog:
* include/std/ranges (_ZipTransform::operator()): Remove name of
unused parameter.
(chunk_view::_Iterator, stride_view::_Iterator): Likewise.
(join_with_view): Declare _Iterator and _Sentinel as class
instead of struct.
(repeat_view): Declare _Iterator as class instead of struct.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Jonathan Wakely [Thu, 15 May 2025 18:32:01 +0000 (19:32 +0100)]
libstdc++: Fix std::format of chrono::local_days with {} [PR120293]
Formatting of chrono::local_days with an empty chrono-specs should be
equivalent to inserting it into an ostream, which should use the
overload for inserting chrono::sys_days into an ostream. The
implementation of empty chrono-specs in _M_format_to_ostream takes some
short cuts, and that wasn't being done correctly for chrono::local_days.
libstdc++-v3/ChangeLog:
PR libstdc++/120293
* include/bits/chrono_io.h (_M_format_to_ostream): Add special
case for local_time convertible to local_days.
* testsuite/std/time/clock/local/io.cc: Check formatting of
chrono::local_days.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Dongyan Chen [Mon, 19 May 2025 07:17:12 +0000 (15:17 +0800)]
RISC-V: Fix the warning of temporary object dangling references.
During the GCC compilation, some warnings about temporary object dangling
references emerged. They appeared in these code lines in riscv-common.cc:
const riscv_ext_info_t &implied_ext_info, const riscv_ext_info_t &ext_info = get_riscv_ext_info (ext) and auto &ext_info = get_riscv_ext_info (search_ext).
The issue arose because the local variable types were not used in a standardized
way, causing their references to dangle once the function ended.
To fix this, the patch changes the argument type of get_riscv_ext_info to
`const char *`, thereby eliminating the warnings.
Changes for v2:
- Change the argument type of get_riscv_ext_info to `const char *` to eliminate the warnings.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (get_riscv_ext_info): Fix argument type.
(riscv_subset_list::check_implied_ext): Type conversion.
zhusonghe [Mon, 19 May 2025 02:43:48 +0000 (10:43 +0800)]
RISC-V: Rename conflicting variables in gen-riscv-ext-texi.cc
The variables `major` and `minor` in `gen-riscv-ext-texi.cc`
conflict with the macros of the same name defined in `<sys/sysmacros.h>`,
which are exposed when building with newer versions of GCC on older
Linux distributions (e.g., Ubuntu 18.04). To resolve this, we rename them
to `major_version` and `minor_version` respectively. This aligns with the
GCC community's recommended practice [1] and improves code clarity.
Kito Cheng [Mon, 12 May 2025 09:38:39 +0000 (02:38 -0700)]
RISC-V: Support Zilsd code gen
This commit adds the code gen support for Zilsd, which is a
newly added extension for RISC-V. The Zilsd extension allows
for loading and storing 64-bit values using even-odd register
pairs.
We only try to do miminal code gen support for that, which means only
use the new instructions when the load store is 64 bits data, we can use
that to optimize the code gen of memcpy/memset/memmove and also the
prologue and epilogue of functions, but I think that probably should be
done in a follow up patch.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_legitimize_move): Handle
load/store with odd-even reg pair.
(riscv_split_64bit_move_p): Don't split load/store if zilsd enabled.
(riscv_hard_regno_mode_ok): Only allow even reg can be used for
64 bits mode for zilsd.
Jennifer Schmitz [Thu, 15 May 2025 14:16:15 +0000 (07:16 -0700)]
regcprop: Return from copy_value for unordered modes
The ICE in PR120276 resulted from a comparison of VNx4QI and V8QI using
partial_subreg_p in the function copy_value during the RTL pass
regcprop, failing the assertion in
inline bool
partial_subreg_p (machine_mode outermode, machine_mode innermode)
{
/* Modes involved in a subreg must be ordered. In particular, we must
always know at compile time whether the subreg is paradoxical. */
poly_int64 outer_prec = GET_MODE_PRECISION (outermode);
poly_int64 inner_prec = GET_MODE_PRECISION (innermode);
gcc_checking_assert (ordered_p (outer_prec, inner_prec));
return maybe_lt (outer_prec, inner_prec);
}
Returning from the function if the modes are not ordered before reaching
the call to partial_subreg_p resolves the ICE and passes bootstrap and
testing without regression.
OK for mainline?
Signed-off-by: Jennifer Schmitz <jschmitz@nvidia.com>
gcc/
PR middle-end/120276
* regcprop.cc (copy_value): Return in case of unordered modes.
gcc/testsuite/
PR middle-end/120276
* gcc.dg/torture/pr120276.c: New test.
Kito Cheng [Mon, 12 May 2025 06:36:07 +0000 (14:36 +0800)]
RISC-V: Add new operand constraint: cR
This commit introduces a new operand constraint `cR` for the RISC-V
architecture, which allows the use of an even-odd RVC general purpose register
(x8-x15) in inline asm.
Haochen Jiang [Fri, 14 Mar 2025 06:27:36 +0000 (14:27 +0800)]
i386: Remove duplicate iterators in md
There are several iterators no longer needed in md files since
after refactor in AVX10, they could directly use legacy AVX512
ones. Remove those duplicate iterators.
gcc/ChangeLog:
* config/i386/sse.md (VF1_VF2_AVX10_2): Removed.
(VF2_AVX10_2): Ditto.
(VI1248_AVX10_2): Ditto.
(VFH_AVX10_2): Ditto.
(VF1_AVX10_2): Ditto.
(VHF_AVX10_2): Ditto.
(VBF_AVX10_2): Ditto.
(VI8_AVX10_2): Ditto.
(VI2_AVX10_2): Ditto.
(VBF): New.
(div<mode>3): Use VBF instead of AVX10.2 ones.
(vec_cmp<mode><avx512fmaskmodelower>): Ditto.
(avx10_2_cvt2ps2phx_<mode><mask_name><round_name>):
Use VHF_AVX512VL instead of AVX10.2 ones.
(vcvt<convertfp8_pack><mode><mask_name>): Ditto.
(vcvthf82ph<mode><mask_name>): Ditto.
(VHF_AVX10_2_2): Remove not needed TARGET_AVX10_2.
(usdot_prod<sseunpackmodelower><mode>): Use VI2_AVX512F
instead of AVX10.2 ones.
(vdpphps_<mode>): Use VF1_AVX512VL instead of AVX10.2 ones.
(vdpphps_<mode>_mask): Ditto.
(vdpphps_<mode>_maskz): Ditto.
(vdpphps_<mode>_maskz_1): Ditto.
(avx10_2_scalefbf16_<mode><mask_name>): Use VBF instead of
AVX10.2 ones.
(<code><mode>3): Ditto.
(avx10_2_<code>bf16_<mode><mask_name>): Ditto.
(avx10_2_fmaddbf16_<mode>_maskz); Ditto.
(avx10_2_fmaddbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fmaddbf16_<mode>_mask): Ditto.
(avx10_2_fmaddbf16_<mode>_mask3): Ditto.
(avx10_2_fnmaddbf16_<mode>_maskz): Ditto.
(avx10_2_fnmaddbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fnmaddbf16_<mode>_mask): Ditto.
(avx10_2_fnmaddbf16_<mode>_mask3): Ditto.
(avx10_2_fmsubbf16_<mode>_maskz); Ditto.
(avx10_2_fmsubbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fmsubbf16_<mode>_mask): Ditto.
(avx10_2_fmsubbf16_<mode>_mask3): Ditto.
(avx10_2_fnmsubbf16_<mode>_maskz): Ditto.
(avx10_2_fnmsubbf16_<mode><sd_maskz_name>): Ditto.
(avx10_2_fnmsubbf16_<mode>_mask): Ditto.
(avx10_2_fnmsubbf16_<mode>_mask3): Ditto.
(avx10_2_rsqrtbf16_<mode><mask_name>): Ditto.
(avx10_2_sqrtbf16_<mode><mask_name>): Ditto.
(avx10_2_rcpbf16_<mode><mask_name>): Ditto.
(avx10_2_getexpbf16_<mode><mask_name>): Ditto.
(avx10_2_<bf16immop>bf16_<mode><mask_name>): Ditto.
(avx10_2_fpclassbf16_<mode><mask_scalar_merge_name>): Ditto.
(avx10_2_cmpbf16_<mode><mask_scalar_merge_name>): Ditto.
(avx10_2_cvt<sat_cvt_trunc_prefix>bf162i<sat_cvt_sign_prefix>bs<mode><mask_name>):
Ditto.
(avx10_2_cvtph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>):
Use VHF_AVX512VL instead of AVX10.2 ones.
(avx10_2_cvttph2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>):
Ditto.
(avx10_2_cvtps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_name>):
Use VF1_AVX512VL instead of AVX10.2 ones.
(avx10_2_cvttps2i<sat_cvt_sign_prefix>bs<mode><mask_name><round_saeonly_name>):
Ditto.
(avx10_2_vcvtt<castmode>2<sat_cvt_sign_prefix>dqs<mode><mask_name><round_saeonly_name>):
Use VF instead of AVX10.2 ones.
(avx10_2_vcvttpd2<sat_cvt_sign_prefix>qqs<mode><mask_name><round_saeonly_name>):
Use VF2 instead of AVX10.2 ones.
(avx10_2_vcvttps2<sat_cvt_sign_prefix>qqs<mode><mask_name><round_saeonly_name>):
Use VI8 instead of AVX10.2 ones.
(avx10_2_minmaxbf16_<mode><mask_name>): Use VBF instead of
AVX10.2 ones.
(avx10_2_minmaxp<mode><mask_name><round_saeonly_name>):
Use VFH_AVX512VL instead of AVX10.2 ones.
(avx10_2_vmovrs<ssemodesuffix><mode><mask_name>):
Use VI1248_AVX512VLBW instead of AVX10.2 ones.
Haochen Jiang [Wed, 14 May 2025 06:57:41 +0000 (14:57 +0800)]
i386: Remove avx10.1-256/512 and evex512 options
As we mentioned in GCC 15, we will remove avx10.1-256/512 and evex512
in GCC 16. Also, the combination of AVX10 and AVX512 option behavior
will also be simplified in GCC 16 since AVX10.1 now implied AVX512,
making the behavior matching everyone else.
Haochen Jiang [Wed, 14 May 2025 07:19:42 +0000 (15:19 +0800)]
i386: Unpush OPTION_MASK_ISA2_EVEX512 for builtins
As we mentioned in GCC 15, we will remove evex512 in GCC 16 since it
is not useful anymore since we will have 512 bit directly. This patch
will first unpush evex512 in the builtins.
emit-rtl: Allow extra checks for paradoxical subregs [PR119966]
When a paradoxical subreg is detected, validate_subreg exits early, thus
skipping the important checks later in the function.
Fix by continuing with the checks instead of declaring early that the
paradoxical subreg is valid.
One of the newly allowed subsequent checks needed to be disabled for
paradoxical subregs. It turned out that combine attempts to create
a paradoxical subreg of mem even for strict-alignment targets.
That is invalid and should eventually be rejected, but is
temporarily left allowed to prevent regressions for
armv8l-unknown-linux-gnueabihf. See PR120329 for more details.
Tests I did:
- No regressions were found for C and C++ for the following targets:
- native x86_64-pc-linux-gnu
- cross riscv64-unknown-linux-gnu
- cross riscv32-none-elf
- Sanity checked armv8l-unknown-linux-gnueabihf by cross-building
up to including libgcc. Linaro CI bot further confirmed there
are no regressions.
- Sanity checked powerpc64-unknown-linux-gnu by building native
toolchain, but I could not setup qemu-user for DejaGnu testing.
PR target/119966
gcc/ChangeLog:
* emit-rtl.cc (validate_subreg): Do not exit immediately for
paradoxical subregs. Filter subsequent tests which are
not valid for paradoxical subregs.
Co-authored-by: Richard Sandiford <richard.sandiford@arm.com> Signed-off-by: Dimitar Dimitrov <dimitar@dinux.eu>
Eric Botcazou [Sun, 18 May 2025 17:10:26 +0000 (19:10 +0200)]
Partially lift restriction from loc_list_from_tree_1
The function accepts all handled_component_p expressions and decodes them by
means of get_inner_reference as expected, but bails out on bitfields:
/* TODO: We can extract value of the small expression via shifting
even for nonzero bitpos. */
if (list_ret == 0)
return 0;
if (!multiple_p (bitpos, BITS_PER_UNIT, &bytepos)
|| !multiple_p (bitsize, BITS_PER_UNIT))
{
expansion_failed (loc, NULL_RTX,
"bitfield access");
return 0;
}
This lifts the second part of the restriction, which helps for obscure cases
of packed discriminated record types in Ada, although this requires the very
latest GDB sources.
gcc/
* dwarf2out.cc (loc_list_from_tree_1) <COMPONENT_REF>: Do not bail
out when the size is not a multiple of a byte.
Deal with bit-fields whose size is not a multiple of a byte when
dereferencing an address.
Andrew Pinski [Sun, 18 May 2025 00:21:39 +0000 (17:21 -0700)]
phiopt: Use mark_lhs_in_seq_for_dce instead of doing it inline
Right now phiopt has the same code as mark_lhs_in_seq_for_dce
inlined into match_simplify_replacement. Instead let's use the
function in gimple-fold that does the same thing.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* gimple-fold.cc (mark_lhs_in_seq_for_dce): Make
non-static.
* gimple-fold.h (mark_lhs_in_seq_for_dce): Declare.
* tree-ssa-phiopt.cc (match_simplify_replacement): Use
mark_lhs_in_seq_for_dce instead of manually looping.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Oleg Endo [Sat, 17 May 2025 16:51:35 +0000 (10:51 -0600)]
[PATCH] libgcc SH: fix alignment for relaxation
From 6462f1e6a2565c5d4756036d9bc2f39dce9bd768 Mon Sep 17 00:00:00 2001
From: QBos07 <qubos@outlook.de>
Date: Sat, 10 May 2025 16:56:28 +0000
Subject: [PATCH] libgcc SH: fix alignment for relaxation
when relaxation is enabled we can not infer the alignment
from the position as that may change. This should not change
non-relaxed builds as its allready aligned there. This was
the missing piece to building an entire toolchain with -mrelax
Credit goes to Oleg Endo: https://sourceware.org/bugzilla/show_bug.cgi?id=3298#c4
libgcc/
* config/sh/lib1funcs.S (ashiftrt_r4_32): Increase alignment.
(movemem): Force alignment of the mova intruction.
Jeff Law [Sat, 17 May 2025 15:37:01 +0000 (09:37 -0600)]
[RISC-V] Fix ICE due to bogus use of gen_rtvec
Found this while setting up the risc-v coordination branch off of gcc-15. Not
sure why I didn't use rtvec_alloc directly here since we're going to initialize
the whole vector ourselves. Using gen_rtvec was just wrong as it's walking
down a non-existent varargs list. Under the "right" circumstances it can walk
off a page and fault.
This was seen with a test already in the testsuite (I forget which test), so no
new regression test.
Tested in my tester and verified the failure on the coordination branch is
resolved a well. Waiting on pre-commit CI to render a verdict.
gcc/
* config/riscv/riscv-vect-permconst.cc (vector_permconst:process_bb):
Use rtvec_alloc, not gen_rtvec since we don't want/need to initialize
the vector.
Yuao Ma [Sat, 17 May 2025 13:45:49 +0000 (07:45 -0600)]
[PATCH] gcc: add trigonometric pi-based functions as gcc builtins
I committed the wrong version on Yuao's behalf. This followup adds the
documentation changes -- Jeff.
This patch adds trigonometric pi-based functions as gcc builtins: acospi, asinpi, atan2pi,
atanpi, cospi, sinpi, and tanpi. Latest glibc already provides support for
these functions, which we plan to leverage in future gfortran implementations.
The patch includes two test cases to verify both correct code generation and
function definition.
If approved, I suggest committing this foundational change first. Constant
folding for these builtins will be addressed in subsequent patches.
Best regards,
Yuao
From 9a9683d250078ce1bc687797c26ca05a9e91b350 Mon Sep 17 00:00:00 2001
From: Yuao Ma <c8ef@outlook.com>
Date: Wed, 14 May 2025 22:14:00 +0800
Subject: [PATCH] gcc: add trigonometric pi-based functions as gcc builtins
Add trigonometric pi-based functions as GCC builtins: acospi, asinpi, atan2pi,
atanpi, cospi, sinpi, and tanpi. Latest glibc already provides support for
these functions, which we plan to leverage in future gfortran implementations.
The patch includes two test cases to verify both correct code generation and
function definition.
If approved, I suggest committing this foundational change first. Constant
folding for these builtins will be addressed in subsequent patches.
Yuao Ma [Sat, 17 May 2025 13:42:24 +0000 (07:42 -0600)]
[PATCH] gcc: add trigonometric pi-based functions as gcc builtins
This patch adds trigonometric pi-based functions as gcc builtins: acospi, asinpi, atan2pi,
atanpi, cospi, sinpi, and tanpi. Latest glibc already provides support for
these functions, which we plan to leverage in future gfortran implementations.
The patch includes two test cases to verify both correct code generation and
function definition.
If approved, I suggest committing this foundational change first. Constant
folding for these builtins will be addressed in subsequent patches.
Best regards,
Yuao
From 9a9683d250078ce1bc687797c26ca05a9e91b350 Mon Sep 17 00:00:00 2001
From: Yuao Ma <c8ef@outlook.com>
Date: Wed, 14 May 2025 22:14:00 +0800
Subject: [PATCH] gcc: add trigonometric pi-based functions as gcc builtins
Add trigonometric pi-based functions as GCC builtins: acospi, asinpi, atan2pi,
atanpi, cospi, sinpi, and tanpi. Latest glibc already provides support for
these functions, which we plan to leverage in future gfortran implementations.
The patch includes two test cases to verify both correct code generation and
function definition.
If approved, I suggest committing this foundational change first. Constant
folding for these builtins will be addressed in subsequent patches.
Jeff Law [Sat, 17 May 2025 13:16:50 +0000 (07:16 -0600)]
[RISC-V] Avoid setting output object more than once in IOR/XOR synthesis
While evaluating Shreya's logical AND synthesis work on spec2017 I ran into a
code quality regression where combine was failing to eliminate a redundant sign
extension.
I had a hunch the problem would be with the multiple sets of the same pseudo
register in the AND synthesis path. I was right that the problem was multiple
sets of the same pseudo, but it was actually some of the splitters in the
RISC-V backend that were the culprit. Those multiple sets caused the sign bit
tracking code to need to make conservative assumptions thus resulting in
failure to eliminate the unnecessary sign extension.
So before we start moving on the logical AND patch we're going to do some
cleanups.
There's multiple moving parts in play. For example, we have splitters which do
multiple sets of the output register. Fixing some of those independently would
result in a code quality regression. Instead they need some adjustments to or
removal of mvconst_internal. Of course getting rid of mvconst_internal will
trigger all kinds of code quality regressions right now which ultimately lead
back to the need to revamp the logical AND expander. Point being we've got
some circular dependencies and breaking them may result in short term code
quality regressions. I'll obviously try to avoid those as much as possible.
So to start the process this patch adjusts the recently added XOR/IOR synthesis
to avoid re-using the destination register. While the reuse was clearly safe
from a semantic standpoint, various parts of the compiler can do a better job
for pseudos that are only set once.
Given this synthesis path should only be active during initial RTL generation,
we can create new pseudos at will, so we create a new one for each insn. At
the end of the sequence we copy from the last set into the final destination.
This has various trivial impacts on the code generation, but the resulting code
looks no better or worse to me across spec2017.
This has been tested in my tester and is currently bootstrapping on my BPI.
Waiting on data from the pre-commit tester before moving forward...
gcc/
* config/riscv/riscv.cc (synthesize_ior_xor): Avoid writing
operands[0] more than once, use new pseudos instead.
Pan Li [Fri, 16 May 2025 07:34:51 +0000 (15:34 +0800)]
RISC-V: Avoid scalar unsigned SAT_ADD test data duplication
Some of the previous scalar unsigned SAT_ADD test data are
duplicated in different test files. This patch would like to
move them into a shared header file, to avoid the test data
duplication.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
Pengxuan Zheng [Mon, 12 May 2025 17:21:49 +0000 (10:21 -0700)]
aarch64: Add more vector permute tests for the FMOV optimization [PR100165]
This patch adds more tests for vector permutes which can now be optimized as
FMOV with the generic PERM change and the aarch64 AND patch.
Changes since v1:
* v2: Add -mlittle-endian to the little endian tests explicitly and rename the
tests accordingly.
PR target/100165
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/fmov-3-be.c: New test.
* gcc.target/aarch64/fmov-3-le.c: New test.
* gcc.target/aarch64/fmov-4-be.c: New test.
* gcc.target/aarch64/fmov-4-le.c: New test.
* gcc.target/aarch64/fmov-5-be.c: New test.
* gcc.target/aarch64/fmov-5-le.c: New test.
Pengxuan Zheng [Mon, 12 May 2025 17:12:11 +0000 (10:12 -0700)]
aarch64: Optimize AND with certain vector of immediates as FMOV [PR100165]
We can optimize AND with certain vector of immediates as FMOV if the result of
the AND is as if the upper lane of the input vector is set to zero and the lower
lane remains unchanged.
f_v4hi:
movi d31, 0xffffffff
and v0.8b, v0.8b, v31.8b
ret
With this patch, it generates:
f_v4hi:
fmov s0, s0
ret
Changes since v1:
* v2: Simplify the mask checking logic by using native_decode_int and address a
few other review comments.
PR target/100165
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (aarch64_output_fmov): New prototype.
(aarch64_simd_valid_and_imm_fmov): Likewise.
* config/aarch64/aarch64-simd.md (and<mode>3<vczle><vczbe>): Allow FMOV
codegen.
* config/aarch64/aarch64.cc (aarch64_simd_valid_and_imm_fmov): New.
(aarch64_output_fmov): Likewise.
* config/aarch64/constraints.md (Df): New constraint.
* config/aarch64/predicates.md (aarch64_reg_or_and_imm): Update
predicate to support FMOV codegen.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/fmov-1-be.c: New test.
* gcc.target/aarch64/fmov-1-le.c: New test.
* gcc.target/aarch64/fmov-2-be.c: New test.
* gcc.target/aarch64/fmov-2-le.c: New test.
Pengxuan Zheng [Wed, 7 May 2025 17:47:37 +0000 (10:47 -0700)]
aarch64: Recognize vector permute patterns which can be interpreted as AND [PR100165]
Certain permute that blends a vector with zero can be interpreted as an AND of a
mask. This idea was suggested by Richard Sandiford when he was reviewing my
patch which tries to optimizes certain vector permute with the FMOV instruction
for the aarch64 target.
Pengxuan Zheng [Fri, 16 May 2025 00:52:29 +0000 (17:52 -0700)]
aarch64: Fix an oversight in aarch64_evpc_reencode
Some fields (e.g., zero_op0_p and zero_op1_p) of the struct "newd" may be left
uninitialized in aarch64_evpc_reencode. This can cause reading of uninitialized
data. I found this oversight when testing my patches on and/fmov
optimizations. This patch fixes the bug by zero initializing the struct.
Pushed as obvious after bootstrap/test on aarch64-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_evpc_reencode): Zero initialize
newd.
Patrick Palka [Fri, 16 May 2025 17:06:04 +0000 (13:06 -0400)]
libstdc++: Use __is_invocable/nothrow_invocable builtins more
As a follow-up to r15-1253 and r15-1254 which made us use these builtins
in the standard std::is_invocable/nothrow_invocable class templates, let's
also use them directly in the standard variable templates and our internal
C++11 __is_invocable/nothrow_invocable class templates.
libstdc++-v3/ChangeLog:
* include/std/type_traits (__is_invocable): Define in terms of
corresponding builtin if available.
(__is_nothrow_invocable): Likewise.
(is_invocable_v): Likewise.
(is_nothrow_invocable_v): Likewise.
Andrew Pinski [Thu, 15 May 2025 03:41:22 +0000 (20:41 -0700)]
Forwprop: add a debug dump after propagate into comparison does something
I noticed that fowprop does not dump when forward_propagate_into_comparison
did a change to the assign statement.
I am actually using it to help guide changing/improving/add match patterns
instead of depending on doing a tree "combiner" here.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (forward_propagate_into_comparison): Dump
when replacing statement.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Martin Jambor [Fri, 16 May 2025 15:13:51 +0000 (17:13 +0200)]
ipa: Dump cgraph_node UID instead of order into ipa-clones dump file
Since starting from GCC 15 the order is not unique for any
symtab_nodes but m_uid is, I believe we ought to dump the latter in
the ipa-clones dump, if only so that people can reliably match entries
about new clones to those about removed nodes (if any).
This patch also contains a fixes to a few other places where we have
so far dumped order to our ordinary dumps and which have been
identified by Michal Jires.
gcc/ChangeLog:
2025-05-16 Martin Jambor <mjambor@suse.cz>
* cgraph.h (symtab_node): Make member function get_uid const.
* cgraphclones.cc (dump_callgraph_transformation): Dump m_uid of the
call graph nodes instead of order.
* cgraph.cc (cgraph_node::remove): Likewise.
* ipa-cp.cc (ipcp_lattice<valtype>::print): Likewise.
* ipa-sra.cc (ipa_sra_summarize_function): Likewise.
* symtab.cc (symtab_node::dump_base): Likewise.
Andrew Pinski [Sat, 10 May 2025 04:13:48 +0000 (21:13 -0700)]
aarch64: Fix narrowing warning in driver-aarch64.cc [PR118603]
Since the AARCH64_CORE defines in aarch64-cores.def all use -1 for
the variant, it is just easier to add the cast to unsigned in the usage
in driver-aarch64.cc.
Build and tested on aarch64-linux-gnu.
gcc/ChangeLog:
PR target/118603
* config/aarch64/driver-aarch64.cc (aarch64_cpu_data): Add cast to unsigned
to VARIANT of the define AARCH64_CORE.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Sat, 10 May 2025 03:56:42 +0000 (20:56 -0700)]
aarch64: Fix narrowing warning in aarch64_detect_vector_stmt_subtype
There is a narrowing warning in aarch64_detect_vector_stmt_subtype
about gather_load_x32_cost and gather_load_x64_cost converting from int to unsigned.
These fields are always unsigned and even the constructor for sve_vec_cost takes
an unsigned. So let's just move the fields over to unsigned.
Build and tested for aarch64-linux-gnu.
gcc/ChangeLog:
* config/aarch64/aarch64-protos.h (struct sve_vec_cost): Change gather_load_x32_cost
and gather_load_x64_cost fields to unsigned.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Mon, 21 Apr 2025 19:19:49 +0000 (12:19 -0700)]
forwprop: Move memcpy_to_memset from gimple fold to forwprop
Since this optimization now walks the vops, it is better to only
do it in forwprop rather than in all the time in fold_stmt.
The next patch will add the limit to the alias walk.
gcc/ChangeLog:
* gimple-fold.cc (optimize_memcpy_to_memset): Move to
tree-ssa-forwprop.cc.
(gimple_fold_builtin_memory_op): Remove call to
optimize_memcpy_to_memset.
(fold_stmt_1): Likewise.
* tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Move from
gimple-fold.cc.
(simplify_builtin_call): Try to optimize memcpy/memset.
(pass_forwprop::execute): Try to optimize memcpy like assignment
from a previous memset.
gcc/testsuite/ChangeLog:
* gcc.dg/pr78408-1.c: Update scan to forwprop1 only.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Iain Sandoe [Sat, 10 May 2025 16:22:55 +0000 (17:22 +0100)]
c++, coroutines: Allow NVRO in more cases for ramp functions.
The constraints of the c++ coroutines specification require the ramp
to construct a return object early in the function. This will be returned
at some later time. This is implemented as NVRO but requires that copying
be well-formed even though it will be elided. Special-case ramp functions
to allow this.
gcc/cp/ChangeLog:
* typeck.cc (check_return_expr): Suppress conversions for NVRO
in coroutine ramp functions.
Iain Sandoe [Sat, 10 May 2025 16:12:44 +0000 (17:12 +0100)]
c++: Set the outer brace marker for missed cases.
In some cases, a function might be declared as FUNCTION_NEEDS_BODY_BLOCK
but all the content is contained within that block. However, poplevel
is currently assuming that such cases would always contain subblocks.
In the case that we do have a body block, but there are no subblocks
then st the outer brace marker on the body block. This situation occurs
for at least coroutine lambda ramp functions and empty constructors.
gcc/cp/ChangeLog:
* decl.cc (poplevel): Set BLOCK_OUTER_CURLY_BRACE_P on the
body block for functions with no subblocks.
Nathaniel Shead [Fri, 28 Mar 2025 12:30:31 +0000 (23:30 +1100)]
c++/modules: Clean up importer_interface
This patch removes some no longer needed special casing in linkage
determination, and makes the distinction between "always_emit" and
"internal" for better future-proofing.
gcc/cp/ChangeLog:
* module.cc (importer_interface): Adjust flags.
(get_importer_interface): Rename flags.
(trees_out::core_bools): Clean up special casing.
(trees_out::write_function_def): Rename flag.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Jason Merrill [Fri, 16 May 2025 12:22:08 +0000 (08:22 -0400)]
c++: one more coro test tweak
After my r16-670, running the testsuite with explicit --stds didn't run this
one in C++17 mode, but the default did. Let's remove the { target c++17 }
so it doesn't by default, either.
This patch mops up obvious redundancies that weren't caught by the
automatic regexp replacements in earlier patches. It doesn't do
anything with genemit.cc, since that will be part of a later series.
gcc/
* config/arm/arm.cc (arm_gen_load_multiple_1): Simplify use of
end_sequence.
(arm_gen_store_multiple_1): Likewise.
* expr.cc (gen_move_insn): Likewise.
* gentarget-def.cc (main): Likewise.
The start_sequence/end_sequence interface was a big improvement over
the previous state, but one slightly awkward thing about it is that
you have to call get_insns before end_sequence in order to get the
insn sequence itself:
To get the contents of the sequence just made, you must call
`get_insns' *before* calling here.
I can see three main potential objections to this:
(1) It isn't obvious whether ending the sequence would return the first
or the last instruction. But although some code reads *both* the
first and the last instruction, I can't think of a specific case
where code would want *only* the last instruction. All the emit
functions take the first instruction rather than the last.
(2) The "end" in end_sequence might imply the C++ meaning of an exclusive
endpoint iterator. But for an insn sequence, the exclusive endpoint
is always the null pointer, so it would never need to be returned.
That said, we could rename the function to something like
"finish_sequence" or "complete_sequence" if this is an issue.
(3) There might have been an intention that start_sequence/end_sequence
could in future reclaim memory for unwanted sequences, and so an
explicit get_insns was used to indicate that the caller does want
the sequence.
But that sort of memory reclaimation has never been added,
and now that the codebase is C++, it would be easier to handle
using RAII. I think reclaiming memory would be difficult to do in
any case, since some code records the individual instructions that
they emit, rather than using get_insns.
Jonathan Wakely [Thu, 15 May 2025 15:03:53 +0000 (16:03 +0100)]
libstdc++: Fix proc check_v3_target_namedlocale for "" locale [PR65909]
When the last format argument to a Tcl proc is named 'args' it has
special meaning and is a list that accepts any number of arguments[1].
This means when "" is passed to the proc and then we expand "$args" we
get an empty list formatted as "{}". My r16-537-g3e2b83faeb6b14 change
broke all uses of dg-require-namedlocale with empty locale names, "".
By changing the name of the formal argument to 'locale' we avoid the
special behaviour for 'args' and now it only accepts a single argument
(as was always intended). When expanded as "$locale" we get "" as I
expected.
Pan Li [Tue, 13 May 2025 03:12:53 +0000 (11:12 +0800)]
RISC-V: Adjust vx combine test case to avoid name conflict
Given we will put all vx combine for int8 in a single file,
we need to make sure the generate function for different
types and ops has different function name. Thus, refactor
the test helper macros for avoiding possible function name
conflict.
The below test suites are passed for this patch series.
* The rv64gcv fully regression test.
Pan Li [Sun, 11 May 2025 08:20:28 +0000 (16:20 +0800)]
RISC-V: Combine vec_duplicate + vsub.vv to vsub.vx on GR2VR cost
This patch would like to combine the vec_duplicate + vsub.vv to the
vsub.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_BINARY(T, OP) \
void \
test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
out[i] = in[i] OP x; \
}
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*<optab>_vx_<mode>): Add new
pattern to convert vec_duplicate + vsub.vv to vsub.vx.
* config/riscv/riscv.cc (riscv_rtx_costs): Add minus as plus op.
* config/riscv/vector-iterators.md: Add minus to iterator
any_int_binop_no_shift_vx.
Jason Merrill [Sat, 10 May 2025 15:24:38 +0000 (11:24 -0400)]
c++: remove coroutines.exp
coroutines.exp was basically only there to add -std=c++20 to all the tests;
removing it lets us use the general support for running tests under multiple
standards. Doing this revealed that some tests that specifically run in
C++17 mode were relying on -std=c++20 followed by -std=c++17 leaving
flag_coroutines set, which seems unintentional, and different from how we
handle other feature flags. So this changes that, and adds the missing
-fcoroutines to those tests.
Harald Anlauf [Thu, 15 May 2025 19:07:07 +0000 (21:07 +0200)]
Fortran: default-initialization and functions returning derived type [PR85750]
Functions with non-pointer, non-allocatable result and of derived type did
not always get initialized although the type had default-initialization,
and a derived type component had the allocatable or pointer attribute.
Rearrange the logic when to apply default-initialization.
PR fortran/85750
gcc/fortran/ChangeLog:
* resolve.cc (resolve_symbol): Reorder conditions when to apply
default-initializers.
Andrew MacLeod [Wed, 14 May 2025 15:13:15 +0000 (11:13 -0400)]
Allow bitmask intersection to process unknown masks.
bitmask_intersection should not return immediately if the current mask is
unknown. Unknown may mean its the default for a range, and this may
interact in intersting ways with the other bitmask.
PR tree-optimization/116546
* value-range.cc (irange::intersect_bitmask): Allow unknown
bitmasks to be processed.
Andrew MacLeod [Wed, 14 May 2025 15:12:22 +0000 (11:12 -0400)]
Improve constant bitmasks.
bitmasks for constants are created only for trailing zeros. It is no
additional work to also include leading 1's in the value that are also
known.
before : [5, 7] mask 0x7 value 0x0
after : [5, 7] mask 0x3 value 0x4
PR tree-optimization/116546
* value-range.cc (irange_bitmask::irange_bitmask): Include
leading ones in the bitmask.
Andrew MacLeod [Tue, 13 May 2025 17:23:16 +0000 (13:23 -0400)]
Turn get_bitmask_from_range into an irange_bitmask constructor.
There are other places where this is interesting, so move the static
function into a constructor for class irange_bitmask.
* value-range.cc (irange_bitmask::irange_bitmask): Rename from
get_bitmask_from_range and tweak.
(prange::set): Use new constructor.
(prange::intersect): Use new constructor.
(irange::get_bitmask): Likewise.
* value-range.h (irange_bitmask): New constructor prototype.
Robert Dubner [Thu, 15 May 2025 16:01:12 +0000 (12:01 -0400)]
cobol: Don't display 0xFF HIGH-VALUE characters in testcases. [PR120251]
The tests were displaying 0xFF characters, and the resulting generated
output changed with the system locale. The check_88 test was modified
so that the regex comparisons ignore those character positions. Two
of the other tests were changed to output hexadecimal rather than
character strings.
There is one new test, and the other inspect testcases were edited to
remove an unimportant back-apostrophe that had found its way into the
source code sequence number area.
gcc/testsuite/ChangeLog:
PR cobol/120251
* cobol.dg/group1/check_88.cob: Ignore characters above 0x80.
* cobol.dg/group2/ALLOCATE_Rule_8_OPTION_INITIALIZE_with_figconst.cob:
Output HIGH-VALUE as hex, rather than as characters.
* cobol.dg/group2/ALLOCATE_Rule_8_OPTION_INITIALIZE_with_figconst.out:
Likewise.
* cobol.dg/group2/INSPECT_CONVERTING_TO_figurative_constants.cob: Typo.
* cobol.dg/group2/INSPECT_CONVERTING_TO_figurative_constants.out: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_1.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_2.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_3.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_4.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_5-f.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_6.cob: Likewise.
* cobol.dg/group2/INSPECT_ISO_Example_7.cob: Likewise.
* cobol.dg/group2/Multiple_INDEXED_BY_variables_with_the_same_name.cob: New test.
* cobol.dg/group2/Multiple_INDEXED_BY_variables_with_the_same_name.out: New test.
Luc Grosheintz [Wed, 14 May 2025 19:13:52 +0000 (21:13 +0200)]
libstdc++: Fix class mandate for extents.
The standard states that the IndexType must be a signed or unsigned
integer. This mandate was implemented using `std::is_integral_v`. Which
also includes (among others) char and bool, which neither signed nor
unsigned integers.
libstdc++-v3/ChangeLog:
* include/std/mdspan: Implement the mandate for extents as
signed or unsigned integer and not any interal type. Remove
leading underscores from names in static_assert message.
* testsuite/23_containers/mdspan/extents/class_mandates_neg.cc:
Check that extents<char,...> and extents<bool,...> are invalid.
Adjust dg-prune-output pattern.
* testsuite/23_containers/mdspan/extents/misc.cc: Update
tests to avoid `char` and `bool` as IndexType.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Jonathan Wakely [Thu, 15 May 2025 10:01:05 +0000 (11:01 +0100)]
libstdc++: Fix std::format_kind primary template for Clang [PR120190]
Although Clang trunk has been adjusted to handle our std::format_kind
definition (because they need to be able to compile the GCC 15.1.0
release), it's probably better to not rely on something that they might
start diagnosing again in future.
Define the primary template in terms of an immediately invoked function
expression, so that we can put a static_assert(false) in the body.
libstdc++-v3/ChangeLog:
PR libstdc++/120190
* include/std/format (format_kind): Adjust primary template to
not depend on itself.
* testsuite/std/format/ranges/format_kind_neg.cc: Adjust
expected errors. Check more invalid specializations.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Reviewed-by: Daniel Krügler <daniel.kruegler@gmail.com>
Jeff Law [Thu, 15 May 2025 15:03:13 +0000 (09:03 -0600)]
[RISC-V][PR target/120223] Don't use bset/binv for XTHEADBS
Thead has the XTHEADBB extension which has a lot of overlap with Zbb. I made
the incorrect assumption that XTHEADBS would largely be like Zbs when
generalizing Shreya's work.
As a result we can't use the operation synthesis code for IOR/XOR because we
don't have binv/bset like capabilities. I should have double checked on
XTHEADBS, my bad.
Anyway, the fix is trivial. Don't allow bset/binv based on XTHEADBS.
Already spun in my tester. Spinning in the pre-commit CI system now.
PR target/120223
gcc/
* config/riscv/riscv.cc (synthesize_ior_xor): XTHEADBS does not have
single bit manipulations.
Patrick Palka [Thu, 15 May 2025 15:07:53 +0000 (11:07 -0400)]
c++: unifying specializations of non-primary tmpls [PR120161]
Here unification of P=Wrap<int>::type, A=Wrap<long>::type wrongly
succeeds ever since r14-4112 which made the RECORD_TYPE case of unify
no longer recurse into template arguments for non-primary templates
(since they're a non-deduced context) and so the int/long mismatch that
makes the two types distinct goes unnoticed.
In the case of (comparing specializations of) a non-primary template,
unify should still go on to compare the types directly before returning
success.
PR c++/120161
gcc/cp/ChangeLog:
* pt.cc (unify) <case RECORD_TYPE>: When comparing specializations
of a non-primary template, still perform a type comparison.
Jason Merrill [Fri, 9 May 2025 23:13:49 +0000 (19:13 -0400)]
c++: -fimplicit-constexpr and modules
Import didn't like differences in DECL_DECLARED_CONSTEXPR_P due to implicit
constexpr, breaking several g++.dg/modules tests; we should handle that
along with DECL_MAYBE_DELETED. For which we need to stream the bit.
gcc/cp/ChangeLog:
* module.cc (trees_out::lang_decl_bools): Stream implicit_constexpr.
(trees_in::lang_decl_bools): Likewise.
(trees_in::is_matching_decl): Check it.
Jason Merrill [Wed, 14 May 2025 14:23:32 +0000 (10:23 -0400)]
c++: one more PR99599 tweak
Patrick pointed out that if the parm/arg types aren't complete yet at this
point, it would affect the type_has_converting_constructor and
TYPE_HAS_CONVERSION tests. I don't have a testcase, but it makes sense for
safety.
PR c++/99599
gcc/cp/ChangeLog:
* pt.cc (conversion_may_instantiate_p): Make sure
classes are complete.
Jason Merrill [Thu, 1 May 2025 14:20:25 +0000 (10:20 -0400)]
libstdc++: build testsuite with -Wabi
I added this locally to check whether the PR120012 fix affects libstdc++ (it
doesn't) but it seems more generally useful to catch whether compiler
ABI changes have library impact.
As a followup to PAREN_EXPR verification, let's ensure that CONJ_EXPR is
only used with COMPLEX_TYPE. While at it, move the whole block towards
the end of the switch, because unlike the other entries it needs to
break out of the switch, not immediately return from the function,
as after the switch we check that types of LHS and RHS match.
Refactor a bit to avoid repeated blocks with debug_generic_expr.
gcc/ChangeLog:
* tree-cfg.cc (verify_gimple_assign_unary): Accept only
COMPLEX_TYPE for CONJ_EXPR.
Tobias Burnus [Thu, 15 May 2025 07:15:21 +0000 (09:15 +0200)]
OpenMP/Fortran: Fix allocatable-component mapping of derived-type array comps
The check whether the location expression in map clause has allocatable
components was failing for some derived-type array expressions such as
map(var%tiles(1))
as the compiler produced
_4 = var.tiles;
MEMREF(_4, _5);
This commit now also handles this case.
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_omp_deep_mapping_do): Handle SSA_NAME if
a def_stmt is available.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/alloc-comp-4.f90: New test.
Andrew Pinski [Wed, 14 May 2025 16:01:07 +0000 (09:01 -0700)]
tree: Canonical order for ADDR
This is the followup based on the review at
https://inbox.sourceware.org/gcc-patches/CAFiYyc3xeG75dsWaF63Zbu5uELPEAEoHwGfoGaVyDWouUJ70Mg@mail.gmail.com/
.
We should put ADDR_EXPR last instead of just is_gimple_invariant_address ones.
Note a few match patterns needed to be updated for this change but we get a decent improvement
as forwprop-38.c is now able to optimize during CCP rather than taking all the way to forwprop.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* fold-const.cc (tree_swap_operands_p): Put ADDR_EXPR last
instead of just is_gimple_invariant_address ones.
* match.pd (`a ptr+ b !=\== ADDR`, `ADDR !=/== ssa_name`):
Move the ADDR to the last operand. Update comment.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Richard Biener [Wed, 14 May 2025 14:45:08 +0000 (16:45 +0200)]
Enhance -fopt-info-vec vectorized loop diagnostic
The following includes whether we vectorize an epilogue, whether
we use loop masking and what vectorization factor (unroll factor)
we use. So it's now
t.c:4:21: optimized: loop vectorized using 64 byte vectors and unroll factor 32
t.c:4:21: optimized: epilogue loop vectorized using masked 64 byte vectors and unroll factor 32
for a masked epilogue with AVX512 and HImode data for example. Rather
than
t.c:4:21: optimized: loop vectorized using 64 byte vectors
t.c:4:21: optimized: loop vectorized using 64 byte vectors
I verified we don't translate opt-info messages and thus excessive
use of %s to compose the strings should be OK.
* tree-vectorizer.cc (vect_transform_loops): When diagnosing
a vectorized loop indicate whether we vectorized an epilogue,
whether we used masked vectors and what unroll factor was
used.
Richard Biener [Wed, 14 May 2025 14:36:29 +0000 (16:36 +0200)]
Fix regression from x86 multi-epilogue tuning
With the avx512_two_epilogues tuning enabled for zen4 and zen5
the gcc.target/i386/vect-epilogues-5.c testcase below regresses
and ends up using AVX2 sized vectors for the masked epilogue
rather than AVX512 sized vectors. The following patch rectifies
this and adds coverage for the intended behavior.
* config/i386/i386.cc (ix86_vector_costs::finish_cost):
Do not suggest a first epilogue mode for AVX512 sized
main loops with X86_TUNE_AVX512_TWO_EPILOGUES as that
interferes with using a masked epilogue.
Simon Martin [Wed, 14 May 2025 18:29:57 +0000 (20:29 +0200)]
c++: Add testcase for issue fixed in GCC 15 [PR120126]
Patrick noticed that this PR's testcase has been fixed by the patch for
PR c++/114292 (r15-7238-gceabea405ffdc8), more specifically the part
that walks the type of DECL_EXPR DECLs.