Hu, Lin1 [Thu, 1 Feb 2024 07:15:01 +0000 (15:15 +0800)]
vect: generate suitable convert insn for int -> int, float -> float and int <-> float.
gcc/ChangeLog:
PR target/107432
* tree-vect-generic.cc
(expand_vector_conversion): Support convert for int -> int,
float -> float and int <-> float.
* tree-vect-stmts.cc (vectorizable_conversion): Wrap the
indirect convert part.
(supportable_indirect_convert_operation): New function.
* tree-vectorizer.h (supportable_indirect_convert_operation):
Define the new function.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-3.c: New test.
Xi Ruoyao [Sat, 15 Jun 2024 10:29:43 +0000 (18:29 +0800)]
LoongArch: Tweak IOR rtx_cost for bstrins
Consider
c &= 0xfff;
a &= ~0xfff;
b &= ~0xfff;
a |= c;
b |= c;
This can be done with 2 bstrins instructions. But we need to recognize
it in loongarch_rtx_costs or the compiler will not propagate "c & 0xfff"
forward.
gcc/ChangeLog:
* config/loongarch/loongarch.cc:
(loongarch_use_bstrins_for_ior_with_mask): Split the main logic
into ...
(loongarch_use_bstrins_for_ior_with_mask_1): ... here.
(loongarch_rtx_costs): Special case for IOR those can be
implemented with bstrins.
liuhongt [Mon, 24 Jun 2024 09:53:22 +0000 (17:53 +0800)]
Fix wrong cost of MEM when addr is a lea.
416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0.
The commit adjust rtx_cost of mem to reduce cost of (add op0 disp).
But Cost of ADDR could be cheaper than XEXP (addr, 0) when it's a lea.
It is the case in the PR, the patch adjust rtx_cost to only handle reg
+ disp, for other forms, they're basically all LEA which doesn't have
additional cost of ADD.
gcc/ChangeLog:
PR target/115462
* config/i386/i386.cc (ix86_rtx_costs): Make cost of MEM (reg +
disp) just a little bit more than MEM (reg).
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr115462.c: New test.
Pan Li [Wed, 26 Jun 2024 01:28:05 +0000 (09:28 +0800)]
Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int
This patch would like to add the middle-end presentation for the
saturation truncation. Aka set the result of truncated value to
the max value when overflow. It will take the pattern similar
as below.
Form 1:
#define DEF_SAT_U_TRUC_FMT_1(WT, NT) \
NT __attribute__((noinline)) \
sat_u_truc_##T##_fmt_1 (WT x) \
{ \
bool overflow = x > (WT)(NT)(-1); \
return ((NT)x) | (NT)-overflow; \
}
For example, truncated uint16_t to uint8_t, we have
Before this patch:
__attribute__((noinline))
uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
{
_Bool overflow;
unsigned int _1;
unsigned int _2;
unsigned int _3;
uint32_t _6;
The below tests are passed for this patch:
*. The rv64gcv fully regression tests.
*. The rv64gcv build with glibc.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.
gcc/ChangeLog:
* internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as
unary_convert.
* match.pd: Add new matching pattern for unsigned int sat_trunc.
* optabs.def (OPTAB_CL): Add unsigned and signed optab.
* tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add
new decl for the matching pattern generated func.
(match_unsigned_saturation_trunc): Add new func impl to match
the .SAT_TRUNC.
(math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match
function under BIT_IOR_EXPR case.
This patch would like to improve the pattern match to recog above
as truncate after .SAT_SUB pattern. Then we will have the pattern
similar to below, as well as eliminate the first 3 dead stmt.
The below tests are passed for this patch.
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.
gcc/ChangeLog:
* match.pd: Add convert description for minus and capture.
* tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
new logic to handle in_type is incompatibile with out_type, as
well as rename from.
(vect_recog_build_binary_gimple_stmt): Rename to.
(vect_recog_sat_add_pattern): Leverage above renamed func.
(vect_recog_sat_sub_pattern): Ditto.
Richard Biener [Wed, 26 Jun 2024 17:23:26 +0000 (19:23 +0200)]
tree-optimization/115652 - amend last fix
The previous fix breaks in the degenerate case when the discovered
last_stmt is equal to the first stmt in the block since then we
undo a required stmt advancement.
PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Only insert
at the start of the block if that strictly dominates
the discovered dependent stmt.
Jonathan Wakely [Tue, 25 Jun 2024 22:59:19 +0000 (23:59 +0100)]
libstdc++: Add script to update docs for a new release branch
This should be run on a release branch after branching from trunk.
Various links and references to trunk in the docs will be updated to
refer to the new release branch.
Jonathan Wakely [Thu, 20 Jun 2024 21:17:08 +0000 (22:17 +0100)]
libstdc++: Remove duplicate test
We currently have 808590.cc which only runs for C++98 mode, and
808590-cxx11.cc which only runs for C++11 and later, but have almost
identical content (except for a defaulted special member in the C++11
one, to suppress a -Wdeprecated-copy warning).
This was done originally to ensure that the test ran for both C++98 mode
and C++11 mode, because the logic being tested was different enough to
need both to be tested. But it's trivial to run all tests in multiple
-std modes now, using GLIBCXX_TESTSUITE_STDS, so we don't need two
separate tests. We can remove one of the tests and allow the other one
to run in any -std mode.
libstdc++-v3/ChangeLog:
* testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc:
Copy defaulted assignment operator from 808590-cxx11.cc to
suppress a warning.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/808590-cxx11.cc:
Removed.
Jonathan Wakely [Thu, 6 Jun 2024 10:50:06 +0000 (11:50 +0100)]
libstdc++: Work around some PSTL test failures for debug mode [PR90276]
This addresses one known failure due to a bug in the upstream tests, and
a number of timeouts due to the algorithms running much more slowly with
debug mode checks enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/90276
* testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc
[_GLIBCXX_DEBUG]: Add xfail-run-if for debug mode.
* testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc
[_GLIBCXX_DEBUG]: Reduce size of test data.
* testsuite/25_algorithms/pstl/alg_sorting/includes.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_util.h:
Likewise.
Jonathan Wakely [Tue, 30 Apr 2024 08:52:13 +0000 (09:52 +0100)]
libstdc++: Fix std::chrono::tzdb to work with vanguard format
I found some issues in the std::chrono::tzdb parser by testing the
tzdata "vanguard" format, which uses new features that aren't enabled in
the "main" and "rearguard" data formats.
Since 2024a the keyword "minimum" is no longer valid for the FROM and TO
fields in a Rule line, which means that "m" is now a valid abbreviation
for "maximum". Previously we expected either "mi" or "ma". For backwards
compatibility, a FROM field beginning with "mi" is still supported and
is treated as 1900. The "maximum" keyword is only allowed in TO now,
because it makes no sense in FROM. To support these changes the
minmax_year and minmax_year2 classes for parsing FROM and TO are
replaced with a single years_from_to class that reads both fields.
The vanguard format makes use of %z in Zone FORMAT fields, which caused
an exception to be thrown from ZoneInfo::set_abbrev because no % or /
characters were expected when a Zone doesn't use a named Rule. The
ZoneInfo::to(sys_info&) function now uses format_abbrev_str to replace
any %z with the current offset. Although format_abbrev_str also checks
for %s and STD/DST formats, those only make sense when a named Rule is
in effect, so won't occur when ZoneInfo::to(sys_info&) is used.
This change also implements a feature that has always been missing from
time_zone::_M_get_sys_info: finding the Rule that is active before the
specified time point, so that we can correctly handle %s in the FORMAT
for the first new sys_info that gets created. This requires implementing
a poorly documented feature of zic, to get the LETTERS field from a
later transition, as described at
https://mm.icann.org/pipermail/tz/2024-April/058891.html
In order for this to work we need to be able to distinguish an empty
letters field (as used by CE%sT where the variable part is either empty
or "S") from "the letters field is not known for this transition". The
tzdata file uses "-" for an empty letters field, which libstdc++ was
previously replacing with "" when the Rule was parsed. Instead, we now
preserve the "-" in the Rule object, so that "" can be used for the case
where we don't know the letters (and so need to decide it).
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (minmax_year, minmax_year2): Remove.
(years_from_to): New class replacing minmax_year and
minmax_year2.
(format_abbrev_str, select_std_or_dst_abbrev): Move earlier in
the file. Handle "-" for letters.
(ZoneInfo::to): Use format_abbrev_str to expand %z.
(ZoneInfo::set_abbrev): Remove exception. Change parameter from
reference to value.
(operator>>(istream&, Rule&)): Do not clear letters when it
contains "-".
(time_zone::_M_get_sys_info): Add missing logic to find the Rule
in effect before the time point.
* testsuite/std/time/tzdb/1.cc: Adjust for vanguard format using
"GMT" as the Zone name, not as a Link to "Etc/GMT".
* testsuite/std/time/time_zone/sys_info_abbrev.cc: New test.
Richard Biener [Tue, 25 Jun 2024 12:04:31 +0000 (14:04 +0200)]
tree-optimization/115629 - missed tail merging
The following fixes a missed tail-merging observed for the testcase
in PR115629. The issue is that when deps_ok_for_redirect doesn't
compute both would be valid prevailing blocks it rejects the merge.
The following instead makes sure to record the working block as
prevailing. Also stmt comparison fails for indirect references
and is not handling memory references thoroughly, failing to unify
array indices and pointers indirected. The following attempts to
fix this.
PR tree-optimization/115629
* tree-ssa-tail-merge.cc (gimple_equal_p): Handle
memory references better.
(deps_ok_for_redirect): Handle the case not both blocks
are considered a valid prevailing block.
Patrick O'Neill [Tue, 25 Jun 2024 21:14:18 +0000 (14:14 -0700)]
RISC-V: Update testcase comments to point to PSABI rather than Table A.6
Table A.6 was originally the source of truth for the recommended mappings.
Point to the PSABI doc since the memory model mappings have been moved there.
Patrick O'Neill [Tue, 25 Jun 2024 21:14:17 +0000 (14:14 -0700)]
RISC-V: Consolidate amo testcase variants
Many riscv/amo/ testcases use check-function-bodies. These testcases can be
consolidated with related testcases (memory ordering variants) without affecting
the assertions.
Give functions descriptive names so testsuite failures are obvious from the
'FAIL:' line.
Carl Love [Fri, 21 Jun 2024 15:56:36 +0000 (11:56 -0400)]
rs6000, change altivec*-runnable.c test file names
Changed the names of the test files.
gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the name to
altivec-38.c.
* gcc.target/powerpc/altivec-2-runnable.c: Change the name to
p8vector-builtin-9.c.
Jeff Law [Wed, 26 Jun 2024 13:20:29 +0000 (07:20 -0600)]
[committed] Remove compromised sh test
Surya's recent patch to IRA improves the code for sh/pr54602-1.c slightly.
Specifically it's able to eliminate a save/restore in the prologue/epilogue and
a bit of register shuffling.
As a result there literally aren't any insns that can be used to fill the delay
slot of the return, so a nop gets emitted and the test fails.
Given there literally aren't any insns to move into the delay slot, the best
course of action is to just drop the test.
Jeff Law [Wed, 26 Jun 2024 12:59:26 +0000 (06:59 -0600)]
[committed][RISC-V] Fix expected output for thead store pair test
Surya's patch to IRA has improved the code we generate for one of the thead
store pair tests for both rv32 and rv64. This patch adjusts the expectations
of that test.
I've verified that the test now passes on rv32 and rv64 in my tester. Pushing
to the trunk.
Richard Biener [Wed, 26 Jun 2024 07:25:27 +0000 (09:25 +0200)]
tree-optimization/115652 - adjust insertion gsi for SLP
The following adjusts how SLP computes the insertion location. In
particular it advanced the insert iterator of the found last_stmt.
The vectorizer will later insert stmts _before_ it. But we also
have the constraint that possibly masked ops may not be scheduled
outside of the loop and as we do not model the loop mask in the
SLP graph we have to adjust for that. The following moves this
to after the advance since it isn't compatible with that as the
current GIMPLE_COND exception shows. The PR is about in-order
reduction vectorization which also isn't happy when that's the
very first stmt.
PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Advance the
iterator based on last_stmt only for vector defs.
Jørgen Kvalsvik [Tue, 4 Jun 2024 12:16:22 +0000 (14:16 +0200)]
Record edge true/false value for gcov
Make gcov aware which edges are the true/false to more accurately
reconstruct the CFG. There are plenty of bits left in arc_info and it
opens up for richer reporting.
Jørgen Kvalsvik [Tue, 25 Jun 2024 06:41:45 +0000 (08:41 +0200)]
Use the term MC/DC in help for gcov --conditions
Without key terms like "masking" and "MC/DC" it is not at all obvious
what --conditions actually reports on, and there is no easy path for the
user to figure out. By at least including the two key terms MC/DC and
masking users have something to search for.
Andre Vieira [Wed, 26 Jun 2024 10:07:01 +0000 (11:07 +0100)]
arm: make arm_predict_doloop_p reject loops with calls
With the introduction of low overhead loops we defined arm_predict_doloop_p,
this is meant to be a low-weight check to rule out loops we are not considering
for doloop optimization and it is used by other passes to prevent optimizations
that may hurt the doloop optimization later on. The reason these are meant to be
lightweight is because it's used by pre-RTL optimizations, meaning we can't do
the same checks that doloop does.
After the definition of arm_predict_doloop_p, when testing for armv8.1-m.main,
tree-ssa/ivopts-3.c failed the scan-dump check as the dump now matched an extra
'!= 0' introduced by:
Doloop cmp iv use: if (ivtmp_1 != 0)
Predict loop 1 can perform doloop optimization later.
where previously we had:
Predict doloop failure due to target specific checks.
and after this patch:
Predict doloop failure due to call in loop.
Predict doloop failure due to target specific checks.
Added a copy of the original tree-ssa/ivopts-3.c as a target specifc test to
check for the new dump message.
gcc/ChangeLog:
* config/arm/arm.cc (arm_predict_doloop_p): Reject loops with function
calls that are not builtins.
Kyrylo Tkachov [Wed, 26 Jun 2024 07:42:11 +0000 (09:42 +0200)]
[aarch64] Add support for -mcpu=grace
This adds support for the NVIDIA Grace CPU to aarch64.
We reuse the tuning decisions for the Neoverse V2 core, but include a
number of architecture features that are not enabled by default in
-mcpu=neoverse-v2.
This allows Grace users to more simply target the CPU with -mcpu=grace
rather than remembering what extensions to tag on top of
-mcpu=neoverse-v2.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
* config/aarch64/aarch64-cores.def (grace): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.
Evgeny Karpov [Tue, 25 Jun 2024 21:59:35 +0000 (21:59 +0000)]
i386: Remove declaration of unused functions
The patch fixes the issue introduced in
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=63512c72df09b43d56ac7680cdfd57a66d40c636
and reported at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655599.html .
Regards,
Evgeny
The patch fixes the issue with compilation on x86_64-gnu-linux
when warnings for unused functions are treated as errors.
Kewen Lin [Wed, 26 Jun 2024 07:16:17 +0000 (02:16 -0500)]
rs6000: Fix wrong RTL patterns for vector merge high/low short on LE
Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low short, which are altivec_vmrg[hl]h.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs. These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order. So it's
mapped into vmrghh on BE while vmrglh on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
16-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence. The newly constructed test case
pr106069-2.c is a typical example for this issue on element type
short.
So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns. With the
proposed patch, the expanders like altivec_vmrghh expands
into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.
Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355
gcc/ChangeLog:
* config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ...
(altivec_vmrghh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghh_direct_le): New define_insn.
(altivec_vmrglh_direct): Rename to ...
(altivec_vmrglh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglh_direct_le): New define_insn.
(altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be
for BE and gen_altivec_vmrglh_direct_le for LE.
(altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be
for BE and gen_altivec_vmrghh_direct_le for LE.
(vec_widen_umult_hi_v16qi): Adjust the call to
gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE
and by gen_altivec_vmrglh for LE.
(vec_widen_smult_hi_v16qi): Likewise.
(vec_widen_umult_lo_v16qi): Adjust the call to
gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE
and by gen_altivec_vmrghh for LE.
(vec_widen_smult_lo_v16qi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghh_direct by
CODE_FOR_altivec_vmrghh_direct_be for BE and
CODE_FOR_altivec_vmrghh_direct_le for LE. And replace
CODE_FOR_altivec_vmrglh_direct by
CODE_FOR_altivec_vmrglh_direct_be for BE and
CODE_FOR_altivec_vmrglh_direct_le for LE.
Kewen Lin [Wed, 26 Jun 2024 07:16:17 +0000 (02:16 -0500)]
rs6000: Fix wrong RTL patterns for vector merge high/low char on LE
Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low char, which are altivec_vmrg[hl]b.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs. These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order. So it's
mapped into vmrghb on BE while vmrglb on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
8-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence. The newly constructed test case
pr106069-1.c is a typical example for this issue.
So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns. With the
proposed patch, the expanders like altivec_vmrghb expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.
Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355
gcc/ChangeLog:
* config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ...
(altivec_vmrghb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghb_direct_le): New define_insn.
(altivec_vmrglb_direct): Rename to ...
(altivec_vmrglb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglb_direct_le): New define_insn.
(altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be
for BE and gen_altivec_vmrglb_direct_le for LE.
(altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be
for BE and gen_altivec_vmrghb_direct_le for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghb_direct by
CODE_FOR_altivec_vmrghb_direct_be for BE and
CODE_FOR_altivec_vmrghb_direct_le for LE. And replace
CODE_FOR_altivec_vmrglb_direct by
CODE_FOR_altivec_vmrglb_direct_be for BE and
CODE_FOR_altivec_vmrglb_direct_le for LE.
Alexandre Oliva [Wed, 26 Jun 2024 05:08:27 +0000 (02:08 -0300)]
[libstdc++] [testsuite] no libatomic for vxworks
libatomic hasn't been ported to vxworks. Most of the stdatomic.h and
<atomic> underlying requirements are provided by builtins and libgcc,
and the vxworks libc already provides remaining __atomic symbols, so
porting libatomic doesn't seem to make sense.
However, some of the target arch-only tests in
add_options_for_libatomic cover vxworks targets, so we end up
attempting to link libatomic in, even though it's not there.
Preempt those too-broad tests.
Co-Authored-By: Marc Poulhiès <poulhies@adacore.com>
for libstdc++-v3/ChangeLog
* testsuite/lib/dg-options.exp (add_options_for_libatomic):
None for *-*-vxworks*.
Alexandre Oliva [Wed, 26 Jun 2024 05:08:18 +0000 (02:08 -0300)]
[testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
The test was too optimistic, alas. We used to vectorize shifts by
clamping the shift counts below the bit width of the types (e.g. at 15
for 16-bit vector elements), but (uint16_t)32768 >> (uint16_t)16 is
well defined (because of promotion to 32-bit int) and must yield 0,
not 1 (as before the fix).
Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.
So the test that expected the vectorization we no longer performed
needs to be adjusted. Instead of nobbling the test, Richard Earnshaw
suggested annotating the test with the expected ranges so as to enable
the optimization, and Christophe Lyon suggested a further
simplification.
Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>
for gcc/testsuite/ChangeLog
liuhongt [Thu, 20 Jun 2024 04:41:13 +0000 (12:41 +0800)]
Optimize a < 0 ? -1 : 0 to (signed)a >> 31.
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
and x < 0 ? 1 : 0 into (unsigned) x >> 31.
Move the optimization did in ix86_expand_int_vcond to match.pd
gcc/ChangeLog:
PR target/114189
* match.pd: Simplify a < 0 ? -1 : 0 to (signed) >> 31 and a <
0 ? 1 : 0 to (unsigned) a >> 31 for vector integer type.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-pr115517.c: New test.
* gcc.target/i386/avx512-pr115517.c: New test.
* g++.target/i386/avx2-pr115517.C: New test.
* g++.target/i386/avx512-pr115517.C: New test.
* g++.dg/tree-ssa/pr88152-1.C: Adjust testcase.
This moves all of the uses of global_dc within diagnostic.cc (including
the definition) to a new diagnostic-global-context.cc. My intent is to
make clearer those parts of our internal API that implicitly use
global_dc, and to perhaps avoid linking global_dc into a future
libdiagnostics.so.
gcc/ChangeLog:
* diagnostic-path.cc (class path_label): Add m_path field,
and use it to replace all uses of global_dc.
(event_range::event_range): Add "ctxt" param and use it to
construct m_path_label.
(event_range::maybe_add_event): Add "ctxt" param and pass it to
gcc_rich_location::add_location_if_nearby.
(path_summary::path_summary): Add "ctxt" param and pass it to
event_range::maybe_add_event.
(diagnostic_context::print_path): Pass *this to path_summary ctor.
(selftest::test_empty_path): Use "dc" when constructing
path_summary rather than implicitly using global_dc.
(selftest::test_intraprocedural_path): Likewise.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.
(selftest::diagnostic_path_cc_tests): Eliminate use of global_dc.
* diagnostic-show-locus.cc
(gcc_rich_location::add_location_if_nearby): Add "ctxt" param and
use it instead of implicitly using global_dc.
(selftest::test_add_location_if_nearby): Use
test_diagnostic_context rather than implicitly using global_dc.
* diagnostic.cc (pedantic_warning_kind): Delete macro.
(permissive_error_kind): Delete macro.
(permissive_error_option): Delete macro.
(diagnostic_context::diagnostic_enabled): Remove use of
permissive_error_option.
(diagnostic_context::report_diagnostic): Remove use of
pedantic_warning_kind.
(diagnostic_impl): Convert to...
(diagnostic_context::diagnostic_impl): ...this.
(diagnostic_n_impl): Convert to...
(diagnostic_context::diagnostic_n_impl): ...this.
(emit_diagnostic): Explicitly use global_dc for method call.
(emit_diagnostic_valist): Likewise.
(emit_diagnostic_valist_meta): Likewise.
(inform): Likewise.
(inform_n): Likewise.
(warning): Likewise.
(warning_at): Likewise.
(warning_meta): Likewise.
(warning_n): Likewise.
(pedwarn): Likewise.
(permerror): Likewise.
(permerror_opt): Likewise.
(error): Likewise.
(error_n): Likewise.
(error_at): Likewise.
(error_meta): Likewise.
(sorry): Likewise.
(sorry_at): Likewise.
(fatal_error): Likewise.
(internal_error): Likewise.
(internal_error_no_backtrace): Likewise.
* diagnostic.h (diagnostic_context::diagnostic_impl): New decl.
(diagnostic_context::diagnostic_n_impl): New decl.
* gcc-rich-location.h (gcc_rich_location::add_location_if_nearby):
Add "ctxt" param.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 26 Jun 2024 00:26:21 +0000 (20:26 -0400)]
testsuite: use check-jsonschema for validating .sarif files [PR109360]
As reported here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655434.html
the schema validation I added for generated .sarif files in r15-1541-ga84fe222029ff2 used the "jsonschema" command line tool, which
has been deprecated by more recent versions of the Python 3 "jsonschema"
module.
This patch updates the validation to use the more recent
"check-jsonschema" command line tool, from the Python 3 "check-jsonschema"
module, fixing the testsuite FAILs due to the deprecation message.
As an added bonus, the output on validation failures is *much* nicer, e.g.
if I undo r15-1540-g9f4fdc3acebcf6, the error messages begin like this:
verify-sarif-file: res: Schema validation errors were encountered.
diagnostic-format-sarif-file-bad-utf8-pr109098-1.c.sarif::$.runs[0].results[0].locations[0].physicalLocation.region.startColumn: 0 is less than the minimum of 1
diagnostic-format-sarif-file-bad-utf8-pr109098-1.c.sarif::$.runs[0].results[0].relatedLocations[0].physicalLocation.region.startColumn: 0 is less than the minimum of 1
diagnostic-format-sarif-file-bad-utf8-pr109098-1.c.sarif::$.runs[0].results[0].relatedLocations[1].physicalLocation.region.startColumn: 0 is less than the minimum of 1
diagnostic-format-sarif-file-bad-utf8-pr109098-1.c.sarif::$.runs[0].results[0].relatedLocations[2].physicalLocation.region.startColumn: 0 is less than the minimum of 1
child process exited abnormally
FAIL: c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-1.c -Wc++-compat (test .sarif output against SARIF schema)
Tested with Python 3.8 with check_jsonschema 0.28.6
gcc/ChangeLog:
PR testsuite/109360
* doc/install.texi (Python3 modules): Update SARIF validation
requirement to use check-jsonschema rather than jsonschema.
gcc/testsuite/ChangeLog:
PR testsuite/109360
* lib/scansarif.exp (verify-sarif-file): Use check-jsonschema
rather than jsonschema, updating the invocation accordingly.
* lib/target-supports.exp (check_effective_target_jsonschema): Convert
to...
(check_effective_target_check_jsonschema): ...this.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Dispatching to partial specializations doesn't really seem to offer much
benefit here. The __is_trivial(T) condition is a compile-time constant
so the untaken branches are dead code and don't cost us anything.
libstdc++-v3/ChangeLog:
* include/bits/valarray_array.h (_Array_default_ctor): Remove.
(__valarray_default_construct): Inline it into here.
(_Array_init_ctor): Remove.
(__valarray_fill_construct): Inline it into here.
(_Array_copy_ctor): Remove.
(__valarray_copy_construct(const T*, const T*, T*)): Inline it
into here.
(__valarray_copy_construct(const T*, size_t, size_t, T*)):
Use _GLIBCXX17_CONSTEXPR for constant condition.
Gaius Mulley [Tue, 25 Jun 2024 22:11:29 +0000 (23:11 +0100)]
modula2: tidyup remove unused procedures and unused parameters
This patch removes M2GenGCC.mod:QuadCondition and
M2Quads.mod:GenQuadOTypeUniquetok. It also removes unused parameter
WalkAction for all FoldIf procedures.
Marek Polacek [Mon, 17 Jun 2024 21:53:12 +0000 (17:53 -0400)]
c++: ICE with __has_unique_object_representations [PR115476]
Here we started to ICE with r13-25: in check_trait_type, for "X[]" we
return true here:
if (kind == 1 && TREE_CODE (type) == ARRAY_TYPE && !TYPE_DOMAIN (type))
return true; // Array of unknown bound. Don't care about completeness.
and then end up crashing in record_has_unique_obj_representations:
4836 if (cur != wi::to_offset (sz))
because sz is null.
https://eel.is/c++draft/type.traits#tab:meta.unary.prop-row-47-column-3-sentence-1
says that the preconditions for __has_unique_object_representations are:
"T shall be a complete type, cv void, or an array of unknown bound" and
that "For an array type T, the same result as
has_unique_object_representations_v<remove_all_extents_t<T>>" so T[]
should be treated as T. So we should use kind==2 for the trait.
PR c++/115476
gcc/cp/ChangeLog:
* semantics.cc (finish_trait_expr)
<case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS>: Move below to call
check_trait_type with kind==2.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/has-unique-obj-representations4.C: New test.
Sergei Lewis [Tue, 25 Jun 2024 21:26:14 +0000 (15:26 -0600)]
[PATCH v2 3/3] RISC-V: cmpmem for RISCV with V extension
So this is the cmpmem patch from Sergei, updated for the trunk.
Updates included adjusting the existing cmpmemsi expander to
conditionally try expansion via vector. And a minor testsuite
adjustment to turn off vector expansion in one test that is primarily
focused on vset optimization and ensuring we don't have extras.
I've spun this in my tester successfully and just want to see a clean
run through precommit CI before moving forward.
Jeff
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_vector::expand_vec_cmpmem): New
function declaration.
* config/riscv/riscv-string.cc (riscv_vector::expand_vec_cmpmem): New
function.
* config/riscv/riscv.md (cmpmemsi): Try riscv_vector::expand_vec_cmpmem
for constant lengths.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/cmpmem-1.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-2.c: New execution tests
* gcc.target/riscv/rvv/base/cmpmem-3.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-4.c: New codegen tests
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Turn off vector mem* and
str* handling.
Marek Polacek [Fri, 14 Jun 2024 21:50:29 +0000 (17:50 -0400)]
c++: ICE with generic lambda and pack expansion [PR115425]
In r13-272 we hardened the *_PACK_EXPANSION and *_ARGUMENT_PACK macros.
That trips up here because make_pack_expansion returns error_mark_node
and we access that with PACK_EXPANSION_LOCAL_P.
PR c++/115425
gcc/cp/ChangeLog:
* pt.cc (tsubst_pack_expansion): Return error_mark_node if
make_pack_expansion doesn't work out.
Marek Polacek [Tue, 18 Jun 2024 14:50:49 +0000 (10:50 -0400)]
c++: ICE with __dynamic_cast redecl [PR115501]
Since r13-3299, build_dynamic_cast_1 calls pushdecl which calls
duplicate_decls and that in this testcase emits the "conflicting
declaration" error and returns error_mark_node, so the subsequent
build_cxx_call crashes on the error_mark_node.
PR c++/115501
gcc/cp/ChangeLog:
* rtti.cc (build_dynamic_cast_1): Return if dcast_fn is erroneous.
ira: Scale save/restore costs of callee save registers with block frequency
In assign_hard_reg(), when computing the costs of the hard registers, the
cost of saving/restoring a callee-save hard register in prolog/epilog is
taken into consideration. However, this cost is not scaled with the entry
block frequency. Without scaling, the cost of saving/restoring is quite
small and this can result in a callee-save register being chosen by
assign_hard_reg() even though there are free caller-save registers
available. Assigning a callee save register to a pseudo that is live
in the entire function and across a call will cause shrink wrap to fail.
Gaius Mulley [Tue, 25 Jun 2024 17:35:22 +0000 (18:35 +0100)]
PR modula2/115536 Expression is evaluated incorrectly when encountering relops and indirection
This fix ensures that we only call BuildRelOpFromBoolean if we are
inside a constant expression (where no indirection can be used).
The fix creates a temporary variable when a boolean is created from
a relop in other cases.
The previous pattern implementation would not work if the operands required
dereferencing during non const expressions. Comparison of relop results
in a constant expression are resolved by constant propagation, basic
block analysis and dead code removal. After the quadruples have been
optimized only one assignment to the boolean variable will remain for
const expressions. All quadruple pattern checking for boolean
expressions is removed by the patch. Thus the implementation becomes
more generic.
gcc/m2/ChangeLog:
PR modula2/115536
* gm2-compiler/M2BasicBlock.def (GetBasicBlockScope): New procedure.
(GetBasicBlockStart): Ditto.
(GetBasicBlockEnd): Ditto.
(IsBasicBlockFirst): New procedure function.
* gm2-compiler/M2BasicBlock.mod (ConvertQuads2BasicBlock): Allow
conditional boolean quads to be removed.
(GetBasicBlockScope): Implement new procedure.
(GetBasicBlockStart): Ditto.
(GetBasicBlockEnd): Ditto.
(IsBasicBlockFirst): Implement new procedure function.
* gm2-compiler/M2GCCDeclare.def (FoldConstants): New parameter
declaration.
* gm2-compiler/M2GCCDeclare.mod (FoldConstants): New parameter
declaration.
(DeclareTypesConstantsProceduresInRange): Recreate basic blocks
after resolving constant expressions.
(CodeBecomes): Guard IsVariableSSA with IsVar.
* gm2-compiler/M2GenGCC.def (ResolveConstantExpressions): New
parameter declaration.
* gm2-compiler/M2GenGCC.mod (FoldIfLess): Remove relop pattern
detection.
(FoldIfGre): Ditto.
(FoldIfLessEqu): Ditto.
(FoldIfGreEqu): Ditto.
(FoldIfIn): Ditto.
(FoldIfNotIn): Ditto.
(FoldIfEqu): Ditto.
(FoldIfNotEqu): Ditto.
(FoldBecomes): Add BasicBlock parameter and allow conditional
boolean becomes to be folded in the first basic block.
(ResolveConstantExpressions): Reimplement.
* gm2-compiler/M2Quads.def (IsConstQuad): New procedure function.
(IsConditionalBooleanQuad): Ditto.
* gm2-compiler/M2Quads.mod (IsConstQuad): Implement new procedure function.
(IsConditionalBooleanQuad): Ditto.
(MoveWithMode): Use GenQuadOTypetok.
(IsInitialisingConst): Rewrite using OpUsesOp1.
(OpUsesOp1): New procedure function.
(doBuildAssignment): Mark des as a VarConditional.
(ConvertBooleanToVariable): Call PutVarConditional.
(DumpQuadSummary): New procedure.
(BuildRelOpFromBoolean): Updated debugging and improved comments.
(BuildRelOp): Only call BuildRelOpFromBoolean if we are in a const
expression and both operands are boolean relops.
(GenQuadOTypeUniquetok): New procedure.
(BackPatch): Correct comment.
* gm2-compiler/SymbolTable.def (PutVarConditional): New procedure.
(IsVarConditional): New procedure function.
* gm2-compiler/SymbolTable.mod (PutVarConditional): Implement new
procedure.
(IsVarConditional): Implement new procedure function.
(SymConstVar): New field IsConditional.
(SymVar): New field IsConditional.
(MakeVar): Initialize IsConditional field.
(MakeConstVar): Initialize IsConditional field.
* gm2-compiler/M2Swig.mod (DoBasicBlock): Change parameters to
use BasicBlock.
* gm2-compiler/M2Code.mod (SecondDeclareAndOptimize): Use iterator
to FoldConstants over basic block list.
* gm2-compiler/M2SymInit.mod (AppendEntry): Replace parameters
with BasicBlock.
* gm2-compiler/P3Build.bnf (Relation): Call RecordOp for #, <> and =.
gcc/testsuite/ChangeLog:
PR modula2/115536
* gm2/iso/const/pass/constbool4.mod: New test.
* gm2/iso/const/pass/constbool5.mod: New test.
* gm2/iso/run/pass/condtest2.mod: New test.
* gm2/iso/run/pass/condtest3.mod: New test.
* gm2/iso/run/pass/condtest4.mod: New test.
* gm2/iso/run/pass/condtest5.mod: New test.
* gm2/iso/run/pass/constbool4.mod: New test.
Jeff Law [Tue, 25 Jun 2024 17:22:01 +0000 (11:22 -0600)]
[committed] Fix fr30-elf newlib build failure with late-combine
So the late combine work has exposed a latent bug in the fr30 port.
The fr30 "call" instruction is pc-relative with a *very* limited range, 12 bits
to be precise.
With such a limited range its hard to see how we could ever consistently use it
in the compiler, with the possible exception of self-recursion. Even for a
call to a locally binding function -ffunction-sections and linker placement of
functions may separate the caller/callee. Code generation seemed to be using
indirect forms pretty consistently, though the RTL would allow direct calls.
With late-combine some of those indirects would be optimized into direct calls.
This naturally led to out of range scenarios.
With the fr30 port slated for removal unless it gets updated to use LRA and the
fundamental problems using direct calls, I took the shortest path to keep
things working -- namely forcing all calls to be indirect.
Tested in my tester with no regressions (and fixes the newlib build failure
with late-combine enabled). Pushed to the trunk.
gcc/
* config/fr30/constraints.md (Q): Remove unused constraint.
* config/fr30/predicates.md (call_operand): Remove unused predicate.
* config/fr30/fr30.md (call, vall_value): Turn into expanders and
force the call address into a register.
(*call, *call_value): Adjust to only allow indirect calls. Adjust
output template accordingly.
Patrick Palka [Tue, 25 Jun 2024 16:59:24 +0000 (12:59 -0400)]
c++: alias CTAD and copy deduction guide [PR115198]
Here we're neglecting to update DECL_NAME during the alias CTAD guide
transformation, which causes copy_guide_p to return false for the
transformed copy deduction guide since DECL_NAME is still __dguide_C
with TREE_TYPE C<B, T> but it should be __dguide_A with TREE_TYPE A<T>
(i.e. C<false, T>). This ultimately results in ambiguity during
overload resolution between the copy deduction guide vs copy ctor guide.
This patch makes us update DECL_NAME of a transformed guide accordingly
during alias/inherited CTAD.
PR c++/115198
gcc/cp/ChangeLog:
* pt.cc (alias_ctad_tweaks): Update DECL_NAME of the transformed
guides.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/class-deduction-alias22.C: New test.
Patrick Palka [Tue, 25 Jun 2024 14:42:21 +0000 (10:42 -0400)]
c++: using non-dep array var of unknown bound [PR115358]
For a non-dependent array variable of unknown bound, it seems we need to
try instantiating its definition upon use in a template context for sake
of proper checking and typing of the overall expression, like we do for
function specializations with deduced return type.
PR c++/115358
gcc/cp/ChangeLog:
* decl2.cc (mark_used): Call maybe_instantiate_decl for an array
variable with unknown bound.
* semantics.cc (finish_decltype_type): Remove now redundant
handling of array variables with unknown bound.
* typeck.cc (cxx_sizeof_expr): Likewise.
Sandra Loosemore [Tue, 25 Jun 2024 13:54:43 +0000 (13:54 +0000)]
Fix PR c/115587, uninitialized variable in c_parser_omp_loop_nest
This function had a reference to an uninitialized variable on the
error path. The problem was diagnosed by clang but not gcc. It seems
the cleanest solution is to initialize all the loop-clause variables
at the point of declaration rather than at different places in the
code.
The C++ front end didn't have this problem, but I've made similar
changes there to keep the code in sync.
gcc/c/ChangeLog:
PR c/115587
* c-parser.cc (c_parser_omp_loop_nest): Move initializations to
point of declaration.
gcc/cp/ChangeLog:
PR c/115587
* parser.cc (cp_parser_omp_loop_nest): Move initializations to
point of declaration.
libatomic: Add rcpc3 128-bit atomic operations for AArch64
The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.
These operations are single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.
libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (libat_load_16): Add LRCPC3
variant.
(libat_store_16): Likewise.
* config/linux/aarch64/host-config.h (HWCAP2_LRCPC3): New.
(LSE2_LRCPC3_ATOP): Previously LSE2_ATOP. New ifuncs guarded
under it.
(has_rcpc3): New.
for details. This change was different from the others in that the
original call was to simplify_subreg rather than simplify_lowpart_subreg.
The old code would therefore go on to do the force_reg for more cases
than the new code would.
gcc/
* expmed.cc (store_bit_field_using_insv): Revert earlier change
to use force_subreg instead of simplify_gen_subreg.
YunQiang Su [Wed, 19 Jun 2024 17:20:36 +0000 (01:20 +0800)]
MIPS: Implement vcond_mask optabs for MSA
Currently, we have `mips_expand_vec_cond_expr`, which calculate
cmp_res first. We can just add a new extra argument to ask it
to use operands[3] as cmp_res instead of calculating from operands[4]
and operands[5].
gcc
* config/mips/mips.cc(mips_expand_vec_cond_expr): Add extra
argument to info that opernads[3] is cmp_res already.
* config/mips/mips-protos.h(mips_expand_vec_cond_expr): Ditto.
* config/mips/mips-msa.md(vcond_mask): Define new expand.
(vcondu): Use mips_expand_vec_cond_expr with 4th argument.
(vcond): Ditto.
Evgeny Karpov [Sat, 8 Jun 2024 13:49:17 +0000 (13:49 +0000)]
aarch64: Add DLL import/export to AArch64 target
This patch reuses the MinGW implementation to enable DLL import/export
functionality for the aarch64-w64-mingw32 target. It also modifies
environment configurations for MinGW.
Evgeny Karpov [Mon, 24 Jun 2024 12:44:58 +0000 (12:44 +0000)]
aarch64: Add selectany attribute handling
This patch extends the aarch64 attributes list with the selectany
attribute for the aarch64-w64-mingw32 target and reuses the mingw
implementation to handle it.
Evgeny Karpov [Mon, 24 Jun 2024 12:43:05 +0000 (12:43 +0000)]
Rename functions for reuse in AArch64
This patch renames functions related to dllimport/dllexport and
selectany functionality. These functions will be reused in the
aarch64-w64-mingw32 target.
Evgeny Karpov [Mon, 24 Jun 2024 12:38:40 +0000 (12:38 +0000)]
Extract ix86 dllimport implementation to mingw
This patch extracts the ix86 implementation for expanding a SYMBOL
into its corresponding dllimport, far-address, or refptr symbol. It
will be reused in the aarch64-w64-mingw32 target. The implementation
is copied as is from i386/i386.cc with minor changes to follow to the
code style.
Also this patch replaces the original DLL import/export implementation
in ix86 with mingw.
Evgeny Karpov [Sat, 8 Jun 2024 13:18:02 +0000 (13:18 +0000)]
Move mingw_* declarations to the mingw folder
This patch refactors recent changes to move mingw-related
functionality to the mingw folder. More renamings to the mingw_ prefix
will be done in follow-up commits.
This is the first commit in the second patch series to add DLL
import/export implementation to AArch64.
Coauthors: Zac Walker <zacwalker@microsoft.com>,
Mark Harmstone <mark@harmstone.com> and
Ron Riddle <ron.riddle@microsoft.com>
Refactored, prepared, and validated by
Radek Barton <radek.barton@microsoft.com> and
Evgeny Karpov <evgeny.karpov@microsoft.com>
Jakub Jelinek [Tue, 25 Jun 2024 06:35:56 +0000 (08:35 +0200)]
c: Fix ICE related to incomplete structures in C23 [PR114930]
Here is a version of the c_update_type_canonical fixes which passed
bootstrap/regtest.
The non-trivial part is the handling of the case when
build_qualified_type (TYPE_CANONICAL (t), TYPE_QUALS (x))
returns a type with NULL TYPE_CANONICAL. That should happen only
if TYPE_CANONICAL (t) == t, because otherwise c_update_type_canonical should
have been already called on the other type. c, the returned type, is usually x
and in that case it should have TYPE_CANONICAL set to itself, or worst
for whatever reason x is not the right canonical type (say it has attributes
or whatever disqualifies it from check_qualified_type). In that case
either it finds some pre-existing type from the variant chain of t which
is later in the chain and we haven't processed it yet (but then
get_qualified_type moves it right after t in:
/* Put the found variant at the head of the variant list so
frequently searched variants get found faster. The C++ FE
benefits greatly from this. */
tree t = *tp;
*tp = TYPE_NEXT_VARIANT (t);
TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (mv);
TYPE_NEXT_VARIANT (mv) = t;
return t;
optimization), or creates a fresh new type using build_variant_type_copy,
which again places the new type right after t:
/* Add the new type to the chain of variants of TYPE. */
TYPE_NEXT_VARIANT (t) = TYPE_NEXT_VARIANT (m);
TYPE_NEXT_VARIANT (m) = t;
TYPE_MAIN_VARIANT (t) = m;
At this point we want to make c its own canonical type (i.e. TYPE_CANONICAL
(c) = c;), but also need to process pointers to it and only then return back
to processing x. Processing the whole chain from c again could be costly,
we could have hundreds of types in the chain already processed, and while
the loop would just quickly skip them
for (tree x = t, l = NULL_TREE; x; l = x, x = TYPE_NEXT_VARIANT (x))
{
if (x != t && TYPE_STRUCTURAL_EQUALITY_P (x))
...
else if (x != t)
continue;
it feels costly. So, this patch instead moves c from right after t
to right before x in the chain (that shouldn't change anything, because
clearly build_qualified_type didn't find any matches in the chain before
x) and continues processing the c at that position, so should handle the
x that encountered this in the next iteration.
We could avoid some of the moving in the chain if we processed the chain
twice, once deal only with x != t && TYPE_STRUCTURAL_EQUALITY_P (x)
&& TYPE_CANONICAL (t) == t && check_qualified_type (t, x, TYPE_QUALS (x))
types (in that case set TYPE_CANONICAL (x) = x) and once the rest. There
is still the theoretical case where build_qualified_type would return
a new type and in that case we are back to the moving the type around and
needing to handle it though.
2024-06-25 Jakub Jelinek <jakub@redhat.com>
Martin Uecker <uecker@tugraz.at>
PR c/114930
PR c/115502
gcc/c/
* c-decl.cc (c_update_type_canonical): Assert t is main variant
with 0 TYPE_QUALS. Simplify and don't use check_qualified_type.
Deal with the case where build_qualified_type returns
TYPE_STRUCTURAL_EQUALITY_P type.
gcc/testsuite/
* gcc.dg/pr114574-1.c: Require lto effective target.
* gcc.dg/pr114574-2.c: Likewise.
* gcc.dg/pr114930.c: New test.
* gcc.dg/pr115502.c: New test.
Jeff Law [Tue, 25 Jun 2024 05:22:21 +0000 (23:22 -0600)]
[committed][RISC-V] Fix some of the testsuite fallout from late-combine patch
This fixes most, but not all of the testsuite fallout from the late-combine
patch. Specifically in the vector space we're often able to eliminate a
broadcast of an scalar element across a vector. That eliminates the vsetvl
related to the broadcast, but more importantly from the testsuite standpoint it
turns .vv forms into .vf or .vx forms.
There were two paths we could have taken here. One to accept .v*, ignoring the
actual register operands. Or to create new matches for the .vx and .vf
variants. I selected the latter as I'd like us to know if the code to avoid
the broadcast regresses.
I'm pushing this through now so that we've got cleaner results and to prevent
duplicate work. I've got patch for the rest of the testsuite fallout, but I
want to think about them a bit.
Kewen Lin [Tue, 25 Jun 2024 05:04:53 +0000 (00:04 -0500)]
Replace {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE with new hook mode_for_floating_type
Currently how we determine which mode will be used for a
floating point type is that for a given type precision
(size) call mode_for_size to get the first mode which has
this size in the specified class. On Powerpc, we have
three modes (TF/KF/IF) having the same mode precision 128
(see[1]), so the processing forces us to have to place TF
at the first place, it would require us to make more
adjustment in some generic code to avoid some unexpected
mode conversions and it would be even worse if we get rid
of TF eventually one day. And as Joseph pointed out in [2],
"floating types should have their mode, not a poorly
defined precision value", as Joseph and Richi suggested,
this patch is to introduce one hook mode_for_floating_type
which returns the corresponding mode for type float, double
or long double. The default implementation returns SFmode
for float and DFmode for double or long double. For ports
which need special treatment, there are some other patches
for their own port specific implementation (referring to
how {,LONG_}DOUBLE_TYPE_SIZE get used there). For all
generic uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE, depending
on the context, some of them are replaced with TYPE_PRECISION
of the according type node, some other are replaced with
GET_MODE_PRECISION on the mode from mode_for_floating_type.
This patch also poisons {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE,
so most defines of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE in port
specific are removed, but there are still some which are
good to be kept for readability then they get renamed with
port specific prefix.
Kewen Lin [Tue, 25 Jun 2024 05:04:51 +0000 (00:04 -0500)]
vms: Replace use of LONG_DOUBLE_TYPE_SIZE
Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type. To be prepared for that, this
patch is to replace use of LONG_DOUBLE_TYPE_SIZE in vms port
with TYPE_PRECISION of long_double_type_node.
Kewen Lin [Tue, 25 Jun 2024 05:04:49 +0000 (00:04 -0500)]
rust: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type. To be prepared for that, this
patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
in rust with TYPE_PRECISION of {float,{,long_}double}_type_node.
Kewen Lin [Tue, 25 Jun 2024 05:04:47 +0000 (00:04 -0500)]
go: Replace uses of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
Joseph pointed out "floating types should have their mode,
not a poorly defined precision value" in the discussion[1],
as he and Richi suggested, the existing macros
{FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE will be replaced with a
hook mode_for_floating_type. To be prepared for that, this
patch is to replace use of {FLOAT,{,LONG_}DOUBLE}_TYPE_SIZE
in go with TYPE_PRECISION of {float,{,long_}double}_type_node.
* go-gcc.cc (Gcc_backend::float_type): Use TYPE_PRECISION of
{float,double,long_double}_type_node to replace
{FLOAT,DOUBLE,LONG_DOUBLE}_TYPE_SIZE.
(Gcc_backend::complex_type): Likewise.
Sergei Lewis [Mon, 24 Jun 2024 20:20:14 +0000 (14:20 -0600)]
[PATCH v2 2/3] RISC-V: setmem for RISCV with V extension
This is primarily Sergei's work, my contributions were limited to
merging his expander with the one that's on the trunk, allowing
non-constant value and trivial testsuite adjustments due to option renaming.
I'm doing setmem first because it's the easiest. The others will follow
soon enough.
I've tested this in my system, waiting on pre-commit CI to render its
verdict before moving forward.
gcc/ChangeLog
* config/riscv/riscv-protos.h (riscv_vector::expand_vec_setmem): New
function declaration.
* config/riscv/riscv-string.cc (riscv_vector::expand_vec_setmem): New
function: this generates an inline vectorised memory set, if and only if
we know the entire operation can be performed in a single vector store.
* config/riscv/riscv.md (setmem<mode>): Try riscv_vector::expand_vec_setmem
for constant lengths. Do not require operand 2 to be a constant.
gcc/testsuite/ChangeLog
* gcc.target/riscv/rvv/base/setmem-1.c: New tests
* gcc.target/riscv/rvv/base/setmem-2.c: New tests
* gcc.target/riscv/rvv/base/setmem-3.c: New tests
Patrick O'Neill [Mon, 24 Jun 2024 19:06:15 +0000 (12:06 -0700)]
RISC-V: Add dg-remove-option for z* extensions
This introduces testsuite support infra for removing extensions.
Since z* extensions don't have ordering requirements the logic for
adding/removing those extensions has also been consolidated.
This fixes RVWMO compile testcases failing on Ztso targets by removing
the extension from the -march string.
gcc/ChangeLog:
* doc/sourcebuild.texi (dg-remove-option): Add documentation.
(dg-add-option): Add documentation for riscv_{a,zaamo,zalrsc,ztso}
Harald Anlauf [Sun, 23 Jun 2024 20:36:43 +0000 (22:36 +0200)]
Fortran: fix passing of optional dummy as actual to optional argument [PR55978]
gcc/fortran/ChangeLog:
PR fortran/55978
* trans-array.cc (gfc_conv_array_parameter): Do not dereference
data component of a missing allocatable dummy array argument for
passing as actual to optional dummy. Harden logic of presence
check for optional pointer dummy by using TRUTH_ANDIF_EXPR instead
of TRUTH_AND_EXPR.
gcc/testsuite/ChangeLog:
PR fortran/55978
* gfortran.dg/optional_absent_12.f90: New test.
Roger Sayle [Mon, 24 Jun 2024 14:34:03 +0000 (15:34 +0100)]
PR tree-optimization/113673: Avoid load merging when potentially trapping.
This patch fixes PR tree-optimization/113673, a P2 ice-on-valid regression
caused by load merging of (ptr[0]<<8)+ptr[1] when -ftrapv has been
specified. When the operator is | or ^ this is safe, but for addition
of signed integer types, a trap may be generated/required, so merging this
idiom into a single non-trapping instruction is inappropriate, confusing
the compiler by transforming a basic block with an exception edge into one
without.
This revision implements Richard Biener's feedback to add an early check
for stmt_can_throw_internal (cfun, stmt) to prevent transforming in the
presence of any statement that could trap, not just overflow on addition.
The one other tweak included in this patch is to mark the local function
find_bswap_or_nop_load as static ensuring that it isn't called from outside
this file, and guaranteeing that it is dominated by stmt_can_throw_internal
checking.
2024-06-24 Roger Sayle <roger@nextmovesoftware.com>
Richard Biener <rguenther@suse.de>
gcc/ChangeLog
PR tree-optimization/113673
* gimple-ssa-store-merging.cc (find_bswap_or_nop_load): Make static.
(find_bswap_or_nop_1): Avoid transformations (load merging) when
stmt_can_throw_internal indicates that a statement can trap.
gcc/testsuite/ChangeLog
PR tree-optimization/113673
* g++.dg/pr113673.C: New test case.
Richard Biener [Mon, 24 Jun 2024 07:52:39 +0000 (09:52 +0200)]
tree-optimization/115602 - SLP CSE results in cycles
The following prevents SLP CSE to create new cycles which happened
because of a 1:1 permute node being present where its child was then
CSEd to the permute node. Fixed by making a node only available to
CSE to after recursing.
PR tree-optimization/115602
* tree-vect-slp.cc (vect_cse_slp_nodes): Delay populating the
bst-map to avoid cycles.
Richard Biener [Fri, 21 Jun 2024 11:19:26 +0000 (13:19 +0200)]
tree-optimization/115528 - fix vect alignment analysis for outer loop vect
For outer loop vectorization of a data reference in the inner loop
we have to look at both steps to see if they preserve alignment.
What is special for this testcase is that the outer loop step is
one element but the inner loop step four and that we now use SLP
and the vectorization factor is one.
PR tree-optimization/115528
* tree-vect-data-refs.cc (vect_compute_data_ref_alignment):
Make sure to look at both the inner and outer loop step
behavior.
This patch adds a combine pass that runs late in the pipeline.
There are two instances: one between combine and split1, and one
after postreload.
The pass currently has a single objective: remove definitions by
substituting into all uses. The pre-RA version tries to restrict
itself to cases that are likely to have a neutral or beneficial
effect on register pressure.
The patch fixes PR106594. It also fixes a few FAILs and XFAILs
in the aarch64 test results, mostly due to making proper use of
MOVPRFX in cases where we didn't previously.
This is just a first step. I'm hoping that the pass could be
used for other combine-related optimisations in future. In particular,
the post-RA version doesn't need to restrict itself to cases where all
uses are substitutable, since it doesn't have to worry about register
pressure. If we did that, and if we extended it to handle multi-register
REGs, the pass might be a viable replacement for regcprop, which in
turn might reduce the cost of having a post-RA instance of the new pass.
On most targets, the pass is enabled by default at -O2 and above.
However, it has a tendency to undo x86's STV and RPAD passes,
by folding the more complex post-STV/RPAD form back into the
simpler pre-pass form.
Also, running a pass after register allocation means that we can
now match define_insn_and_splits that were previously only matched
before register allocation. This trips things like:
(define_insn_and_split "..."
[...pattern...]
"...cond..."
"#"
"&& 1"
[...pattern...]
{
...unconditional use of gen_reg_rtx ()...;
}
because matching and splitting after RA will call gen_reg_rtx when
pseudos are no longer allowed. rs6000 has several instances of this.
xtensa has a variation in which the split condition is:
"&& can_create_pseudo_p ()"
The failure then is that, if we match after RA, we'll never be
able to split the instruction.
The patch therefore disables the pass by default on i386, rs6000
and xtensa. Hopefully we can fix those ports later (if their
maintainers want). It seems better to add the pass first, though,
to make it easier to test any such fixes.
gcc.target/aarch64/bitfield-bitint-abi-align{16,8}.c would need
quite a few updates for the late-combine output. That might be
worth doing, but it seems too complex to do as part of this patch.
I tried compiling at least one target per CPU directory and comparing
the assembly output for parts of the GCC testsuite. This is just a way
of getting a flavour of how the pass performs; it obviously isn't a
meaningful benchmark. All targets seemed to improve on average:
rtl-ssa has routines for scanning forwards or backwards for something
under the control of an exclusion set. These searches are currently
used for two main things:
- to work out where an instruction can be moved within its EBB
- to work out whether recog can add a new hard register clobber
The exclusion set was originally a callback function that returned
true for insns that should be ignored. However, for the late-combine
work, I'd also like to be able to skip an entire definition, along
with all its uses.
This patch prepares for that by turning the exclusion set into an
object that provides predicate member functions. Currently the
only two member functions are:
- should_ignore_insn: what the old callback did
- should_ignore_def: the new functionality
but more could be added later.
Doing this also makes it easy to remove some asymmetry that I think
in hindsight was a mistake: in forward scans, ignoring an insn meant
ignoring all definitions in that insn (ok) and all uses of those
definitions (non-obvious). The new interface makes it possible
to select the required behaviour, with that behaviour being applied
consistently in both directions.
Now that the exclusion set is a dedicated object, rather than
just a "random" function, I think it makes sense to remove the
_ignoring suffix from the function names. The suffix was originally
there to describe the callback, and in particular to emphasise that
a true return meant "ignore" rather than "heed".
gcc/
* rtl-ssa.h: Include predicates.h.
* rtl-ssa/predicates.h: New file.
* rtl-ssa/access-utils.h (prev_call_clobbers_ignoring): Rename to...
(prev_call_clobbers): ...this and treat the ignore parameter as an
object with the same interface as ignore_nothing.
(next_call_clobbers_ignoring): Rename to...
(next_call_clobbers): ...this and treat the ignore parameter as an
object with the same interface as ignore_nothing.
(first_nondebug_insn_use_ignoring): Rename to...
(first_nondebug_insn_use): ...this and treat the ignore parameter as
an object with the same interface as ignore_nothing.
(last_nondebug_insn_use_ignoring): Rename to...
(last_nondebug_insn_use): ...this and treat the ignore parameter as
an object with the same interface as ignore_nothing.
(last_access_ignoring): Rename to...
(last_access): ...this and treat the ignore parameter as an object
with the same interface as ignore_nothing. Conditionally skip
definitions.
(prev_access_ignoring): Rename to...
(prev_access): ...this and treat the ignore parameter as an object
with the same interface as ignore_nothing.
(first_def_ignoring): Replace with...
(first_access): ...this new function.
(next_access_ignoring): Rename to...
(next_access): ...this and treat the ignore parameter as an object
with the same interface as ignore_nothing. Conditionally skip
definitions.
* rtl-ssa/change-utils.h (insn_is_changing): Delete.
(restrict_movement_ignoring): Rename to...
(restrict_movement): ...this and treat the ignore parameter as an
object with the same interface as ignore_nothing.
(recog_ignoring): Rename to...
(recog): ...this and treat the ignore parameter as an object with
the same interface as ignore_nothing.
* rtl-ssa/changes.h (insn_is_changing_closure): Delete.
* rtl-ssa/functions.h (function_info::add_regno_clobber): Treat
the ignore parameter as an object with the same interface as
ignore_nothing.
* rtl-ssa/insn-utils.h (insn_is): Delete.
* rtl-ssa/insns.h (insn_is_closure): Delete.
* rtl-ssa/member-fns.inl
(insn_is_changing_closure::insn_is_changing_closure): Delete.
(insn_is_changing_closure::operator()): Likewise.
(function_info::add_regno_clobber): Treat the ignore parameter
as an object with the same interface as ignore_nothing.
(ignore_changing_insns::ignore_changing_insns): New function.
(ignore_changing_insns::should_ignore_insn): Likewise.
* rtl-ssa/movement.h (restrict_movement_for_dead_range): Treat
the ignore parameter as an object with the same interface as
ignore_nothing.
(restrict_movement_for_defs_ignoring): Rename to...
(restrict_movement_for_defs): ...this and treat the ignore parameter
as an object with the same interface as ignore_nothing.
(restrict_movement_for_uses_ignoring): Rename to...
(restrict_movement_for_uses): ...this and treat the ignore parameter
as an object with the same interface as ignore_nothing. Conditionally
skip definitions.
* doc/rtl.texi: Update for above name changes. Use
ignore_changing_insns instead of insn_is_changing.
* config/aarch64/aarch64-cc-fusion.cc (cc_fusion::parallelize_insns):
Likewise.
* pair-fusion.cc (no_ignore): Delete.
(latest_hazard_before, first_hazard_after): Update for above name
changes. Use ignore_nothing instead of no_ignore.
(pair_fusion_bb_info::fuse_pair): Update for above name changes.
Use ignore_changing_insns instead of insn_is_changing.
(pair_fusion::try_promote_writeback): Likewise.