Pan Li [Fri, 23 May 2025 05:22:35 +0000 (13:22 +0800)]
RISC-V: Combine vec_duplicate + vor.vv to vor.vx on GR2VR cost
This patch would like to combine the vec_duplicate + vor.vv to the
vor.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_BINARY(T, OP) \
void \
test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
out[i] = in[i] OP x; \
}
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new
case for IOR op.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op or to no_shift_vx_ops.
Nathaniel Shead [Thu, 22 May 2025 12:16:22 +0000 (22:16 +1000)]
c++/modules: Fix merge of TLS init functions [PR120363]
The PR notes that we missed setting DECL_CONTEXT on the TLS init
function; we missed this initially because this function is not created
in header units, only named modules.
I also noticed that 'DECL_CONTEXT (fn) = DECL_CONTEXT (var)' was
incorrect: for class members, this ends up having the modules merging
machinery treat the decl as a member function, which breaks when
attempting to dedup against an existing completed class type. Instead
we can just use the global_namespace as the context, because the name of
the function is already mangled appropriately so that we'll match the
correct duplicates.
PR c++/120363
gcc/cp/ChangeLog:
* decl2.cc (get_tls_init_fn): Set context as global_namespace.
(get_tls_wrapper_fn): Likewise.
gcc/testsuite/ChangeLog:
* g++.dg/modules/pr113292_a.H: Move to...
* g++.dg/modules/tls-1_a.H: ...here.
* g++.dg/modules/pr113292_b.C: Move to...
* g++.dg/modules/tls-1_b.C: ...here.
* g++.dg/modules/pr113292_c.C: Move to...
* g++.dg/modules/tls-1_c.C: ...here.
* g++.dg/modules/tls-2_a.C: New test.
* g++.dg/modules/tls-2_b.C: New test.
* g++.dg/modules/tls-2_c.C: New test.
* g++.dg/modules/tls-3.h: New test.
* g++.dg/modules/tls-3_a.H: New test.
* g++.dg/modules/tls-3_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
Nathaniel Shead [Fri, 23 May 2025 14:51:49 +0000 (00:51 +1000)]
c++/modules: Fix stream-in of member using-decls [PR120414]
When streaming in a reference to a data member, we have an oversight
where we did not consider USING_DECLs, despite otherwise handling them
here the same as fields. This patch corrects that mistake.
PR c++/120414
gcc/cp/ChangeLog:
* module.cc (trees_in::tree_node): Allow reading a USING_DECL
when streaming tt_data_member.
gcc/testsuite/ChangeLog:
* g++.dg/modules/using-31_a.C: New test.
* g++.dg/modules/using-31_b.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com> Reviewed-by: Jason Merrill <jason@redhat.com>
This adds an automatic downloader for the latest test results from
the mailing list archive and supports diffing test_summary to it.
Useful if you don't want to run your own baseline.
Right now ggc has a single free list for multiple sizes. In some cases
the list can get mixed by orders and then the allocator may spend a lot
of time walking the free list to find the right sizes.
This patch splits the free list into multiple free lists by order
which allows O(1) access in most cases.
It also has a fallback list for sizes not in the free lists
(that seems to be needed for now)
For the PR119387 test case it gives a significant speedup,
both with and without debug information.
Potential drawback might be some more fragmentation in the memory
map, so there is a risk that very large compilations may run into
the vma limit on Linux, or have slightly less coverage with
large pages.
For the PR119387 test case which have extra memory overhead with -ggdb:
../obj-fast/gcc/cc1plus-allocpage -std=gnu++20 -O2 pr119387.cc -quiet
ran 1.04 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119387.cc -quiet
2.63 ± 0.01 times faster than ../obj-fast/gcc/cc1plus-allocpage -std=gnu++20 -O2 pr119387.cc -quiet -ggdb
2.78 ± 0.01 times faster than ../obj-fast/gcc/cc1plus -std=gnu++20 -O2 pr119387.cc -quiet -ggdb
It might also help for other test cases creating a lot of garbage.
gcc/ChangeLog:
PR middle-end/114563
PR c++/119387
* ggc-page.cc (struct free_list): New structure.
(struct page_entry): Point to free_list.
(find_free_list): New function.
(find_free_list_order): Dito.
(alloc_page): Use specific free_list.
(release_pages): Dito.
(do_release_pages): Dito.
(init_ggc): Dito.
(ggc_print_statistics): Print overflow stat.
(ggc_pch_read): Use specific free list.
broke bootstrap on Ubuntu 20.04 (which bundles makeinfo 6.7)
and elsewhere with a local installation of makeinfo 6.6:
gcc/doc/implement-c.texi:5: node `C Implementation' lacks menu item for
`Constant expressions implementation' despite being its Up target
gcc/doc/implement-c.texi:5: node `C Implementation' lacks menu item for
`Types implementation' despite being its Up target
Jonathan Wakely [Wed, 21 May 2025 15:57:59 +0000 (16:57 +0100)]
libstdc++: Fix concept checks for std::unique_copy [PR120384]
This looks to have been wrong since r0-125454-gea89b2482f97aa which
introduced the predefined_ops.h. Since that change, the binary predicate
passed to std::__unique_copy is _Iter_comp_iter, which takes arguments
of the iterator type, not the iterator's value type.
This removes the checks from the __unique_copy overloads and moves them
into the second overload of std::unique_copy, where we have the original
binary predicate, not the adapted one from predefined_ops.h.
The third __unique_copy overload currently checks that the predicate is
callable with the input range value type and the output range value
type. This change alters that, so that we only ever check that the
predicate can be called with two arguments of the same type. That is
intentional, because calling the predicate with different types is a bug
that will be fixed in a later commit (see PR libstdc++/120386).
libstdc++-v3/ChangeLog:
PR libstdc++/120384
* include/bits/stl_algo.h (__unique_copy): Remove all
_BinaryPredicateConcept concept checks.
(unique_copy): Check _BinaryPredicateConcept in overload that
takes a predicate.
* testsuite/25_algorithms/unique_copy/120384.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Robin Dapp [Thu, 8 May 2025 07:51:45 +0000 (09:51 +0200)]
RISC-V: Support CPUs in -march.
This patch allows an -march string like
-march=sifive-p670
in order override a previous -march in a simple way.
Suppose we have a Makefile that specifies -march=rv64gc by default.
A user-specified -mcpu=sifive-p670 would be after the -march in the
options string and thus only set -mtune=sifive-p670 (as -mcpu does not
override a previously specified -march or -mtune).
So if we wanted to override we would need to specify the full, lengthy
-march=rv64gcv_... string instead of a simple -mcpu=...
Therefore this patch always first tries to interpret -march= as CPU
string. If it is a supported CPU we use its march properties and let it
override previously specified options. Otherwise the behavior is as
before. This enables the "last-specified option wins" behavior GCC
normally employs.
Note that -march does not imply -mtune like on x86 or other targets.
So an -march=CPU won't override a previously specified -mtune=other-CPU.
gcc/ChangeLog:
* common/config/riscv/riscv-common.cc (riscv_subset_list::parse_base_ext):
Adjust error message.
(riscv_handle_option): Parse as CPU string first.
(riscv_expand_arch): Ditto.
* doc/invoke.texi: Document.
Robin Dapp [Wed, 7 May 2025 19:02:21 +0000 (21:02 +0200)]
RISC-V: Add autovec mode param.
This patch adds a --param=autovec-mode=<MODE_NAME>. When the param is
specified we make autovectorize_vector_modes return exactly this mode if
it is available. This helps when testing different vectorizer settings.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (autovectorize_vector_modes): Return
user-specified mode if available.
* config/riscv/riscv.opt: New param.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/param-autovec-mode.c: New test.
Jason Merrill [Thu, 22 May 2025 14:10:04 +0000 (10:10 -0400)]
diagnostics: use -Wformat-diag more consistently
r10-1211 added various -Wformat-diag warnings about quoting in GCC
diagnostic strings, but didn't change these two quoting warnings to use that
flag as well.
gcc/c-family/ChangeLog:
* c-format.cc (flag_chars_t::validate): Control quoting warnings
with -Wformat-diag.
Tobias Burnus [Fri, 23 May 2025 09:30:48 +0000 (11:30 +0200)]
libgomp.c-c++-common/metadirective-1.c: Expect 'error:' for nvptx compile [PR118694]
OpenMP's 'target teams' is strictly coupled with 'teams'; if the latter
exists, the kernel is launched in directly with multiple teams. Thus,
the host has to know whether the teams construct exists or not. For
#pragma omp target
#pragma omp metadirective when (device={arch("nvptx")}: teams loop)
it is simple when 'nvptx' offloading is not supported, otherwise it depends
on the default device at runtime as the user code asks for a single team for
host fallback and gcn offload and multiple for nvptx offload.
In any case, this commit ensures that no FAIL is printed, whatever a
future solution might look like. Instead of a dg-bogus combined with an
'xfail offload_target_nvptx', one an also argue that a dg-error for
'target offload_target_nvptx' would be more appropriate.
libgomp/ChangeLog:
PR middle-end/118694
* testsuite/libgomp.c-c++-common/metadirective-1.c: xfail when
compiling (also) for nvptx offloading as an error is then expected.
when the shift amount is equal to half the bitwidth of the <x>
register.
Bootstrapped and regtested on aarch64-linux-gnu.
Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
gcc/ChangeLog:
* expmed.cc (expand_rotate_as_vec_perm): Avoid a no-op move if the
target already provided the result in the expected register.
* config/aarch64/aarch64.cc (aarch64_vectorize_vec_perm_const):
Avoid forcing subregs into fresh registers unnecessarily.
* config/aarch64/aarch64-sve.md: Add define_split for rotate.
(*v_revvnx8hi): New pattern.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/shift_rev_1.c: New test.
* gcc.target/aarch64/sve/shift_rev_2.c: Likewise.
* gcc.target/aarch64/sve/shift_rev_3.c: Likewise.
Dhruv Chawla [Fri, 9 May 2025 08:47:45 +0000 (01:47 -0700)]
aarch64: Match unpredicated shift patterns for ADR, SRA and ADDHNB instructions
This patch modifies the shift expander to immediately lower constant
shifts without unspec. It also modifies the ADR, SRA and ADDHNB patterns
to match the lowered forms of the shifts, as the predicate register is
not required for these instructions.
Bootstrapped and regtested on aarch64-linux-gnu.
Signed-off-by: Dhruv Chawla <dhruvc@nvidia.com> Co-authored-by: Richard Sandiford <richard.sandiford@arm.com>
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (@aarch64_adr<mode>_shift):
Match lowered form of ashift.
(*aarch64_adr<mode>_shift): Likewise.
(*aarch64_adr_shift_sxtw): Likewise.
(*aarch64_adr_shift_uxtw): Likewise.
(<ASHIFT:optab><mode>3): Check amount instead of operands[2] in
aarch64_sve_<lr>shift_operand.
(v<optab><mode>3): Generate unpredicated shifts for constant
operands.
(@aarch64_pred_<optab><mode>): Convert to a define_expand.
(*aarch64_pred_<optab><mode>): Create define_insn_and_split pattern
from @aarch64_pred_<optab><mode>.
(*post_ra_v_ashl<mode>3): Rename to ...
(aarch64_vashl<mode>3_const): ... this and remove reload requirement.
(*post_ra_v_<optab><mode>3): Rename to ...
(aarch64_v<optab><mode>3_const): ... this and remove reload
requirement.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_add_<sve_int_op><mode>): Match lowered form of
SHIFTRT.
(*aarch64_sve2_sra<mode>): Likewise.
(*bitmask_shift_plus<mode>): Match lowered form of lshiftrt.
Joseph Myers [Thu, 22 May 2025 21:39:37 +0000 (21:39 +0000)]
c: Document C23 implementation-defined behavior
Add references to C23 subclauses to the documentation of
implementation-defined behavior, and new entries for
implementation-defined behavior new in C23; change some references in
the text to e.g. "C99 and C11" to encompass C23 as well.
Gaius Mulley [Thu, 22 May 2025 21:03:22 +0000 (22:03 +0100)]
PR modula2/120389 ICE if assigning a constant char to an integer array
This patch fixes an ICE which occurs if a constant char is assigned
into an integer array. The fix it to introduce type checking in
M2GenGCC.mod:CodeXIndr.
gcc/m2/ChangeLog:
PR modula2/120389
* gm2-compiler/M2GenGCC.mod (CodeXIndr): Check to see that
the type of left is assignment compatible with the type of
right.
gcc/testsuite/ChangeLog:
PR modula2/120389
* gm2/iso/fail/badarray3.mod: New test.
Alexandre Oliva [Thu, 22 May 2025 18:15:31 +0000 (15:15 -0300)]
[vxworks] build partial libatomic
Since vxworks' libc contains much of libatomic, in not-very-granular
modules, building all of libatomic doesn't work very well.
However, some expected entry points are not present in libc, so
arrange for libatomic to build only those missing bits.
for libatomic/ChangeLog
* configure.tgt: Set partial_libatomic on *-*-vxworks*.
* configure.ac (PARTIAL_VXWORKS): New AM_CONDITIONAL.
* Makefile.am (libatomic_la_SOURCES): Select few sources for
PARTIAL_VXWORKS.
* configure, Makefile.in: Rebuilt.
Alexandre Oliva [Thu, 22 May 2025 18:06:24 +0000 (15:06 -0300)]
iesFrom: Alexandre Oliva <oliva@adacore.com>
[aarch64] [vxworks] mark x18 as fixed, adjust tests
VxWorks uses x18 as the TCB, so STATIC_CHAIN_REGNUM has long been set
(in gcc/config/aarch64/aarch64-vxworks.h) to use x9 instead.
This patch marks x18 as fixed if the newly-introduced
TARGET_OS_USES_R18 is defined, so that it is not chosen by the
register allocator, rejects -fsanitize-shadow-call-stack due to the
register conflict, and adjusts tests that depend on x18 or on the
static chain register.
for gcc/ChangeLog
* config/aarch64/aarch64-vxworks.h (TARGET_OS_USES_R18): Define.
Update comments.
* config/aarch64/aarch64.cc (aarch64_conditional_register_usage):
Mark x18 as fixed on VxWorks.
(aarch64_override_options_internal): Issue sorry message on
-fsanitize=shadow-call-stack if TARGET_OS_USES_R18.
for gcc/testsuite/ChangeLog
* gcc.dg/cwsc1.c (CHAIN, aarch64): x9 instead x18 for __vxworks.
* gcc.target/aarch64/reg-alloc-4.c: Drop x18-assigned asm
operand on vxworks.
* gcc.target/aarch64/shadow_call_stack_1.c: Don't expect
-ffixed-x18 error on vxworks, but rather the sorry message.
* gcc.target/aarch64/shadow_call_stack_2.c: Skip on vxworks.
* gcc.target/aarch64/shadow_call_stack_3.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_4.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_5.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_6.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_7.c: Likewise.
* gcc.target/aarch64/shadow_call_stack_8.c: Likewise.
* gcc.target/aarch64/stack-check-prologue-19.c: Likewise.
* gcc.target/aarch64/stack-check-prologue-20.c: Likewise.
Shreya Munnangi [Thu, 22 May 2025 17:51:01 +0000 (11:51 -0600)]
[RISC-V] Clear both upper and lower bits using 3 shifts
So the next step in Shreya's work. In the prior patch we used two shifts to
clear bits at the high or low end of an object. In this patch we use 3 shifts
to clear bits on both ends.
Nothing really special here. With mvconst_internal still in the tree it's of
marginal value, though Shreya and I have confirmed the code coming out of
expand looks good. It's just that combine reconstitutes the operation via
mvconst_internal+and which looks cheaper.
When I was playing in this space earlier I definitely saw testsuite cases that
need this case handled to not regress with mvconst_internal removed.
This has spun in my tester on rv32 and rv64 and it's bootstrap + testing on my
BPI with a mere 23 hours to go. Waiting on pre-commit testing to render a
verdict before moving forward.
gcc/
* config/riscv/riscv.cc (synthesize_and): When profitable, use a three
shift sequence to clear bits at both upper and lower bits rather than
synthesizing the constant mask.
Siarhei Volkau [Thu, 22 May 2025 14:52:17 +0000 (08:52 -0600)]
[PATCH][RISC-V][PR target/70557] Improve storing 0 to memory on rv32
Patch is originally from Siarhei Volkau <lis8215@gmail.com>.
RISC-V has a zero register (x0) which we can use to store zero into memory
without loading the constant into a distinct register. Adjust the constraints
of the 32-bit movdi_32bit pattern to recognize that we can store 0.0 into
memory using x0 as the source register.
This patch only affects RISC-V. It has been regression tested on riscv64-elf.
Jeff has also tested this in his tester (riscv64-elf and riscv32-elf) with no
regressions.
PR target/70557
gcc/
* config/riscv/riscv.md (movdi_32bit): Add "J" constraint to allow storing 0
directly to memory.
Andrew Pinski [Tue, 20 May 2025 22:10:15 +0000 (15:10 -0700)]
aarch64: Improve rtx_cost for constants in COMPARE [PR120372]
The middle-end uses rtx_cost on constants with the outer of being COMPARE
to find out the cost of a constant formation for a comparison instruction.
So for aarch64 backend, we would just return the cost of constant formation
in general. We can improve this by seeing if the outer is COMPARE and if
the constant fits the constraints of the cmp instruction just set the costs
to being one instruction.
Built and tested for aarch64-linux-gnu.
PR target/120372
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_rtx_costs <case CONST_INSN>): Handle
if outer is COMPARE and the constant can be handled by the cmp instruction.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/imm_choice_comparison-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Andrew Pinski [Tue, 20 May 2025 21:48:58 +0000 (14:48 -0700)]
expand: Use rtx_cost directly instead of gen_move_insn for canonicalize_comparison.
This is the first part in fixing PR target/120372.
The current code for canonicalize_comparison, uses gen_move_insn and rtx_cost to find
out the cost of generating a constant. This is ok in most cases except sometimes
the comparison instruction can handle different constants than a simple set
intruction can do. This changes to use rtx_cost directly with the outer being COMPARE
just like how prepare_cmp_insn handles that.
Note this is also a small speedup and small memory improvement because we are not creating
a move for the constant any more. Since we are not creating a psedu-register any more, this
also removes the check on that.
Also adds a dump so we can see why one choice was chosen over the other.
Build and tested for aarch64-linux-gnu.
gcc/ChangeLog:
* expmed.cc (canonicalize_comparison): Use rtx_cost directly
instead of gen_move_insn. Print out the choice if dump is enabled.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jonathan Wakely [Wed, 21 May 2025 14:29:02 +0000 (15:29 +0100)]
libstdc++: Fix vector(from_range_t, R&&) for exceptions [PR120367]
Because this constructor delegates to vector(a) the object has been
fully constructed and the destructor will run if an exception happens.
That means we need to set _M_finish == _M_start so that the destructor
doesn't try to destroy any elements.
libstdc++-v3/ChangeLog:
PR libstdc++/120367
* include/bits/stl_vector.h (_M_range_initialize): Initialize
_M_impl._M_finish.
* testsuite/23_containers/vector/cons/from_range.cc: Check with
a type that throws on construction.
exceptions during construction.
Jakub Jelinek [Thu, 22 May 2025 09:01:13 +0000 (11:01 +0200)]
bitintlower: Ensure extension of the most significant limb on info->extended targets
Shifts are the only special case I'm aware of where the most
significant limb (if it is has padding bits) is accessed inside of a loop or
with access outside of a loop but with variable idx. Everything else should
access the most significant limb using INTEGER_CST idx and thus can (and
should) deal with the needed extension on that access directly.
And RSHIFT_EXPR shouldn't really violate the content of the padding bits.
For LSHIFT_EXPR we should IMHO do the following (which fixes the testcase
on s390x-linux).
The LSHIFT_EXPR is
/* Lower
dst = src << n;
as
unsigned n1 = n % limb_prec;
size_t n2 = n / limb_prec;
size_t n3 = n1 != 0;
unsigned n4 = (limb_prec - n1) % limb_prec;
size_t idx;
size_t p = prec / limb_prec - (prec % limb_prec == 0);
for (idx = p; (ssize_t) idx >= (ssize_t) (n2 + n3); --idx)
dst[idx] = (src[idx - n2] << n1) | (src[idx - n2 - n3] >> n4);
if (n1)
{
dst[idx] = src[idx - n2] << n1;
--idx;
}
for (; (ssize_t) idx >= 0; --idx)
dst[idx] = 0; */
as described in the comment (note, the comments are for the little-endian
lowering only, didn't want to complicate it with endianity).
As can be seen, the most significant limb can be modifier either inside
of the loop or in the if (n1) body if the loop had 0 iterations.
In your patch you've modified I believe just the loop and not the if body,
and made it conditional on every iteration (furthermore through
gimplification of COND_EXPR which is not the way this is done elsewhere in
gimple-lower-bitint.cc, there is if_then helper and it builds
gimple_build_cond etc.). I think that is way too expensive. In theory we
could peel off the first iteration manually and do the info->extended
handling in there and do it again inside of the if (n1) case if idx ==
(bitint_big_endian ? size_zero_node : p) in that case, but I think just
doing the extension after the loops is easier.
Note, we don't need to worry about volatile here, the shift is done into
an addressable variable memory only if it is non-volatile, otherwise it
is computed into a temporary and then copied over into the volatile var.
2025-05-22 Jakub Jelinek <jakub@redhat.com>
* gimple-lower-bitint.cc (bitint_extended): New variable.
(bitint_large_huge::lower_shift_stmt): For LSHIFT_EXPR with
bitint_extended if lhs has most significant partial limb extend
it afterwards.
* gcc.dg/bitintext.h: New file.
* gcc.dg/torture/bitint-82.c: New test.
Jakub Jelinek [Thu, 22 May 2025 07:09:48 +0000 (09:09 +0200)]
i386: Extend *cmp<mode>_minus_1 optimizations also to plus with CONST_INT [PR120360]
As mentioned by Linus, we can't optimize comparison of otherwise unused
result of plus with CONST_INT second operand, compared against zero.
This can be done using just cmp instruction with negated constant and say
js/jns/je/jne etc. conditional jumps (or setcc).
We already have *cmp<mode>_minus_1 instruction which handles it when
(as shown in foo in the testcase) the IL has MINUS rather than PLUS,
but for constants except for the minimum value the canonical form is
with PLUS.
The following patch adds a new pattern and predicate to handle this.
2025-05-22 Jakub Jelinek <jakub@redhat.com>
PR target/120360
* config/i386/predicates.md (x86_64_neg_const_int_operand): New
predicate.
* config/i386/i386.md (*cmp<mode>_plus_1): New pattern.
Dongyan Chen [Thu, 22 May 2025 03:46:52 +0000 (21:46 -0600)]
[PATCH] testsuite: RISC-V: Update the cset-sext-sfb/zba-slliuw test optimization level.
Failed testcases occurred in the regression test of gcc: cset-sext-sfb.c failed
the -Oz test, and zba-slliuw.c failed the -Og test.
This patch solves the problem by skipping the optimization.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/cset-sext-sfb.c: Skip for -Oz.
* gcc.target/riscv/zba-slliuw.c: Skip for -Og.
Shreya Munnangi [Thu, 22 May 2025 00:49:14 +0000 (18:49 -0600)]
[RISC-V] Clear high or low bits using shift pairs
So the first special case of clearing bits from Shreya's work. We can clear an
arbitrary number of high bits by shifting left by the number of bits to clear,
then logically shifting right to put everything in place. Similarly we can
clear an arbitrary number of low bits with a right logical shift followed by a
left shift. Naturally this only applies when the constant synthesis budget is
2+ insns.
Even with mvconst_internal still enabled this does consistently show various
small code generation improvements.
I have seen a notable regression. The two shift form to wipe out high bits
isn't handled well by ext-dce. Essentially it looks like we don't recognize
the sequence as wiping upper bits, instead it makes bits live and as a result
we're unable to remove a prior zero extension. I've opened a bug for this
issue.
The other case I've seen is CSE related. If we had a number of masking
operations with the same mask, we might have previously CSE'd the constant. In
that scenario each instance of masking would be a single AND using the CSE'd
register holding the constant, whereas with this patch it'll be a pair of
shifts. But on a good uarch design the pair of shifts would be fused into a
single op. Given this is relatively rare and on the margins from a performance
standpoint I'm not going to worry about it.
This has spun in my tester for riscv32-elf and riscv64-elf. Bootstrap and
regression test is in flight and due in an hour or so. Waiting on the
upstream pre-commit tester and the bootstrap test before moving forward.
gcc/
* config/riscv/riscv.cc (synthesize_and): When profitable, use two
shift combinations to clear high or low bits rather than synthsizing
the constant.
Pengxuan Zheng [Wed, 21 May 2025 00:58:23 +0000 (17:58 -0700)]
aarch64: Carry over zeroness in aarch64_evpc_reencode
There was a bug in aarch64_evpc_reencode which could leave zero_op0_p and
zero_op1_p of the struct "newd" uninitialized. r16-701-gd77c3bc1c35e303 fixed
the issue by zero initializing "newd." This patch provides an alternative fix
as suggested by Richard Sandiford based on the fact that the zeroness is
preserved by aarch64_evpc_reencode.
gcc/ChangeLog:
* config/aarch64/aarch64.cc (aarch64_evpc_reencode): Copy zero_op0_p and
zero_op1_p from d to newd.
[PATCH] configure: Always add pre-installed header directories to search path
configure script was adding the target directory flags, including the
'-B' flags for the executable prefix and the '-isystem' flags for the
pre-installed header directories, to the target flags only for
non-Canadian builds under the premise that the host binaries under the
executable prefix will not be able to execute on the build system for
Canadian builds.
While that is true for the '-B' flags specifying the executable prefix,
the '-isystem' flags specifying the pre-installed header directories are
not affected by this and do not need special handling.
This patch updates the configure script to always add the 'include' and
'sys-include' pre-installed header directories to the target search
path, in order to ensure that the availability of the pre-installed
header directories in the search path is consistent across non-Canadian
and Canadian builds.
When '--with-headers' flag is specified, this effectively ensures that
the libc headers, that are copied from the specified header directory to
the sys-include directory, are used by libstdc++.
Andrew Pinski [Mon, 5 May 2025 16:46:14 +0000 (09:46 -0700)]
combine: gen_lowpart_no_emit vs CLOBBER [PR120090]
The problem here is simplify-rtx.cc expects gen_lowpart_no_emit
to return NULL on failure but combine's hook was returning CLOBBER.
After r16-160-ge6f89d78c1a7528e93458278, gcc.target/i386/avx512bw-pr103750-2.c
started to fail at -m32 due to this as new simplify code would return
a RTL with a clobber in it rather than returning NULL.
To fix this gen_lowpart_no_emit should return NULL when there was an failure
instead of a clobber. This only changes the gen_lowpart_no_emit hook and not the
generic gen_lowpart hook as parts of combine just pass gen_lowpart result directly
without checking the return value.
Bootstrapped and tested on x86_64-linux-gnu.
PR rtl-optimization/120090
gcc/ChangeLog:
* combine.cc (gen_lowpart_for_combine_no_emit): New function.
(RTL_HOOKS_GEN_LOWPART_NO_EMIT): Set to gen_lowpart_for_combine_no_emit.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Jeff Law [Wed, 21 May 2025 22:04:58 +0000 (16:04 -0600)]
[RISC-V] Improve (x << C1) + C2 split code
I wrote this a couple months ago to fix an instruction count regression in
505.mcf on risc-v, but I don't have a trivial little testcase to add to the
suite.
There were two problems with the pattern.
First, the code was generating a shift followed by an add after reload.
Naturally combine doesn't run after reload and the code stayed in that form
rather than using shadd when available.
Second the splitter was just over-active. We need to make sure that the
shifted form of the constant operand has a cost > 1 to synthesize. It's
useless to split if the shifted constant can be synthesized in a single
instruction.
This has been in my tester since March. So it's been through numerous
riscv64-elf and riscv32-elf test cycles as well as multiple rv64 bootstrap
tests. Waiting on the upstream CI system to render a verdict before moving
forward.
Looking further out I'm hoping this pattern will transform into a simpler and
always active define_split.
gcc/
* config/riscv/riscv.md ((x << C1) + C2): Tighten split condition
and generate more efficient code when splitting.
Jeff Law [Wed, 21 May 2025 20:15:23 +0000 (14:15 -0600)]
[RISC-V][PR target/120368] Fix 32bit shift on rv64
So a followup to last week's bugfix. In last week's change we we stopped using
define_insn_and_split to rewrite instructions. That change was done to avoid
dropping a masking instruction out of the RTL.
As a result the pattern(s) were changed into simple define_insns, which is
good. One of them uses the GPR iterator since it's supposed to work for both
32bit and 64bit shifts on rv64.
But we failed to emit the right opcode for a 32bit shift on rv64. Thankfully
the fix is trivial. If the mode is anything but word_mode, then we must be
doing a 32-bit shift on rv64, ie the various "w" shift instructions.
It's run through my tester. Just waiting on the upstream CI system to spin it.
PR target/120368
gcc/
* config/riscv/riscv.md (shift with masked shift count): Fix
opcode when generating an SImode shift on rv64.
gcc/testsuite/
* gcc.target/riscv/pr120368.c: New test.
Pan Li [Tue, 20 May 2025 07:00:15 +0000 (15:00 +0800)]
RISC-V: RISC-V: Combine vec_duplicate + vand.vv to vand.vx on GR2VR cost
This patch would like to combine the vec_duplicate + vand.vv to the
vand.vx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have example code like below, GR2VR cost is 0.
#define DEF_VX_BINARY(T, OP) \
void \
test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
{ \
for (unsigned i = 0; i < n; i++) \
out[i] = in[i] OP x; \
}
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add new
case for rtx code AND.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op and to no_shift_vx_ops.
Since r13-6296, we haven't got 4 simdclone calls for these tests on
ia32 without avx_runtime. With avx_runtime, we get 3 such calls even
on ia32, but we didn't test for anything on ia32 with avx_runtime.
Adjust and simplify the expectations and comments.
If the toolchain is built with --enable-frame-pointer,
gcc.target/i386/no-callee-saved-16.c will not get the expected
optimization without -fomit-frame-pointer, that would be enabled by
-O2 without the configure flag. Add it.
Alexandre Oliva [Wed, 21 May 2025 09:20:29 +0000 (06:20 -0300)]
[testsuite] [x86] double copysign requires -msse2
SSE_FLOAT_MODE_P only holds for DFmode with SSE2, and that's a
condition for copysign<mode>3 to be available under TARGET_SSE_MATH.
Various copysign testcases use -msse -mfpmath=sse on ia32 to enable
the copysign builtins and patterns, but that would only be enough if
the tests were limited to floats. Since they test doubles as well, we
need -msse2 instead of -msse.
for gcc/testsuite/ChangeLog
* gcc.dg/fold-copysign-1.c: Bump to sse2 on ia32.
* gcc.dg/pr55152-2.c: Likewise.
* gcc.dg/tree-ssa/abs-4.c: Likewise.
* gcc.dg/tree-ssa/backprop-6.c: Likewise.
Alexandre Oliva [Wed, 21 May 2025 09:20:22 +0000 (06:20 -0300)]
[testsuite] [aarch64] match alt cache clear names in sme nonlocal_goto tests
vxworks calls cacheTextUpdate instead of __clear_cache.
Adjust the sme/nonlocal_goto_*.c tests for inexact matches.
for gcc/testsuite/ChangeLog
* gcc.target/aarch64/sme/nonlocal_goto_1.c: Match
vxworks cache-clearing function as well.
* gcc.target/aarch64/sme/nonlocal_goto_2.c: Likewise.
* gcc.target/aarch64/sme/nonlocal_goto_3.c: Likewise.
Alexandre Oliva [Wed, 21 May 2025 09:20:11 +0000 (06:20 -0300)]
[testsuite] tolerate missing std::stold
basic_string.h doesn't define the non-w string version of std::stold
when certain conditions aren't met, and then a couple of tests fail to
compile.
Guard the portions of the tests that depend on std::stold with the
conditions for it to be defined.
for libstdc++-v3/ChangeLog
* testsuite/21_strings/basic_string/numeric_conversions/char/stold.cc:
Guard non-wide stold calls with conditions for it to be
defined.
* testsuite/27_io/basic_ostream/inserters_arithmetic/char/hexfloat.cc:
Likewise.
Alexandre Oliva [Wed, 21 May 2025 09:20:03 +0000 (06:20 -0300)]
[testsuite] [analyzer] [vxworks] define __STDC_WANT_LIB_EXT1__ to 1
vxworks' headers use #if instead of #ifdef to test for
__STDC_WANT_LIB_EXT1__, so the definition in the analyzer test
strotok-cppreference.c catches a bug there, but not something it's
meant to catch or that we could fix in GCC, so amend the definition to
sidestep the libc bug.
for gcc/testsuite/ChangeLog
* c-c++-common/analyzer/strtok-cppreference.c
(__STDC_WANT_LIB_EXT1__): Define to 1.
Alexandre Oliva [Wed, 21 May 2025 09:19:46 +0000 (06:19 -0300)]
vxworks: libgcc: include string.h for memset
gthr-vxworks-thread.c calls memset in __ghtread_cond_signal, but it
fails ot include <string.h>, where this function is declared, and GCC
14 rejects calls of undeclared functions. Include the required
header.
for libgcc/ChangeLog
* config/gthr-vxworks-thread.c: Include string.h for memset.
genemit has traditionally used open-coded gen_rtx_FOO sequences
to build up the instruction pattern. This is now the source of
quite a bit of bloat in the binary, and also a source of slow
compile times.
Two obvious ways of trying to deal with this are:
(1) Try to identify rtxes that have a similar form and use shared
routines to generate rtxes of that form.
(2) Use a static table to encode the rtx and call a common routine
to expand it.
I did briefly look at (1). However, it's more complex than (2),
and I think suffers from being the worst of both worlds, for reasons
that I'll explain below. This patch therefore does (2).
In theory, one of the advantages of open-coding the calls to
gen_rtx_FOO is that the rtx can be populated using stores of known
constants (for the rtx code, mode, unspec number, etc). However,
the time spent constructing an rtx is likely to be dominated by
the call to rtx_alloc, rather than by the stores to the fields.
Option (1) above loses this advantage of storing constants.
The shared routines would parameterise an rtx according to things
like the modes on the rtx and its suboperands, so the code would
need to fetch the parameters. In a sense, the rtx structure would
be open-coded but the parameters would be table-encoded (albeit
in a simple way).
The expansion code also shouldn't be particularly hot. Anything that
treats expand/discard cycles as very cheap would be misconceived,
since each discarded expansion generates garbage memory that needs
to be cleaned up later.
Option (2) turns out to be pretty simple -- certainly simpler
than (1) -- and seems to give a reasonable saving. Some numbers,
all for --enable-checking=yes,rtl,extra:
[A] size of the @progbits sections in insn-emit-*.o, new / old
[B] size of the load segments in cc1, new / old
[C] time to compile a typical insn-emit*.cc, new / old
To get an idea of the effect on the final compiler, I tried compiling
fold-const.ii with -O0 (no -g), since that should give any slowdown
less room to hide. I couldn't measure any difference in compile time
before or after the patch for any of the three variants above.
gcc/
* gensupport.h (needs_barrier_p): Delete.
* gensupport.cc (needs_barrier_p): Likewise.
* rtl.h (always_void_p): Return true for PC, RETURN and SIMPLE_RETURN.
(expand_opcode): New enum class.
(expand_rtx, complete_seq): Declare.
* emit-rtl.cc (rtx_expander): New class.
(expand_rtx, complete_seq): New functions.
* gengenrtl.cc (special_rtx, excluded_rtx): Add a cross-reference
comment.
* genemit.cc (FIRST_CODE): New constant.
(print_code): Delete.
(generator::file, generator::used, generator::sequence_type): Delete.
(generator::bytes): New member variable.
(generator::generator): Update accordingly.
(generator::gen_rtx_scratch): Delete.
(generator::add_uint, generator::add_opcode, generator::add_code)
(generator::add_match_operator, generator::add_exp)
(generator::add_vec, generator::gen_table): New member functions.
(generator::gen_exp): Rewrite to use a bytecode expansion.
(generator::gen_emit_seq): Likewise.
(start_gen_insn): Return the C++ expression for the operands array.
(gen_insn, gen_expand, gen_split): Update callers accordingly.
(emit_c_code): Remove use of _val.
genemit: Avoid using gen_exp in output_add_clobbers
output_add_clobbers emits code to add:
(clobber (scratch:M))
and/or:
(clobber (reg:M R))
expressions to the end of a PARALLEL. At the moment, it does this
using the general gen_exp function. That makes sense with the code
in its current form, but with later patches it's more convenient to
handle the two cases directly.
This also avoids having to pass an md_rtx_info that is unrelated
to the clobber expressions.
gcc/
* genemit.cc (clobber_pat::code): Delete.
(maybe_queue_insn): Don't set clobber_pat::code.
(output_add_clobbers): Remove info argument and output the two
REG and SCRATCH cases directly.
(main): Update call accordingly.
gen_exp currently supports the 's' (string) operand type. It would
certainly be possible to make the upcoming bytecode patch support
that too. However, the rtx codes that have string operands should
be very rarely used in hard-coded define_insn/expand/split/peephole2
rtx templates (as opposed to things like attribute expressions,
where const_string is commonplace). And AFAICT, no current target
does use them like that.
This patch therefore reports an error for these rtx codes,
rather than adding code that would be unused and untested.
gcc/
* genemit.cc (generator::gen_exp): Report an error for 's' operands.
gen_exp had code to handle the 'L' operand format. But this format
is specifically for location_ts, which are only used in RTX_INSNs.
Those should never occur in this context, where the input is always
an md file rather than an __RTL function. Any hard-coded raw
location value would be meaningless anyway.
It seemed safer to turn this into an error rather than a gcc_unreachable.
gcc/
* genemit.cc (generator::gen_exp): Raise an error if we see
an 'L' operand.
gen_exp has code to detect when the same operand is used multiple
times. It ensures that second and subsequent uses call copy_rtx,
to enforce correct unsharing.
However, for historical reasons that aren't clear to me, this was
skipped for a define_insn unless the define_insn was a parallel.
It was also skipped for a single define_expand instruction,
regardless of its contents.
This meant that a single parallel instruction was treated differently
between define_insn (where sharing rules were followed) and
define_expand (where sharing rules weren't followed). define_splits
and define_peephole2s followed the sharing rules in all cases.
This patch makes everything follow the sharing rules. The code
it touches will be removed by the proposed bytecode-based expansion,
which will use its own tracking when enforcing sharing rules.
However, it seemed better for staging and bisection purposes
to make this change first.
gcc/
* genemit.cc (generator::used): Update comment.
(generator::gen_exp): Remove handling of null unused arrays.
(gen_insn, gen_expand): Always pass a used array.
(output_add_clobbers): Note why the used array is null here.
gen_exp now has quite a few arguments that need to be passed
to each recursive call. This patch turns it and related routines
into member functions of a new generator class, so that the shared
information can be stored in member variables.
This also helps to make later patches less noisy.
gcc/
* genemit.cc (generator): New structure.
(gen_rtx_scratch, gen_exp, gen_emit_seq): Turn into member
functions of generator.
(gen_insn, gen_expand, gen_split, output_add_clobbers): Update
users accordingly.
genemit: Consistently use operand arrays in gen_* functions
One slightly awkward part about emitting the generator function
bodies is that:
* define_insn and define_expand routines have a separate argument for
each operand, named "operand0" upwards.
* define_split and define_peephole2 routines take a pointer to an array,
named "operands".
* the C++ preparation code for expands, splits and peephole2s uses an
array called "operands" to refer to the operands.
* the automatically-generated code uses individual "operand<N>"
variables to refer to the operands.
So define_expands have to store the incoming arguments into an operands
array before the md file's C++ code, then copy the operands array back
to the individual variables before the automatically-generated code.
splits and peephole2s have to copy the incoming operands array to
individual variables after the md file's C++ code, creating more
local variables that are live across calls to rtx_alloc.
This patch tries to simplify things by making the whole function
body use the operands array in preference to individual variables.
define_insns and define_expands store their arguments to the array
on entry.
This would have pros and cons on its own, but having a single array
helps with future efforts to reduce the duplication between gen_*
functions.
gcc/
* genemit.cc (gen_rtx_scratch, gen_exp): Use operands[%d] rather than
operand%d.
(start_gen_insn): Mark the incoming arguments as const and store
them to an operands array.
(gen_expand, gen_split): Remove copies into and out of the operands
array.
An earlier version of this series wanted to collect information
about all the gen_* functions that are going to be generated.
The current version no longer does that, but the queue seemed
worth keeping anyway, since it gives a more consistent structure.
gcc/
* genemit.cc (queue): New static variable.
(maybe_queue_insn): New function, split out from...
(gen_insn): ...here.
(queue_expand): New function, split out from...
(gen_expand): ...here.
(gen_split): New function, split out from...
(queue_split): ...here.
(main): Queue definitions for later processing rather than
emitting them on the fly.
This patch makes genemit.cc pass the md_rtx_info around by constant
reference rather than pointer. It's somewhat of a cosmetic change
on its own, but it makes later changes less noisy.
gcc/
* genemit.cc (gen_exp): Make the info argument a constant reference.
(gen_emit_seq, gen_insn, gen_expand, gen_split): Likewise.
(output_add_clobbers): Likewise.
(main): Update calls accordingly.
The automatically-generated gen_* routines take their operands as
individual arguments, named "operand0" upwards. These arguments are
stored into an "operands" array before invoking the expander's C++
code, which can then modify the operands by writing to the array.
However, the SPARC sign-extend and zero-extend expanders used the
operandN variables directly, rather than operands[N]. That's a
correct usage in context, since the code goes on to expand the
pattern manually and invoke DONE.
But it's also easy for code to accidentally write to operandN instead
of operands[N] when trying to set up something like a match_dup.
It sounds like Jeff had seen an instance of this.
A later patch is therefore going to mark the operandN arguments
as const. This patch makes way for that by using operands[N]
instead of operandN for the SPARC expanders.
gcc/
* config/sparc/sparc.md (zero_extendhisi2, zero_extendhidi2)
(extendhisi2, extendqihi2, extendqisi2, extendqidi2)
(extendhidi2): Use operands[0] and operands[1] instead of
operand0 and operand1.
nds32: Avoid accessing beyond the operands[] array
This pattern used operands[2] to hold the shift amount, even though
the pattern doesn't have an operand 2 (not even as a match_dup).
This caused a build failure with -Werror:
array subscript 2 is above array bounds of ‘rtx_def* [2]’
gcc/
PR target/100837
* config/nds32/nds32-intrinsic.md (unspec_get_pending_int): Use
a local variable instead of operands[2].
Iain Sandoe [Sun, 11 May 2025 19:36:58 +0000 (20:36 +0100)]
c++, coroutines: Use decltype(auto) for the g_r_o.
The revised wording for coroutines, uses decltype(auto) for the
type of the get return object, which preserves references.
It is quite reasonable for a coroutine body implementation to
complete before control is returned to the ramp - and in that
case we would be creating the ramp return object from an already-
deleted promise object.
Jason observes that this is a terrible situation and we should
seek a resolution to it via core.
Since the test added here explicitly performs the unsafe action
dscribed above we expect it to fail (until a resolution is found).
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Use
decltype(auto) to determine the type of the temporary
get_return_object.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr115908.C: Count promise construction
and destruction. Run the test and XFAIL it.
Iain Sandoe [Mon, 12 May 2025 18:47:42 +0000 (19:47 +0100)]
c++, coroutines: Address CWG2563 return value init [PR119916].
This addresses the clarification that, when the get_return_object is of a
different type from the ramp return, any necessary conversions should be
performed on the return expression (so that they typically occur after the
function body has started execution).
PR c++/119916
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Do not
initialise initial_await_resume_called here...
(cp_coroutine_transform::build_ramp_function): ... but here.
When the coroutine is not void, initialize a GRO object from
promise.get_return_object(). Use this as the argument to the
return expression. Use a regular cleanup for the GRO, since
it is ramp-local.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/torture/special-termination-00-sync-completion.C:
Amend for CWG2563 expected behaviour.
* g++.dg/coroutines/torture/special-termination-01-self-destruct.C:
Likewise.
* g++.dg/coroutines/torture/pr119916.C: New test.
Andrew Pinski [Tue, 20 May 2025 20:21:28 +0000 (13:21 -0700)]
middle-end: Fix complex lowering of cabs with no LHS [PR120369]
This was introduced by r15-1797-gd8fe4f05ef448e . I had missed that
the LHS of the cabs call could be NULL. This seems to only happen at -O0,
I tried to produce one that happens at -O1 but needed many different
options to prevent the removal of the call.
Anyways the fix is just keep around the call if the LHS is null.
Bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/120369
gcc/ChangeLog:
* tree-complex.cc (gimple_expand_builtin_cabs): Return early
if the LHS of cabs is null.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr120369-1.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Shreya Munnangi [Wed, 21 May 2025 02:15:42 +0000 (20:15 -0600)]
[RISC-V] Infrastructure of synthesizing logical AND with constant
So this is the next step on the path to mvconst_internal removal and is work
from Shreya and myself.
This puts in the infrastructure to allow us to synthesize logical AND much like
we're doing with logical IOR/XOR.
Unlike IOR/XOR, AND has many more special cases that can be profitable. For
example, you can use shifts to clear many bits. You can use zero extension to
clear bits, you can use rotate+andi+rotate, shift pairs, etc.
So to make potential bisecting easy the plan is to drop in the work on logical
AND in several steps, essentially one new case at a time.
This step just puts the basics of a operation synthesis in place. It still
uses the same code generation strategies as we are currently using.
I'd like to say this is NFC, but unfortunately that's not true. While the code
generation strategy is the same, this does indirectly introduce new REG_EQUAL
notes. Those additional notes in turn can impact how various optimizers behave
in very minor ways.
As usual, this has survived my tester on riscv32-elf and riscv64-elf.
Waiting on pre-commit to do its thing. And I'll start queuing up the
additional cases we want to handle while waiting 😉
gcc/
* config/riscv/riscv-protos.h (synthesize_and): Prototype.
* config/riscv/riscv.cc (synthesize_and): New function.
* config/riscv/riscv.md (and<mode>3): Use it.
Andrew Pinski [Sun, 18 May 2025 07:06:38 +0000 (00:06 -0700)]
match: Remove valueize_condition argument from gimple_extra template
After r15-4791-gb60031e8f9f8fe, the valueize_condition argument becomes
unused. I didn't notice that as there was -Wno-unused option being added
while compiling gimple-match-exports.cc. This removes that too as there are
no unused warnings.
Umesh Kalappa [Tue, 20 May 2025 17:57:00 +0000 (11:57 -0600)]
[PATCH v2 2/2] MIPS p8700 doesn't have vector extension and added the dummies reservation for the same.
The RISC-V backend requires all types to map to a reservation in the
scheduler model. This adds types to a dummy reservation for all the
types not currently handled by the p8700 model.
gcc/
* config/riscv/mips-p8700.md (mips_p8700_dummies): New
reservation.
(mips_p8700_unknown): Reservation for all the dummies.
Umesh Kalappa [Tue, 20 May 2025 17:50:46 +0000 (11:50 -0600)]
[PATCH v2 1/2] The following changes enable P8700 processor for RISCV and P8700 is a high-performance processor from MIPS by extending RISCV with custom instructions
Add support for the p8700 design from MIPS.
gcc/
* config/riscv/mips-p8700.md: New scheduler model.
* config/riscv/riscv-cores.def (mips-p87000): New tuning model
and core architecture.
* config/riscv/riscv-opts.h (riscv_microarchitecture_type); Add
mips-p8700.
* config/riscv/riscv.cc (mips_p8700_tune_info): New uarch
tuning parameters.
* config/riscv/riscv.md (tune): Add mips_p8700.
Include mips-p8700.md
* doc/invoke.texi: Document tune/cpu options for the MIPS P8700.