Patrick O'Neill [Tue, 20 Aug 2024 18:51:50 +0000 (11:51 -0700)]
RISC-V: Emit costs for bool and stepped const vectors
These cases are handled in the expander
(riscv-v.cc:expand_const_vector). We need the vector builder to detect
these cases so extract that out into a new riscv-v.h header file.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (class rvv_builder): Move to riscv-v.h.
* config/riscv/riscv.cc (riscv_const_insns): Emit placeholder costs for
bool/stepped const vectors.
* config/riscv/riscv-v.h: New file.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Patrick O'Neill [Tue, 20 Aug 2024 18:29:12 +0000 (11:29 -0700)]
RISC-V: Handle case when constant vector construction target rtx is not a register
This manifests in RTL that is optimized away which causes runtime failures
in the testsuite. Update all patterns to use a temp result register if required.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (expand_const_vector): Use tmp register if
needed.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Patrick O'Neill [Tue, 20 Aug 2024 18:38:20 +0000 (11:38 -0700)]
RISC-V: Reorder insn cost match order to match corresponding expander match order
The corresponding expander (riscv-v.cc:expand_const_vector) matches
const_vec_duplicate_p before const_vec_series_p. Reorder to match this
behavior when calculating costs.
Christophe Lyon [Wed, 21 Aug 2024 13:58:08 +0000 (13:58 +0000)]
arm: Always use vmov.f64 instead of vmov.f32 with MVE
With MVE, vmov.f64 is always supported (no need for +fp.dp extension).
This patch updates two patterns:
- in movdi_vfp, we incorrectly checked
TARGET_VFP_SINGLE || TARGET_HAVE_MVE instead of
TARGET_VFP_SINGLE && !TARGET_HAVE_MVE, and didn't take into account
these two possibilities when computing the length attribute.
- in thumb2_movdf_vfp, we checked only TARGET_VFP_SINGLE.
No need to update movdf_vfp, since it is enabled only for TARGET_ARM
(which is not the case when MVE is enabled).
The patch also updates gcc.target/arm/armv8_1m-fp64-move-1.c, to
accept only vmov.f64 instead of vmov.f32.
Tested on arm-none-eabi with:
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp
qemu/-mthumb/-mtune=cortex-m55/-mfloat-abi=hard/-mfpu=auto/-march=armv8.1-m.main+mve.fp+fp.dp
H.J. Lu [Tue, 27 Aug 2024 14:03:22 +0000 (07:03 -0700)]
Extend check-function-bodies to allow label and directives
As PR target/116174 shown, we may need to verify labels and the directive
order. Extend check-function-bodies to support matched output lines to
allow label and directives.
gcc/
* doc/sourcebuild.texi (check-function-bodies): Add an optional
argument for matched output lines.
gcc/testsuite/
* gcc.target/i386/pr116174.c: Use check-function-bodies.
* lib/scanasm.exp (parse_function_bodies): Append the line if
$up_config(matched) matches the line.
(check-function-bodies): Add an argument for matched. Set
up_config(matched) to $matched. Append the expected line without
$config(line_prefix) to function_regexp if it starts with ".L".
Michael Matz [Thu, 22 Aug 2024 15:21:42 +0000 (17:21 +0200)]
LRA: Fix setup_sp_offset
This is part of making m68k work with LRA. See PR116429.
In short: setup_sp_offset is internally inconsistent. It wants to
setup the sp_offset for newly generated instructions. sp_offset for
an instruction is always the state of the sp-offset right before that
instruction. For that it starts at the (assumed correct) sp_offset
of the instruction right after the given (new) sequence, and then
iterates that sequence forward simulating its effects on sp_offset.
That can't ever be right: either it needs to start at the front
and simulate forward, or start at the end and simulate backward.
The former seems to be the more natural way. Funnily the local
variable holding that instruction is also called 'before'.
This changes it to the first variant: start before the sequence,
do one simulation step to get the sp-offset state in front of the
sequence and then continue simulating.
More details: in the problematic testcase we start with this
situation (sp_off before 550 is 0):
The call insn sp_off remains at the correct -16, but internally it's already
inconsistent here. If the sp_off before an insn is -16, and that insn
pre_decs sp, then the after-insn sp_off should be -20.
PR target/116429
* lra.cc (setup_sp_offset): Start with sp_offset from
before the new sequence, not from after.
Michael Matz [Thu, 22 Aug 2024 15:09:11 +0000 (17:09 +0200)]
LRA: Don't use 0 as initialization for sp_offset
this is part of making m68k work with LRA. See PR116374.
m68k has the property that sometimes the elimation offset
between %sp and %argptr is zero. During setting up elimination
infrastructure it's changes between sp_offset and previous_offset
that feed into insns_with_changed_offsets that ultimately will
setup looking at the instructions so marked.
But the initial values for sp_offset and previous_offset are
also zero. So if the targets INITIAL_ELIMINATION_OFFSET (called
in update_reg_eliminate) is zero then nothing changes, the
instructions in question don't get into the list to consider and
the sp_offset tracking goes wrong.
Solve this by initializing those member with -1 instead of zero.
An initial offset of that value seems very unlikely, as it's
in word-sized increments. This then also reveals a problem in
eliminate_regs_in_insn where it always uses sp_offset-previous_offset
as offset adjustment, even in the first_p pass. That was harmless
when previous_offset was uninitialized as zero. But all the other
code uses a different idiom of checking for first_p (or rather
update_p which is !replace_p&&!first_p), and using sp_offset directly.
So use that as well in eliminate_regs_in_insn.
PR target/116374
* lra-eliminations.cc (init_elim_table): Use -1 as initializer.
(update_reg_eliminate): Accept -1 as not-yet-used marker.
(eliminate_regs_in_insn): Use previous_sp_offset only when
not first_p.
Michael Matz [Thu, 22 Aug 2024 15:03:56 +0000 (17:03 +0200)]
final: go down ASHIFT in walk_alter_subreg
when experimenting with m68k plus LRA one of the
changes in the backend is to accept ASHIFTs (not only
MULT) as scale code for address indices. When then not
turning on LRA but using reload those addresses are
presented to it which chokes on them. While reload is
going away the change to make them work doesn't really hurt
(and generally seems useful, as MULT and ASHIFT really are
no different). So just add it.
PR target/116413
* final.cc (walk_alter_subreg): Recurse on AHIFT.
Jonathan Wakely [Tue, 27 Aug 2024 12:30:42 +0000 (13:30 +0100)]
libstdc++: Do not use std::vector<bool>::reference default ctor [PR115098]
This default constructor was made private by r15-3124-gb25b101bc38000 so
the pretty printer tests need a fix to stop using it. There's no
conforming way to get a default-constructed 'reference' now, e.g. trying
to access an element of a default-constructed std::vector<bool> will
trigger an assertion. Remove the tests, but leave a comment in the
printer code about handling it.
libstdc++-v3/ChangeLog:
PR libstdc++/115098
* python/libstdcxx/v6/printers.py (StdBitReferencePrinter): Add
comment.
* testsuite/libstdc++-prettyprinters/simple.cc: Do not default
construct std::vector<bool>::reference.
* testsuite/libstdc++-prettyprinters/simple11.cc: Likewise.
Jonathan Wakely [Tue, 27 Aug 2024 11:19:47 +0000 (12:19 +0100)]
c++: Add most missing C++20 and C++23 names to cxxapi-data.csv
This includes uncommenting the atomic_flag non-member functions, which
were added by PR libstdc++/103934.
Also generate a hint for std::ignore, which was recently tweaked to be
more generally useful by P2968R2, which r15-2324 implemented.
gcc/cp/ChangeLog:
* cxxapi-data.csv: Add C++20 and C++23 names from <chrono>,
<format>, <generator>, <iterator>, <print>, and <stdfloat>.
Set cxx11 dialect for std::ignore in <tuple>. Uncomment
atomic_flag functions from <atomic>.
* std-name-hint.gperf: Regenerate.
* std-name-hint.h: Regenerate.
Handle arithmetic on eliminated address indices [PR116413]
This patch fixes gcc.c-torture/compile/opout.c for m68k with LRA
enabled. The test has:
...
z (a, b)
{
return (int) &a + (int) &b + (int) x + (int) z;
}
so it adds the address of two incoming arguments. This ends up
being treated as an LEA in which the "index" is the incoming
argument pointer, which the LEA multiplies by 2. The incoming
argument pointer is then eliminated, leading to:
lra: Don't apply eliminations to allocated registers [PR116321]
The sequence of events in this PR is that:
- the function has many addresses in which only a single hard base
register is acceptable. Let's call the hard register H.
- IRA allocates that register to one of the pseudo base registers.
Let's call the pseudo register P.
- Some of the other addresses that require H occur when P is still live.
- LRA therefore has to spill P.
- When it reallocates P, LRA chooses to use FRAME_POINTER_REGNUM,
which has been eliminated to the stack pointer. (This is ok,
since the frame register is free.)
- Spilling P causes LRA to reprocess the instruction that uses P.
- When reprocessing the address that has P as its base, LRA first
applies the new allocation, to get FRAME_POINTER_REGNUM,
and then applies the elimination, to get the stack pointer.
The last step seems wrong: the elimination should only apply to
pre-existing uses of FRAME_POINTER_REGNUM, not to uses that result
from allocating pseudos. Applying both means that we get the wrong
register number, and therefore the wrong class.
The PR is about an existing testcase that fails with LRA on m86k.
gcc/
PR middle-end/116321
* lra-constraints.cc (get_hard_regno): Only apply eliminations
to existing hard registers.
(get_reg_class): Likewise.
Iain Sandoe [Mon, 26 Aug 2024 13:09:40 +0000 (14:09 +0100)]
c++, coroutines: The frame pointer is used in the helpers [PR116482].
We have a bogus warning about the coroutine state frame pointers
being apparently unused in the resume and destroy functions. Fixed
by making the parameters DECL_ARTIFICIAL.
PR c++/116482
gcc/cp/ChangeLog:
* coroutines.cc
(coro_build_actor_or_destroy_function): Make the parameter
decls DECL_ARTIFICIAL.
Richard Biener [Mon, 26 Aug 2024 11:50:00 +0000 (13:50 +0200)]
tree-optimization/116460 - ICE with DCE in forwprop
The following avoids removing stmts with defs that might still have
uses in the IL before calling simple_dce_from_worklist which might
remove those as that will wreck debug stmt generation. Instead first
perform use-based DCE and then remove stmts which may have uses in
code that CFG cleanup will remove. This requires tracking stmts
in to_remove by their SSA def so we can check whether it was removed
before without running into the issue that PHIs can be ggc_free()d
upon removal. So this adds to_remove_defs in addition to to_remove
which has to stay to track GIMPLE_NOPs we want to elide.
PR tree-optimization/116460
* tree-ssa-forwprop.cc (pass_forwprop::execute): First do
simple_dce_from_worklist and then remove stmts in to_remove.
Track defs to be removed in to_remove_defs.
Bernd Edlinger [Mon, 26 Aug 2024 16:06:52 +0000 (18:06 +0200)]
Fix another inline7.c test failure on sparc targets
This new test was reported to be still failing on sparc targets.
Here the number of DW_AT_ranges dropped to zero.
The test should pass on this architecture with -Os, -O2 and -O3.
I tried to improve also different known problematic targets,
where only one subroutine had DW_AT_ranges:
Those are armhf (arm with hard float), powerpc and powerpc64.
The best option is to use -Os: So far the only one, where
all two inline instances in this test had two DW_AT_ranges.
gcc/testsuite/ChangeLog:
PR other/116462
* gcc.dg/debug/dwarf2/inline7.c: Switch to -Os optimization.
Pan Li [Mon, 26 Aug 2024 07:58:52 +0000 (15:58 +0800)]
RISC-V: Support IMM for operand 1 of ussub pattern
This patch would like to allow IMM for the operand 1 of ussub pattern.
Aka .SAT_SUB(x, 22) as the below example.
Form 2:
#define DEF_SAT_U_SUB_IMM_FMT_2(T, IMM) \
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_2 (T x) \
{ \
return x >= (T)IMM ? x - (T)IMM : 0; \
}
DEF_SAT_U_SUB_IMM_FMT_2(uint64_t, 1022)
It is almost the as support imm for operand 0 of ussub pattern, but
allow the second operand to be imm insted of the first operand.
The below test suites are passed for this patch:
1. The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_expand_ussub): Gen xmode for the
second operand, aka y in parameter.
* config/riscv/riscv.md (ussub<mode>3): Allow const_int for operand 2.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_sub_imm-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-5_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-6_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-7_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-8.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-5.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-6.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-7.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-8.c: New test.
Nathaniel Shead [Thu, 22 Aug 2024 10:41:54 +0000 (20:41 +1000)]
c++/modules: Fix include translation for already-seen headers [PR99243]
After importing a header unit we learn about and setup any header
modules that we transitively depend on. However, this causes
'set_filename' to fail an assertion if we then come across this header
as an #include and attempt to translate it into a module. We still need
to do this translation so that libcpp learns that this is a header unit,
but we shouldn't error just because we've already seen it as an import.
Instead this patch merely checks and errors to handle the case of a
broken mapper implementation which supplies a different CMI path from
the one we already got.
As a drive-by fix, also make failing to find the CMI for a module be a
fatal error: any further errors in the TU are unlikely to be helpful.
PR c++/99243
gcc/cp/ChangeLog:
* module.cc (module_state::set_filename): Handle repeated calls
to 'set_filename' as long as the CMI path matches.
(maybe_translate_include): Adjust comment.
gcc/testsuite/ChangeLog:
* g++.dg/modules/map-2.C: Prune additional fatal error message.
* g++.dg/modules/inc-xlate-4_a.H: New test.
* g++.dg/modules/inc-xlate-4_b.H: New test.
* g++.dg/modules/inc-xlate-4_c.H: New test.
Nathaniel Shead [Thu, 22 Aug 2024 11:04:11 +0000 (21:04 +1000)]
c++/modules: Clean up include translation [PR110980]
Currently the handling of include translation is confusing to read,
using a tri-state integer without much clarity on what different states
mean. This patch cleans this up to use explicit enumerators indicating
the different possible states instead, and fixes a bug where the option
'-flang-info-include-translate' ended being accidentally unusable.
PR c++/110980
gcc/cp/ChangeLog:
* module.cc (maybe_translate_include): Clean up.
gcc/testsuite/ChangeLog:
* g++.dg/modules/inc-xlate-2_a.H: New test.
* g++.dg/modules/inc-xlate-2_b.H: New test.
* g++.dg/modules/inc-xlate-3.h: New test.
* g++.dg/modules/inc-xlate-3_a.H: New test.
combine.cc (make_more_copies): Copy attributes from the original pseudo, PR115883
The first of the late-combine passes, propagates some of the copies
made during the (in-time-)combine pass in make_more_copies into the
users of the "original" pseudo registers and removes the "old"
pseudos. That effectively removes attributes such as REG_POINTER,
which matter to LRA. The quoted PR is for an ICE-manifesting bug that
was exposed by the late-combine pass and went back to hiding with this
patch until commit r15-2937-g3673b7054ec2, the fix for PR116236, when
it was actually fixed. To wit, this patch is only incidentally
related to that bug.
In other words, the REG_POINTER attribute should not be required for
LRA to work correctly. This patch merely corrects state for those
propagated register-uses to ante late-combine.
For reasons not investigated, this fixes a failing test
"FAIL: gcc.dg/guality/pr54200.c -Og -DPREVENT_OPTIMIZATION line 20 z == 3"
for x86_64-linux-gnu.
PR middle-end/115883
* combine.cc (make_more_copies): Copy attributes from the original
pseudo to the new copy.
Arsen Arsenović [Fri, 23 Aug 2024 18:19:05 +0000 (20:19 +0200)]
c++/coros: do not assume coros don't nest [PR113457]
In the testcase presented in the PR, during template expansion, an
tsubst of an operand causes a lambda coroutine to be processed, causing
it to get an initial suspend and final suspend. The code for assigning
awaitable var names (get_awaitable_var) assumed that the sequence Is ->
Is -> Fs -> Fs is impossible (i.e. that one could only 'open' one
coroutine before closing it at a time), and reset the counter used for
unique numbering each time a final suspend occured. This assumption is
false in a few cases, usually when lambdas are involved.
Instead of storing this counter in a static-storage variable, we can
store it in coroutine_info. This struct is local to each function, so
we don't need to worry about "cross-contamination" nor resetting.
PR c++/113457
gcc/cp/ChangeLog:
* coroutines.cc (struct coroutine_info): Add integer field
awaitable_number. This is a counter used for assigning unique
names to awaitable temporaries.
(get_awaitable_var): Use awaitable_number from coroutine_info
instead of the static int awn.
gcc/testsuite/ChangeLog:
* g++.dg/coroutines/pr113457-1.C: New test.
* g++.dg/coroutines/pr113457.C: New test.
David Malcolm [Mon, 26 Aug 2024 16:24:22 +0000 (12:24 -0400)]
diagnostics: move output formats from diagnostic.{c,h} to their own files
In particular, move the classic text output code to a
diagnostic-text.cc (analogous to -json.cc and -sarif.cc).
No functional change intended.
gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add diagnostic-format-text.o.
* diagnostic-format-json.cc: Include "diagnostic-format.h".
* diagnostic-format-sarif.cc: Likewise.
* diagnostic-format-text.cc: New file, using material from
diagnostics.cc.
* diagnostic-global-context.cc: Include
"diagnostic-format.h".
* diagnostic-format-text.h: New file, using material from
diagnostics.h.
* diagnostic-format.h: New file, using material from
diagnostics.h.
* diagnostic.cc: Include "diagnostic-format.h" and
"diagnostic-format-text.h".
(diagnostic_text_output_format::~diagnostic_text_output_format):
Move to diagnostic-format-text.cc.
(diagnostic_text_output_format::on_report_diagnostic): Likewise.
(diagnostic_text_output_format::on_diagram): Likewise.
(diagnostic_text_output_format::print_any_cwe): Likewise.
(diagnostic_text_output_format::print_any_rules): Likewise.
(diagnostic_text_output_format::print_option_information):
Likewise.
* diagnostic.h (class diagnostic_output_format): Move to
diagnostic-format.h.
(class diagnostic_text_output_format): Move to
diagnostic-format-text.h.
(diagnostic_output_format_init): Move to
diagnostic-format.h.
(diagnostic_output_format_init_json_stderr): Likewise.
(diagnostic_output_format_init_json_file): Likewise.
(diagnostic_output_format_init_sarif_stderr): Likewise.
(diagnostic_output_format_init_sarif_file): Likewise.
(diagnostic_output_format_init_sarif_stream): Likewise.
* gcc.cc: Include "diagnostic-format.h".
* opts.cc: Include "diagnostic-format.h".
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic_group_plugin.c: Include
"diagnostic-format-text.h".
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
moving responsibility for phase 3 of formatting and printing the result
from diagnostic_context to the output format.
This simplifies diagnostic_context::report_diagnostic and allows us to
move the code that prints CWEs, rules, and option information in textual
form from diagnostic_context to diagnostic_text_output_format, where it
belongs.
No functional change intended.
gcc/ChangeLog:
* diagnostic-format-json.cc
(json_output_format::on_begin_diagnostic): Delete.
(json_output_format::on_end_diagnostic): Rename to...
(json_output_format::on_report_diagnostic): ...this and add call
to pp_output_formatted_text.
(diagnostic_output_format_init_json): Drop unnecessary calls
to disable textual printing of CWEs, rules, and options.
* diagnostic-format-sarif.cc (sarif_builder::end_diagnostic):
Rename to...
(sarif_builder::on_report_diagnostic): ...this and add call to
pp_output_formatted_text.
(sarif_output_format::on_begin_diagnostic): Delete.
(sarif_output_format::on_end_diagnostic): Rename to...
(sarif_output_format::on_report_diagnostic): ...this and update
call to m_builder accordingly.
(diagnostic_output_format_init_sarif): Drop unnecessary calls
to disable textual printing of CWEs, rules, and options.
* diagnostic.cc (diagnostic_context::print_any_cwe): Convert to...
(diagnostic_text_output_format::print_any_cwe): ...this.
(diagnostic_context::print_any_rules): Convert to...
(diagnostic_text_output_format::print_any_rules): ...this.
(diagnostic_context::print_option_information): Convert to...
(diagnostic_text_output_format::print_option_information):
...this.
(diagnostic_context::report_diagnostic): Replace calls to the
output format's on_begin_diagnostic, to pp_output_formatted_text,
printing CWE, rules, option info, and the call to the format's
on_end_diagnostic with a call to the format's
on_report_diagnostic.
(diagnostic_text_output_format::on_begin_diagnostic): Delete.
(diagnostic_text_output_format::on_end_diagnostic): Delete.
(diagnostic_text_output_format::on_report_diagnostic): New vfunc,
which effectively does the on_begin_diagnostic, the call to
pp_output_formatted_text, the calls for printing CWE, rules,
option info, and the call to the diagnostic_finalizer.
* diagnostic.h (diagnostic_output_format::on_begin_diagnostic):
Delete.
(diagnostic_output_format::on_end_diagnostic): Delete.
(diagnostic_output_format::on_report_diagnostic): New.
(diagnostic_text_output_format::on_begin_diagnostic): Delete.
(diagnostic_text_output_format::on_end_diagnostic): Delete.
(diagnostic_text_output_format::on_report_diagnostic): New.
(class diagnostic_context): Add friend class
diagnostic_text_output_format.
(diagnostic_context::get_urlifier): New accessor.
(diagnostic_context::print_any_cwe): Move decl...
(diagnostic_text_output_format::print_any_cwe): ...to here.
(diagnostic_context::print_any_rules): Move decl...
(diagnostic_text_output_format::print_any_rules): ...to here.
(diagnostic_context::print_option_information): Move decl...
(diagnostic_text_output_format::print_option_information): ...to
here.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 26 Aug 2024 16:24:22 +0000 (12:24 -0400)]
testsuite: add event IDs to multithreaded event plugin test
Add test coverage of "%@" in event messages in a multithreaded
execution path.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-inline-events.c:
Update expected output.
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py:
Likewise.
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-separate-events.c:
Likewise.
* gcc.dg/plugin/diagnostic_plugin_test_paths.c
(test_diagnostic_path::add_event_2): Return the id of the added
event.
(test_diagnostic_path::add_event_2_with_event_id): New.
(example_4): Add event IDs to the deadlock messages indicating
where the locks where acquired.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 26 Aug 2024 16:24:21 +0000 (12:24 -0400)]
testsuite: generalize support for Python tests for SARIF output
In r15-2354-g4d1f71d49e396c I added the ability to use Python to write
tests of SARIF output via a new "run-sarif-pytest" based
on "run-gcov-pytest", with a sarif.py support script in
testsuite/gcc.dg/sarif-output.
This followup patch:
(a) removes the limitation of such tests needing to be in
testsuite/gcc.dg/sarif-output by moving sarif.py to testsuite/lib
and adding logic to add that directory to PYTHONPATH when invoking
pytest.
(b) uses this to replace fragile regexp-based tests in
gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.c with
Python logic that verifies the structure within the generated JSON,
and to add test coverage for SARIF output relating to GCC plugins.
gcc/ChangeLog:
* diagnostic-format-sarif.cc: Add comments noting that we don't
yet capture any diagnostic_metadata::rules associated with a
diagnostic.
gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-metadata-sarif.c: New test,
based on diagnostic-test-metadata.c.
* gcc.dg/plugin/diagnostic-test-metadata-sarif.py: New script.
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.c:
Replace scan-sarif-file directives with run-sarif-pytest, to
run...
* gcc.dg/plugin/diagnostic-test-paths-multithreaded-sarif.py:
...this new test.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
diagnostic-test-metadata-sarif.c.
* gcc.dg/sarif-output/sarif.py: Move to...
* lib/sarif.py: ...here.
* lib/scansarif.exp (run-sarif-pytest): Prepend "lib" to
PYTHONPATH before running python scripts.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Simon Martin [Sun, 25 Aug 2024 19:59:31 +0000 (21:59 +0200)]
c++: Check template parameters in member class template specialization [PR115716]
We currently ICE upon the following invalid code, because we don't check
that the template parameters in a member class template specialization
are correct.
=== cut here ===
template <typename T> struct x {
template <typename U> struct y {
typedef T result2;
};
};
template<> template<typename U, typename> struct x<int>::y {
typedef double result2;
};
int main() {
x<int>::y<int>::result2 xxx2;
}
=== cut here ===
This patch fixes the PR by calling redeclare_class_template.
Bernd Edlinger [Sat, 24 Aug 2024 06:37:53 +0000 (08:37 +0200)]
Fix bootstap-errors due to enabling -gvariable-location-views
This recent change triggered various bootstap-errors, mostly on
x86 targets because line info advance address entries were output
in the wrong section table.
The switch to the wrong line table happened in dwarfout_set_ignored_loc.
It must use the same section as the earlier called
dwarf2out_switch_text_section.
But also ft32-elf was affected, because the assembler choked on
something simple as ".2byte .LM2-.LM1", but fortunately it is
able to use native location views, the configure test was just
not executed because the ft32 "nop" instruction was missing.
gcc/ChangeLog:
PR debug/116470
* configure.ac: Add the "nop" instruction for cpu type ft32.
* configure: Regenerate.
* dwarf2out.cc (dwarf2out_set_ignored_loc): Use the correct
line info section.
Tie together the two functions that ensure tail padding with
search_line_ssse3 via CPP_BUFFER_PADDING macro.
libcpp/ChangeLog:
* internal.h (CPP_BUFFER_PADDING): New macro; use it ...
* charset.cc (_cpp_convert_input): ...here, and ...
* files.cc (read_file_guts): ...here, and ...
* lex.cc (search_line_ssse3): here.
The following improves forwprop block reachability which I noticed
when debugging PR116460 and what is also noted in the comment. It
avoids processing blocks in natural loops determined unreachable,
thereby making the issue in PR116460 latent.
PR tree-optimization/116460
* tree-ssa-forwprop.cc (pass_forwprop::execute): Do not
process blocks in unreachable natural loops.
Richard Biener [Mon, 26 Aug 2024 11:21:57 +0000 (13:21 +0200)]
Delay edge removal in forwprop
SSA forwprop has switch simplification code that calls remove edge
and as side-effect releases dominator info. For a followup we want
to retain that so the following delays removing edges until the end
of the pass. As usual we have to deal with parts of the edge
vanishing due to EH/abnormal pruning so record edges as basic-block
index pairs and remove them only when they are still there.
* tree-ssa-forwprop.cc (simplify_gimple_switch_label_vec):
Delay removing edges and releasing dominator info, instead
record into edges_to_remove vector.
(simplify_gimple_switch): Pass through vector of to remove
edges.
(pass_forwprop::execute): Likewise. Remove queued edges.
Pan Li [Sat, 24 Aug 2024 02:16:28 +0000 (10:16 +0800)]
Match: Add int type fits check for .SAT_ADD imm operand
This patch would like to add strict check for imm operand of .SAT_ADD
matching. We have no type checking for imm operand in previous, which
may result in unexpected IL to be catched by .SAT_ADD pattern.
We leverage the int_fits_type_p here to make sure the imm operand is
a int type fits the result type of the .SAT_ADD. For example:
Fits uint8_t:
uint8_t a;
uint8_t sum = .SAT_ADD (a, 12);
uint8_t sum = .SAT_ADD (a, 12u);
uint8_t sum = .SAT_ADD (a, 126u);
uint8_t sum = .SAT_ADD (a, 128u);
uint8_t sum = .SAT_ADD (a, 228);
uint8_t sum = .SAT_ADD (a, 223u);
Not fits uint8_t:
uint8_t a;
uint8_t sum = .SAT_ADD (a, -1);
uint8_t sum = .SAT_ADD (a, 256u);
uint8_t sum = .SAT_ADD (a, 257);
The below test suite are passed for this patch:
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Add int_fits_type_p check for .SAT_ADD imm operand.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_add_imm-11.c: Adjust test case for imm.
* gcc.target/riscv/sat_u_add_imm-12.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-15.c: Ditto.
* gcc.target/riscv/sat_u_add_imm-16.c: Ditto.
* gcc.target/riscv/sat_u_add_imm_type_check-1.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-10.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-11.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-12.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-13.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-14.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-15.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-16.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-17.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-18.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-19.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-2.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-20.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-21.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-22.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-23.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-24.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-25.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-26.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-27.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-28.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-29.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-3.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-30.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-31.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-32.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-33.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-34.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-35.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-36.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-37.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-38.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-39.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-4.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-40.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-41.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-42.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-43.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-44.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-45.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-46.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-47.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-48.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-49.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-5.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-50.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-51.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-52.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-6.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-7.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-8.c: New test.
* gcc.target/riscv/sat_u_add_imm_type_check-9.c: New test.
Andrew Pinski [Sun, 25 Aug 2024 17:10:06 +0000 (10:10 -0700)]
expand: Use the correct mode for store flags for popcount [PR116480]
When expanding popcount used for equal to 1 (or rather __builtin_stdc_has_single_bit),
the wrong mode was bsing used for the mode of the store flags. We were using the mode
of the argument to popcount but since popcount's return value is always int, the mode
of the expansion here should have been the mode of the return type rater than the argument.
Built and tested on aarch64-linux-gnu with no regressions.
Also bootstrapped and tested on x86_64-linux-gnu.
PR middle-end/116480
gcc/ChangeLog:
* internal-fn.cc (expand_POPCOUNT): Use the correct mode
for store flags.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr116480-1.c: New test.
* gcc.dg/torture/pr116480-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Haochen Jiang [Mon, 26 Aug 2024 02:53:56 +0000 (10:53 +0800)]
i386: Add bf8 -> fp16 intrin
Since BF8 and FP16 have same bits for exponent, the type conversion
between them is just a cast for fraction part. We will use a sequence
of instrctions instead of new instructions to do that. For convenience,
intrins are also provided.
Haochen Jiang [Mon, 26 Aug 2024 02:53:35 +0000 (10:53 +0800)]
i386: Refactor m512-check.h
After AVX10 introduction, we still want to use AVX512 helper functions
to avoid duplicate code. In order to reuse them, we need to do some refactor
to make sure each function define happen under correct ISA to avoid ABI
warnings.
gcc/testsuite/ChangeLog:
* gcc.target/i386/m512-check.h: Wrap the function define with
correct vector size.
Pan Li [Sat, 3 Aug 2024 07:02:57 +0000 (07:02 +0000)]
RISC-V: Support IMM for operand 0 of ussub pattern
This patch would like to allow IMM for the operand 0 of ussub pattern.
Aka .SAT_SUB(1023, y) as the below example.
Form 1:
#define DEF_SAT_U_SUB_IMM_FMT_1(T, IMM) \
T __attribute__((noinline)) \
sat_u_sub_imm##IMM##_##T##_fmt_1 (T y) \
{ \
return (T)IMM >= y ? (T)IMM - y : 0; \
}
DEF_SAT_U_SUB_IMM_FMT_1(uint64_t, 1023)
Before this patch:
10 │ sat_u_sub_imm82_uint64_t_fmt_1:
11 │ li a5,82
12 │ bgtu a0,a5,.L3
13 │ sub a0,a5,a0
14 │ ret
15 │ .L3:
16 │ li a0,0
17 │ ret
After this patch:
10 │ sat_u_sub_imm82_uint64_t_fmt_1:
11 │ li a5,82
12 │ sltu a4,a5,a0
13 │ addi a4,a4,-1
14 │ sub a0,a5,a0
15 │ and a0,a4,a0
16 │ ret
The below test suites are passed for this patch:
1. The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_gen_unsigned_xmode_reg): Add new
func impl to gen xmode rtx reg from operand rtx.
(riscv_expand_ussub): Gen xmode reg for operand 1.
* config/riscv/riscv.md: Allow const_int for operand 1.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macro.
* gcc.target/riscv/sat_u_sub_imm-1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-1_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-1_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-2_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-3_2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-4.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-1.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-2.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-3.c: New test.
* gcc.target/riscv/sat_u_sub_imm-run-4.c: New test.
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: Add test helper macros.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-19.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-20.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-21.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-22.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-23.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-24.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-19.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-20.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-21.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-22.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-23.c: New test.
* gcc.target/riscv/rvv/autovec/unop/vec_sat_u_trunc-run-24.c: New test.
Pan Li [Sun, 25 Aug 2024 03:02:10 +0000 (11:02 +0800)]
RISC-V: Add testcases for unsigned scalar .SAT_TRUNC form 4
This patch would like to add test cases for the unsigned scalar quad and
oct .SAT_TRUNC form 4. Aka:
Form 4:
#define DEF_SAT_U_TRUNC_FMT_4(NT, WT) \
NT __attribute__((noinline)) \
sat_u_trunc_##WT##_to_##NT##_fmt_4 (WT x) \
{ \
bool not_overflow = x <= (WT)(NT)(-1); \
return ((NT)x) | (NT)((NT)not_overflow - 1); \
}
The below test is passed for this patch.
* The rv64gcv regression test.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_u_trunc-19.c: New test.
* gcc.target/riscv/sat_u_trunc-20.c: New test.
* gcc.target/riscv/sat_u_trunc-21.c: New test.
* gcc.target/riscv/sat_u_trunc-22.c: New test.
* gcc.target/riscv/sat_u_trunc-23.c: New test.
* gcc.target/riscv/sat_u_trunc-24.c: New test.
* gcc.target/riscv/sat_u_trunc-run-19.c: New test.
* gcc.target/riscv/sat_u_trunc-run-20.c: New test.
* gcc.target/riscv/sat_u_trunc-run-21.c: New test.
* gcc.target/riscv/sat_u_trunc-run-22.c: New test.
* gcc.target/riscv/sat_u_trunc-run-23.c: New test.
* gcc.target/riscv/sat_u_trunc-run-24.c: New test.
Xianmiao Qu [Sun, 25 Aug 2024 17:22:21 +0000 (11:22 -0600)]
[PATCH] Re-add calling emit_clobber in lower-subreg.cc's resolve_simple_move.
The previous patch:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=d8a6945c6ea22efa4d5e42fe1922d2b27953c8cd
aimed to eliminate redundant MOV instructions by removing calling
emit_clobber in lower-subreg.cc's resolve_simple_move.
First, I found that another patch address this issue:
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=bdf2737cda53a83332db1a1a021653447b05a7e7
and even without removing calling emit_clobber,
the instruction generation is still as expected.
Second, removing the CLOBBER expression will have side effects.
When there is no CLOBBER expression and only SUBREG assignments exist,
according to the logic of the 'df_lr_bb_local_compute' function,
the register will be added to the basic block LR IN set.
This will cause the register's lifetime to span the entire function,
resulting in increased register pressure. Taking the newly added test case
'gcc/testsuite/gcc.target/riscv/pr43644.c' as an example,
removing the CLOBBER expression will lead to spill in some registers.
gcc/:
* lower-subreg.cc (resolve_simple_move): Re-add calling emit_clobber
immediately before moving a multi-word register by parts.
gcc/testsuite/:
* gcc.target/riscv/pr43644.c: New test case.
Andi Kleen [Fri, 2 Aug 2024 03:10:27 +0000 (20:10 -0700)]
Support if conversion for switches
The gimple-if-to-switch pass converts if statements with
multiple equal checks on the same value to a switch. This breaks
vectorization which cannot handle switches.
Teach the tree-if-conv pass used by the vectorizer to handle
simple switch statements, like those created by if-to-switch earlier.
These are switches that only have a single non default block,
They are handled similar to COND in if conversion.
This makes the vect-bitfield-read-1-not test fail. The test
checks for a bitfield analysis failing, but it actually
relied on the ifcvt erroring out early because the test
is using a switch. The if conversion still does not
work because the switch is not in a form that this
patch can handle, but it fails much later and the bitfield
analysis succeeds, which makes the test fail. I marked
it xfail because it doesn't seem to be testing what it wants
to test.
* gcc.dg/vect/vect-switch-ifcvt-1.c: New test.
* gcc.dg/vect/vect-switch-ifcvt-2.c: New test.
* gcc.dg/vect/vect-switch-search-line-fast.c: New test.
* gcc.dg/vect/vect-bitfield-read-1-not.c: Change to xfail.
Mark Harmstone [Sun, 11 Aug 2024 01:57:59 +0000 (02:57 +0100)]
Write CodeView information about static locals in optimized code
Write CodeView S_LDATA32 symbols for static locals in optimized code. We have
to handle these separately, as they come after the S_FRAMEPROC, plus you can't
have S_BLOCK32 symbols like you can in unoptimized code.
gcc/
* dwarf2codeview.cc (write_optimized_static_local_vars): New function.
(write_function): Call write_optimized_static_local_vars.
Mark Harmstone [Sun, 11 Aug 2024 01:48:00 +0000 (02:48 +0100)]
Write CodeView S_FRAMEPROC symbols
Write S_FRAMEPROC symbols, which aren't very useful but seem to be necessary
for Microsoft debuggers to function properly. These symbols come after S_LOCAL
symbols for optimized variables, but before S_REGISTER and S_REGREL32 for
unoptimized variables.
Mark Harmstone [Sun, 11 Aug 2024 01:23:26 +0000 (02:23 +0100)]
Write CodeView information about optimized stack variables
Outputs S_DEFRANGE_REGISTER_REL symbols for optimized local variables that are
on the stack, consisting of the stack register, the offset, and the code range
for which this applies.
Mark Harmstone [Thu, 8 Aug 2024 02:18:11 +0000 (03:18 +0100)]
Write CodeView information about enregistered optimized variables
Enable variable tracking when outputting CodeView debug information, and make
it so that we issue debug symbols for optimized variables in registers. This
consists of S_LOCAL symbols, which give the name and the type of local
variables, followed by S_DEFRANGE_REGISTER symbols for the register and the
code for which this applies.
gcc/
* dwarf2codeview.cc (enum cv_sym_type): Add S_LOCAL and
S_DEFRANGE_REGISTER.
(write_s_local): New function.
(write_defrange_register): New function.
(write_optimized_local_variable_loc): New function.
(write_optimized_local_variable): New function.
(write_optimized_function_vars): New function.
(write_function): Call write_optimized_function_vars if variable
tracking enabled.
* dwarf2out.cc (typedef var_loc_view): Move to dwarf2out.h.
(struct dw_loc_list_struct): Likewise.
* dwarf2out.h (typedef var_loc_view): Move from dwarf2out.h.
(struct dw_loc_list_struct): Likewise.
* opts.cc (finish_options): Enable variable tracking for CodeView.
Roger Sayle [Sun, 25 Aug 2024 15:14:34 +0000 (09:14 -0600)]
i386: Update STV's gains for TImode arithmetic right shifts on AVX2.
This patch tweaks timode_scalar_chain::compute_convert_gain to better
reflect the expansion of V1TImode arithmetic right shifts by the i386
backend. The comment "see ix86_expand_v1ti_ashiftrt" appears after
"case ASHIFTRT" in compute_convert_gain, and the changes below attempt
to better match the logic used there.
The original motivating example is:
__int128 m1;
void foo()
{
m1 = (m1 << 8) >> 8;
}
which with -O2 -mavx2 we fail to convert to vector form due to the
inappropriate cost of the arithmetic right shift.
Instruction gain -16 for 7: {r103:TI=r101:TI>>0x8;clobber flags:CC;}
Total gain: -3
Chain #1 conversion is not profitable
This is reporting that the ASHIFTRT is four instructions worse using
vectors than in scalar form, which is incorrect as the AVX2 expansion
of this shift only requires three instructions (and the scalar form
requires two).
With more accurate costs in timode_scalar_chain::compute_convert_gain
we now see (with -O2 -mavx2):
Instruction gain -4 for 7: {r103:TI=r101:TI>>0x8;clobber flags:CC;}
Total gain: 9
Converting chain #1...
Jeff Law [Sun, 25 Aug 2024 13:16:50 +0000 (07:16 -0600)]
[committed] Fix assembly scan for RISC-V VLS tests
Surya's IRA patch from June slightly improves the code we generate for the
vls/calling-conventions tests on RISC-V. Specifically it removes an
unnecessary move from the instruction stream. This (of course) broke those
tests:
Jeff Law [Sun, 25 Aug 2024 13:06:45 +0000 (07:06 -0600)]
Turn off late-combine for a few risc-v specific tests
Just minor testsuite adjustments -- several of the shorten-memref tests are
slightly twiddled by the late-combine pass:
> Running /home/jlaw/test/gcc/gcc/testsuite/gcc.target/riscv/riscv.exp ...
> FAIL: gcc.target/riscv/shorten-memrefs-2.c -Os scan-assembler store1a:\n(\t?\\.[^\n]*\n)*\taddi
> XPASS: gcc.target/riscv/shorten-memrefs-3.c -Os scan-assembler-not load2a:\n.*addi[ \t]*[at][0-9],[at][0-9],[0-9]*
> FAIL: gcc.target/riscv/shorten-memrefs-5.c -Os scan-assembler store1a:\n(\t?\\.[^\n]*\n)*\taddi
> FAIL: gcc.target/riscv/shorten-memrefs-8.c -Os scan-assembler store:\n(\t?\\.[^\n]*\n)*\taddi\ta[0-7],a[0-7],1
This patch just turns off the late-combine pass for those tests. Locally I'd
adjusted all the shorten-memref patches, but a quick re-rest shows that only 4
tests seem affected right now.
Anyway, pushing to the trunk to slightly clean up our test results.
Gaius Mulley [Sat, 24 Aug 2024 21:43:55 +0000 (22:43 +0100)]
modula2: Export all string to integral and fp number conversion functions
Export all string to integral and floating point number conversion functions
(atof, atoi, atol, atoll, strtod, strtof, strtold, strtol, strtoll, strtoul
and strtoull).
Iain Sandoe [Mon, 19 Aug 2024 19:50:54 +0000 (20:50 +0100)]
c++, coroutines: Look through initial_await target exprs [PR110635].
In the case that the initial awaiter returns an object, the initial await
can be a target expression and we need to look at its initializer to cast
the await_resume() to void and to wrap in a compound expression that sets
the initial_await_resume_called flag.
PR c++/110635
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::wrap_original_function_body): Look through
initial await target expressions to find the actual co_await_expr
that we need to update.
Iain Sandoe [Sun, 18 Aug 2024 21:54:50 +0000 (22:54 +0100)]
c++, coroutines: Rework handling of throwing_cleanups [PR102051].
In the fix for PR95822 (r11-7402) we set throwing_cleanup false in the top
level of the coroutine transform code. However, as the current PR shows,
that is not sufficient.
Any use of cxx_maybe_build_cleanup() can reset the flag, which causes the
check_return_expr () logic to try to add a guard variable and set it.
For the coroutine code, we need to handle the cleanups separately, since
the responsibility for them changes after the first resume point, which
we handle in the ramp exception processing.
Fix this by forcing the "throwing_cleanup" flag false right before the
processing of the return expression.
PR c++/102051
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Handle
"throwing_cleanup" here instead of ...
(cp_coroutine_transform::apply_transforms): ... here.
Iain Sandoe [Sun, 18 Aug 2024 13:54:38 +0000 (14:54 +0100)]
c++, coroutines: Fix ordering of return object conversions [PR115908].
[dcl.fct.def.coroutine]/7 says:
The expression promise.get_return_object() is used to initialize the returned
reference or prvalue result object of a call to a coroutine. The call to
get_return_object is sequenced before the call to initial_suspend and is
invoked at most once.
The issue is about when any conversions are carried out if the type of
the g_r_o call is not the same as the ramp return. Currently, we have been
doing this by materialising the g_r_o return value and passing that to
finish_return_expr() which handles the necessary conversions and checks.
As the PR shows, this does not work as expected.
In the revised version we carry out the work of the conversions when
intialising the return slot (with the same facilities that are used by
finish_return_expr()). We do this before the call that initiates the
coroutine body, satisfying the requirements for one call before initial
suspend.
The return expression becomes a trivial 'return <retval>'.
This simplifies the ramp logic considerably, since we no longer need to
keep track of the temporarily-materialised g_r_o value.
PR c++/115908
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Rework the return
value initialisation to initialise the return slot always from
get_return_object, even if that implies carrying out conversions
to do so.
We have been requiring the get_return_on_allocation_fail() call to have the
same type as the ramp. This is not intended by the standard, so relax that
to allow anything convertible to the ramp return.
PR c++/109682
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Allow for cases where
get_return_on_allocation_fail has a type convertible to the ramp
return type.
Iain Sandoe [Sat, 17 Aug 2024 14:47:58 +0000 (15:47 +0100)]
c++, coroutines: Only allow void get_return_object if the ramp is void [PR100476].
Require that the value returned by get_return_object is convertible to
the ramp return. This means that the only time we allow a void
get_return_object, is when the ramp is also a void function.
We diagnose this early to allow us to exit the ramp build if the return
values are incompatible.
PR c++/100476
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Remove special
handling of void get_return_object expressions.
Iain Sandoe [Sat, 17 Aug 2024 11:49:41 +0000 (12:49 +0100)]
c++, coroutines: Fix handling of early exceptions [PR113773].
The responsibility for destroying part of the frame content (promise,
arg copies and the frame itself) transitions from the ramp to the
body of the coroutine once we reach the await_resume () for the
initial suspend.
We added the variable that flags the transition, but failed to act on
it. This corrects that so that the ramp only tries to run DTORs for
objects when an exception occurs before the initial suspend await
resume has started.
PR c++/113773
gcc/cp/ChangeLog:
* coroutines.cc
(cp_coroutine_transform::build_ramp_function): Only cleanup the
frame state on exceptions that occur before the initial await
resume has begun.
Iain Sandoe [Fri, 16 Aug 2024 16:56:57 +0000 (17:56 +0100)]
c++, coroutines: Separate allocator work from the ramp body build.
This splits out the building of the allocation and deallocation expressions
and runs them early in the ramp build, so that we can exit if they are not
usable, before we start building the ramp body.
Likewise move checks for other required resources to the begining of the
ramp builder.
This is preparation for work needed to update the allocation/destruction
in cases where we have excess alignment of the promise or other saved frame
state.
gcc/cp/ChangeLog:
* call.cc (build_op_delete_call_1): Renamed and added a param
to allow the caller to prioritize two argument usual deleters.
(build_op_delete_call): New.
(build_coroutine_op_delete_call): New.
* coroutines.cc (coro_get_frame_dtor): Rename...
(build_coroutine_frame_delete_expr):... to this; simplify to
use build_op_delete_call for all cases.
(build_actor_fn): Use revised frame delete function.
(build_coroutine_frame_alloc_expr): New.
(cp_coroutine_transform::complete_ramp_function): Rename...
(cp_coroutine_transform::build_ramp_function): ... to this.
Reorder code to carry out checks for prerequisites before the
codegen. Split out the allocation/delete code.
(cp_coroutine_transform::apply_transforms): Use revised name.
* coroutines.h: Rename function.
* cp-tree.h (build_coroutine_op_delete_call): New.
Iain Sandoe [Wed, 14 Aug 2024 16:18:32 +0000 (17:18 +0100)]
c++, coroutines: Separate the analysis, ramp and outlined function synthesis.
This change is preparation for fixes to the ramp and codegen to follow.
The primary motivation is that we have thee activities; analysis, ramp
synthesis and outlined coroutine body synthesis. These are currently
carried out in sequence in the 'morph_fn_to_coro' code, which means that
we are nesting the synthesis of the outlined coroutine body inside the
finish_function call for the original function (which becomes the ramp).
The revised code splits the three interests so that the analysis can be
used independently by the ramp and body synthesis. This avoids some issues
seen with global state that start/finish function use and allows us to
use more of the high-level APIs in fixing bugs.
The resultant implementation is more self-contained, and has less impact
on finish_function.
gcc/cp/ChangeLog:
* coroutines.cc (struct suspend_point_info, struct param_info,
struct local_var_info, struct susp_frame_data,
struct local_vars_frame_data): Move to coroutines.h.
(build_actor_fn): Use start/finish function APIs.
(build_destroy_fn): Likewise.
(coro_build_actor_or_destroy_function): No longer mark the
actor / destroyer as DECL_COROUTINE_P.
(coro_rewrite_function_body): Use class members.
(cp_coroutine_transform::wrap_original_function_body): Likewise.
(build_ramp_function): Replace by...
(cp_coroutine_transform::complete_ramp_function): ...this.
(cp_coroutine_transform::cp_coroutine_transform): New.
(cp_coroutine_transform::~cp_coroutine_transform): New
(morph_fn_to_coro): Replace by...
(cp_coroutine_transform::apply_transforms): ...this.
(cp_coroutine_transform::finish_transforms): New.
* cp-tree.h (morph_fn_to_coro): Remove.
* decl.cc (emit_coro_helper): Remove.
(finish_function): Revise handling of coroutine transforms.
* coroutines.h: New file.
Iain Sandoe [Sat, 10 Aug 2024 11:43:36 +0000 (12:43 +0100)]
c++, coroutines: Split the ramp build into a separate function.
This is primarily preparation to partition the functionality of the
coroutine transform into analysis, ramp generation and then (later)
synthesis of the coroutine body. The patch does fix one latent
issue in the ordering of DTORs for frame parameter copies (to ensure
that they are processed in reverse order to the copy creation).
gcc/cp/ChangeLog:
* coroutines.cc (build_actor_fn): Arrange to apply any
required parameter copy DTORs in reverse order to their
creation.
(coro_rewrite_function_body): Handle revised param uses.
(morph_fn_to_coro): Split the ramp function completion
into a separate function.
(build_ramp_function): New.
Iain Sandoe [Thu, 22 Aug 2024 07:10:14 +0000 (08:10 +0100)]
c++, coroutines: Tidy up awaiter variable checks.
When we build an await expression, we might need to materialise the awaiter
if it is a prvalue. This re-implements this using core APIs instead of local
code.
gcc/cp/ChangeLog:
* coroutines.cc (build_co_await): Simplify checks for the cases that
we need to materialise an awaiter.
Jonathan Wakely [Fri, 23 Aug 2024 20:32:14 +0000 (21:32 +0100)]
libstdc++: Update and clarify Doxygen version requirements in manual
There are lots of bugs that affect libstdc++ output from Doxygen, so
using 1.9.6 or later is recommended. Give a lower minimum, because some
distros still use 1.9.1 and that will work, albeit suboptimally.
Patrick O'Neill [Mon, 19 Aug 2024 19:19:33 +0000 (12:19 -0700)]
RISC-V: Use encoded nelts when calling repeating_sequence_p
repeating_sequence_p operates directly on the encoded pattern and does
not derive elements using the .elt() accessor. Passing in the length of
the unencoded vector can cause an out-of-bounds read of the encoded
pattern.
gcc/ChangeLog:
* config/riscv/riscv-v.cc (rvv_builder::can_duplicate_repeating_sequence_p):
Use encoded_nelts when calling repeating_sequence_p.
(rvv_builder::is_repeating_sequence): Ditto.
(rvv_builder::repeating_sequence_use_merge_profitable_p): Ditto.
Signed-off-by: Patrick O'Neill <patrick@rivosinc.com>
Manolis Tsamis [Tue, 20 Aug 2024 07:16:29 +0000 (09:16 +0200)]
ifcvt: Do not overwrite results in noce_convert_multiple_sets [PR116372, PR116405]
Now that more operations are allowed for noce_convert_multiple_sets,
it is possible that the same register appears multiple times as target
in a basic block. After noce_convert_multiple_sets_1 is called we
potentially also emit register moves from temporaries back to the
original targets. In some cases where the target registers overlap
with the block's condition, these register moves may overwrite
intermediate variables because they're emitted after the if-converted
code. To address this issue we now iterate backwards and keep track
of seen registers when emitting these final register moves.
Manolis Tsamis [Thu, 22 Aug 2024 09:59:11 +0000 (02:59 -0700)]
ifcvt: disallow call instructions in noce_convert_multiple_sets [PR116358]
Similar to not allowing jump instructions in the generated code, we
also shouldn't allow call instructions in noce_convert_multiple_sets.
In the case of PR116358 a libcall was generated from force_operand.
Peter Bergner [Fri, 23 Aug 2024 16:45:40 +0000 (11:45 -0500)]
rs6000: Fix PTImode handling in power8 swap optimization pass [PR116415]
Our power8 swap optimization pass has some special handling for optimizing
swaps of TImode variables. The test case reported in bugzilla uses a call
to __atomic_compare_exchange, which introduces a variable of PTImode and
that does not get the same treatment as TImode leading to wrong code
generation. The simple fix is to treat PTImode identically to TImode.
2024-08-23 Peter Bergner <bergner@linux.ibm.com>
gcc/
PR target/116415
* config/rs6000/rs6000.h (TI_OR_PTI_MODE): New define.
* config/rs6000/rs6000-p8swap.cc (rs6000_analyze_swaps): Use it to
handle PTImode identically to TImode.
gcc/testsuite/
PR target/116415
* gcc.target/powerpc/pr116415.c: New test.
Jonathan Wakely [Tue, 25 Jun 2024 20:58:34 +0000 (21:58 +0100)]
libstdc++: Implement LWG 3746 for std::optional
This avoids constraint recursion in operator<=> for std::optional.
The resolution was approved in Kona 2022.
libstdc++-v3/ChangeLog:
* include/std/optional (__is_derived_from_optional): New
concept.
(operator<=>): Use __is_derived_from_optional.
* testsuite/20_util/optional/relops/lwg3746.cc: New test.
Jonathan Wakely [Wed, 22 May 2024 15:49:31 +0000 (16:49 +0100)]
libstdc++: Optimize __try_use_facet for const types
LWG 436 confirmed that const-qualified types are valid arguments for
Facet template parameters (but volatile-qualified types are not). Use the
fast path in std::use_facet and std::has_facet for const T as well as T.
libstdc++-v3/ChangeLog:
* include/bits/locale_classes.tcc (__try_use_facet): Also avoid
dynamic_cast for const-qualified facet types.
Using std::is_constructible in the constraints introduces a spurious
dependency on the type being destructible, which should not be required
for constructing with an allocator. The test case shows a case where the
type has a private destructor, which can be destroyed by the allocator,
but std::is_destructible and std::is_constructible are false.
Similarly, using is_nothrow_constructible in the noexcept-specifiers
for the construct members of allocator_traits and std::allocator,
__gnu_cxx::__new_allocator, and __gnu_cxx::__malloc_allocator gives the
wrong answer if the type isn't destructible.
We need a new type trait to define those correctly, so that we only
check if the placement new-expression is nothrow after using
is_constructible to check that it would be well-formed.
Instead of just fixing the overly restrictive constraint to check for
placement new, rewrite allocator_traits in terms of 'if constexpr' using
variable templates and the detection idiom.
Although we can use 'if constexpr' and variable templates in C++11 with
appropriate uses of diagnostic pragmas, we can't have constexpr
functions with multiple return statements. This means that in C++11 mode
the _S_nothrow_construct and _S_nothrow_destroy helpers used for
noexcept-specifiers still need to be overlaods using enable_if. Nearly
everything else can be simplified to reduce overload resolution and
enable_if checks.
libstdc++-v3/ChangeLog:
PR libstdc++/108619
* include/bits/alloc_traits.h (__allocator_traits_base): Add
variable templates for detecting which allocator operations are
supported.
(allocator_traits): Use 'if constexpr' instead of dispatching to
overloads constrained with enable_if.
(allocator_traits<allocator<T>>::construct): Use Construct if
construct_at is not supported. Use
__is_nothrow_new_constructible for noexcept-specifier.
(allocator_traits<allocator<void>>::construct): Use
__is_nothrow_new_constructible for noexcept-specifier.
* include/bits/new_allocator.h (construct): Likewise.
* include/ext/malloc_allocator.h (construct): Likewise.
* include/std/type_traits (__is_nothrow_new_constructible): New
variable template.
* testsuite/20_util/allocator/89510.cc: Adjust expected results.
* testsuite/ext/malloc_allocator/89510.cc: Likewise.
* testsuite/ext/new_allocator/89510.cc: Likewise.
* testsuite/20_util/allocator_traits/members/108619.cc: New test.
Jonathan Wakely [Wed, 31 Jul 2024 15:32:44 +0000 (16:32 +0100)]
libstdc++: Only use std::time_put in std::format for non-C locales
When testing on Solaris I noticed that std/time/year/io.cc was FAILing
because the year 1642 was being formatted as "+(" by %Ey. This turns out
to be because we defer to std::time_put for modified conversion specs,
and std::time_put uses std::strftime, and that's undefined for years
before 1970. In particular, years before 1900 mean that the tm_year
field is negative, which then causes incorrect results from strftime on
at least Solaris and AIX.
I've raised the general problem with LWG, but we can fix the FAILing
test case (and probably improve performance slightly) by ignoring the E
and O modifiers when the formatting locale is the "C" locale. The
modifiers have no effect for the C locale, so we can just treat %Ey as
%y and format it directly. This doesn't fix anything when the formatting
locale isn't the C locale, but that case is not adequately tested, so
doesn't cause any FAIL right now!
The naïve fix would be simply:
if (__mod)
if (auto __loc = _M_locale(__ctx); __loc != locale::classic())
// ...
However when the format string doesn't use the 'L' option, _M_locale
always returns locale::classic(). In that case, we make a copy of the
classic locale (which calls the non-inline copy constructor in
the library), then make another copy of the classic locale, then compare
the two. We can avoid all that by checking for the 'L' option first,
instead of letting _M_locale do that:
if (__mod && _M_spec._M_localized)
if (auto __loc = __ctx.locale(); __loc != locale::classic())
// ...
We could optimize this further if we had a __is_classic(__loc) function
that would do the __loc == locale::classic() check without making any
copies or non-inline calls. That would require examining the locale's
_M_impl member, and probably require checking its name, because the
locale::_S_classic singleton is not exported from the library.
For _M_S the change is slightly different from the other functions,
because if we skip using std::time_put for %OS then we fall through to
the code that potentially prints fractional seconds, but the %OS format
only prints whole seconds. So we need to format whole seconds directly
when not using std::time_put, instead of falling through to the code
below.
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y):
Ignore modifiers unless the formatting locale is not the C
locale.
(__formatter_chrono::_M_d_e): Likewise.
(__formatter_chrono::_M_H_I): Likewise.
(__formatter_chrono::_M_m): Likewise.
(__formatter_chrono::_M_M): Likewise.
(__formatter_chrono::_M_S): Likewise.
(__formatter_chrono::_M_u_w): Likewise.
(__formatter_chrono::_M_U_V_W): Likewise.
Jonathan Wakely [Tue, 16 Jul 2024 08:43:06 +0000 (09:43 +0100)]
libstdc++: Define operator== for hash table iterators [PR115939]
Currently iterators for unordered containers do not directly define
operator== and operator!= overloads. Instead they rely on the base class
defining them, which is done so that iterator and const_iterator
comparisons work using the same overloads.
However this means a derived-to-base conversion is needed to call those
operators, and PR libstdc++/115939 shows that this can be ambiguous (for
-pedantic) when another overloaded operator could be used after an
implicit conversion.
This change defines operator== and operator!= directly for
_Node_iterator and _Node_const_iterator so that no derived-to-base
conversions are needed. The new overloads just forward to the base class
ones, so the implementation is still shared and doesn't need to be
duplicated.
libstdc++-v3/ChangeLog:
PR libstdc++/115939
* include/bits/hashtable_policy.h (_Node_iterator): Add
operator== and operator!=.
(_Node_const_iterator): Likewise.
* testsuite/23_containers/unordered_map/115939.cc: New test.