Andrew Pinski [Wed, 2 Oct 2024 21:21:24 +0000 (14:21 -0700)]
aarch64: Fix early ra for -fno-delete-dead-exceptions [PR116927]
Early-RA was considering throwing instructions as being dead and removing
them even if -fno-delete-dead-exceptions was in use. This fixes that oversight.
Built and tested for aarch64-linux-gnu.
PR target/116927
gcc/ChangeLog:
* config/aarch64/aarch64-early-ra.cc (early_ra::is_dead_insn): Insns
that throw are not dead with -fno-delete-dead-exceptions.
gcc/testsuite/ChangeLog:
* g++.dg/torture/pr116927-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
François Dumont [Thu, 9 Nov 2023 18:06:52 +0000 (19:06 +0100)]
libstdc++: [_Hashtable] Fix some implementation inconsistencies
Get rid of the different usages of the mutable keyword except in
_Prime_rehash_policy where it is preserved for abi compatibility reason.
Fix comment to explain that we need the computation of bucket index noexcept
to be able to rehash the container when needed.
For Standard instantiations through std::unordered_xxx containers we already
force caching of hash code when hash functor is not noexcep so it is guarantied.
The static_assert purpose in _Hashtable on _M_bucket_index is thus limited
to usages of _Hashtable with exotic _Hashtable_traits.
libstdc++-v3/ChangeLog:
* include/bits/hashtable_policy.h (_NodeBuilder<>::_S_build): Remove
const qualification on _NodeGenerator instance.
(_ReuseOrAllocNode<>::operator()(_Args&&...)): Remove const qualification.
(_ReuseOrAllocNode<>::_M_nodes): Remove mutable.
(_Insert_base<>::_M_insert_range): Remove _NodeGetter const qualification.
(_Hash_code_base<>::_M_bucket_index(const _Hash_node_value<>&, size_t)):
Simplify noexcept declaration, we already static_assert that _RangeHash functor
is noexcept.
* include/bits/hashtable.h: Rework comments. Remove const qualifier on
_NodeGenerator& arguments.
David Malcolm [Thu, 3 Oct 2024 02:05:03 +0000 (22:05 -0400)]
diagnostics: support SARIF 2.2 output, undocumented for now [PR116301]
GCC currently supports outputting SARIF v2.1.0
Version 2.2 of the SARIF spec is not yet official, but the draft has
already gained features we might might want to use.
This patch extends the SARIF output code to accept a enum sarif_version
parameter internally, representing 2.1.0 or a prerelease of 2.2
The patch updates the SARIF output selftests so that they are run for
all such versions.
I hope to expose this "properly" via the mechanism described
in comment #13 of PR116613. In the meantime, the patch adds a new
-fdiagnostics-format=sarif-file-2.2-prerelease
for use by the DejaGnu testsuite, deliberately left undocumented for
now.
The copy of the 2.2 draft schema in the testsuite was downloaded from
https://raw.githubusercontent.com/oasis-tcs/sarif-spec/refs/tags/2.2-prerelease-2024-08-08/sarif-2.2/schema/sarif-2-2.schema.json
The patch adds support for capturing related locations within an ICE
notification for SARIF 2.2 onwards, thus capturing "include chain"
information for SARIF-based reports of ICEs that occur within a
header; see https://github.com/oasis-tcs/sarif-spec/issues/540
The patch does *not* add support for the "scannedFile" role, leaving it
to followup work; see https://github.com/oasis-tcs/sarif-spec/issues/459
gcc/ChangeLog:
PR other/116301
* common.opt (sarif-file-2.2-prerelease): New value for
-fdiagnostics-format=.
* diagnostic-format-sarif.cc
(sarif_location_manager::sarif_location_manager): Move
initialization of m_related_locations_arr here from sarif_result's
ctor.
(sarif_location_manager::add_related_location): Implement for
base class, taking sarif_result's implementation. Add "builder"
param.
(sarif_location_manager::m_related_locations_arr): Move here from
class sarif_result.
(class sarif_result): Move m_related_locations_arr field and
add_related_location vfunc to class sarif_location_manager.
(sarif_builder::get_version): New accessor.
(sarif_builder::m_version): New field.
(sarif_invocation::add_notification_for_ice): Call
process_worklist on the notification for SARIF 2.2 and later.
(sarif_location_manager::process_worklist_item): Pass builder to
calls to add_related_location.
(sarif_result::on_nested_diagnostic): Likewise.
(sarif_result::on_diagram): Likewise.
(sarif_ice_notification::add_related_location): Add builder param.
For SARIF 2.2 and later chain up to base class impl so that
notifications get related locations.
(sarif_builder::sarif_builder): Add "version" param.
(SARIF_SCHEMA): Delete in favor of...
(sarif_version_to_url): New function.
(SARIF_VERSION): Delete in favor of...
(sarif_version_to_property): New function.
(make_top_level_object): Update to use m_version for "$schema" and
"version".
(sarif_output_format::sarif_output_format): Add "version" param.
(sarif_stream_output_format::sarif_stream_output_format):
Likewise.
(sarif_file_output_format::sarif_file_output_format): Likewise.
(diagnostic_output_format_init_sarif_stderr): Likewise.
(diagnostic_output_format_init_sarif_file): Likewise.
(diagnostic_output_format_init_sarif_stream): Likewise.
(selftest::test_sarif_diagnostic_context): Likewise.
(selftest::test_make_location_object): Likewise.
(selftest::test_simple_log): Likewise. Update schema and version
tests accordingly.
(selftest::test_simple_log_2): Add "version" param.
(selftest::test_message_with_embedded_link): Likewise.
(selftest::run_tests_per_version): New, based on the
for_each_line_table_case calls in...
(selftest::diagnostic_format_sarif_cc_tests): Add loop over sarif
versions. Replace for_each_line_table_case calls with one
call to run_tests_per_version.
* diagnostic-format-sarif.h: Include "diagnostic-format.h".
(enum class sarif_version): New.
(diagnostic_output_format_init_sarif_stderr): Move to here from
diagnostic-format.h. Add "version" param.
(diagnostic_output_format_init_sarif_file): Likewise.
(diagnostic_output_format_init_sarif_stream): Likewise.
* diagnostic-format.h: Include "diagnostic.h".
(diagnostic_output_format_init_sarif_stderr): Move from here to
diagnostic-format-sarif.h.
* diagnostic.cc: Define INCLUDE_MEMORY.
Include "diagnostic-format-sarif.h".
(diagnostic_output_format_init): Pass sarif_version::v2_1_0 to
existing SARIF options.
Add case DIAGNOSTICS_OUTPUT_FORMAT_SARIF_FILE_2_2_PRERELEASE.
* diagnostic.h (enum diagnostics_output_format): Add
DIAGNOSTICS_OUTPUT_FORMAT_SARIF_FILE_2_2_PRERELEASE.
gcc/testsuite/ChangeLog:
PR other/116301
* gcc.dg/plugin/crash-test-ice-in-header-sarif-2.1.c: New test.
* gcc.dg/plugin/crash-test-ice-in-header-sarif-2.2.c: New test.
* gcc.dg/plugin/crash-test-ice-in-header-sarif-2_1.py: Support
script for new test.
* gcc.dg/plugin/crash-test-ice-in-header-sarif-2_2.py: Likewise.
* gcc.dg/plugin/crash-test-ice-in-header.h: New header.
* gcc.dg/plugin/plugin.exp: Add the new tests.
* lib/sarif-schema-2.2-prerelease-2024-08-08.json: New schema
file.
* lib/scansarif.exp (verify-sarif-file): Add optional argument for
specifying which version of the schema to validate against,
supporting "2.1" and "2.2", defaulting to the former.
Update the test name to capture the version of the schema tested
against.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Andrew Pinski [Tue, 1 Oct 2024 18:34:00 +0000 (18:34 +0000)]
phiopt: Fix VCE moving by rewriting it into cast [PR116098]
Phiopt match_and_simplify might move a well defined VCE assign statement
from being conditional to being uncondtitional; that VCE might no longer
being defined. It will need a rewrite into a cast instead.
This adds the rewriting code to move_stmt for the VCE case.
This is enough to fix the issue at hand. It should also be using rewrite_to_defined_overflow
but first I need to move the check to see a rewrite is needed into its own function
and that is causing issues (see https://gcc.gnu.org/pipermail/gcc-patches/2024-September/663938.html).
Plus this version is easiest to backport.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/116098
gcc/ChangeLog:
* tree-ssa-phiopt.cc (move_stmt): Rewrite VCEs from integer to integer
types to case.
gcc/testsuite/ChangeLog:
* c-c++-common/torture/pr116098-2.c: New test.
* g++.dg/torture/pr116098-1.C: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
testsuite/52641 - Make gcc.dg/strict-flex-array-3.c work on int != 32 bits.
PR testsuite/52641
gcc/testsuite/
* gcc.dg/strict-flex-array-3.c (expect) [AVR]: Use custom
version due to AVR-LibC limitations.
(stuff): Use __SIZEOF_INT__ instead of hard-coded values.
middle-end: Fix ifcvt predicate generation for masked function calls
Up until now, due to a latent bug in the code for the ifcvt pass,
irrespective of the branch taken in a conditional statement, the
original condition for the if statement was used in masking the
function call.
This patch ensures that the correct predicate mask generation is
carried out such that, upon autovectorization, the correct vector
lanes are selected in the vectorized function call.
gcc/ChangeLog:
* tree-if-conv.cc (predicate_statements): Fix handling of
predicated function calls.
Andre Vieira [Wed, 2 Oct 2024 14:14:40 +0000 (15:14 +0100)]
arm: Prevent ICE when doloop dec_set is not PLUS expr
This patch refactors and fixes an issue where arm_mve_dlstp_check_dec_counter
was making an assumption about the form of what a candidate for a dec_insn
should be, which caused an ICE.
This dec_insn is the instruction that decreases the loop counter inside a
decrementing loop and we expect it to have the following form:
(set (reg CONDCOUNT)
(plus (reg CONDCOUNT)
(const_int)))
Where CONDCOUNT is the loop counter, and const int is the negative constant
used to decrement it.
This patch also improves our search for a valid dec_insn. Before this patch
we'd only look for a dec_insn inside the loop header if the loop latch was
empty. We now also search the loop header if the loop latch is not empty but
the last instruction is not a valid dec_insn. This could potentially be improved
to search all instructions inside the loop latch.
gcc/ChangeLog:
* config/arm/arm.cc (check_dec_insn): New helper function containing
code hoisted from...
(arm_mve_dlstp_check_dec_counter): ... here. Use check_dec_insn to
check the validity of the candidate dec_insn.
Simon Martin [Wed, 2 Oct 2024 13:32:37 +0000 (15:32 +0200)]
c++: Fix regression introduced by r15-3796 [PR116722]
Jason pointed out that the fix I made for PR116722 via r15-3796
introduces a regression when running constexpr-dynamic10.C with
-fimplicit-constexpr.
The problem is that my change makes us leave cxx_eval_call_expression
early, and bypass the call to cxx_eval_thunk_call (through a recursive
call to cxx_eval_call_expression) that used to emit an error for that
testcase with -fimplicit-constexpr.
This patch emits the error if !ctx->quiet before bailing out because the
{con,de}structor belongs to a class with virtual bases.
PR c++/116722
gcc/cp/ChangeLog:
* constexpr.cc (cxx_bind_parameters_in_call): When !ctx->quiet,
emit error before bailing out due to a call to {con,de}structor
for a class with virtual bases.
when inserting code to determine if var is power of two. If the target
doesn't support expanding the builtin as special instructions switch
conversion relies on this whole pattern being expanded as bitmagic.
However, it is possible that other GIMPLE optimizations move the two
statements of the pattern apart. In that case the builtin becomes a
libgcc call in the final binary. The call is slow and in case of
freestanding programs can result in linking error (this bug was
originally found while compiling Linux kernel).
This patch modifies switch conversion to insert the bitmagic
(var ^ (var - 1)) > (var - 1) instead of the builtin.
gcc/ChangeLog:
PR tree-optimization/116616
* tree-switch-conversion.cc (can_pow2p): Remove this function.
(gen_pow2p): Generate bitmagic instead of a builtin. Remove the
TYPE parameter.
(switch_conversion::is_exp_index_transform_viable): Don't call
can_pow2p.
(switch_conversion::exp_index_transform): Call gen_pow2p without
the TYPE parameter.
* tree-switch-conversion.h: Remove
m_exp_index_transform_pow2p_type.
gcc/testsuite/ChangeLog:
PR tree-optimization/116616
* gcc.target/i386/switch-exp-transform-1.c: Don't test for
presence of the POPCOUNT internal fn call.
Richard Biener [Wed, 2 Oct 2024 07:39:50 +0000 (09:39 +0200)]
Speedup iterative_hash_template_arg
Using iterative_hash_object is expensive compared to using
iterative_hash_hashval_t which is fit for integer sized values.
The following reduces the number of perf cycles spent in
iterative_hash_template_arg and iterative_hash combined by 20%.
gcc/cp/
* pt.cc (iterative_hash_template_arg): Avoid using
iterative_hash_object.
Richard Biener [Wed, 2 Oct 2024 11:40:59 +0000 (13:40 +0200)]
Adjust gcc.dg/vect/slp-12a.c
We can now SLP the loop. There's PR116583 tracking that this still
fails for VLA vectors when load-lanes doesn't support a group of
size 8. We can't express this right now so the testcase keeps
FAILing for aarch64 with SVE (but passes now for riscv).
Richard Biener [Wed, 2 Oct 2024 11:39:14 +0000 (13:39 +0200)]
Adjust expectation for gcc.dg/vect/slp-19c.c
We can now vectorize the first loop with SLP when using V2SImode
vectors since then we can handle the non-power-of-two interleaving.
We can also SLP the second loop reliably now after adding induction
support for VLA vectors.
The testcase in PR114855 shows profile prediction to evaluate
the same SSA def via expr_expected_value for each condition or
switch in a function. The following patch caches the expected
value (and probability/predictor) for each visited SSA def,
also protecting against recursion and thus obsoleting the visited
bitmap. This reduces the time spent in branch prediction from
1.2s to 0.3s, though callgrind which was how I noticed this
seems to be comparatively very much happier about the change than
this number suggests.
PR tree-optimization/114855
* predict.cc (ssa_expected_value): New global.
(expr_expected_value): Do not take bitmap.
(expr_expected_value_1): Likewise. Use ssa_expected_value
to cache results for a SSA def.
(tree_predict_by_opcode): Adjust.
(tree_estimate_probability): Manage ssa_expected_value.
(tree_guess_outgoing_edge_probabilities): Likewise.
Jonathan Wakely [Tue, 24 Sep 2024 22:20:56 +0000 (23:20 +0100)]
libstdc++: Populate std::time_get::get's %c format for C locale
We were using the empty string "" for D_T_FMT and ERA_D_T_FMT in the C
locale, instead of "%a %b %e %T %Y" as the C standard requires. Set it
correctly for each locale implementation that defines time_members.cc.
We can also explicitly set the _M_era_xxx pointers to the same values as
the corresponding _M_xxx ones, rather than setting them to point to
identical string literals. This doesn't rely on the compiler merging
string literals, and makes it more explicit that they're the same in the
C locale.
libstdc++-v3/ChangeLog:
* config/locale/dragonfly/time_members.cc
(__timepunct<char>::_M_initialize_timepunc)
(__timepunct<wchar_t>::_M_initialize_timepunc): Set
_M_date_time_format for C locale. Set %Ex formats to the same
values as the %x formats.
* config/locale/generic/time_members.cc: Likewise.
* config/locale/gnu/time_members.cc: Likewise.
* testsuite/22_locale/time_get/get/char/5.cc: New test.
* testsuite/22_locale/time_get/get/wchar_t/5.cc: New test.
Jonathan Wakely [Fri, 6 Sep 2024 20:41:47 +0000 (21:41 +0100)]
libstdc++: Fix rounding in chrono::parse
I noticed that chrono::parse was using duration_cast and time_point_cast
to convert the parsed value to the result. Those functions truncate
towards zero, which is not generally what you want. Especially for
negative times before the epoch, where truncating towards zero rounds
"up" towards the next duration/time_point. Using chrono::round is
typically better, as that rounds to nearest.
However, while testing the fix I realised that rounding to the nearest
can give surprising results in some cases. For example if we parse a
chrono::sys_days using chrono::parse("F %T", "2024-09-22 18:34:56", tp)
then we will round up to the next day, i.e. sys_days(2024y/09/23). That
seems surprising, and I think 2024-09-22 is what most users would
expect.
This change attempts to provide a hybrid rounding heuristic where we use
chrono::round for the general case, but when the result has a period
that is one of minutes, hours, days, weeks, or years then we truncate
towards negative infinity using chrono::floor. This means that we
truncate "2024-09-22 18:34:56" to the start of the current
minute/hour/day/week/year, instead of rounding up to 2024-09-23, or to
18:35, or 17:00. For a period of months chrono::round is used, because
the months duration is defined as a twelfth of a year, which is not
actually the length of any calendar month. We don't want to truncate to
a whole number of "months" if that can actually go from e.g. 2023-03-01
to 2023-01-31, because February is shorter than chrono::months(1).
libstdc++-v3/ChangeLog:
* include/bits/chrono_io.h (__detail::__use_floor): New
function.
(__detail::__round): New function.
(from_stream): Use __detail::__round.
* testsuite/std/time/clock/file/io.cc: Check for expected
rounding in parse.
* testsuite/std/time/clock/gps/io.cc: Likewise.
Jonathan Wakely [Tue, 1 Oct 2024 09:43:43 +0000 (10:43 +0100)]
libstdc++: Fix -Wlong-long warning in <bits/postypes.h>
For 32-bit targets __INT64_TYPE__ expands to long long, which gives a
pedwarn for C++98 mode, causing:
FAIL: 17_intro/headers/c++1998/all_pedantic_errors.cc -std=gnu++98 (test for excess errors)
Excess errors:
.../bits/postypes.h:64: error: ISO C++ 1998 does not support 'long long' [-Wlong-long]
The following patch implements the clang -Wheader-guard warning, which warns
if a valid multiple inclusion header guard's #ifndef/#if !defined directive
is immediately (no other non-line directives nor other (non-comment)
tokens in between) followed by #define directive for some different macro,
which in get_suggestion rules is close enough to the actual header guard
macro (i.e. likely misspelling), the #define is object-like with empty
definition (I've followed what clang implements) and the macro isn't defined
later on (at least not on the final #endif at the end of a header).
In this case it emits a warning, so that
#ifndef STDIO_H
#define STDOI_H
...
#endif
or similar misspellings can be caught.
clang enables this warning by default, but I've put it into -Wall instead
as it still seems to be a style warning, nothing more severe; if a header
doesn't survive multiple inclusion because of the misspelling, users will
get different diagnostics.
2024-10-02 Jakub Jelinek <jakub@redhat.com>
PR preprocessor/96842
libcpp/
* include/cpplib.h (struct cpp_options): Add warn_header_guard member.
(enum cpp_warning_reason): Add CPP_W_HEADER_GUARD enumerator.
* internal.h (struct cpp_reader): Add mi_def_cmacro, mi_loc and
mi_def_loc members.
(_cpp_defined_macro_p): Constify type pointed by argument type.
Formatting fix.
* init.cc (cpp_create_reader): Clear
CPP_OPTION (pfile, warn_header_guard).
* directives.cc (struct if_stack): Add def_loc and mi_def_cmacro
members.
(DIRECTIVE_TABLE): Add IF_COND flag to define.
(do_define): Set ifs->mi_def_cmacro on a define immediately following
#ifndef directive for the guard. Clear pfile->mi_valid. Formatting
fix.
(do_endif): Copy over pfile->mi_def_cmacro and pfile->mi_def_loc
if ifs->mi_def_cmacro is set and pfile->mi_cmacro isn't a defined
macro.
(push_conditional): Clear mi_def_cmacro and mi_def_loc members.
* files.cc (_cpp_pop_file_buffer): Emit -Wheader-guard diagnostics.
gcc/
* doc/invoke.texi (Wheader-guard): Document.
gcc/c-family/
* c.opt (Wheader-guard): New option.
* c.opt.urls: Regenerated.
* c-ppoutput.cc (init_pp_output): Initialize also cb->get_suggestion.
gcc/testsuite/
* c-c++-common/cpp/Wheader-guard-1.c: New test.
* c-c++-common/cpp/Wheader-guard-1-1.h: New test.
* c-c++-common/cpp/Wheader-guard-1-2.h: New test.
* c-c++-common/cpp/Wheader-guard-1-3.h: New test.
* c-c++-common/cpp/Wheader-guard-1-4.h: New test.
* c-c++-common/cpp/Wheader-guard-1-5.h: New test.
* c-c++-common/cpp/Wheader-guard-1-6.h: New test.
* c-c++-common/cpp/Wheader-guard-1-7.h: New test.
* c-c++-common/cpp/Wheader-guard-1-8.h: New test.
* c-c++-common/cpp/Wheader-guard-1-9.h: New test.
* c-c++-common/cpp/Wheader-guard-1-10.h: New test.
* c-c++-common/cpp/Wheader-guard-1-11.h: New test.
* c-c++-common/cpp/Wheader-guard-1-12.h: New test.
* c-c++-common/cpp/Wheader-guard-2.c: New test.
* c-c++-common/cpp/Wheader-guard-2.h: New test.
* c-c++-common/cpp/Wheader-guard-3.c: New test.
* c-c++-common/cpp/Wheader-guard-3.h: New test.
Jakub Jelinek [Wed, 2 Oct 2024 08:14:50 +0000 (10:14 +0200)]
opts: Fix up regenerate-opt-urls dependencies
It seems that we currently require
1) enabling at least c,c++,fortran,d in --enable-languages
2) first doing make html
before one can successfully regenerate-opt-urls, otherwise without 2)
one gets
make regenerate-opt-urls
make: *** No rule to make target '/home/jakub/src/gcc/obj12x/gcc/HTML/gcc-15.0.0/gcc/Option-Index.html', needed by 'regenerate-opt-urls'. Stop.
or say if not configuring d after make html one still gets
make regenerate-opt-urls
make: *** No rule to make target '/home/jakub/src/gcc/obj12x/gcc/HTML/gcc-15.0.0/gdc/Option-Index.html', needed by 'regenerate-opt-urls'. Stop.
Now, I believe neither 1) nor 2) is really necessary.
The regenerate-opt-urls goal has dependency on 3 Option-Index.html files,
but those files don't have dependencies how to generate them.
make html has dependency on $(HTMLS_BUILD) which adds
$(build_htmldir)/gcc/index.html and lang.html among other things, where
the former actually builds not just index.html but also Option-Index.html
and tons of other files, and lang.html is filled in by configure depending
on configured languages, so sometimes will include gfortran.html and
sometimes d.html.
The following patch adds dependencies of the Option-Index.html on their
corresponding index.html files and that is all that seems to be needed,
make regenerate-opt-urls then works even without prior make html and
even if just a subset of c/c++, fortran and d is enabled.
2024-10-02 Jakub Jelinek <jakub@redhat.com>
* Makefile.in ($(OPT_URLS_HTML_DEPS)): Add dependencies of the
Option-Index.html files on the corresponding index.html files.
Don't mention the requirement that all languages that have their own
HTML manuals to be enabled.
Andrew Pinski [Tue, 1 Oct 2024 21:48:19 +0000 (14:48 -0700)]
backprop: Fix deleting of a phi node [PR116922]
The problem here is remove_unused_var is called on a name that is
defined by a phi node but it deletes it like removing a normal statement.
remove_phi_node should be called rather than gsi_remove for phinodes.
Note there is a possibility of using simple_dce_from_worklist instead
but that is for another day.
gcc.target/powerpc/p9-vec-length-full-8.c was expecting all loops to
use -with-len fully masked vectorization to avoid epilogues because
the loops needed peeling for gaps. With SLP we have improved things
here and the loops using V2D[IF]mode no longer need peeling for gaps
since the target can compose those vectors from two scalars and
in turn we generate better code and not need an epilogue either
(the iteration count divides by the VF).
Richard Biener [Tue, 1 Oct 2024 13:17:18 +0000 (15:17 +0200)]
tree-optimization/116654 - missed dr_explicit_realign[_optimized] with SLP
With single-lane SLP we miss to use the power realing loads causing
some testsuite FAILs. r14-2430-g4736ddd11874fe exempted SLP of
non-grouped accesses because that could have been only splats
where the scheme isn't used anyway, but now with single-lane SLP
it can be contiguous accesses.
PR tree-optimization/116654
* tree-vect-data-refs.cc (vect_supportable_dr_alignment):
Treat non-grouped accesses like non-SLP.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_s_sub-2-i16.c: New test.
* gcc.target/riscv/sat_s_sub-2-i32.c: New test.
* gcc.target/riscv/sat_s_sub-2-i64.c: New test.
* gcc.target/riscv/sat_s_sub-2-i8.c: New test.
* gcc.target/riscv/sat_s_sub-run-2-i16.c: New test.
* gcc.target/riscv/sat_s_sub-run-2-i32.c: New test.
* gcc.target/riscv/sat_s_sub-run-2-i64.c: New test.
* gcc.target/riscv/sat_s_sub-run-2-i8.c: New test.
Introduce two new unspecs, UNSPEC_COND_SMAX and UNSPEC_COND_SMIN,
corresponding to rtl operators smax and smin. UNSPEC_COND_SMAX is used
to generate fmaxnm instruction and UNSPEC_COND_SMIN is used to generate
fminnm instruction.
With these new unspecs, we can generate SVE2 max/min instructions using
existing generic unpredicated and predicated instruction patterns that
use optab attribute. Thus, we have removed specialised instruction
patterns for max/min instructions that were using
SVE_COND_FP_MAXMIN_PUBLIC iterator.
No new test cases as the existing test cases should be enough to test
this refactoring.
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md
(<fmaxmin><mode>3): Remove this instruction pattern.
(cond_<fmaxmin><mode>): Remove this instruction pattern.
* config/aarch64/iterators.md: New unspecs and changes to
iterators and attrs to use the new unspecs
Thomas Koenig [Sun, 29 Sep 2024 14:52:51 +0000 (16:52 +0200)]
Implement MAXVAL and MINVAL for UNSIGNED.
gcc/fortran/ChangeLog:
* check.cc (int_or_real_or_char_or_unsigned_check_f2003): New function.
(gfc_check_minval_maxval): Use it.
* trans-intrinsic.cc (gfc_conv_intrinsic_minmaxval): Handle
initial values for UNSIGNED.
* gfortran.texi: Document MINVAL and MAXVAL for unsigned.
libgfortran/ChangeLog:
* Makefile.am: Add minval and maxval files.
* Makefile.in: Regenerated.
* gfortran.map: Add new functions.
* generated/maxval_m1.c: New file.
* generated/maxval_m16.c: New file.
* generated/maxval_m2.c: New file.
* generated/maxval_m4.c: New file.
* generated/maxval_m8.c: New file.
* generated/minval_m1.c: New file.
* generated/minval_m16.c: New file.
* generated/minval_m2.c: New file.
* generated/minval_m4.c: New file.
* generated/minval_m8.c: New file.
Eric Botcazou [Tue, 1 Oct 2024 15:54:00 +0000 (17:54 +0200)]
Fix wrong code out of NRV + RSO + inlining
The testcase is miscompiled with -O -flto beccause the three optimizations
NRV + RSO + inlining are applied to the same call: when the LHS of the call
is marked write-only before inlining, it will keep the mark after inlining
although it may be read in GIMPLE from that point on.
The fix is to apply the removal of the store, that would have been applied
later if the call was not inlined, right before inlining, which will prevent
the problematic references to the LHS from being generated during inlining.
gcc/
* tree-inline.cc (expand_call_inline): Remove the store to the
return slot if it is a global variable that is only written to.
gcc/testsuite/
* gnat.dg/lto28.adb: New test.
* gnat.dg/lto28_pkg1.ads: New helper.
* gnat.dg/lto28_pkg2.ads: Likewise.
* gnat.dg/lto28_pkg2.adb: Likewise.
* gnat.dg/lto28_pkg3.ads: Likewise.
From 06a370a0a2329dd4da0ffcab7c35ea7df2353baf Mon Sep 17 00:00:00 2001
From: Jim Lin <jim@andestech.com>
Date: Tue, 1 Oct 2024 14:42:56 +0800
Subject: [PATCH] RISC-V/libgcc: Fix incorrect and missing .cfi_offset for
__riscv_save_[0-3] on RV32.
libgcc/ChangeLog:
* config/riscv/save-restore.S: Fix .cfi_offset for
__riscv_save_[0-3] on RV32.
P2985R0 (C++26) introduces std::is_virtual_base_of; this is the compiler
builtin that will back up the library trait (which strictly requires
compiler support).
The name has been chosen to match LLVM/MSVC's, as per the discussion
here:
https://github.com/llvm/llvm-project/issues/98310
The actual user-facing type trait in libstdc++ will be added later.
gcc/cp/ChangeLog:
* constraint.cc (diagnose_trait_expr): New diagnostic.
* cp-trait.def (IS_VIRTUAL_BASE_OF): New builtin.
* cp-tree.h (enum base_access_flags): Add a new flag to be
able to request a search for a virtual base class.
* cxx-pretty-print.cc (pp_cxx_userdef_literal): Update the
list of GNU extensions to the grammar.
* search.cc (struct lookup_base_data_s): Add a field to
request searching for a virtual base class.
(dfs_lookup_base): Add the ability to look for a virtual
base class.
(lookup_base): Forward the flag to dfs_lookup_base.
* semantics.cc (trait_expr_value): Implement the builtin
by calling lookup_base with the new flag.
(finish_trait_expr): Handle the new builtin.
gcc/ChangeLog:
* doc/extend.texi: Document the new
__builtin_is_virtual_base_of builtin; amend the docs for
__is_base_of.
gcc/testsuite/ChangeLog:
* g++.dg/ext/is_virtual_base_of.C: New test.
* g++.dg/ext/is_virtual_base_of_diagnostic.C: New test.
Signed-off-by: Giuseppe D'Angelo <giuseppe.dangelo@kdab.com> Reviewed-by: Jason Merrill <jason@redhat.com>
We should factor out the conversion here as that will allow a simplfication to
`(t_3 != 0) & (c_4 != 0)`. Unlike most other types; `a ? b : CST` will simplify
for boolean result type to either `a | b` or `a & b` so allowing this conversion
for all operations will be always profitable.
Bootstrapped and tested on x86_64-linux-gnu with no regressions.
Note on the phi-opt-7.c testcase change, we are now able to optimize this
and remove the if due to the factoring out now so this is an improvement.
PR tree-optimization/116890
gcc/ChangeLog:
* tree-ssa-phiopt.cc (factor_out_conditional_operation): Conversions
from bool is also should be considered as wanting to happen.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phi-opt-7.c: Update testcase for no ifs left.
* gcc.dg/tree-ssa/phi-opt-42.c: New test.
* gcc.dg/tree-ssa/phi-opt-43.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>
Gaius Mulley [Tue, 1 Oct 2024 13:26:31 +0000 (14:26 +0100)]
PR modula2/116918 -fswig correct syntax
This patch fixes the syntax for the generated swig interface file.
The % characters in fprintf require escaping.
gcc/m2/ChangeLog:
PR modula2/116918
* gm2-compiler/M2Swig.mod (AnnotateProcedure): Capitalize
the generated comment, split comment into multiple lines and
terminate the comment with ". */".
(DoCheckUnbounded): Escape the % character with %%.
(DoWriteFile): Ditto.
The ACLE defines a new scalar type, __mfp8. This is an opaque 8bit types that
can only be used by fp8 intrinsics. Additionally, the mfloat8_t type is made
available in arm_neon.h and arm_sve.h as an alias of the same.
This implementation uses an unsigned INTEGER_TYPE, with precision 8 to
represent __mfp8. Conversions to int and other types are disabled via the
TARGET_INVALID_CONVERSION hook.
Additionally, operations that are typically available to integer types are
disabled via TARGET_INVALID_UNARY_OP and TARGET_INVALID_BINARY_OP hooks.
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_mfp8_type_node): Add node
for __mfp8 type.
(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type.
(aarch64_init_fp8_types): New function to initialise fp8 types and
register with language backends.
* config/aarch64/aarch64.cc (aarch64_mangle_type): Add ABI mangling for
new type.
(aarch64_invalid_conversion): Add function implementing
TARGET_INVALID_CONVERSION hook that blocks conversion to and from the
__mfp8 type.
(aarch64_invalid_unary_op): Add function implementing TARGET_UNARY_OP
hook that blocks operations on __mfp8 other than &.
(aarch64_invalid_binary_op): Extend TARGET_BINARY_OP hook to disallow
operations on __mfp8 type.
(TARGET_INVALID_CONVERSION): Add define.
(TARGET_INVALID_UNARY_OP): Likewise.
* config/aarch64/aarch64.h (aarch64_mfp8_type_node): Add node for __mfp8
type.
(aarch64_mfp8_ptr_type_node): Add node for __mfp8 pointer type.
* config/aarch64/arm_private_fp8.h (mfloat8_t): Add typedef.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/fp8_mangling.C: New tests exercising mangling.
* g++.target/aarch64/fp8_scalar_typecheck_2.C: New tests in C++.
* gcc.target/aarch64/fp8_scalar_1.c: New tests in C.
* gcc.target/aarch64/fp8_scalar_typecheck_1.c: Likewise.
Richard Biener [Tue, 1 Oct 2024 11:35:58 +0000 (13:35 +0200)]
tree-optimization/116902 - vectorizer load hosting breaks UID order #2
This is another case of load hoisting breaking UID order in the
preheader, this time between two hoistings. The easiest way out is
to do what we do for the main stmt - copy instead of move.
PR tree-optimization/116902
PR tree-optimization/116842
* tree-vect-stmts.cc (sort_after_uid): Remove again.
(hoist_defs_of_uses): Copy defs instead of hoisting them so
we can zero their UID.
(vectorizable_load): Separate analysis and transform call,
do transform on the stmt copy.
Richard Biener [Tue, 1 Oct 2024 08:37:16 +0000 (10:37 +0200)]
tree-optimization/116906 - unsafe PRE with never executed edges
When we're computing ANTIC for PRE we treat edges to not yet visited
blocks as having a maximum ANTIC solution to get at an optimistic
solution in the iteration. That assumes the edges visted eventually
execute. This is a wrong assumption that can lead to wrong code
(and not only non-optimality) when possibly trapping expressions
are involved as the testcases in the PR show. The following mitigates
this by pruning trapping expressions from ANTIC computed when
maximum sets are involved.
PR tree-optimization/116906
* tree-ssa-pre.cc (prune_clobbered_mems): Add clean_traps
argument.
(compute_antic_aux): Direct prune_clobbered_mems to prune
all traps when any MAX solution was involved in the ANTIC
computation.
(compute_partial_antic_aux): Adjust.
* gcc.dg/pr116906-1.c: New testcase.
* gcc.dg/pr116906-2.c: Likewise.
Jakub Jelinek [Tue, 1 Oct 2024 07:52:20 +0000 (09:52 +0200)]
range-cache: Fix ranger ICE if number of bbs increases [PR116899]
Ranger cache during initialization reserves number of basic block slots
in the m_workback vector (in an inefficient way by doing create (0)
and safe_grow_cleared (n) and truncate (0) rather than say create (n)
or reserve (n) after create). The problem is that some passes split bbs and/or
create new basic blocks and so when one is unlucky, the quick_push into that
vector can ICE.
The following patch replaces those 4 quick_push calls with safe_push.
I've also gathered some statistics from compiling 63 gcc sources (picked those
that dependent on gimple-range-cache.h just because I had to rebuild them once
for the instrumentation), and that showed that in 81% of cases nothing has
been pushed into the vector at all (and note, not everything was small, there
were even cases with 10000+ basic blocks), another 18.5% of cases had just 1-4
elements in the vector at most, 0.08% had 5-8 and 19 out of 305386 cases
had at most 9-11 elements, nothing more. So, IMHO reserving number of basic
block in the vector is a waste of compile time memory and with the safe_push
calls, the patch just initializes the vector to vNULL.
2024-10-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/116899
* gimple-range-cache.cc (ranger_cache::ranger_cache): Set m_workback
to vNULL instead of creating it, growing and then truncating.
(ranger_cache::fill_block_cache): Use safe_push rather than quick_push
on m_workback.
(ranger_cache::range_from_dom): Likewise.
Jakub Jelinek [Tue, 1 Oct 2024 07:49:49 +0000 (09:49 +0200)]
range-cache: Fix ICE on SSA_NAME with def_stmt not yet in the IL [PR116898]
Some passes like the bitint lowering queue some statements on edges and only
commit them at the end of the pass. If they use ranger at the same time,
the ranger might see such SSA_NAMEs and ICE on those. The following patch
instead just punts on them.
2024-10-01 Jakub Jelinek <jakub@redhat.com>
PR middle-end/116898
* gimple-range-cache.cc (ranger_cache::block_range): If a SSA_NAME
with NULL def_bb isn't SSA_NAME_IS_DEFAULT_DEF, return false instead
of failing assertion. Formatting fix.
libstdc++-v3: Fix signed-overflow warning for newlib/ctype_base.h, PR116895
There are 100+ regressions when running the g++ testsuite for newlib
targets (probably excepting ARM-based ones) e.g cris-elf after commit r15-3859-g63a598deb0c9fc "libstdc++: #ifdef out #pragma GCC
system_header", which effectively no longer silences warnings for
gcc-installed system headers. Some of these regressions are fixed by
r15-3928. For the remaining ones, there's in g++.log:
FAIL: g++.old-deja/g++.robertl/eb79.C -std=c++26 (test for excess errors)
Excess errors:
/gccobj/cris-elf/libstdc++-v3/include/cris-elf/bits/ctype_base.h:50:53: \
warning: overflow in conversion from 'int' to 'std::ctype_base::mask' \
{aka 'char'} changes value from '151' to '-105' [-Woverflow]
This is because the _B macro in newlib's ctype.h (from where the
"_<letter>" macros come) is bit 7, the sign-bit of 8-bit types:
#define _B 0200
Using it in an int-expression that is then truncated to 8 bits will
"change" the value to negative for a default-signed char. If this
code was created from scratch, it should have been an unsigned type,
however it's not advisable to change the type of mask as this affects
the API. The least ugly option seems to be to silence the warning by
explict casts in the initializer, and for consistency, doing it for
all members.
PR libstdc++/116895
* config/os/newlib/ctype_base.h: Avoid signed-overflow warnings by
explicitly casting initializer expressions to mask.
Marek Polacek [Wed, 18 Sep 2024 19:44:31 +0000 (15:44 -0400)]
c++: concept in default argument [PR109859]
1) We're hitting the assert in cp_parser_placeholder_type_specifier.
It says that if it turns out to be false, we should do error() instead.
Do so, then.
2) lambda-targ8.C should compile fine, though. The problem was that
local_variables_forbidden_p wasn't cleared when we're about to parse
the optional template-parameter-list for a lambda in a default argument.
PR c++/109859
gcc/cp/ChangeLog:
* parser.cc (cp_parser_lambda_declarator_opt): Temporarily clear
local_variables_forbidden_p.
(cp_parser_placeholder_type_specifier): Turn an assert into an
error.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-defarg3.C: New test.
* g++.dg/cpp2a/lambda-targ8.C: New test.
Eric Botcazou [Mon, 30 Sep 2024 19:04:18 +0000 (21:04 +0200)]
Fix internal error during inlining after ICF pass
The problem is that the ICF pass identifies two functions, one of which has
a static chain while the other does not. The fix is simply to prevent this
identification from occurring.
David Malcolm [Mon, 30 Sep 2024 15:48:30 +0000 (11:48 -0400)]
diagnostics: return text buffer from test_show_locus [PR116613]
As work towards supporting multiple diagnostic outputs (where each
output has its own pretty_printer), avoid referencing dc.m_printer
throughout the selftests of diagnostic-show-locus.cc. Instead
have test_diagnostic_context::test_show_locus return the result
buffer, hiding the specifics of which printer is in use in such
test cases.
No functional change intended.
gcc/ChangeLog:
PR other/116613
* diagnostic-show-locus.cc
(selftest::test_diagnostic_show_locus_unknown_location): Move call
to dc.test_show_locus into ASSERT_STREQ, and compare against its
result, rather than explicitly using dc.m_printer.
(selftest::test_one_liner_simple_caret): Likewise.
(selftest::test_one_liner_no_column): Likewise.
(selftest::test_one_liner_caret_and_range): Likewise.
(selftest::test_one_liner_multiple_carets_and_ranges): Likewise.
(selftest::test_one_liner_fixit_insert_before): Likewise.
(selftest::test_one_liner_fixit_insert_after): Likewise.
(selftest::test_one_liner_fixit_remove): Likewise.
(selftest::test_one_liner_fixit_replace): Likewise.
(selftest::test_one_liner_fixit_replace_non_equal_range):
Likewise.
(selftest::test_one_liner_fixit_replace_equal_secondary_range):
Likewise.
(selftest::test_one_liner_fixit_validation_adhoc_locations):
Likewise.
(selftest::test_one_liner_many_fixits_1): Likewise.
(selftest::test_one_liner_many_fixits_2): Likewise.
(selftest::test_one_liner_labels): Likewise.
(selftest::test_one_liner_simple_caret_utf8): Likewise.
(selftest::test_one_liner_multiple_carets_and_ranges_utf8):
Likewise.
(selftest::test_one_liner_fixit_insert_before_utf8): Likewise.
(selftest::test_one_liner_fixit_insert_after_utf8): Likewise.
(selftest::test_one_liner_fixit_remove_utf8): Likewise.
(selftest::test_one_liner_fixit_replace_utf8): Likewise.
(selftest::test_one_liner_fixit_replace_non_equal_range_utf8):
Likewise.
(selftest::test_one_liner_fixit_replace_equal_secondary_range_utf8):
Likewise.
(selftest::test_one_liner_fixit_validation_adhoc_locations_utf8):
Likewise.
(selftest::test_one_liner_many_fixits_1_utf8): Likewise.
(selftest::test_one_liner_many_fixits_2_utf8): Likewise.
(selftest::test_one_liner_labels_utf8): Likewise.
(selftest::test_one_liner_colorized_utf8): Likewise.
(selftest::test_add_location_if_nearby): Likewise.
(selftest::test_diagnostic_show_locus_fixit_lines): Likewise.
(selftest::test_overlapped_fixit_printing): Likewise.
(selftest::test_overlapped_fixit_printing_utf8): Likewise.
(selftest::test_overlapped_fixit_printing_utf8): Likewise.
(selftest::test_overlapped_fixit_printing_2): Likewise.
(selftest::test_fixit_insert_containing_newline): Likewise.
(selftest::test_fixit_insert_containing_newline_2): Likewise.
(selftest::test_fixit_replace_containing_newline): Likewise.
(selftest::test_fixit_deletion_affecting_newline): Likewise.
(selftest::test_tab_expansion): Likewise.
(selftest::test_escaping_bytes_1): Likewise.
(selftest::test_escaping_bytes_2): Likewise.
(selftest::test_line_numbers_multiline_range): Likewise.
* selftest-diagnostic.cc
(selftest::test_diagnostic_context::test_show_locus): Return the
formatted text of m_printer.
* selftest-diagnostic.h
(selftest::test_diagnostic_context::test_show_locus): Convert
return type from void to const char *.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 30 Sep 2024 15:48:30 +0000 (11:48 -0400)]
diagnostics: require callers of diagnostic_show_locus to be explicit about the printer [PR116613]
As work towards supporting multiple diagnostic outputs (where each
output has its own pretty_printer), update diagnostic_show_locus
so that the pretty_printer must always be explicitly passed in.
No functional change intended.
gcc/c-family/ChangeLog:
PR other/116613
* c-format.cc (selftest::test_type_mismatch_range_labels):
Explicitly pass in dc.m_printer to diagnostic_show_locus.
gcc/ChangeLog:
PR other/116613
* diagnostic-show-locus.cc (diagnostic_context::maybe_show_locus):
Convert param "pp" from * to &. Drop logic for using the
context's m_printer when the param is null.
* diagnostic.h (diagnostic_context::maybe_show_locus): Convert
param "pp" from * to &.
(diagnostic_show_locus): Drop default "nullptr" value for pp
param. Assert that it and context are nonnull. Pass pp by
reference to maybe_show_locus.
gcc/testsuite/ChangeLog:
PR other/116613
* gcc.dg/plugin/expensive_selftests_plugin.c (test_richloc):
Explicitly pass in dc.m_printer to diagnostic_show_locus.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 30 Sep 2024 15:48:29 +0000 (11:48 -0400)]
diagnostics: isolate diagnostic_context with interface classes [PR116613]
As work towards supporting multiple diagnostic outputs (where each
output has its own pretty_printer), avoid passing around
diagnostic_context to the various printing routines, so that we
can be more explicit about which pretty_printer is in use.
Introduce a set of "policy" classes that capture the parts of
diagnostic_context that are needed, and use these rather than
diagnostic_context *. Pass around pretty_printer & rather than
taking value from context. Split out the pretty_printer-using code
from class layout into a new class layout_printer, separating the
responsibilities of determining layout when quoting source versus
actually printing the source.
No functional change intended.
gcc/analyzer/ChangeLog:
PR other/116613
* program-point.cc (function_point::print_source_line): Replace
call to diagnostic_show_locus with a call to
diagnostic_source_print_policy::print.
gcc/ChangeLog:
PR other/116613
* diagnostic-format-json.cc (json_from_expanded_location): Replace
call to diagnostic_context::converted_column with call to
diagnostic_column_policy::converted_column.
* diagnostic-format-sarif.cc
(sarif_builder::make_location_object): Replace call to
diagnostic_show_locus with call to
diagnostic_source_print_policy::print.
* diagnostic-format-text.cc (get_location_text): Replace call to
diagnostic_context::get_location_text with call to
diagnostic_column_policy::get_location_text.
(diagnostic_text_output_format::report_current_module): Replace call
to diagnostic_context::converted_column with call to
diagnostic_column_policy::converted_column.
* diagnostic-format-text.h
(diagnostic_text_output_format::diagnostic_output_format):
Initialize m_column_policy.
(diagnostic_text_output_format::get_column_policy): New.
(diagnostic_text_output_format::m_column_policy): New.
* diagnostic-path.cc (class path_print_policy): New.
(event_range::maybe_add_event): Replace diagnostic_context param
with path_print_policy.
(event_range::print): Convert "pp" from * to &. Convert first
param of start_span callback from diagnostic_context to
diagnostic_location_print_policy.
(path_summary::path_summary): Convert first param from
diagnostic_text_output_format to path_print_policy. Add
colorize param. Update for changes to
event_range::maybe_add_event.
(thread_event_printer::print_swimlane_for_event_range): Assert
that pp is non-null. Update for change to event_range::print.
(diagnostic_text_output_format::print_path): Pass
path_print_policy to path_summary's ctor.
(selftest::test_empty_path): Likewise.
(selftest::test_intraprocedural_path): Likewise.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.
* diagnostic-show-locus.cc (colorizer::set_range): Update for
change to m_pp.
(colorizer::m_pp): Convert from * to &.
(class layout): Add friend class layout_printer and move various
decls to it.
(layout::m_pp): Drop field.
(layout::m_policy): Rename to...
(layout::m_char_policy): ...this.
(layout::m_colorizer): Move field to class layout_printer.
(layout::m_diagnostic_path_p): Drop field.
(class layout_printer): New class, by refactoring class layout.
(colorizer::colorizer): Convert "pp" param from * to &.
(colorizer::set_named_color): Update for above change.
(colorizer::begin_state): Likewise.
(colorizer::finish_state): Likewise.
(make_policy): Rename to...
(make_char_policy): ...this, and update param from
diagnostic_context to diagnostic_source_print_policy.
(layout::layout): Update param from diagnostic_context to
diagnostic_source_print_policy. Drop params "diagnostic_kind" and
"pp", moving these and other material to class layout_printer.
(layout::maybe_add_location_range): Update for renamed field.
(layout::print_gap_in_line_numbering): Convert to...
(layout_printer::print_gap_in_line_numbering): ...this.
(layout::calculate_x_offset_display): Update for renamed field.
(layout::print_source_line): Convert to...
(layout_printer::print_source_line): ...this.
(layout::print_leftmost_column): Convert to...
(layout_printer::print_leftmost_column): ...this.
(layout::start_annotation_line): Convert to...
(layout_printer::start_annotation_line): ...this.
(layout::print_annotation_line): Convert to...
(layout_printer::print_annotation_line): ...this.
(layout::print_any_labels): Convert to...
(layout_printer::print_any_labels): ...this.
(layout::print_leading_fixits): Convert to...
(layout_printer::print_leading_fixits): ...this.
(layout::print_trailing_fixits): Convert to...
(layout_printer::print_trailing_fixits): ...this.
(layout::print_newline): Convert to...
(layout_printer::print_newline): ...this.
(layout::get_state_at_point): Make const.
(layout::get_x_bound_for_row): Make const.
(layout::move_to_column): Convert to...
(layout_printer::move_to_column): ...this.
(layout::show_ruler): Convert to...
(layout_printer::show_ruler): ...this.
(layout::print_line): Convert to...
(layout_printer::print_line): ...this.
(layout::print_any_right_to_left_edge_lines): Convert to...
(layout_printer::print_any_right_to_left_edge_lines): ...this.
(layout::print_any_right_to_left_edge_lines): Likewise.
(layout_printer::layout_printer): New.
(layout::update_any_effects): Delete, moving logic to
layout_printer::print.
(gcc_rich_location::add_location_if_nearby): Update param from
diagnostic_context to diagnostic_source_print_policy. Add
overload taking a diagnostic_context.
(diagnostic_context::maybe_show_locus): Move handling of null
pretty_printer here, from layout ctor. Convert call to
diagnostic_context::show_locus to
diagnostic_source_print_policy::print.
(diagnostic_source_print_policy::diagnostic_source_print_policy):
New.
(diagnostic_context::show_locus): Convert to...
(diagnostic_source_print_policy::print): ...this. Convert pp
from * to &.
(layout_printer::print): New, based on material in
diagnostic_context::show_locus.
(selftest::make_char_policy): New.
(selftest::test_display_widths): Update for above changes.
(selftest::test_offset_impl): Likewise.
(selftest::test_layout_x_offset_display_utf8): Likewise.
(selftest::test_layout_x_offset_display_tab): Likewise.
(selftest::test_diagnostic_show_locus_unknown_location): Use
test_diagnostic_context::test_show_locus rather than
diagnostic_show_locus.
(selftest::test_one_liner_no_column): Likewise.
(selftest::test_one_liner_caret_and_range): Likewise.
(selftest::test_one_liner_multiple_carets_and_ranges): Likewise.
(selftest::test_one_liner_fixit_insert_before): Likewise.
(selftest::test_one_liner_fixit_insert_after): Likewise.
(selftest::test_one_liner_fixit_remove): Likewise.
(selftest::test_one_liner_fixit_replace): Likewise.
(selftest::test_one_liner_fixit_replace_non_equal_range):
Likewise.
(selftest::test_one_liner_fixit_replace_equal_secondary_range):
Likewise.
(selftest::test_one_liner_fixit_validation_adhoc_locations):
Likewise.
(selftest::test_one_liner_many_fixits_1): Likewise.
(selftest::test_one_liner_many_fixits_2): Likewise.
(selftest::test_one_liner_labels): Likewise.
(selftest::test_one_liner_simple_caret_utf8): Likewise.
(selftest::test_one_liner_caret_and_range_utf8): Likewise.
(selftest::test_one_liner_multiple_carets_and_ranges_utf8):
Likewise.
(selftest::test_one_liner_fixit_insert_before_utf8): Likewise.
(selftest::test_one_liner_fixit_insert_after_utf8): Likewise.
(selftest::test_one_liner_fixit_remove_utf8): Likewise.
(selftest::test_one_liner_fixit_replace_utf8): Likewise.
(selftest::test_one_liner_fixit_replace_non_equal_range_utf8):
Likewise.
(selftest::test_one_liner_fixit_replace_equal_secondary_range_utf8):
Likewise.
(selftest::test_one_liner_fixit_validation_adhoc_locations_utf8):
Likewise.
(selftest::test_one_liner_many_fixits_1_utf8): Likewise.
(selftest::test_one_liner_many_fixits_2_utf8): Likewise.
(selftest::test_one_liner_labels_utf8): Likewise.
(selftest::test_one_liner_colorized_utf8): Likewise.
(selftest::test_add_location_if_nearby): Likewise.
(selftest::test_diagnostic_show_locus_fixit_lines): Likewise.
(selftest::test_overlapped_fixit_printing): Likewise.
(selftest::test_overlapped_fixit_printing_utf8): Likewise.
(selftest::test_overlapped_fixit_printing_2): Likewise.
(selftest::test_fixit_insert_containing_newline): Likewise.
(selftest::test_fixit_insert_containing_newline_2): Likewise.
(selftest::test_fixit_replace_containing_newline): Likewise.
(selftest::test_fixit_deletion_affecting_newline): Likewise.
(selftest::test_tab_expansion): Likewise.
(selftest::test_escaping_bytes_1): Likewise.
(selftest::test_escaping_bytes_2): Likewise.
(selftest::test_line_numbers_multiline_range): Likewise.
* diagnostic.cc
(diagnostic_column_policy::diagnostic_column_policy): New.
(diagnostic_context::converted_column): Convert to...
(diagnostic_column_policy::converted_column): ...this.
(diagnostic_context::get_location_text): Convert to...
(diagnostic_column_policy::get_location_text): ...this, adding
"show_column" param.
(diagnostic_location_print_policy::diagnostic_location_print_policy):
New ctors.
(default_diagnostic_start_span_fn): Convert param from
diagnostic_context * to const diagnostic_location_print_policy &.
Add "pp" param.
(selftest::assert_location_text): Update for above changes.
(selftest::test_diagnostic_get_location_text): Rename to...
(selftest::test_get_location_text): ...this.
(selftest::c_diagnostic_cc_tests): Update for renaming.
* diagnostic.h (class diagnostic_location_print_policy): New
forward decl.
(class diagnostic_source_print_policy): New forward decl.
(diagnostic_start_span_fn): Convert first param from
diagnostic_context * to const diagnostic_location_print_policy &
and add pretty_printer * param.
(class diagnostic_column_policy): New.
(class diagnostic_location_print_policy): New.
(class diagnostic_source_print_policy): New.
(class diagnostic_context): Add friend class
diagnostic_source_print_policy.
(diagnostic_context::converted_column): Drop decl in favor of
diagnostic_column_policy::converted_column.
(diagnostic_context::get_location_text): Drop decl in favor of
diagnostic_column_policy::get_location_text.
(diagnostic_context::show_locus): Drop decl in favor of
diagnostic_source_print_policy::print.
(default_diagnostic_start_span_fn): Update for change to
diagnostic_start_span_fn.
* gcc-rich-location.h (class diagnostic_source_print_policy): New
forward decl.
(gcc_rich_location::add_location_if_nearby): Convert first param
from diagnostic_context to diagnostic_source_print_policy. Add
overload taking diagnostic_context.
* selftest-diagnostic.cc
(selftest::test_diagnostic_context::test_diagnostic_context): Turn
off colorization.
(selftest::test_diagnostic_context::start_span_cb): Update for
change to callback type.
(test_diagnostic_context::test_show_locus): New.
* selftest-diagnostic.h
(selftest::test_diagnostic_context::start_span_cb): Update for
change to callback type.
(test_diagnostic_context::test_show_locus): New decl.
gcc/fortran/ChangeLog:
PR other/116613
* error.cc (gfc_diagnostic_build_locus_prefix): Convert first
param from diagnostic_context * to
const diagnostic_location_print_policy &. Add colorize param.
Likewise for the "two expanded_locations" overload.
(gfc_diagnostic_text_starter): Update for above changes.
(gfc_diagnostic_start_span): Update for change to callback type.
gcc/testsuite/ChangeLog:
PR other/116613
* gcc.dg/plugin/diagnostic_group_plugin.c
(test_diagnostic_start_span_fn): Update for change to callback
type.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 30 Sep 2024 15:48:29 +0000 (11:48 -0400)]
diagnostics: avoid using diagnostic_context's m_printer [PR116613]
As work towards supporting multiple diagnostic outputs (where each
output has its own pretty_printer), avoid using diagnostic_context's
m_printer field. Instead, use the output format's printer. Currently
this *is* the dc's printer, but eventually it might not be.
No functional change intended.
gcc/ChangeLog:
PR other/116613
* diagnostic-format-json.cc (diagnostic_output_format_init_json):
Pass in the format. Use the format's printer when disabling
colorization. Move the call to set_output_format into here.
(diagnostic_output_format_init_json_stderr): Update for above
change.
(diagnostic_output_format_init_json_file): Likewise.
* diagnostic-format-sarif.cc
(diagnostic_output_format_init_sarif): Use the format's printer
when disabling colorization.
* diagnostic-path.cc (selftest::test_empty_path): Use the
text_output's printer.
(selftest::test_intraprocedural_path): Likewise.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.
gcc/testsuite/ChangeLog:
PR other/116613
* gcc.dg/plugin/diagnostic_group_plugin.c
(test_output_format::on_begin_group): Use get_printer () rather
than accessing m_context.m_printer.
(test_output_format::on_end_group): Likewise.
* gcc.dg/plugin/diagnostic_plugin_xhtml_format.c
(xhtml_builder::m_printer): New field.
(xhtml_builder::xhtml_builder): Add "pp" param and use it to
initialize m_printer.
(xhtml_builder::on_report_diagnostic): Drop "context" param.
(xhtml_builder::make_element_for_diagnostic): Likewise. Use
this->m_printer rather than the context's m_printer. Pass
m_printer to call to diagnostic_show_locus.
(xhtml_builder::emit_diagram): Drop "context" param.
(xhtml_output_format::on_report_diagnostic): Drop context param
from call to m_builder.
(xhtml_output_format::on_diagram): Likewise.
(xhtml_output_format::xhtml_output_format): Pass result of
get_printer as printer for builder.
(diagnostic_output_format_init_xhtml): Use the fmt's printer
rather than the context's.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 30 Sep 2024 15:48:29 +0000 (11:48 -0400)]
diagnostics: use "%e" to avoid intermediate strings [PR116613]
Various diagnostics build an intermediate string, potentially with
colorization, and then use this in a diagnostic message.
This won't work if we have multiple diagnostic sinks, where some might
be colorized and some not.
This patch reworks such places using "%e" and pp_element subclasses, so
that any colorization happens within report_diagnostic's call to
pp_format.
gcc/analyzer/ChangeLog:
PR other/116613
* kf-analyzer.cc: Include "pretty-print-markup.h".
(kf_analyzer_dump_escaped::impl_call_pre): Defer colorization
choices by eliminating the construction of a intermediate string,
replacing it with a new pp_element subclass via "%e".
gcc/ChangeLog:
PR other/116613
* attribs.cc: Include "pretty-print-markup.h".
(decls_mismatched_attributes): Defer colorization choices by
replacing printing to a pretty_printer * param with appending
to a vec of strings.
(maybe_diag_alias_attributes): As above, replacing pretty_printer
with usage of pp_markup::comma_separated_quoted_strings and "%e"
in two places.
* attribs.h (decls_mismatched_attributes): Update decl.
* gimple-ssa-warn-access.cc: Include "pretty-print-markup.h".
(pass_waccess::maybe_warn_memmodel): Defer colorization choices by
replacing printing to a pretty_printer * param with use of
pp_markup::comma_separated_quoted_strings and "%e".
(pass_waccess::maybe_warn_memmodel): Likewise, replacing printing
to a temporary buffer.
* pretty-print-markup.h
(class pp_markup::comma_separated_quoted_strings): New.
* pretty-print.cc
(pp_markup::comma_separated_quoted_strings::add_to_phase_2): New.
(selftest::test_pp_printf_within_pp_element): New.
(selftest::test_comma_separated_quoted_strings): New.
(selftest::pretty_print_cc_tests): Call the new tests.
gcc/cp/ChangeLog:
PR other/116613
* pt.cc: Include "pretty-print-markup.h".
(warn_spec_missing_attributes): Defer colorization choices by
replacing printing to a pretty_printer * param with appending
to a vec of strings. Replace pretty_printer with usage of
pp_markup::comma_separated_quoted_strings and "%e".
gcc/testsuite/ChangeLog:
PR other/116613
* c-c++-common/analyzer/escaping-1.c: Update expected results to
remove type information from C++ results. Previously we were
using %qD with default_tree_printer, which used
lang_hooks.decl_printable_name, whereas now we're using %qD with
a clone of the cxx_pretty_printer.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Mon, 30 Sep 2024 15:48:28 +0000 (11:48 -0400)]
diagnostics: fix memory leak in SARIF selftests
"make selftest-valgrind" was complaining about leaks of artifact objects
in SARIF's selftest::test_make_location_object:
-fself-test: 7638695 pass(es) in 89.999249 seconds
==3306525==
==3306525== HEAP SUMMARY:
==3306525== in use at exit: 1,215,639 bytes in 2,808 blocks
==3306525== total heap usage: 2,860,898 allocs, 2,858,090 frees, 1,336,446,579 bytes allocated
==3306525==
==3306525== 11,728 (1,536 direct, 10,192 indirect) bytes in 16 blocks are definitely lost in loss record 353 of 375
==3306525== at 0x514FE7D: operator new(unsigned long) (vg_replace_malloc.c:342)
==3306525== by 0x36E5FD2: sarif_builder::get_or_create_artifact(char const*, diagnostic_artifact_role, bool) (diagnostic-format-sarif.cc:2884)
==3306525== by 0x36E3D57: sarif_builder::maybe_make_physical_location_object(unsigned int, diagnostic_artifact_role, int, content_renderer const*) (diagnostic-format-sarif.cc:2097)
==3306525== by 0x36E34CE: sarif_builder::make_location_object(sarif_location_manager&, rich_location const&, logical_location const*, diagnostic_artifact_role) (diagnostic-format-sarif.cc:1922)
==3306525== by 0x36E72C6: selftest::test_make_location_object(selftest::line_table_case const&) (diagnostic-format-sarif.cc:3500)
==3306525== by 0x375609B: selftest::for_each_line_table_case(void (*)(selftest::line_table_case const&)) (input.cc:3898)
==3306525== by 0x36E9668: selftest::diagnostic_format_sarif_cc_tests() (diagnostic-format-sarif.cc:3910)
==3306525== by 0x3592A11: selftest::run_tests() (selftest-run-tests.cc:100)
==3306525== by 0x17DBEF3: toplev::run_self_tests() (toplev.cc:2268)
==3306525== by 0x17DC2BF: toplev::main(int, char**) (toplev.cc:2376)
==3306525== by 0x36A1919: main (main.cc:39)
==3306525==
==3306525== 12,400 (1,536 direct, 10,864 indirect) bytes in 16 blocks are definitely lost in loss record 355 of 375
==3306525== at 0x514FE7D: operator new(unsigned long) (vg_replace_malloc.c:342)
==3306525== by 0x36E5FD2: sarif_builder::get_or_create_artifact(char const*, diagnostic_artifact_role, bool) (diagnostic-format-sarif.cc:2884)
==3306525== by 0x36E2323: sarif_builder::sarif_builder(diagnostic_context&, line_maps const*, char const*, bool) (diagnostic-format-sarif.cc:1500)
==3306525== by 0x36E70AA: selftest::test_make_location_object(selftest::line_table_case const&) (diagnostic-format-sarif.cc:3469)
==3306525== by 0x375609B: selftest::for_each_line_table_case(void (*)(selftest::line_table_case const&)) (input.cc:3898)
==3306525== by 0x36E9668: selftest::diagnostic_format_sarif_cc_tests() (diagnostic-format-sarif.cc:3910)
==3306525== by 0x3592A11: selftest::run_tests() (selftest-run-tests.cc:100)
==3306525== by 0x17DBEF3: toplev::run_self_tests() (toplev.cc:2268)
==3306525== by 0x17DC2BF: toplev::main(int, char**) (toplev.cc:2376)
==3306525== by 0x36A1919: main (main.cc:39)
==3306525==
==3306525== LEAK SUMMARY:
==3306525== definitely lost: 3,072 bytes in 32 blocks
==3306525== indirectly lost: 21,056 bytes in 368 blocks
==3306525== possibly lost: 0 bytes in 0 blocks
==3306525== still reachable: 1,191,511 bytes in 2,408 blocks
==3306525== suppressed: 0 bytes in 0 blocks
==3306525== Reachable blocks (those to which a pointer was found) are not shown.
==3306525== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==3306525==
==3306525== For lists of detected and suppressed errors, rerun with: -s
==3306525== ERROR SUMMARY: 2 errors from 2 contexts (suppressed: 0 from 0)
Fixed thusly.
gcc/ChangeLog:
* diagnostic-format-sarif.cc (sarif_builder::~sarif_builder): New,
deleting any remaining artifact objects.
(sarif_builder::make_run_object): Empty the artifact map.
* ordered-hash-map.h (ordered_hash_map::empty): New.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
autovectorizer: Test autovectorization of different dot-prod modes.
Given the novel treatment of the dot product optab as a conversion, we
are now able to target different relationships between output modes and
input modes.
This is made clearer by way of example. Previously, on AArch64, the
following loop was vectorizable:
uint32_t udot4(int n, uint8_t* data) {
uint32_t sum = 0;
for (int i=0; i<n; i+=1)
sum += data[i] * data[i];
return sum;
}
while the following was not:
uint32_t udot2(int n, uint16_t* data) {
uint32_t sum = 0;
for (int i=0; i<n; i+=1)
sum += data[i] * data[i];
return sum;
}
Under the new treatment of the dot product optab, they are both now
vectorizable.
This adds the relevant target-agnostic check to ensure this behavior
in the autovectorizer, gated behind the new check_effective_target
`vect_dotprod_hisi' as well a runtime check targeting aarch64.
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
i386: Fix dot_prod backend patterns for mmx and sse targets
Following the migration of the dot_prod optab from a direct to a
conversion-type optab, ensure all back-end patterns incorporate the
second machine mode into pattern names.
arm: Fix arm backend-use of (u|s|us)dot_prod patterns
Given recent changes to the dot_prod standard pattern name, this patch
fixes the arm back-end by implementing the following changes:
1. Add 2nd mode to all patterns relating to the dot-product in .md
files.
2. redirect the single-mode CODE_FOR_neon_(u|s|us)dot<mode> values
generated from `arm_neon_builtins.def' to their new 2-mode
equivalents via means of simple aliases, as per the following example:
* config/arm/neon.md (<sup>dot_prod<vsi2qi>): Renamed to...
(<sup>dot_prod<mode><vsi2qi>): ...this.
(neon_<sup>dot<vsi2qi>): Renamed to...
(neon_<sup>dot<mode><vsi2qi>): ...this.
(neon_usdot<vsi2qi>): Renamed to...
(neon_usdot<mode><vsi2qi>): ...this.
(usdot_prod<vsi2qi>): Renamed to...
(usdot_prod<mode><vsi2qi>): ...this.
* config/arm/arm-builtins.cc
(CODE_FOR_neon_sdotv8qi): Definie as alias to
new CODE_FOR_neon_sdotv2siv8qi.
(CODE_FOR_neon_udotv8qi): Definie as alias to
new CODE_FOR_neon_udotv2siv8qi.
(CODE_FOR_neon_usdotv8qi): Definie as alias to
new CODE_FOR_neon_usdotv2siv8qi.
(CODE_FOR_neon_sdotv16qi): Definie as alias to
new CODE_FOR_neon_sdotv4siv16qi.
(CODE_FOR_neon_udotv16qi): Definie as alias to
new CODE_FOR_neon_udotv4siv16qi.
(CODE_FOR_neon_usdotv16qi): Definie as alias to
new CODE_FOR_neon_usdotv4siv16qi.
aarch64: Fix aarch64 backend-use of (u|s|us)dot_prod patterns
Given recent changes to the dot_prod standard pattern name, this patch
fixes the aarch64 back-end by implementing the following changes:
1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
2. Rewrite initialization and function expansion mechanism for simd
builtins.
3. Fix all direct calls to back-end `dot_prod' patterns in SVE
builtins.
Finally, given that it is now possible for the compiler to
differentiate between the two- and four-way dot product, we add a test
to ensure that autovectorization picks up on dot-product patterns
where the result is twice the width of the operands.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md
(<sur>dot_prod<vsi2qi><vczle><vczbe>): Renamed to...
(<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
(usdot_prod<vsi2qi><vczle><vczbe>): Renamed to...
(usdot_prod<mode><vsi2qi><vczle><vczbe>): ...this.
(<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
(popcount<mode2>): fix use of `udot_prod_optab'.
* config/aarch64/aarch64-sve.md
(<sur>dot_prod<vsi2qi>): Renamed to...
(<sur>dot_prod<mode><vsi2qi>): ...this.
(@<sur>dot_prod<vsi2qi>): Renamed to...
(@<sur>dot_prod<mode><vsi2qi>): ...this.
(<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
* config/aarch64/aarch64-sve2.md
(@aarch64_sve_<sur>dotvnx4sivnx8hi): Renamed to...
(<sur>dot_prodvnx4sivnx8hi): ...this.
* config/aarch64/aarch64-simd-builtins.def: Modify macro
expansion-based initialization and expansion
of (u|s|us)dot_prod builtins.
* config/aarch64/aarch64-builtins.cc
(CODE_FOR_aarch64_sdot_prodv8qi): Define as alias to
new CODE_FOR_sdot_prodv2siv8qi.
(CODE_FOR_aarch64_udot_prodv8qi): Define as alias to
new CODE_FOR_udot_prodv2siv8qi.
(CODE_FOR_aarch64_usdot_prodv8qi): Define as alias to
new CODE_FOR_usdot_prodv2siv8qi.
(CODE_FOR_aarch64_sdot_prodv16qi): Define as alias to
new CODE_FOR_sdot_prodv4siv16qi.
(CODE_FOR_aarch64_udot_prodv16qi): Define as alias to
new CODE_FOR_udot_prodv4siv16qi.
(CODE_FOR_aarch64_usdot_prodv16qi): Define as alias to
new CODE_FOR_usdot_prodv4siv16qi.
* config/aarch64/aarch64-sve-builtins-base.cc
(svdot_impl::expand): s/direct/convert/ in
`convert_optab_handler_for_sign' function call.
(svusdot_impl::expand): add second mode argument in call to
`code_for_dot_prod'.
* config/aarch64/aarch64-sve-builtins.cc
(function_expander::convert_optab_handler_for_sign): New class
method.
* config/aarch64/aarch64-sve-builtins.h
(class function_expander): Add prototype for new
`convert_optab_handler_for_sign' method.
autovectorizer: Add basic support for convert optabs
Given the shift from modeling dot products as direct optabs to
treating them as conversion optabs, we make necessary changes to the
autovectorizer code to ensure that given the relevant tree code,
together with the input and output data modes, we can retrieve the
relevant optab and subsequently the insn_code for it.
gcc/ChangeLog:
* gimple-match-exports.cc (directly_supported_p): Add overload
for conversion-type optabs.
* gimple-match.h (directly_supported_p): Add new function
prototype.
* optabs.cc (expand_widen_pattern_expr): Make the
DOT_PROD_EXPR tree code use `find_widening_optab_handler' to
retrieve icode.
* tree-vect-loop.cc (vect_is_emulated_mixed_dot_prod): make it
call conversion-type overloaded `directly_supported_p'.
* tree-vect-patterns.cc (vect_supportable_conv_optab_p): New.
(vect_recog_dot_prod_pattern): s/direct/conv/ in call to
`vect_supportable_direct_optab_p'.
optabs: Make all `*dot_prod_optab's modeled as conversions
Given the specification in the GCC internals manual defines the
{u|s}dot_prod<m> standard name as taking "two signed elements of the
same mode, adding them to a third operand of wider mode", there is
currently ambiguity in the relationship between the mode of the first
two arguments and that of the third.
This vagueness means that, in theory, different modes may be
supportable in the third argument. This flexibility would allow for a
given backend to add to the accumulator a different number of
vectorized products, e.g. A backend may provide instructions for both:
as is now seen in the SVE2.1 extension to AArch64. In spite of the
aforementioned flexibility, modeling the dot-product operation as a
direct optab means that we have no way to encode both input and the
accumulator data modes into the backend pattern name, which prevents
us from harnessing this flexibility.
We therefore make all dot_prod optabs conversions, allowing, for
example, for the encoding of both 2-way and 4-way dot product backend
patterns.
gcc/ChangeLog:
* optabs.def (sdot_prod_optab): Convert from OPTAB_D to
OPTAB_CD.
(udot_prod_optab): Likewise.
(usdot_prod_optab): Likewise.
* doc/md.texi (Standard Names): update entries for u,s and us
dot_prod names.
Richard Biener [Mon, 30 Sep 2024 11:38:28 +0000 (13:38 +0200)]
tree-optimization/116879 - failure to recognize non-empty latch
When we relaxed the vectorizers constraint on loop structure verifying
the emptiness of the latch became too lose as can be seen in the case
for PR116879 where the latch effectively contains two basic-blocks
which one being an unmerged forwarder that's not empty.
PR tree-optimization/116879
* tree-vect-loop.cc (vect_analyze_loop_form): Scan all
blocks that form the latch.
Tamar Christina [Mon, 30 Sep 2024 12:06:24 +0000 (13:06 +0100)]
middle-end: check explicitly for external or constants when checking for loop invariant [PR116817]
The previous check if a value was external was checking
!vect_get_internal_def (vinfo, var) but this of course isn't completely right
as they could reductions etc.
This changes the check to just explicitly look at externals and constants.
Note that reductions remain unhandled here, but we don't support codegen of
boolean reductions today anyway.
So at the time we do then this would have the be handled as well in lowering.
gcc/ChangeLog:
PR tree-optimization/116817
* tree-vect-patterns.cc (vect_recog_bool_pattern): Check for const or
externals.
gcc/testsuite/ChangeLog:
PR tree-optimization/116817
* g++.dg/vect/pr116817.cc: New test.
Richard Biener [Sat, 28 Sep 2024 12:02:18 +0000 (14:02 +0200)]
tree-optimization/116842 - vectorizer load hosting breaks UID order
The following fixes the case when vectorizing a load hoists an invariant
load and dependent stmts, thereby breaking UID order of said stmts.
While we duplicate the load we just move the dependences.
PR tree-optimization/116842
* tree-vect-stmts.cc (hoist_defs_of_uses): Sort stmts to hoist
after UID to avoid breaking vect_stmt_dominates_stmt_p.
Richard Biener [Fri, 27 Sep 2024 11:50:31 +0000 (13:50 +0200)]
tree-optimization/116785 - relax volatile handling in PTA
When there's volatile qualified stores we do not have to treat the
destination as pointing to ANYTHING. It's only when reading from
it that we want to treat the resulting pointers as pointing to ANYTHING.
PR tree-optimization/116785
* tree-ssa-structalias.cc (get_constraint_for_1): Only
volatile qualified reads produce ANYTHING.
Richard Biener [Thu, 26 Sep 2024 13:41:59 +0000 (15:41 +0200)]
tree-optimization/116850 - corrupt post-dom info
Path isolation computes post-dominators on demand but can end up
splitting blocks after that, wrecking it. We can delay splitting
of blocks until we no longer need the post-dom info which is what
the following patch does to solve the issue.
PR tree-optimization/116850
* gimple-ssa-isolate-paths.cc (bb_split_points): New global.
(insert_trap): Delay BB splitting if post-doms are computed.
(find_explicit_erroneous_behavior): Process delayed BB
splitting after releasing post dominators.
(gimple_ssa_isolate_erroneous_paths): Do not free post-dom
info here.
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
* The x86 bootstrap test.
* The x86 fully regression test.
gcc/ChangeLog:
* match.pd: Add case 1 matching pattern for signed SAT_SUB.
* tree-ssa-math-opts.cc (gimple_signed_integer_sat_sub): Add new
decl for generated SAT_SUB matching func.
(match_unsigned_saturation_sub): Rename from...
(match_saturation_sub): ...Rename to and add signed SAT_SUB matching.
(math_opts_dom_walker::after_dom_children): Leverage the named
match func for both the unsigned and signed SAT_SUB.
The below test are passed for this patch.
* The rv64gcv fully regression test.
It is test only patch and obvious up to a point, will commit it
directly if no comments in next 48H.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/sat_arith.h: Add test helper macros.
* gcc.target/riscv/sat_arith_data.h: Add test data for SAT_SUB.
* gcc.target/riscv/sat_s_sub-1-i16.c: New test.
* gcc.target/riscv/sat_s_sub-1-i32.c: New test.
* gcc.target/riscv/sat_s_sub-1-i64.c: New test.
* gcc.target/riscv/sat_s_sub-1-i8.c: New test.
* gcc.target/riscv/sat_s_sub-run-1-i16.c: New test.
* gcc.target/riscv/sat_s_sub-run-1-i32.c: New test.
* gcc.target/riscv/sat_s_sub-run-1-i64.c: New test.
* gcc.target/riscv/sat_s_sub-run-1-i8.c: New test.
After this patch:
10 │ sat_s_sub_int8_t_fmt_1:
11 │ sub a4,a0,a1
12 │ xor a5,a0,a4
13 │ xor a1,a0,a1
14 │ and a5,a5,a1
15 │ srli a5,a5,7
16 │ andi a5,a5,1
17 │ srai a0,a0,63
18 │ xori a3,a0,127
19 │ neg a0,a5
20 │ addi a5,a5,-1
21 │ and a3,a3,a0
22 │ and a0,a4,a5
23 │ or a0,a0,a3
24 │ slliw a0,a0,24
25 │ sraiw a0,a0,24
26 │ ret
The below test suites are passed for this patch.
* The rv64gcv fully regression test.
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_expand_sssub): Add new func
decl for expanding signed SAT_SUB.
* config/riscv/riscv.cc (riscv_expand_sssub): Add new func impl
for expanding signed SAT_SUB.
* config/riscv/riscv.md (sssub<mode>3): Add new pattern sssub
for scalar signed integer.
Jakub Jelinek [Sun, 29 Sep 2024 19:52:32 +0000 (21:52 +0200)]
cselib: Discard useless locs of preserved VALUEs [PR116627]
remove_useless_values iteratively discards useless locs (locs of
cselib_val which refer to non-preserved VALUEs with no locations),
which in turn can make further values useless until no further VALUEs
are made useless and then discards the useless VALUEs.
Preserved VALUEs (something done during var-tracking only I think)
live in a different hash table, cselib_preserved_hash_table rather
than cselib_hash_table. cselib_find_slot first looks up slot in
cselib_preserved_hash_table and only if not found looks it up in
cselib_hash_table (and INSERTs only into the latter), whereas preservation
of a VALUE results in move of a cselib_val from the latter to the former
hash table.
The testcase in the PR (apparently too fragile, it only reproduces on 14
branch with various flags on a single arch, not on trunk) ICEs, because
we have a preserved VALUE (QImode with (const_int 0) as one of the locs).
In a different BB SImode r2 is looked up, a non-preserved VALUE is created
for it, and the r13-2916 added code attempts to lookup also SUBREGs of that
in narrower modes, among those QImode, so adds to that SImode r2
non-preserve VALUE a new loc of (subreg:QI (value:SI) 0). That SImode
value is considered useless, so remove_useless_value discards it, but
nothing discarded it from the preserved VALUE's loc_list, so when looking
something up in the hash table we ICE trying to derevence CSELIB_VAL
of the discarded VALUE.
I think we need to discuard useless locs even from the preserved VALUEs.
That IMHO shouldn't create any further useless VALUEs, the preserved
VALUEs are never useless, so we don't need to iterate with it, can do it
just once, but IMHO it needs to be done because actually
discard_useless_values.
The following patch does that.
2024-09-29 Jakub Jelinek <jakub@redhat.com>
PR target/116627
* cselib.cc (remove_useless_values): Discard useless locs
even from preserved cselib_vals in cselib_preserved_hash_table
hash table.
Tested by running "make info pdf html" and looking at the pdf and html output. I used the comment on "gcc/config/sh.cc:sh_print_operand()", SH's TARGET_PRINT_OPERAND function, as a guide.
[PATCH] [PATCH] Avoid integer overflow in gcc.dg/cpp/charconst-3.c (PR testsuite/116806)
The intermediate expression (unsigned char) '\234' * scale overflows
int on int16 targets, causing the test case to fail there. Fixed by
performing the arithmetic in unsigned type, as suggested by Andrew Pinski.
Regression tested on x86_64-pc-linux-gnu, and on an out-of-tree 16-bit
target with simulator. Manually checked the generated code for pdp11
and xstormy16.
Ok for trunk? (I don't have commit rights so I'd need help committing it.)
gcc/testsuite/
PR testsuite/116806
* gcc.dg/cpp/charconst-3.c: Perform arithmetic in unsigned
type to avoid integer overflow.
Jovan Vukic [Sun, 29 Sep 2024 16:06:43 +0000 (10:06 -0600)]
[PATCH v2] RISC-V: Improve code generation for select of consecutive constants
Based on the valuable feedback I received, I decided to implement the patch
in the RTL pipeline. Since a similar optimization already exists in
simplify_binary_operation_1, I chose to generalize my original approach
and place it directly below that code.
The expression (X xor C1) + C2 is simplified to X xor (C1 xor C2) under
the conditions described in the patch. This is a more general optimization,
but it still applies to the RISC-V case, which was my initial goal:
long f1(long x, long y) {
return (x > y) ? 2 : 3;
}
Before the patch, the generated assembly is:
f1(long, long):
sgt a0,a0,a1
xori a0,a0,1
addi a0,a0,2
ret
After the patch, the generated assembly is:
f1(long, long):
sgt a0,a0,a1
xori a0,a0,3
ret
The patch optimizes cases like x LT/GT y ? 2 : 3 (and x GE/LE y ? 3 : 2),
as initially intended. Since this optimization is more general, I noticed
it also optimizes cases like x < CONST ? 3 : 2 when CONST < 0. I’ve added
tests for these cases as well.
A bit of logic behind the patch: The equality A + B == A ^ B + 2 * (A & B)
always holds true. This can be simplified to A ^ B if 2 * (A & B) == 0.
In our case, we have A == X ^ C1, B == C2 and X is either 0 or 1.
PR target/108038
gcc/ChangeLog:
* simplify-rtx.cc (simplify_context::simplify_binary_operation_1): New
simplification.
Thomas Koenig [Sat, 28 Sep 2024 20:29:56 +0000 (22:29 +0200)]
Implement FINDLOC for UNSIGNED.
gcc/fortran/ChangeLog:
* check.cc (intrinsic_type_check): Handle unsigned.
(gfc_check_findloc): Likewise.
* gfortran.texi: Include FINDLOC in unsigned documentation.
* iresolve.cc (gfc_resolve_findloc): Use INTEGER version
for UNSIGNED.