late-combine relies on df, which for -O0 is only initialised late
(pass_df_initialize_no_opt, after split1). Other df-based passes
cope with this by requiring optimize > 0, so this patch does the
same for late-combine.
gcc/
PR rtl-optimization/115677
* late-combine.cc (pass_late_combine::gate): New function.
s390: Check for ADDR_REGS in s390_decompose_addrstyle_without_index
An explicit check for address registers was not required so far since
during register allocation the processing of address constraints was
sufficient. However, address constraints themself do not check for
REGNO_OK_FOR_{BASE,INDEX}_P. Thus, with the newly introduced
late-combine pass in r15-1579-g792f97b44ffc5e we generate new insns with
invalid address registers which aren't fixed up afterwards.
Fixed by explicitly checking for address registers in
s390_decompose_addrstyle_without_index such that those new insns are
rejected.
gcc/ChangeLog:
PR target/115634
* config/s390/s390.cc (s390_decompose_addrstyle_without_index):
Check for ADDR_REGS in s390_decompose_addrstyle_without_index.
Richard Biener [Thu, 27 Jun 2024 09:26:08 +0000 (11:26 +0200)]
tree-optimization/115669 - fix SLP reduction association
The following avoids associating a reduction path as that might
get STMT_VINFO_REDUC_IDX out-of-sync with the SLP operand order.
This is a latent issue with SLP reductions but now easily exposed
as we're doing single-lane SLP reductions.
When we achieved SLP only we can move and update this meta-data.
PR tree-optimization/115669
* tree-vect-slp.cc (vect_build_slp_tree_2): Do not reassociate
chains that participate in a reduction.
Jonathan Wakely [Tue, 11 Jun 2024 15:45:43 +0000 (16:45 +0100)]
libstdc++: Fix std::codecvt<wchar_t, char, mbstate_t> for empty dest [PR37475]
For the GNU locale model, codecvt::do_out and codecvt::do_in incorrectly
return 'ok' when the destination range is empty. That happens because
detecting incomplete output is done in the loop body, and the loop is
never even entered if to == to_end.
By restructuring the loop condition so that we check the output range
separately, we can ensure that for a non-empty source range, we always
enter the loop at least once, and detect if the destination range is too
small.
The loops also seem easier to reason about if we return immediately on
any error, instead of checking the result twice on every iteration. We
can use an RAII type to restore the locale before returning, which also
simplifies all the other member functions.
libstdc++-v3/ChangeLog:
PR libstdc++/37475
* config/locale/gnu/codecvt_members.cc (Guard): New RAII type.
(do_out, do_in): Return partial if the destination is empty but
the source is not. Use Guard to restore locale on scope exit.
Return immediately on any conversion error.
(do_encoding, do_max_length, do_length): Use Guard.
* testsuite/22_locale/codecvt/in/char/37475.cc: New test.
* testsuite/22_locale/codecvt/in/wchar_t/37475.cc: New test.
* testsuite/22_locale/codecvt/out/char/37475.cc: New test.
* testsuite/22_locale/codecvt/out/wchar_t/37475.cc: New test.
Alexandre Oliva [Thu, 27 Jun 2024 10:22:48 +0000 (07:22 -0300)]
[libstdc++] [testsuite] defer to check_vect_support* [PR115454]
The newly-added testcase overrides the default dg-do action set by
check_vect_support_and_set_flags (in libstdc++-dg/conformance.exp), so
it attempts to run the test even if runtime vector support is not
available.
Remove the explicit dg-do directive, so that the default is honored,
and the test is run if vector support is found, and only compiled
otherwise.
for libstdc++-v3/ChangeLog
PR libstdc++/115454
* testsuite/experimental/simd/pr115454_find_last_set.cc: Defer
to check_vect_support_and_set_flags's default dg-do action.
Jonathan Wakely [Wed, 26 Jun 2024 19:22:54 +0000 (20:22 +0100)]
libstdc++: Fix std::format for chrono::duration with unsigned rep [PR115668]
Using std::chrono::abs is only valid if numeric_limits<rep>::is_signed
is true, so using it unconditionally made it ill-formed to format a
duration with an unsigned rep.
The duration formatter might as negate the duration itself instead of
using chrono::abs, because it already needs to check for a negative
value.
libstdc++-v3/ChangeLog:
PR libstdc++/115668
* include/bits/chrono_io.h (formatter<duration<R,P, C>::format):
Do not use chrono::abs.
* testsuite/20_util/duration/io.cc: Check formatting a duration
with unsigned rep.
Jonathan Wakely [Tue, 18 Jun 2024 19:57:13 +0000 (20:57 +0100)]
libstdc++: Enable more debug assertions during constant evaluation [PR111250]
Some of our debug assertions expand to nothing unless
_GLIBCXX_ASSERTIONS is defined, which means they are not checked during
constant evaluation. By making them unconditionally expand to a
__glibcxx_assert expression they will be checked during constant
evaluation. This allows us to diagnose more instances of undefined
behaviour at compile-time, such as accessing a vector past-the-end.
libstdc++-v3/ChangeLog:
PR libstdc++/111250
* include/debug/assertions.h (__glibcxx_requires_non_empty_range)
(__glibcxx_requires_nonempty, __glibcxx_requires_subscript):
Define to __glibcxx_assert expressions or to debug mode
__glibcxx_check_xxx expressions.
* testsuite/23_containers/array/element_access/constexpr_c++17.cc:
Add checks for out-of-bounds accesses in constant expressions.
* testsuite/23_containers/vector/element_access/constexpr.cc:
Likewise.
Eric Botcazou [Wed, 12 Jun 2024 14:05:57 +0000 (16:05 +0200)]
ada: Remove last uses of System.Address_Operations in runtime library
This completes the switch from using System.Address_Operations to using only
System.Storage_Elements in the runtime library. The remaining uses were for
simple optimizations that can be done by the optimizer alone.
gcc/ada/
* libgnat/s-carsi8.adb: Remove clauses for System.Address_Operations
and use only operations of System.Storage_Elements for addresses.
* libgnat/s-casi16.adb: Likewise.
* libgnat/s-casi32.adb: Likewise.
* libgnat/s-casi64.adb: Likewise.
* libgnat/s-casi128.adb: Likewise.
* libgnat/s-carun8.adb: Likewise.
* libgnat/s-caun16.adb: Likewise.
* libgnat/s-caun32.adb: Likewise.
* libgnat/s-caun64.adb: Likewise.
* libgnat/s-caun128.adb: Likewise.
* libgnat/s-geveop.adb: Likewise.
Eric Botcazou [Tue, 11 Jun 2024 17:29:22 +0000 (19:29 +0200)]
ada: Add missing dimension information for target names
It is computed from the Etype of N_Target_Name nodes.
gcc/ada/
* sem_ch5.adb (Analyze_Target_Name): Call Analyze_Dimension on the
node once the Etype is set.
* sem_dim.adb (OK_For_Dimension): Set to True for N_Target_Name.
(Analyze_Dimension): Call Analyze_Dimension_Has_Etype for it.
Martin Clochard [Fri, 7 Jun 2024 09:44:45 +0000 (11:44 +0200)]
ada: Overridden operation field not correctly set for controlling result wrappers
Implicit wrapper overridings generated for functions with
controlling result when deriving with null extension may
have field Overridden_Operation incorrectly set, when making
several such derivations in succession. This happens because
overridings were assumed to come from source, and entities
generated by Derive_Subprograms were also assumed to be
derived from source subprograms. Overridden_Operation could
be set to the entity generated by Derive_Subprograms for the
same type, resulting in a cycle between Overriden_Operation
and Alias fields, causing non-termination in GNATprove.
gcc/ada/
* sem_ch6.adb (Check_Overriding_Indicator) Remove Comes_From_Source filter.
(New_Overloaded_Entity) Move up special case of LSP_Subprogram,
and remove Comes_From_Source filter.
Eric Botcazou [Wed, 5 Jun 2024 21:19:53 +0000 (23:19 +0200)]
ada: Implement first half of Generalized Finalization
This implements the first half of the Generalized Finalization proposal,
namely the Finalizable aspect as well as its optional relaxed semantics
for the finalization operations, but the latter part is only implemented
for dynamically allocated objects.
In accordance with the spirit, if not the letter, of the proposal, this
implements the finalizable types declared with strict semantics for the
finalization operations as a direct generalization of controlled types,
which in turn makes it possible to reimplement the latter types in terms
of the former types and ensures full interoperability between them.
The relaxed semantics for the finalization operations is also a direct
generalization of the GNAT pragma No_Heap_Finalization for dynamically
allocated objects, in that it extends the effects of the pragma to all
access types designating the finalizable type, instead of just applying
them to library-level named access types.
gcc/ada/
* aspects.ads (Aspect_Id): Add Aspect_Finalizable.
(Implementation_Defined_Aspect): Add True for Aspect_Finalizable.
(Operational_Aspect): Add True for Aspect_Finalizable.
(Aspect_Argument): Add Expression for Aspect_Finalizable.
(Is_Representation_Aspect): Add False for Aspect_Finalizable.
(Aspect_Names): Add Name_Finalizable for Aspect_Finalizable.
(Aspect_Delay): Add Always_Delay for Aspect_Finalizable.
* checks.adb: Add with and use clauses for Sem_Elab.
(Install_Primitive_Elaboration_Check): Call Is_Controlled_Procedure.
* einfo.ads (Has_Relaxed_Finalization): Document new flag.
(Is_Controlled_Active): Update documentation.
* exp_aggr.adb (Generate_Finalization_Actions): Replace Find_Prim_Op
with Find_Controlled_Prim_Op for Name_Finalize.
* exp_attr.adb (Expand_N_Attribute_Reference) <Finalization_Size>:
Return 0 if the prefix type has relaxed finalization.
* exp_ch3.adb (Build_Equivalent_Record_Aggregate): Return Empty if
the type needs finalization.
(Expand_Freeze_Record_Type): Call Find_Controlled_Prim_Op instead of
Find_Prim_Op for Name_{Adjust,Initialize,Finalize}.
Call Make_Finalize_Address_Body for all controlled types.
* exp_ch4.adb (Insert_Dereference_Action): Do not generate a call to
Adjust_Controlled_Dereference if the designated type has relaxed
finalization.
* exp_ch6.adb (Needs_BIP_Collection): Return false for an untagged
type that has relaxed finalization.
* exp_ch7.adb (Allows_Finalization_Collection): Return false if the
designated type has relaxed finalization.
(Check_Visibly_Controlled): Call Find_Controlled_Prim_Op instead of
Find_Prim_Op.
(Make_Adjust_Call): Likewise.
(Make_Deep_Record_Body): Likewise.
(Make_Final_Call): Likewise.
(Make_Init_Call): Likewise.
* exp_disp.adb (Set_All_DT_Position): Remove obsolete warning.
* exp_util.ads: Add with and use clauses for Snames.
(Find_Prim_Op): Add precondition.
(Find_Controlled_Prim_Op): New function declaration.
(Name_Of_Controlled_Prim_Op): Likewise.
* exp_util.adb: Remove with and use clauses for Snames.
(Build_Allocate_Deallocate_Proc): Do not build finalization actions
if the designated type has relaxed finalization.
(Find_Controlled_Prim_Op): New function.
(Find_Last_Init): Call Find_Controlled_Prim_Op instead of
Find_Prim_Op.
(Name_Of_Controlled_Prim_Op): New function.
* freeze.adb (Freeze_Entity.Freeze_Record_Type): Propagate the
Has_Relaxed_Finalization flag from components.
* gen_il-fields.ads (Opt_Field_Enum): Add Has_Relaxed_Finalization.
* gen_il-gen-gen_entities.adb (Entity_Kind): Likewise.
* sem_aux.adb (Is_By_Reference_Type): Return true for all controlled
types.
* sem_ch3.adb (Build_Derived_Record_Type): Do not special case types
declared in Ada.Finalization.
(Record_Type_Definition): Propagate the Has_Relaxed_Finalization
flag from components.
* sem_ch13.adb (Analyze_Aspects_At_Freeze_Point): Also process the
Finalizable aspect.
(Analyze_Aspect_Specifications): Likewise. Call Flag_Non_Static_Expr
in more cases.
(Check_Aspect_At_Freeze_Point): Likewise.
(Inherit_Aspects_At_Freeze_Point): Likewise.
(Resolve_Aspect_Expressions): Likewise.
(Resolve_Finalizable_Argument): New procedure.
(Validate_Finalizable_Aspect): Likewise.
* sem_elab.ads: Add with and use clauses for Snames.
(Is_Controlled_Procedure): New function declaration.
* sem_elab.adb: Remove with and use clauses for Snames.
(Is_Controlled_Proc): Move to...
(Is_Controlled_Procedure): ...here and rename.
(Check_A_Call): Call Find_Controlled_Prim_Op instead of
Find_Prim_Op.
(Is_Finalization_Procedure): Likewise.
* sem_util.ads (Propagate_Controlled_Flags): Update documentation.
* sem_util.adb (Is_Fully_Initialized_Type): Replace call to
Find_Optional_Prim_Op with Find_Controlled_Prim_Op.
Call Has_Null_Extension only for derived tagged types.
(Propagate_Controlled_Flags): Propagate Has_Relaxed_Finalization.
* snames.ads-tmpl (Name_Finalizable): New name.
(Name_Relaxed_Finalization): Likewise.
* libgnat/s-finroo.ads (Root_Controlled): Add Finalizable aspect.
* doc/gnat_rm/gnat_language_extensions.rst: Document implementation
of Generalized Finalization.
* gnat_rm.texi: Regenerate.
* gnat_ugn.texi: Regenerate.
Hu, Lin1 [Wed, 6 Mar 2024 11:58:48 +0000 (19:58 +0800)]
vect: support direct conversion under x86-64-v3.
gcc/ChangeLog:
PR target/107432
* config/i386/i386-expand.cc (ix86_expand_trunc_with_avx2_noavx512f):
New function for generate a series of suitable insn.
* config/i386/i386-protos.h (ix86_expand_trunc_with_avx2_noavx512f):
Define new function.
* config/i386/sse.md: Extend trunc<mode><mode>2 for x86-64-v3.
(ssebytemode) Add V8HI.
(PMOV_DST_MODE_2_AVX2): New mode iterator.
(PMOV_SRC_MODE_3_AVX2): Ditto.
* config/i386/mmx.md
(trunc<mode><mmxhalfmodelower>2): Ditto.
(avx512vl_trunc<mode><mmxhalfmodelower>2): Ditto.
(truncv2si<mode>2): Ditto.
(avx512vl_truncv2si<mode>2): Ditto.
(mmxbytemode): New mode attr.
Hu, Lin1 [Wed, 28 Feb 2024 10:11:55 +0000 (18:11 +0800)]
vect: Support v4hi -> v4qi.
gcc/ChangeLog:
PR target/107432
* config/i386/mmx.md
(VI2_32_64): New mode iterator.
(mmxhalfmode): New mode atter.
(mmxhalfmodelower): Ditto.
(truncv2hiv2qi2): Extend mode v4hi and change name from
truncv2hiv2qi to trunc<mode><mmxhalfmodelower>2.
gcc/testsuite/ChangeLog:
PR target/107432
* gcc.target/i386/pr107432-1.c: Modify test.
* gcc.target/i386/pr107432-6.c: Add test.
* gcc.target/i386/pr108938-3.c: This patch supports
truncv4hiv4qi affect bswap optimization, so I added
the -mno-avx option for now, and open a bugzilla.
Hu, Lin1 [Thu, 1 Feb 2024 07:15:01 +0000 (15:15 +0800)]
vect: generate suitable convert insn for int -> int, float -> float and int <-> float.
gcc/ChangeLog:
PR target/107432
* tree-vect-generic.cc
(expand_vector_conversion): Support convert for int -> int,
float -> float and int <-> float.
* tree-vect-stmts.cc (vectorizable_conversion): Wrap the
indirect convert part.
(supportable_indirect_convert_operation): New function.
* tree-vectorizer.h (supportable_indirect_convert_operation):
Define the new function.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_arith.h: Add helper
test macros.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_binary_scalar.h: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_trunc-run-3.c: New test.
Xi Ruoyao [Sat, 15 Jun 2024 10:29:43 +0000 (18:29 +0800)]
LoongArch: Tweak IOR rtx_cost for bstrins
Consider
c &= 0xfff;
a &= ~0xfff;
b &= ~0xfff;
a |= c;
b |= c;
This can be done with 2 bstrins instructions. But we need to recognize
it in loongarch_rtx_costs or the compiler will not propagate "c & 0xfff"
forward.
gcc/ChangeLog:
* config/loongarch/loongarch.cc:
(loongarch_use_bstrins_for_ior_with_mask): Split the main logic
into ...
(loongarch_use_bstrins_for_ior_with_mask_1): ... here.
(loongarch_rtx_costs): Special case for IOR those can be
implemented with bstrins.
liuhongt [Mon, 24 Jun 2024 09:53:22 +0000 (17:53 +0800)]
Fix wrong cost of MEM when addr is a lea.
416.gamess regressed 4-6% on x86_64 since my r15-882-g1d6199e5f8c1c0.
The commit adjust rtx_cost of mem to reduce cost of (add op0 disp).
But Cost of ADDR could be cheaper than XEXP (addr, 0) when it's a lea.
It is the case in the PR, the patch adjust rtx_cost to only handle reg
+ disp, for other forms, they're basically all LEA which doesn't have
additional cost of ADD.
gcc/ChangeLog:
PR target/115462
* config/i386/i386.cc (ix86_rtx_costs): Make cost of MEM (reg +
disp) just a little bit more than MEM (reg).
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr115462.c: New test.
Pan Li [Wed, 26 Jun 2024 01:28:05 +0000 (09:28 +0800)]
Internal-fn: Support new IFN SAT_TRUNC for unsigned scalar int
This patch would like to add the middle-end presentation for the
saturation truncation. Aka set the result of truncated value to
the max value when overflow. It will take the pattern similar
as below.
Form 1:
#define DEF_SAT_U_TRUC_FMT_1(WT, NT) \
NT __attribute__((noinline)) \
sat_u_truc_##T##_fmt_1 (WT x) \
{ \
bool overflow = x > (WT)(NT)(-1); \
return ((NT)x) | (NT)-overflow; \
}
For example, truncated uint16_t to uint8_t, we have
Before this patch:
__attribute__((noinline))
uint32_t sat_u_truc_T_fmt_1 (uint64_t x)
{
_Bool overflow;
unsigned int _1;
unsigned int _2;
unsigned int _3;
uint32_t _6;
The below tests are passed for this patch:
*. The rv64gcv fully regression tests.
*. The rv64gcv build with glibc.
*. The x86 bootstrap tests.
*. The x86 fully regression tests.
gcc/ChangeLog:
* internal-fn.def (SAT_TRUNC): Add new signed IFN sat_trunc as
unary_convert.
* match.pd: Add new matching pattern for unsigned int sat_trunc.
* optabs.def (OPTAB_CL): Add unsigned and signed optab.
* tree-ssa-math-opts.cc (gimple_unsigend_integer_sat_trunc): Add
new decl for the matching pattern generated func.
(match_unsigned_saturation_trunc): Add new func impl to match
the .SAT_TRUNC.
(math_opts_dom_walker::after_dom_children): Add .SAT_TRUNC match
function under BIT_IOR_EXPR case.
This patch would like to improve the pattern match to recog above
as truncate after .SAT_SUB pattern. Then we will have the pattern
similar to below, as well as eliminate the first 3 dead stmt.
The below tests are passed for this patch.
1. The rv64gcv fully regression tests.
2. The rv64gcv build with glibc.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.
gcc/ChangeLog:
* match.pd: Add convert description for minus and capture.
* tree-vect-patterns.cc (vect_recog_build_binary_gimple_call): Add
new logic to handle in_type is incompatibile with out_type, as
well as rename from.
(vect_recog_build_binary_gimple_stmt): Rename to.
(vect_recog_sat_add_pattern): Leverage above renamed func.
(vect_recog_sat_sub_pattern): Ditto.
Richard Biener [Wed, 26 Jun 2024 17:23:26 +0000 (19:23 +0200)]
tree-optimization/115652 - amend last fix
The previous fix breaks in the degenerate case when the discovered
last_stmt is equal to the first stmt in the block since then we
undo a required stmt advancement.
PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Only insert
at the start of the block if that strictly dominates
the discovered dependent stmt.
Jonathan Wakely [Tue, 25 Jun 2024 22:59:19 +0000 (23:59 +0100)]
libstdc++: Add script to update docs for a new release branch
This should be run on a release branch after branching from trunk.
Various links and references to trunk in the docs will be updated to
refer to the new release branch.
Jonathan Wakely [Thu, 20 Jun 2024 21:17:08 +0000 (22:17 +0100)]
libstdc++: Remove duplicate test
We currently have 808590.cc which only runs for C++98 mode, and
808590-cxx11.cc which only runs for C++11 and later, but have almost
identical content (except for a defaulted special member in the C++11
one, to suppress a -Wdeprecated-copy warning).
This was done originally to ensure that the test ran for both C++98 mode
and C++11 mode, because the logic being tested was different enough to
need both to be tested. But it's trivial to run all tests in multiple
-std modes now, using GLIBCXX_TESTSUITE_STDS, so we don't need two
separate tests. We can remove one of the tests and allow the other one
to run in any -std mode.
libstdc++-v3/ChangeLog:
* testsuite/20_util/specialized_algorithms/uninitialized_copy/808590.cc:
Copy defaulted assignment operator from 808590-cxx11.cc to
suppress a warning.
* testsuite/20_util/specialized_algorithms/uninitialized_copy/808590-cxx11.cc:
Removed.
Jonathan Wakely [Thu, 6 Jun 2024 10:50:06 +0000 (11:50 +0100)]
libstdc++: Work around some PSTL test failures for debug mode [PR90276]
This addresses one known failure due to a bug in the upstream tests, and
a number of timeouts due to the algorithms running much more slowly with
debug mode checks enabled.
libstdc++-v3/ChangeLog:
PR libstdc++/90276
* testsuite/25_algorithms/pstl/alg_sorting/partial_sort.cc
[_GLIBCXX_DEBUG]: Add xfail-run-if for debug mode.
* testsuite/25_algorithms/pstl/alg_nonmodifying/nth_element.cc
[_GLIBCXX_DEBUG]: Reduce size of test data.
* testsuite/25_algorithms/pstl/alg_sorting/includes.cc:
Likewise.
* testsuite/25_algorithms/pstl/alg_sorting/set_util.h:
Likewise.
Jonathan Wakely [Tue, 30 Apr 2024 08:52:13 +0000 (09:52 +0100)]
libstdc++: Fix std::chrono::tzdb to work with vanguard format
I found some issues in the std::chrono::tzdb parser by testing the
tzdata "vanguard" format, which uses new features that aren't enabled in
the "main" and "rearguard" data formats.
Since 2024a the keyword "minimum" is no longer valid for the FROM and TO
fields in a Rule line, which means that "m" is now a valid abbreviation
for "maximum". Previously we expected either "mi" or "ma". For backwards
compatibility, a FROM field beginning with "mi" is still supported and
is treated as 1900. The "maximum" keyword is only allowed in TO now,
because it makes no sense in FROM. To support these changes the
minmax_year and minmax_year2 classes for parsing FROM and TO are
replaced with a single years_from_to class that reads both fields.
The vanguard format makes use of %z in Zone FORMAT fields, which caused
an exception to be thrown from ZoneInfo::set_abbrev because no % or /
characters were expected when a Zone doesn't use a named Rule. The
ZoneInfo::to(sys_info&) function now uses format_abbrev_str to replace
any %z with the current offset. Although format_abbrev_str also checks
for %s and STD/DST formats, those only make sense when a named Rule is
in effect, so won't occur when ZoneInfo::to(sys_info&) is used.
This change also implements a feature that has always been missing from
time_zone::_M_get_sys_info: finding the Rule that is active before the
specified time point, so that we can correctly handle %s in the FORMAT
for the first new sys_info that gets created. This requires implementing
a poorly documented feature of zic, to get the LETTERS field from a
later transition, as described at
https://mm.icann.org/pipermail/tz/2024-April/058891.html
In order for this to work we need to be able to distinguish an empty
letters field (as used by CE%sT where the variable part is either empty
or "S") from "the letters field is not known for this transition". The
tzdata file uses "-" for an empty letters field, which libstdc++ was
previously replacing with "" when the Rule was parsed. Instead, we now
preserve the "-" in the Rule object, so that "" can be used for the case
where we don't know the letters (and so need to decide it).
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (minmax_year, minmax_year2): Remove.
(years_from_to): New class replacing minmax_year and
minmax_year2.
(format_abbrev_str, select_std_or_dst_abbrev): Move earlier in
the file. Handle "-" for letters.
(ZoneInfo::to): Use format_abbrev_str to expand %z.
(ZoneInfo::set_abbrev): Remove exception. Change parameter from
reference to value.
(operator>>(istream&, Rule&)): Do not clear letters when it
contains "-".
(time_zone::_M_get_sys_info): Add missing logic to find the Rule
in effect before the time point.
* testsuite/std/time/tzdb/1.cc: Adjust for vanguard format using
"GMT" as the Zone name, not as a Link to "Etc/GMT".
* testsuite/std/time/time_zone/sys_info_abbrev.cc: New test.
Richard Biener [Tue, 25 Jun 2024 12:04:31 +0000 (14:04 +0200)]
tree-optimization/115629 - missed tail merging
The following fixes a missed tail-merging observed for the testcase
in PR115629. The issue is that when deps_ok_for_redirect doesn't
compute both would be valid prevailing blocks it rejects the merge.
The following instead makes sure to record the working block as
prevailing. Also stmt comparison fails for indirect references
and is not handling memory references thoroughly, failing to unify
array indices and pointers indirected. The following attempts to
fix this.
PR tree-optimization/115629
* tree-ssa-tail-merge.cc (gimple_equal_p): Handle
memory references better.
(deps_ok_for_redirect): Handle the case not both blocks
are considered a valid prevailing block.
Patrick O'Neill [Tue, 25 Jun 2024 21:14:18 +0000 (14:14 -0700)]
RISC-V: Update testcase comments to point to PSABI rather than Table A.6
Table A.6 was originally the source of truth for the recommended mappings.
Point to the PSABI doc since the memory model mappings have been moved there.
Patrick O'Neill [Tue, 25 Jun 2024 21:14:17 +0000 (14:14 -0700)]
RISC-V: Consolidate amo testcase variants
Many riscv/amo/ testcases use check-function-bodies. These testcases can be
consolidated with related testcases (memory ordering variants) without affecting
the assertions.
Give functions descriptive names so testsuite failures are obvious from the
'FAIL:' line.
Carl Love [Fri, 21 Jun 2024 15:56:36 +0000 (11:56 -0400)]
rs6000, change altivec*-runnable.c test file names
Changed the names of the test files.
gcc/testsuite/ChangeLog:gcc/testsuite/ChangeLog:
* gcc.target/powerpc/altivec-1-runnable.c: Change the name to
altivec-38.c.
* gcc.target/powerpc/altivec-2-runnable.c: Change the name to
p8vector-builtin-9.c.
Jeff Law [Wed, 26 Jun 2024 13:20:29 +0000 (07:20 -0600)]
[committed] Remove compromised sh test
Surya's recent patch to IRA improves the code for sh/pr54602-1.c slightly.
Specifically it's able to eliminate a save/restore in the prologue/epilogue and
a bit of register shuffling.
As a result there literally aren't any insns that can be used to fill the delay
slot of the return, so a nop gets emitted and the test fails.
Given there literally aren't any insns to move into the delay slot, the best
course of action is to just drop the test.
Jeff Law [Wed, 26 Jun 2024 12:59:26 +0000 (06:59 -0600)]
[committed][RISC-V] Fix expected output for thead store pair test
Surya's patch to IRA has improved the code we generate for one of the thead
store pair tests for both rv32 and rv64. This patch adjusts the expectations
of that test.
I've verified that the test now passes on rv32 and rv64 in my tester. Pushing
to the trunk.
Richard Biener [Wed, 26 Jun 2024 07:25:27 +0000 (09:25 +0200)]
tree-optimization/115652 - adjust insertion gsi for SLP
The following adjusts how SLP computes the insertion location. In
particular it advanced the insert iterator of the found last_stmt.
The vectorizer will later insert stmts _before_ it. But we also
have the constraint that possibly masked ops may not be scheduled
outside of the loop and as we do not model the loop mask in the
SLP graph we have to adjust for that. The following moves this
to after the advance since it isn't compatible with that as the
current GIMPLE_COND exception shows. The PR is about in-order
reduction vectorization which also isn't happy when that's the
very first stmt.
PR tree-optimization/115652
* tree-vect-slp.cc (vect_schedule_slp_node): Advance the
iterator based on last_stmt only for vector defs.
Make gcov aware which edges are the true/false to more accurately
reconstruct the CFG. There are plenty of bits left in arc_info and it
opens up for richer reporting.
Jørgen Kvalsvik [Tue, 25 Jun 2024 06:41:45 +0000 (08:41 +0200)]
Use the term MC/DC in help for gcov --conditions
Without key terms like "masking" and "MC/DC" it is not at all obvious
what --conditions actually reports on, and there is no easy path for the
user to figure out. By at least including the two key terms MC/DC and
masking users have something to search for.
Andre Vieira [Wed, 26 Jun 2024 10:07:01 +0000 (11:07 +0100)]
arm: make arm_predict_doloop_p reject loops with calls
With the introduction of low overhead loops we defined arm_predict_doloop_p,
this is meant to be a low-weight check to rule out loops we are not considering
for doloop optimization and it is used by other passes to prevent optimizations
that may hurt the doloop optimization later on. The reason these are meant to be
lightweight is because it's used by pre-RTL optimizations, meaning we can't do
the same checks that doloop does.
After the definition of arm_predict_doloop_p, when testing for armv8.1-m.main,
tree-ssa/ivopts-3.c failed the scan-dump check as the dump now matched an extra
'!= 0' introduced by:
Doloop cmp iv use: if (ivtmp_1 != 0)
Predict loop 1 can perform doloop optimization later.
where previously we had:
Predict doloop failure due to target specific checks.
and after this patch:
Predict doloop failure due to call in loop.
Predict doloop failure due to target specific checks.
Added a copy of the original tree-ssa/ivopts-3.c as a target specifc test to
check for the new dump message.
gcc/ChangeLog:
* config/arm/arm.cc (arm_predict_doloop_p): Reject loops with function
calls that are not builtins.
Kyrylo Tkachov [Wed, 26 Jun 2024 07:42:11 +0000 (09:42 +0200)]
[aarch64] Add support for -mcpu=grace
This adds support for the NVIDIA Grace CPU to aarch64.
We reuse the tuning decisions for the Neoverse V2 core, but include a
number of architecture features that are not enabled by default in
-mcpu=neoverse-v2.
This allows Grace users to more simply target the CPU with -mcpu=grace
rather than remembering what extensions to tag on top of
-mcpu=neoverse-v2.
Bootstrapped and tested on aarch64-none-linux-gnu.
gcc/
* config/aarch64/aarch64-cores.def (grace): New entry.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document the above.
Evgeny Karpov [Tue, 25 Jun 2024 21:59:35 +0000 (21:59 +0000)]
i386: Remove declaration of unused functions
The patch fixes the issue introduced in
https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=63512c72df09b43d56ac7680cdfd57a66d40c636
and reported at
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655599.html .
Regards,
Evgeny
The patch fixes the issue with compilation on x86_64-gnu-linux
when warnings for unused functions are treated as errors.
Kewen Lin [Wed, 26 Jun 2024 07:16:17 +0000 (02:16 -0500)]
rs6000: Fix wrong RTL patterns for vector merge high/low short on LE
Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low short, which are altivec_vmrg[hl]h.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs. These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order. So it's
mapped into vmrghh on BE while vmrglh on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
16-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence. The newly constructed test case
pr106069-2.c is a typical example for this issue on element type
short.
So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns. With the
proposed patch, the expanders like altivec_vmrghh expands
into altivec_vmrghh_direct_be or altivec_vmrglh_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.
Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355
gcc/ChangeLog:
* config/rs6000/altivec.md (altivec_vmrghh_direct): Rename to ...
(altivec_vmrghh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghh_direct_le): New define_insn.
(altivec_vmrglh_direct): Rename to ...
(altivec_vmrglh_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglh_direct_le): New define_insn.
(altivec_vmrghh): Adjust by calling gen_altivec_vmrghh_direct_be
for BE and gen_altivec_vmrglh_direct_le for LE.
(altivec_vmrglh): Adjust by calling gen_altivec_vmrglh_direct_be
for BE and gen_altivec_vmrghh_direct_le for LE.
(vec_widen_umult_hi_v16qi): Adjust the call to
gen_altivec_vmrghh_direct by gen_altivec_vmrghh for BE
and by gen_altivec_vmrglh for LE.
(vec_widen_smult_hi_v16qi): Likewise.
(vec_widen_umult_lo_v16qi): Adjust the call to
gen_altivec_vmrglh_direct by gen_altivec_vmrglh for BE
and by gen_altivec_vmrghh for LE.
(vec_widen_smult_lo_v16qi): Likewise.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghh_direct by
CODE_FOR_altivec_vmrghh_direct_be for BE and
CODE_FOR_altivec_vmrghh_direct_le for LE. And replace
CODE_FOR_altivec_vmrglh_direct by
CODE_FOR_altivec_vmrglh_direct_be for BE and
CODE_FOR_altivec_vmrglh_direct_le for LE.
Kewen Lin [Wed, 26 Jun 2024 07:16:17 +0000 (02:16 -0500)]
rs6000: Fix wrong RTL patterns for vector merge high/low char on LE
Commit r12-4496 changes some define_expands and define_insns
for vector merge high/low char, which are altivec_vmrg[hl]b.
These defines are mainly for built-in function vec_merge{h,l}
and some internal gen function needs. These functions should
consider endianness, taking vec_mergeh as example, as PVIPR
defines, vec_mergeh "Merges the first halves (in element order)
of two vectors", it does note it's in element order. So it's
mapped into vmrghb on BE while vmrglb on LE respectively.
Although the mapped insns are different, as the discussion in
PR106069, the RTL pattern should be still the same, it is
conformed before commit r12-4496, but gets changed into
different patterns on BE and LE starting from commit r12-4496.
Similar to 32-bit element case in commit log of r15-1504, this
8-bit element pattern on LE doesn't actually match what the
underlying insn is intended to represent, once some optimization
like combine does some changes basing on it, it would cause
the unexpected consequence. The newly constructed test case
pr106069-1.c is a typical example for this issue.
So this patch is to fix the wrong RTL pattern, ensure the
associated RTL patterns become the same as before which can
have the same semantic as their mapped insns. With the
proposed patch, the expanders like altivec_vmrghb expands
into altivec_vmrghb_direct_be or altivec_vmrglb_direct_le
depending on endianness, "direct" can easily show which
insn would be generated, _be and _le are mainly for the
different RTL patterns as endianness.
Co-authored-by: Xionghu Luo <xionghuluo@tencent.com>
PR target/106069
PR target/115355
gcc/ChangeLog:
* config/rs6000/altivec.md (altivec_vmrghb_direct): Rename to ...
(altivec_vmrghb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrghb_direct_le): New define_insn.
(altivec_vmrglb_direct): Rename to ...
(altivec_vmrglb_direct_be): ... this. Add condition BYTES_BIG_ENDIAN.
(altivec_vmrglb_direct_le): New define_insn.
(altivec_vmrghb): Adjust by calling gen_altivec_vmrghb_direct_be
for BE and gen_altivec_vmrglb_direct_le for LE.
(altivec_vmrglb): Adjust by calling gen_altivec_vmrglb_direct_be
for BE and gen_altivec_vmrghb_direct_le for LE.
* config/rs6000/rs6000.cc (altivec_expand_vec_perm_const): Replace
CODE_FOR_altivec_vmrghb_direct by
CODE_FOR_altivec_vmrghb_direct_be for BE and
CODE_FOR_altivec_vmrghb_direct_le for LE. And replace
CODE_FOR_altivec_vmrglb_direct by
CODE_FOR_altivec_vmrglb_direct_be for BE and
CODE_FOR_altivec_vmrglb_direct_le for LE.
Alexandre Oliva [Wed, 26 Jun 2024 05:08:27 +0000 (02:08 -0300)]
[libstdc++] [testsuite] no libatomic for vxworks
libatomic hasn't been ported to vxworks. Most of the stdatomic.h and
<atomic> underlying requirements are provided by builtins and libgcc,
and the vxworks libc already provides remaining __atomic symbols, so
porting libatomic doesn't seem to make sense.
However, some of the target arch-only tests in
add_options_for_libatomic cover vxworks targets, so we end up
attempting to link libatomic in, even though it's not there.
Preempt those too-broad tests.
Co-Authored-By: Marc Poulhiès <poulhies@adacore.com>
for libstdc++-v3/ChangeLog
* testsuite/lib/dg-options.exp (add_options_for_libatomic):
None for *-*-vxworks*.
Alexandre Oliva [Wed, 26 Jun 2024 05:08:18 +0000 (02:08 -0300)]
[testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
The test was too optimistic, alas. We used to vectorize shifts by
clamping the shift counts below the bit width of the types (e.g. at 15
for 16-bit vector elements), but (uint16_t)32768 >> (uint16_t)16 is
well defined (because of promotion to 32-bit int) and must yield 0,
not 1 (as before the fix).
Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.
So the test that expected the vectorization we no longer performed
needs to be adjusted. Instead of nobbling the test, Richard Earnshaw
suggested annotating the test with the expected ranges so as to enable
the optimization, and Christophe Lyon suggested a further
simplification.
Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>
for gcc/testsuite/ChangeLog
liuhongt [Thu, 20 Jun 2024 04:41:13 +0000 (12:41 +0800)]
Optimize a < 0 ? -1 : 0 to (signed)a >> 31.
Try to optimize x < 0 ? -1 : 0 into (signed) x >> 31
and x < 0 ? 1 : 0 into (unsigned) x >> 31.
Move the optimization did in ix86_expand_int_vcond to match.pd
gcc/ChangeLog:
PR target/114189
* match.pd: Simplify a < 0 ? -1 : 0 to (signed) >> 31 and a <
0 ? 1 : 0 to (unsigned) a >> 31 for vector integer type.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx2-pr115517.c: New test.
* gcc.target/i386/avx512-pr115517.c: New test.
* g++.target/i386/avx2-pr115517.C: New test.
* g++.target/i386/avx512-pr115517.C: New test.
* g++.dg/tree-ssa/pr88152-1.C: Adjust testcase.
This moves all of the uses of global_dc within diagnostic.cc (including
the definition) to a new diagnostic-global-context.cc. My intent is to
make clearer those parts of our internal API that implicitly use
global_dc, and to perhaps avoid linking global_dc into a future
libdiagnostics.so.
gcc/ChangeLog:
* diagnostic-path.cc (class path_label): Add m_path field,
and use it to replace all uses of global_dc.
(event_range::event_range): Add "ctxt" param and use it to
construct m_path_label.
(event_range::maybe_add_event): Add "ctxt" param and pass it to
gcc_rich_location::add_location_if_nearby.
(path_summary::path_summary): Add "ctxt" param and pass it to
event_range::maybe_add_event.
(diagnostic_context::print_path): Pass *this to path_summary ctor.
(selftest::test_empty_path): Use "dc" when constructing
path_summary rather than implicitly using global_dc.
(selftest::test_intraprocedural_path): Likewise.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.
(selftest::diagnostic_path_cc_tests): Eliminate use of global_dc.
* diagnostic-show-locus.cc
(gcc_rich_location::add_location_if_nearby): Add "ctxt" param and
use it instead of implicitly using global_dc.
(selftest::test_add_location_if_nearby): Use
test_diagnostic_context rather than implicitly using global_dc.
* diagnostic.cc (pedantic_warning_kind): Delete macro.
(permissive_error_kind): Delete macro.
(permissive_error_option): Delete macro.
(diagnostic_context::diagnostic_enabled): Remove use of
permissive_error_option.
(diagnostic_context::report_diagnostic): Remove use of
pedantic_warning_kind.
(diagnostic_impl): Convert to...
(diagnostic_context::diagnostic_impl): ...this.
(diagnostic_n_impl): Convert to...
(diagnostic_context::diagnostic_n_impl): ...this.
(emit_diagnostic): Explicitly use global_dc for method call.
(emit_diagnostic_valist): Likewise.
(emit_diagnostic_valist_meta): Likewise.
(inform): Likewise.
(inform_n): Likewise.
(warning): Likewise.
(warning_at): Likewise.
(warning_meta): Likewise.
(warning_n): Likewise.
(pedwarn): Likewise.
(permerror): Likewise.
(permerror_opt): Likewise.
(error): Likewise.
(error_n): Likewise.
(error_at): Likewise.
(error_meta): Likewise.
(sorry): Likewise.
(sorry_at): Likewise.
(fatal_error): Likewise.
(internal_error): Likewise.
(internal_error_no_backtrace): Likewise.
* diagnostic.h (diagnostic_context::diagnostic_impl): New decl.
(diagnostic_context::diagnostic_n_impl): New decl.
* gcc-rich-location.h (gcc_rich_location::add_location_if_nearby):
Add "ctxt" param.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
David Malcolm [Wed, 26 Jun 2024 00:26:21 +0000 (20:26 -0400)]
testsuite: use check-jsonschema for validating .sarif files [PR109360]
As reported here:
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655434.html
the schema validation I added for generated .sarif files in r15-1541-ga84fe222029ff2 used the "jsonschema" command line tool, which
has been deprecated by more recent versions of the Python 3 "jsonschema"
module.
This patch updates the validation to use the more recent
"check-jsonschema" command line tool, from the Python 3 "check-jsonschema"
module, fixing the testsuite FAILs due to the deprecation message.
As an added bonus, the output on validation failures is *much* nicer, e.g.
if I undo r15-1540-g9f4fdc3acebcf6, the error messages begin like this:
verify-sarif-file: res: Schema validation errors were encountered.
diagnostic-format-sarif-file-bad-utf8-pr109098-1.c.sarif::$.runs[0].results[0].locations[0].physicalLocation.region.startColumn: 0 is less than the minimum of 1
diagnostic-format-sarif-file-bad-utf8-pr109098-1.c.sarif::$.runs[0].results[0].relatedLocations[0].physicalLocation.region.startColumn: 0 is less than the minimum of 1
diagnostic-format-sarif-file-bad-utf8-pr109098-1.c.sarif::$.runs[0].results[0].relatedLocations[1].physicalLocation.region.startColumn: 0 is less than the minimum of 1
diagnostic-format-sarif-file-bad-utf8-pr109098-1.c.sarif::$.runs[0].results[0].relatedLocations[2].physicalLocation.region.startColumn: 0 is less than the minimum of 1
child process exited abnormally
FAIL: c-c++-common/diagnostic-format-sarif-file-bad-utf8-pr109098-1.c -Wc++-compat (test .sarif output against SARIF schema)
Tested with Python 3.8 with check_jsonschema 0.28.6
gcc/ChangeLog:
PR testsuite/109360
* doc/install.texi (Python3 modules): Update SARIF validation
requirement to use check-jsonschema rather than jsonschema.
gcc/testsuite/ChangeLog:
PR testsuite/109360
* lib/scansarif.exp (verify-sarif-file): Use check-jsonschema
rather than jsonschema, updating the invocation accordingly.
* lib/target-supports.exp (check_effective_target_jsonschema): Convert
to...
(check_effective_target_check_jsonschema): ...this.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
Dispatching to partial specializations doesn't really seem to offer much
benefit here. The __is_trivial(T) condition is a compile-time constant
so the untaken branches are dead code and don't cost us anything.
libstdc++-v3/ChangeLog:
* include/bits/valarray_array.h (_Array_default_ctor): Remove.
(__valarray_default_construct): Inline it into here.
(_Array_init_ctor): Remove.
(__valarray_fill_construct): Inline it into here.
(_Array_copy_ctor): Remove.
(__valarray_copy_construct(const T*, const T*, T*)): Inline it
into here.
(__valarray_copy_construct(const T*, size_t, size_t, T*)):
Use _GLIBCXX17_CONSTEXPR for constant condition.
Gaius Mulley [Tue, 25 Jun 2024 22:11:29 +0000 (23:11 +0100)]
modula2: tidyup remove unused procedures and unused parameters
This patch removes M2GenGCC.mod:QuadCondition and
M2Quads.mod:GenQuadOTypeUniquetok. It also removes unused parameter
WalkAction for all FoldIf procedures.
Marek Polacek [Mon, 17 Jun 2024 21:53:12 +0000 (17:53 -0400)]
c++: ICE with __has_unique_object_representations [PR115476]
Here we started to ICE with r13-25: in check_trait_type, for "X[]" we
return true here:
if (kind == 1 && TREE_CODE (type) == ARRAY_TYPE && !TYPE_DOMAIN (type))
return true; // Array of unknown bound. Don't care about completeness.
and then end up crashing in record_has_unique_obj_representations:
4836 if (cur != wi::to_offset (sz))
because sz is null.
https://eel.is/c++draft/type.traits#tab:meta.unary.prop-row-47-column-3-sentence-1
says that the preconditions for __has_unique_object_representations are:
"T shall be a complete type, cv void, or an array of unknown bound" and
that "For an array type T, the same result as
has_unique_object_representations_v<remove_all_extents_t<T>>" so T[]
should be treated as T. So we should use kind==2 for the trait.
PR c++/115476
gcc/cp/ChangeLog:
* semantics.cc (finish_trait_expr)
<case CPTK_HAS_UNIQUE_OBJ_REPRESENTATIONS>: Move below to call
check_trait_type with kind==2.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1z/has-unique-obj-representations4.C: New test.
Sergei Lewis [Tue, 25 Jun 2024 21:26:14 +0000 (15:26 -0600)]
[PATCH v2 3/3] RISC-V: cmpmem for RISCV with V extension
So this is the cmpmem patch from Sergei, updated for the trunk.
Updates included adjusting the existing cmpmemsi expander to
conditionally try expansion via vector. And a minor testsuite
adjustment to turn off vector expansion in one test that is primarily
focused on vset optimization and ensuring we don't have extras.
I've spun this in my tester successfully and just want to see a clean
run through precommit CI before moving forward.
Jeff
gcc/ChangeLog:
* config/riscv/riscv-protos.h (riscv_vector::expand_vec_cmpmem): New
function declaration.
* config/riscv/riscv-string.cc (riscv_vector::expand_vec_cmpmem): New
function.
* config/riscv/riscv.md (cmpmemsi): Try riscv_vector::expand_vec_cmpmem
for constant lengths.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/base/cmpmem-1.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-2.c: New execution tests
* gcc.target/riscv/rvv/base/cmpmem-3.c: New codegen tests
* gcc.target/riscv/rvv/base/cmpmem-4.c: New codegen tests
* gcc.target/riscv/rvv/autovec/vls/misalign-1.c: Turn off vector mem* and
str* handling.
Marek Polacek [Fri, 14 Jun 2024 21:50:29 +0000 (17:50 -0400)]
c++: ICE with generic lambda and pack expansion [PR115425]
In r13-272 we hardened the *_PACK_EXPANSION and *_ARGUMENT_PACK macros.
That trips up here because make_pack_expansion returns error_mark_node
and we access that with PACK_EXPANSION_LOCAL_P.
PR c++/115425
gcc/cp/ChangeLog:
* pt.cc (tsubst_pack_expansion): Return error_mark_node if
make_pack_expansion doesn't work out.
Marek Polacek [Tue, 18 Jun 2024 14:50:49 +0000 (10:50 -0400)]
c++: ICE with __dynamic_cast redecl [PR115501]
Since r13-3299, build_dynamic_cast_1 calls pushdecl which calls
duplicate_decls and that in this testcase emits the "conflicting
declaration" error and returns error_mark_node, so the subsequent
build_cxx_call crashes on the error_mark_node.
PR c++/115501
gcc/cp/ChangeLog:
* rtti.cc (build_dynamic_cast_1): Return if dcast_fn is erroneous.
ira: Scale save/restore costs of callee save registers with block frequency
In assign_hard_reg(), when computing the costs of the hard registers, the
cost of saving/restoring a callee-save hard register in prolog/epilog is
taken into consideration. However, this cost is not scaled with the entry
block frequency. Without scaling, the cost of saving/restoring is quite
small and this can result in a callee-save register being chosen by
assign_hard_reg() even though there are free caller-save registers
available. Assigning a callee save register to a pseudo that is live
in the entire function and across a call will cause shrink wrap to fail.
Gaius Mulley [Tue, 25 Jun 2024 17:35:22 +0000 (18:35 +0100)]
PR modula2/115536 Expression is evaluated incorrectly when encountering relops and indirection
This fix ensures that we only call BuildRelOpFromBoolean if we are
inside a constant expression (where no indirection can be used).
The fix creates a temporary variable when a boolean is created from
a relop in other cases.
The previous pattern implementation would not work if the operands required
dereferencing during non const expressions. Comparison of relop results
in a constant expression are resolved by constant propagation, basic
block analysis and dead code removal. After the quadruples have been
optimized only one assignment to the boolean variable will remain for
const expressions. All quadruple pattern checking for boolean
expressions is removed by the patch. Thus the implementation becomes
more generic.
gcc/m2/ChangeLog:
PR modula2/115536
* gm2-compiler/M2BasicBlock.def (GetBasicBlockScope): New procedure.
(GetBasicBlockStart): Ditto.
(GetBasicBlockEnd): Ditto.
(IsBasicBlockFirst): New procedure function.
* gm2-compiler/M2BasicBlock.mod (ConvertQuads2BasicBlock): Allow
conditional boolean quads to be removed.
(GetBasicBlockScope): Implement new procedure.
(GetBasicBlockStart): Ditto.
(GetBasicBlockEnd): Ditto.
(IsBasicBlockFirst): Implement new procedure function.
* gm2-compiler/M2GCCDeclare.def (FoldConstants): New parameter
declaration.
* gm2-compiler/M2GCCDeclare.mod (FoldConstants): New parameter
declaration.
(DeclareTypesConstantsProceduresInRange): Recreate basic blocks
after resolving constant expressions.
(CodeBecomes): Guard IsVariableSSA with IsVar.
* gm2-compiler/M2GenGCC.def (ResolveConstantExpressions): New
parameter declaration.
* gm2-compiler/M2GenGCC.mod (FoldIfLess): Remove relop pattern
detection.
(FoldIfGre): Ditto.
(FoldIfLessEqu): Ditto.
(FoldIfGreEqu): Ditto.
(FoldIfIn): Ditto.
(FoldIfNotIn): Ditto.
(FoldIfEqu): Ditto.
(FoldIfNotEqu): Ditto.
(FoldBecomes): Add BasicBlock parameter and allow conditional
boolean becomes to be folded in the first basic block.
(ResolveConstantExpressions): Reimplement.
* gm2-compiler/M2Quads.def (IsConstQuad): New procedure function.
(IsConditionalBooleanQuad): Ditto.
* gm2-compiler/M2Quads.mod (IsConstQuad): Implement new procedure function.
(IsConditionalBooleanQuad): Ditto.
(MoveWithMode): Use GenQuadOTypetok.
(IsInitialisingConst): Rewrite using OpUsesOp1.
(OpUsesOp1): New procedure function.
(doBuildAssignment): Mark des as a VarConditional.
(ConvertBooleanToVariable): Call PutVarConditional.
(DumpQuadSummary): New procedure.
(BuildRelOpFromBoolean): Updated debugging and improved comments.
(BuildRelOp): Only call BuildRelOpFromBoolean if we are in a const
expression and both operands are boolean relops.
(GenQuadOTypeUniquetok): New procedure.
(BackPatch): Correct comment.
* gm2-compiler/SymbolTable.def (PutVarConditional): New procedure.
(IsVarConditional): New procedure function.
* gm2-compiler/SymbolTable.mod (PutVarConditional): Implement new
procedure.
(IsVarConditional): Implement new procedure function.
(SymConstVar): New field IsConditional.
(SymVar): New field IsConditional.
(MakeVar): Initialize IsConditional field.
(MakeConstVar): Initialize IsConditional field.
* gm2-compiler/M2Swig.mod (DoBasicBlock): Change parameters to
use BasicBlock.
* gm2-compiler/M2Code.mod (SecondDeclareAndOptimize): Use iterator
to FoldConstants over basic block list.
* gm2-compiler/M2SymInit.mod (AppendEntry): Replace parameters
with BasicBlock.
* gm2-compiler/P3Build.bnf (Relation): Call RecordOp for #, <> and =.
gcc/testsuite/ChangeLog:
PR modula2/115536
* gm2/iso/const/pass/constbool4.mod: New test.
* gm2/iso/const/pass/constbool5.mod: New test.
* gm2/iso/run/pass/condtest2.mod: New test.
* gm2/iso/run/pass/condtest3.mod: New test.
* gm2/iso/run/pass/condtest4.mod: New test.
* gm2/iso/run/pass/condtest5.mod: New test.
* gm2/iso/run/pass/constbool4.mod: New test.
Jeff Law [Tue, 25 Jun 2024 17:22:01 +0000 (11:22 -0600)]
[committed] Fix fr30-elf newlib build failure with late-combine
So the late combine work has exposed a latent bug in the fr30 port.
The fr30 "call" instruction is pc-relative with a *very* limited range, 12 bits
to be precise.
With such a limited range its hard to see how we could ever consistently use it
in the compiler, with the possible exception of self-recursion. Even for a
call to a locally binding function -ffunction-sections and linker placement of
functions may separate the caller/callee. Code generation seemed to be using
indirect forms pretty consistently, though the RTL would allow direct calls.
With late-combine some of those indirects would be optimized into direct calls.
This naturally led to out of range scenarios.
With the fr30 port slated for removal unless it gets updated to use LRA and the
fundamental problems using direct calls, I took the shortest path to keep
things working -- namely forcing all calls to be indirect.
Tested in my tester with no regressions (and fixes the newlib build failure
with late-combine enabled). Pushed to the trunk.
gcc/
* config/fr30/constraints.md (Q): Remove unused constraint.
* config/fr30/predicates.md (call_operand): Remove unused predicate.
* config/fr30/fr30.md (call, vall_value): Turn into expanders and
force the call address into a register.
(*call, *call_value): Adjust to only allow indirect calls. Adjust
output template accordingly.
Patrick Palka [Tue, 25 Jun 2024 16:59:24 +0000 (12:59 -0400)]
c++: alias CTAD and copy deduction guide [PR115198]
Here we're neglecting to update DECL_NAME during the alias CTAD guide
transformation, which causes copy_guide_p to return false for the
transformed copy deduction guide since DECL_NAME is still __dguide_C
with TREE_TYPE C<B, T> but it should be __dguide_A with TREE_TYPE A<T>
(i.e. C<false, T>). This ultimately results in ambiguity during
overload resolution between the copy deduction guide vs copy ctor guide.
This patch makes us update DECL_NAME of a transformed guide accordingly
during alias/inherited CTAD.
PR c++/115198
gcc/cp/ChangeLog:
* pt.cc (alias_ctad_tweaks): Update DECL_NAME of the transformed
guides.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/class-deduction-alias22.C: New test.
Patrick Palka [Tue, 25 Jun 2024 14:42:21 +0000 (10:42 -0400)]
c++: using non-dep array var of unknown bound [PR115358]
For a non-dependent array variable of unknown bound, it seems we need to
try instantiating its definition upon use in a template context for sake
of proper checking and typing of the overall expression, like we do for
function specializations with deduced return type.
PR c++/115358
gcc/cp/ChangeLog:
* decl2.cc (mark_used): Call maybe_instantiate_decl for an array
variable with unknown bound.
* semantics.cc (finish_decltype_type): Remove now redundant
handling of array variables with unknown bound.
* typeck.cc (cxx_sizeof_expr): Likewise.
Sandra Loosemore [Tue, 25 Jun 2024 13:54:43 +0000 (13:54 +0000)]
Fix PR c/115587, uninitialized variable in c_parser_omp_loop_nest
This function had a reference to an uninitialized variable on the
error path. The problem was diagnosed by clang but not gcc. It seems
the cleanest solution is to initialize all the loop-clause variables
at the point of declaration rather than at different places in the
code.
The C++ front end didn't have this problem, but I've made similar
changes there to keep the code in sync.
gcc/c/ChangeLog:
PR c/115587
* c-parser.cc (c_parser_omp_loop_nest): Move initializations to
point of declaration.
gcc/cp/ChangeLog:
PR c/115587
* parser.cc (cp_parser_omp_loop_nest): Move initializations to
point of declaration.
libatomic: Add rcpc3 128-bit atomic operations for AArch64
The introduction of the optional RCPC3 architectural extension for
Armv8.2-A upwards provides additional support for the release
consistency model, introducing the Load-Acquire RCpc Pair Ordered, and
Store-Release Pair Ordered operations in the form of LDIAPP and STILP.
These operations are single-copy atomic on cores which also implement
LSE2 and, as such, support for these operations is added to Libatomic
and employed accordingly when the LSE2 and RCPC3 features are detected
in a given core at runtime.
libatomic/ChangeLog:
* config/linux/aarch64/atomic_16.S (libat_load_16): Add LRCPC3
variant.
(libat_store_16): Likewise.
* config/linux/aarch64/host-config.h (HWCAP2_LRCPC3): New.
(LSE2_LRCPC3_ATOP): Previously LSE2_ATOP. New ifuncs guarded
under it.
(has_rcpc3): New.
for details. This change was different from the others in that the
original call was to simplify_subreg rather than simplify_lowpart_subreg.
The old code would therefore go on to do the force_reg for more cases
than the new code would.
gcc/
* expmed.cc (store_bit_field_using_insv): Revert earlier change
to use force_subreg instead of simplify_gen_subreg.
YunQiang Su [Wed, 19 Jun 2024 17:20:36 +0000 (01:20 +0800)]
MIPS: Implement vcond_mask optabs for MSA
Currently, we have `mips_expand_vec_cond_expr`, which calculate
cmp_res first. We can just add a new extra argument to ask it
to use operands[3] as cmp_res instead of calculating from operands[4]
and operands[5].
gcc
* config/mips/mips.cc(mips_expand_vec_cond_expr): Add extra
argument to info that opernads[3] is cmp_res already.
* config/mips/mips-protos.h(mips_expand_vec_cond_expr): Ditto.
* config/mips/mips-msa.md(vcond_mask): Define new expand.
(vcondu): Use mips_expand_vec_cond_expr with 4th argument.
(vcond): Ditto.
Evgeny Karpov [Sat, 8 Jun 2024 13:49:17 +0000 (13:49 +0000)]
aarch64: Add DLL import/export to AArch64 target
This patch reuses the MinGW implementation to enable DLL import/export
functionality for the aarch64-w64-mingw32 target. It also modifies
environment configurations for MinGW.
Evgeny Karpov [Mon, 24 Jun 2024 12:44:58 +0000 (12:44 +0000)]
aarch64: Add selectany attribute handling
This patch extends the aarch64 attributes list with the selectany
attribute for the aarch64-w64-mingw32 target and reuses the mingw
implementation to handle it.
Evgeny Karpov [Mon, 24 Jun 2024 12:43:05 +0000 (12:43 +0000)]
Rename functions for reuse in AArch64
This patch renames functions related to dllimport/dllexport and
selectany functionality. These functions will be reused in the
aarch64-w64-mingw32 target.