In s390_expand_insv(), if generating code for ICM et al. src is a MEM
and gen_lowpart might force src into a register such that we end up with
patterns which do not match anymore. Use adjust_address() instead in
order to preserve a MEM.
Furthermore, it is not straight forward to enforce a subreg. For
example, in case of a paradoxical subreg, gen_lowpart() may return a
register. In order to compensate this, s390_gen_lowpart_subreg() emits
a reference to a pseudo which does not coincide with its definition
which is wrong. Additionally, if dest is a paradoxical subreg, then do
not try to emit a strict_low_part since it could mean that dest was not
initialized even though this might be fixed up later by init-regs.
Splitter for insn *get_tp_64, *zero_extendhisi2_31,
*zero_extendqisi2_31, *zero_extendqihi2_31 are applied after reload.
Thus, operands[0] is a hard register and gen_lowpart (m, operands[0])
just returns the hard register for mode m which is fine to use as an
argument for strict_low_part, i.e., we do not need to enforce subregs
here since after reload subregs are supposed to be eliminated anyway.
This fixes gcc.dg/torture/pr111821.c.
gcc/ChangeLog:
* config/s390/s390-protos.h (s390_gen_lowpart_subreg): Remove.
* config/s390/s390.cc (s390_gen_lowpart_subreg): Remove.
(s390_expand_insv): Use adjust_address() and emit a
strict_low_part only in case of a natural subreg.
* config/s390/s390.md: Use gen_lowpart() instead of
s390_gen_lowpart_subreg().
doc: Add more alias option and reorder Intel CPU -march documentation
This patch is backported from GCC15 with some tweaks.
Since r15-3539, there are requests coming in to add other alias option
documentation. This patch will add all ot them, including corei7, corei7-avx,
core-avx-i, core-avx2, atom, slm, gracemont and emerarldrapids.
Also in the patch, I reordered that part of documentation, currently all
the CPUs/products are just all over the place. I regrouped them by
date-to-now products (since the very first CPU to latest Panther Lake), P-core
(since the clients become hybrid cores, starting from Sapphire Rapids) and
E-core (since Bonnell to latest Clearwater Forest). In GCC14 and
eariler GCC, Xeon Phi CPUs are still there, I put them after E-core
CPUs.
And in the patch, I refined the product names in documentation.
gcc/ChangeLog:
* doc/invoke.texi: Add corei7, corei7-avx, core-avx-i,
core-avx2, atom, slm, gracemont and emerarldrapids. Reorder
the -march documentation by splitting them into date-to-now
products, P-core, E-core and Xeon Phi. Refine the product names in
documentation.
Richard Biener [Mon, 26 Aug 2024 11:50:00 +0000 (13:50 +0200)]
tree-optimization/116460 - ICE with DCE in forwprop
The following avoids removing stmts with defs that might still have
uses in the IL before calling simple_dce_from_worklist which might
remove those as that will wreck debug stmt generation. Instead first
perform use-based DCE and then remove stmts which may have uses in
code that CFG cleanup will remove. This requires tracking stmts
in to_remove by their SSA def so we can check whether it was removed
before without running into the issue that PHIs can be ggc_free()d
upon removal. So this adds to_remove_defs in addition to to_remove
which has to stay to track GIMPLE_NOPs we want to elide.
PR tree-optimization/116460
* tree-ssa-forwprop.cc (pass_forwprop::execute): First do
simple_dce_from_worklist and then remove stmts in to_remove.
Track defs to be removed in to_remove_defs.
Richard Biener [Tue, 11 Jun 2024 11:11:08 +0000 (13:11 +0200)]
middle-end/115426 - wrong gimplification of "rm" asm output operand
When the operand is gimplified to an extract of a register or a
register we have to disallow memory as we otherwise fail to
gimplify it properly. Instead of
Andrew Pinski [Thu, 22 Aug 2024 00:41:38 +0000 (17:41 -0700)]
fold: Fix `a * 1j` if a has side effects [PR116454]
The problem here was a missing save_expr around arg0 since
it is used twice, once in REALPART_EXPR and once in IMAGPART_EXPR.
Thia adds the save_expr and reformats the code slightly so it is a
little easier to understand. It excludes the case when arg0 is
a COMPLEX_EXPR since in that case we'll end up with the distinct
real and imaginary parts. This is important to retain early
optimization in some testcases.
Bootstapped and tested on x86_64-linux-gnu with no regressions.
PR middle-end/116454
gcc/ChangeLog:
* fold-const.cc (fold_binary_loc): Fix `a * +-1i`
by wrapping arg0 with save_expr when it is not COMPLEX_EXPR.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr116454-1.c: New test.
* gcc.dg/torture/pr116454-2.c: New test.
Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com> Co-Authored-By: Richard Biener <rguenther@suse.de>
(cherry picked from commit b07f8a301158e53717b8688cc8ea430b6f02574c)
Richard Biener [Wed, 21 Aug 2024 11:56:40 +0000 (13:56 +0200)]
tree-optimization/116380 - bogus SSA update with loop distribution
When updating LC PHIs after copying loops we have to handle defs
defined outside of the loop appropriately (by not setting them to
NULL ...). This mimics how we handle this in the SSA updating
code of the vectorizer.
The following tries to address that the vectorizer fails to have
precise knowledge of argument and return calling conventions and
views some accesses as loads and stores that are not.
This is mainly important when doing basic-block vectorization as
otherwise loop indexing would force such arguments to memory.
On x86 the reduction in the number of apparent loads and stores
often dominates cost analysis so the following tries to mitigate
this aggressively by adjusting only the scalar load and store
cost, reducing them to the cost of a simple scalar statement,
but not touching the vector access cost which would be much
harder to estimate. Thereby we error on the side of not performing
basic-block vectorization.
PR tree-optimization/116274
* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Cost scalar loads
and stores as simple scalar stmts when they access a non-global,
not address-taken variable that doesn't have BLKmode assigned.
Andrew Pinski [Wed, 7 Aug 2024 16:36:38 +0000 (09:36 -0700)]
aarch64/testsuite: Add testcases for recently fixed PRs
The commit for PR 116258, added a x86_64 specific testcase,
I thought it would be a good idea to add an aarch64 testcase too.
And since it also fixed VLA vectors too so add a SVE testcase.
Pushed as obvious after a test for aarch64-linux-gnu.
PR middle-end/116258
PR middle-end/116259
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr116258.c: New test.
* gcc.target/aarch64/sve/pr116259-1.c: New test.
Richard Biener [Thu, 18 Jul 2024 11:35:33 +0000 (13:35 +0200)]
middle-end/115641 - invalid address construction
fold_truth_andor_1 via make_bit_field_ref builds an address of
a CALL_EXPR which isn't valid GENERIC and later causes an ICE.
The following simply avoids the folding for f ().a != 1 || f ().b != 2
as it is a premature optimization anyway. The alternative would
have been to build a TARGET_EXPR around the call. To get this far
f () has to be const as otherwise the two calls are not semantically
equivalent for the optimization.
PR middle-end/115641
* fold-const.cc (decode_field_reference): If the inner
reference isn't something we can take the address of, fail.
H.J. Lu [Fri, 6 Sep 2024 12:24:07 +0000 (05:24 -0700)]
x86-64: Don't use temp for argument in a TImode register
Don't use temp for a PARALLEL BLKmode argument of an EXPR_LIST expression
in a TImode register. Otherwise, the TImode variable will be put in
the GPR save area which guarantees only 8-byte alignment.
gcc/
PR target/116621
* config/i386/i386.cc (ix86_gimplify_va_arg): Don't use temp for
a PARALLEL BLKmode container of an EXPR_LIST expression in a
TImode register.
gcc/testsuite/
PR target/116621
* gcc.target/i386/pr116621.c: New test.
Marek Polacek [Tue, 3 Sep 2024 21:01:48 +0000 (17:01 -0400)]
c++: ICE with TTP [PR96097]
We crash when dependent_type_p gets a TEMPLATE_TYPE_PARM outside
a template. That happens here because in
template <template <typename T, typename T::type TT> typename X>
void func() {}
template <typename U, int I>
struct Y {};
void g() { func<Y>(); }
when performing overload resolution for func<Y>() we have to check
if U matches T and I matches TT. So we wind up in
coerce_template_template_parm/PARM_DECL. TREE_TYPE (arg) is int
so we try to substitute TT's type, which is T::type. But we have
nothing to substitute T with. And we call make_typename_type where
ctx is still T, which checks dependent_scope_p and we trip the assert.
It should work to always perform the substitution in a template context.
If the result still contains template parameters, we cannot say if they
match.
PR c++/96097
gcc/cp/ChangeLog:
* pt.cc (coerce_template_template_parm): Increment
processing_template_decl before calling tsubst.
Jakub Jelinek [Thu, 12 Sep 2024 16:22:21 +0000 (18:22 +0200)]
c++: Disable deprecated/unavailable diagnostics when creating thunks for methods with such attributes [PR116636]
On the following testcase, we emit false positive warnings/errors about using
the deprecated or unavailable methods when creating thunks for them, even
when nothing (in the testcase so far) actually used those.
The following patch temporarily disables that diagnostics when creating
the thunks.
2024-09-12 Jakub Jelinek <jakub@redhat.com>
PR c++/116636
* method.cc: Include decl.h.
(use_thunk): Temporarily change deprecated_state to
UNAVAILABLE_DEPRECATED_SUPPRESS.
Jakub Jelinek [Tue, 10 Sep 2024 16:32:58 +0000 (18:32 +0200)]
c++: Fix get_member_function_from_ptrfunc with -fsanitize=bounds [PR116449]
The following testcase is miscompiled, because
get_member_function_from_ptrfunc
emits something like
(((FUNCTION.__pfn & 1) != 0)
? ptr + FUNCTION.__delta + FUNCTION.__pfn - 1
: FUNCTION.__pfn) (ptr + FUNCTION.__delta, ...)
or so, so FUNCTION tree is used there 5 times. There is
if (TREE_SIDE_EFFECTS (function)) function = save_expr (function);
but in this case function doesn't have side-effects, just nested ARRAY_REFs.
Now, if all the FUNCTION trees would be shared, it would work fine,
FUNCTION is evaluated in the first operand of COND_EXPR; but unfortunately
that isn't the case, both the BIT_AND_EXPR shortening and conversion to
bool done for build_conditional_expr actually unshare_expr that first
expression, but none of the other 4 are unshared. With -fsanitize=bounds,
.UBSAN_BOUNDS calls are added to the ARRAY_REFs and use save_expr to avoid
evaluating the argument multiple times, but because that FUNCTION tree is
first used in the second argument of COND_EXPR (i.e. conditionally), the
SAVE_EXPR initialization is done just there and then the third argument
of COND_EXPR just uses the uninitialized temporary and so does the first
argument computation as well.
The following patch fixes that by doing save_expr even if !TREE_SIDE_EFFECTS,
but to avoid doing that too often only if !nonvirtual and if the expression
isn't a simple decl.
2024-09-10 Jakub Jelinek <jakub@redhat.com>
PR c++/116449
* typeck.cc (get_member_function_from_ptrfunc): Use save_expr
on instance_ptr and function even if it doesn't have side-effects,
as long as it isn't a decl.
Jakub Jelinek [Sat, 7 Sep 2024 07:36:53 +0000 (09:36 +0200)]
libiberty: Fix up > 64K section handling in simple_object_elf_copy_lto_debug_section [PR116614]
cat abc.C
#define A(n) struct T##n {} t##n;
#define B(n) A(n##0) A(n##1) A(n##2) A(n##3) A(n##4) A(n##5) A(n##6) A(n##7) A(n##8) A(n##9)
#define C(n) B(n##0) B(n##1) B(n##2) B(n##3) B(n##4) B(n##5) B(n##6) B(n##7) B(n##8) B(n##9)
#define D(n) C(n##0) C(n##1) C(n##2) C(n##3) C(n##4) C(n##5) C(n##6) C(n##7) C(n##8) C(n##9)
#define E(n) D(n##0) D(n##1) D(n##2) D(n##3) D(n##4) D(n##5) D(n##6) D(n##7) D(n##8) D(n##9)
E(1) E(2) E(3)
int main () { return 0; }
./xg++ -B ./ -o abc{.o,.C} -flto -flto-partition=1to1 -O2 -g -fdebug-types-section -c
./xgcc -B ./ -o abc{,.o} -flto -flto-partition=1to1 -O2
(not included in testsuite as it takes a while to compile) FAILs with
lto-wrapper: fatal error: Too many copied sections: Operation not supported
compilation terminated.
/usr/bin/ld: error: lto-wrapper failed
collect2: error: ld returned 1 exit status
The following patch fixes that. Most of the 64K+ section support for
reading and writing was already there years ago (and especially reading used
quite often already) and a further bug fixed in it in the PR104617 fix.
Yet, the fix isn't solely about removing the
if (new_i - 1 >= SHN_LORESERVE)
{
*err = ENOTSUP;
return "Too many copied sections";
}
5 lines, the missing part was that the function only handled reading of
the .symtab_shndx section but not copying/updating of it.
If the result has less than 64K-epsilon sections, that actually wasn't
needed, but e.g. with -fdebug-types-section one can exceed that pretty
easily (reported to us on WebKitGtk build on ppc64le).
Updating the section is slightly more complicated, because it basically
needs to be done in lock step with updating the .symtab section, if one
doesn't need to use SHN_XINDEX in there, the section should (or should be
updated to) contain SHN_UNDEF entry, otherwise needs to have whatever would
be overwise stored but couldn't fit. But repeating due to that all the
symtab decisions what to discard and how to rewrite it would be ugly.
So, the patch instead emits the .symtab_shndx section (or sections) last
and prepares the content during the .symtab processing and in a second
pass when going just through .symtab_shndx sections just uses the saved
content.
2024-09-07 Jakub Jelinek <jakub@redhat.com>
PR lto/116614
* simple-object-elf.c (SHN_COMMON): Align comment with neighbouring
comments.
(SHN_HIRESERVE): Use uppercase hex digits instead of lowercase for
consistency.
(simple_object_elf_find_sections): Formatting fixes.
(simple_object_elf_fetch_attributes): Likewise.
(simple_object_elf_attributes_merge): Likewise.
(simple_object_elf_start_write): Likewise.
(simple_object_elf_write_ehdr): Likewise.
(simple_object_elf_write_shdr): Likewise.
(simple_object_elf_write_to_file): Likewise.
(simple_object_elf_copy_lto_debug_section): Likewise. Don't fail for
new_i - 1 >= SHN_LORESERVE, instead arrange in that case to copy
over .symtab_shndx sections, though emit those last and compute their
section content when processing associated .symtab sections. Handle
simple_object_internal_read failure even in the .symtab_shndx reading
case.
Jonathan Wakely [Tue, 10 Sep 2024 13:36:26 +0000 (14:36 +0100)]
libstdc++: Only use std::ios_base_library_init() for ELF [PR116159]
The undefined std::ios_base_library_init() symbol that is referenced by
<iostream> is only supposed to be used for targets where symbol
versioning is supported.
The mingw-w64 target defaults to --enable-symvers=gnu due to using GNU
ld but doesn't actually support symbol versioning. This means it tries
to emit references to the std::ios_base_library_init() symbol, which
isn't really defined in the library. This causes problems when using lld
to link user binaries.
Disable the undefined symbol reference for non-ELF targets.
libstdc++-v3/ChangeLog:
PR libstdc++/116159
* include/std/iostream (ios_base_library_init): Only define for
ELF targets.
* src/c++98/ios_init.cc (ios_base_library_init): Likewise.
Jonathan Wakely [Tue, 10 Sep 2024 13:25:41 +0000 (14:25 +0100)]
libstdc++: std::string move assignment should not use POCCA trait [PR116641]
The changes to implement LWG 2579 (r10-327-gdb33efde17932f) made
std::string::assign use the propagate_on_container_copy_assignment
(POCCA) trait, for consistency with operator=(const basic_string&).
However, this also unintentionally affected operator=(basic_string&&)
which calls assign(str) to make a deep copy when performing a move is
not possible. The fix is for the move assignment operator to call
_M_assign(str) instead of assign(str), as this just does the deep copy
and doesn't check the POCCA trait first.
The bug only affects the unlikely/useless combination of POCCA==true and
POCMA==false, but we should fix it for correctness anyway. it should
also make move assignment slightly cheaper to compile and execute,
because we skip the extra code in assign(const basic_string&).
libstdc++-v3/ChangeLog:
PR libstdc++/116641
* include/bits/basic_string.h (operator=(basic_string&&)): Call
_M_assign instead of assign.
* testsuite/21_strings/basic_string/allocator/116641.cc: New
test.
Jonathan Wakely [Tue, 30 Apr 2024 08:52:13 +0000 (09:52 +0100)]
libstdc++: Fix std::chrono::tzdb to work with vanguard format
I found some issues in the std::chrono::tzdb parser by testing the
tzdata "vanguard" format, which uses new features that aren't enabled in
the "main" and "rearguard" data formats.
Since 2024a the keyword "minimum" is no longer valid for the FROM and TO
fields in a Rule line, which means that "m" is now a valid abbreviation
for "maximum". Previously we expected either "mi" or "ma". For backwards
compatibility, a FROM field beginning with "mi" is still supported and
is treated as 1900. The "maximum" keyword is only allowed in TO now,
because it makes no sense in FROM. To support these changes the
minmax_year and minmax_year2 classes for parsing FROM and TO are
replaced with a single years_from_to class that reads both fields.
The vanguard format makes use of %z in Zone FORMAT fields, which caused
an exception to be thrown from ZoneInfo::set_abbrev because no % or /
characters were expected when a Zone doesn't use a named Rule. The
ZoneInfo::to(sys_info&) function now uses format_abbrev_str to replace
any %z with the current offset. Although format_abbrev_str also checks
for %s and STD/DST formats, those only make sense when a named Rule is
in effect, so won't occur when ZoneInfo::to(sys_info&) is used.
Since making this change on trunk, the tzdata-2024b release started
using %z in the main format, not just vanguard. This makes a backport to
release branches necessary (see PR 116657).
This change also implements a feature that has always been missing from
time_zone::_M_get_sys_info: finding the Rule that is active before the
specified time point, so that we can correctly handle %s in the FORMAT
for the first new sys_info that gets created. This requires implementing
a poorly documented feature of zic, to get the LETTERS field from a
later transition, as described at
https://mm.icann.org/pipermail/tz/2024-April/058891.html
In order for this to work we need to be able to distinguish an empty
letters field (as used by CE%sT where the variable part is either empty
or "S") from "the letters field is not known for this transition". The
tzdata file uses "-" for an empty letters field, which libstdc++ was
previously replacing with "" when the Rule was parsed. Instead, we now
preserve the "-" in the Rule object, so that "" can be used for the case
where we don't know the letters (and so need to decide it).
libstdc++-v3/ChangeLog:
* src/c++20/tzdb.cc (minmax_year, minmax_year2): Remove.
(years_from_to): New class replacing minmax_year and
minmax_year2.
(format_abbrev_str, select_std_or_dst_abbrev): Move earlier in
the file. Handle "-" for letters.
(ZoneInfo::to): Use format_abbrev_str to expand %z.
(ZoneInfo::set_abbrev): Remove exception. Change parameter from
reference to value.
(operator>>(istream&, Rule&)): Do not clear letters when it
contains "-".
(time_zone::_M_get_sys_info): Add missing logic to find the Rule
in effect before the time point.
* testsuite/std/time/tzdb/1.cc: Adjust for vanguard format using
"GMT" as the Zone name, not as a Link to "Etc/GMT".
* testsuite/std/time/time_zone/sys_info_abbrev.cc: New test.
Patrick Palka [Thu, 15 Aug 2024 14:23:54 +0000 (10:23 -0400)]
c++: c->B::m access resolved through current inst [PR116320]
Here when checking the access of (the injected-class-name) B in c->B::m
at parse time, we notice its context B (now the type) is a base of the
object type C<T>, so we proceed to use C<T> as the effective qualifying
type. But this C<T> is the dependent specialization not the primary
template type, so it has empty TYPE_BINFO, which leads to a segfault later
from perform_or_defer_access_check.
The reason the DERIVED_FROM_P (B, C<T>) test guarding this code path works
despite C<T> having empty TYPE_BINFO is because of its currently_open_class
logic (added in r9-713-gd9338471b91bbe) which replaces a dependent
specialization with the primary template type if we're inside it. So the
safest fix seems to be to call currently_open_class in the caller as well.
PR c++/116320
gcc/cp/ChangeLog:
* semantics.cc (check_accessibility_of_qualified_id): Try
currently_open_class when using the object type as the
effective qualifying type.
Patrick Palka [Sat, 10 Aug 2024 01:15:25 +0000 (21:15 -0400)]
c++: inherited CTAD fixes [PR116276]
This implements the overlooked inherited vs non-inherited guide
tiebreaker from P2582R1. This requires tracking inherited-ness of a
guide, for which it seems natural to reuse the lang_decl_fn::context
field which for a constructor tracks its inherited-ness.
This patch also works around CLASSTYPE_CONSTRUCTORS not reliably
returning all inherited constructors (due to some using-decl handling
quirks in in push_class_level_binding) by iterating over TYPE_FIELDS
instead.
This patch also makes us recognize another written form of inherited
constructor, 'using Base<T>::Base::Base' whose USING_DECL_SCOPE is a
TYPENAME_TYPE.
PR c++/116276
gcc/cp/ChangeLog:
* call.cc (joust): Implement P2582R1 inherited vs non-inherited
guide tiebreaker.
* cp-tree.h (lang_decl_fn::context): Document usage in
deduction_guide_p FUNCTION_DECLs.
(inherited_guide_p): Declare.
* pt.cc (inherited_guide_p): Define.
(set_inherited_guide_context): Define.
(alias_ctad_tweaks): Use set_inherited_guide_context.
(inherited_ctad_tweaks): Recognize some inherited constructors
whose scope is a TYPENAME_TYPE.
(ctor_deduction_guides_for): For C++23 inherited CTAD, iterate
over TYPE_FIELDS instead of CLASSTYPE_CONSTRUCTORS to recognize
all inherited constructors.
gcc/testsuite/ChangeLog:
* g++.dg/cpp23/class-deduction-inherited4.C: Remove an xfail.
* g++.dg/cpp23/class-deduction-inherited5.C: New test.
* g++.dg/cpp23/class-deduction-inherited6.C: New test.
Patrick Palka [Sat, 3 Aug 2024 13:05:05 +0000 (09:05 -0400)]
libstdc++: use concrete return type for std::forward_like
Inspired by https://github.com/llvm/llvm-project/issues/101614 this
inverts the relationship between forward_like and __like_t so that
forward_like is defined in terms of __like_t and with a concrete return
type. __like_t in turn is defined via partial specializations that
pattern match on the const- and reference-ness of T.
This turns out to be more SFINAE friendly and significantly cheaper
to compile than the previous implementation.
libstdc++-v3/ChangeLog:
* include/bits/move.h (__like_impl): New metafunction.
(__like_t): Redefine in terms of __like_impl.
(forward_like): Redefine in terms of __like_t.
* testsuite/20_util/forward_like/2_neg.cc: Don't expect
error outside the immediate context anymore.
H.J. Lu [Tue, 27 Aug 2024 20:11:39 +0000 (13:11 -0700)]
ipa: Don't disable function parameter analysis for fat LTO
Update analyze_parms not to disable function parameter analysis for
-ffat-lto-objects. Tested on x86-64, there are no differences in zstd
with "-O2 -flto=auto" -g "vs -O2 -flto=auto -g -ffat-lto-objects".
PR ipa/116410
* ipa-modref.cc (analyze_parms): Always analyze function parameter
for LTO.
Jakub Jelinek [Tue, 3 Sep 2024 08:20:44 +0000 (10:20 +0200)]
lower-bitint: Fix up __builtin_{add,sub}_overflow{,_p} bitint lowering [PR116501]
The following testcase is miscompiled. The problem is in the last_ovf step.
The second operand has signed _BitInt(513) type but has the MSB clear,
so range_to_prec returns 512 for it (i.e. it fits into unsigned
_BitInt(512)). Because of that the last step actually doesn't need to get
the most significant bit from the second operand, but the code was deciding
what to use purely from TYPE_UNSIGNED (type1) - if unsigned, use 0,
otherwise sign-extend the last processed bit; but that in this case was set.
We don't want to treat the positive operand as if it was negative regardless
of the bit below that precision, and precN >= 0 indicates that the operand
is in the [0, inf) range.
2024-09-03 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/116501
* gimple-lower-bitint.cc (bitint_large_huge::lower_addsub_overflow):
In the last_ovf case, use build_zero_cst operand not just when
TYPE_UNSIGNED (typeN), but also when precN >= 0.
Jakub Jelinek [Wed, 7 Aug 2024 18:14:31 +0000 (20:14 +0200)]
Don't call clean_symbol_name in create_tmp_var_name [PR116219]
SRA adds fancy names like offset$D94316$_M_impl$D93629$_M_start
where the numbers in there are DECL_UIDs if there are unnamed
FIELD_DECLs etc.
Because -g0 vs. -g can cause differences between the exact DECL_UID
values (add bigger gaps in between them, corresponding decls should
still be ordered the same based on DECL_UID) we make sure such
decls have DECL_NAMELESS set and depending on exact options either don't
dump such names at all or dump_fancy_name sanitizes the D123456$ parts in
there to Dxxxx$.
Unfortunately in tons of places we then use get_name to grab either user
names or these SRA created names and use that as argument to
create_tmp_var{,_name,_raw} to base other artificial temporary names based
on that. Those are DECL_NAMELESS too, but unfortunately create_tmp_var_name
starting with
https://gcc.gnu.org/git/?p=gcc.git&a=commit;h=725494f6e4121eace43b7db1202f8ecbf52a8276
calls clean_symbol_name which replaces the $s in there with _s and thus
dump_fancy_name doesn't sanitize it anymore.
I don't see any discussion of that commit (originally to TM branch, later
merged) on the mailing list, but from
DECL_NAME (new_decl)
= create_tmp_var_name (IDENTIFIER_POINTER (DECL_NAME (old_decl)));
- SET_DECL_ASSEMBLER_NAME (new_decl, NULL_TREE);
+ SET_DECL_ASSEMBLER_NAME (new_decl, DECL_NAME (new_decl));
snippet elsewhere in that commit it seems create_tmp_var_name was used at
that point also to determine function names of clones, so presumably the
clean_symbol_name at that point was to ensure the symbol could be emitted
into assembly, maybe in case DECL_NAME is something like C++ operators or
whatever could have there undesirable characters.
Anyway, we don't do that for years anymore, already GCC 4.5 uses for such
purposes clone_function_name which starts of DECL_ASSEMBLER_NAME of the old
function and appends based on supportable symbol suffix separators the
separator and some suffix and/or number, so that part doesn't go through
create_tmp_var_name.
I don't see problems with having the $ and . etc. characters in the names
intended just to make dumps more readable, after all, we already are using
those in the SRA created names. Those names shouldn't make it into the
assembly in any way, neither debug info nor assembly labels.
There is one theoretical case, where the gimplifier promotes automatic
vars into TREE_STATIC ones and therefore those can then appear in assembly,
just in case it would be on e.g. SRA created names and regimplified later.
Because no cases of promotion of DECL_NAMELESS vars to static was observed in
{x86_64,i686,powerpc64le}-linux bootstraps/regtests, the code simply uses
C.NNN names for DECL_NAMELESS vars like it does for !DECL_NAME vars.
Richi mentioned on IRC that the non-cleaned up names might make things
harder to feed stuff back to the GIMPLE FE, but if so, I think it should be
the dumping for GIMPLE FE purposes that cleans those up (but at that point
it should also verify if some such cleaned up names don't collide with
others and somehow deal with those).
2024-08-07 Jakub Jelinek <jakub@redhat.com>
PR c++/116219
* gimple-expr.cc (remove_suffix): Formatting fixes.
(create_tmp_var_name): Don't call clean_symbol_name.
* gimplify.cc (gimplify_init_constructor): When promoting automatic
DECL_NAMELESS vars to static, don't preserve their DECL_NAME.
Tamar Christina [Thu, 5 Sep 2024 09:36:02 +0000 (10:36 +0100)]
testsuite: remove -fwrapv from signbit-5.c
The meaning of the testcase was changed by passing it -fwrapv. The reason for
the test failures on some platform was because the test was testing some
implementation defined behavior wrt INT_MIN in generic code.
Instead of using -fwrapv this just removes the border case from the test so
all the values now have a defined semantic. It still relies on the handling of
shifting a negative value right, but that wasn't changed with -fwrapv anyway.
The -fwrapv case is being handled already by other testcases.
gcc/testsuite/ChangeLog:
* gcc.dg/signbit-5.c: Remove -fwrapv and change INT_MIN to INT_MIN+1.
Jonathan Wakely [Mon, 2 Sep 2024 11:16:49 +0000 (12:16 +0100)]
libstdc++: Fix error handling in fs::hard_link_count for Windows
The recent change to use auto_win_file_handle for
std::filesystem::hard_link_count caused a regression. The
std::error_code argument should be cleared if no error occurs, but this
no longer happens. Add a call to ec.clear() in fs::hard_link_count to
fix this.
Also change the auto_win_file_handle class to take a reference to the
std::error_code and set it if an error occurs, to slightly simplify the
control flow in the fs::equiv_files function.
libstdc++-v3/ChangeLog:
* src/c++17/fs_ops.cc (auto_win_file_handle): Add error_code&
member and set it if CreateFileW or GetFileInformationByHandle
fails.
(fs::equiv_files) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Simplify
control flow.
(fs::hard_link_count) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Clear ec
on success.
* testsuite/27_io/filesystem/operations/hard_link_count.cc:
Check error handling.
Jonathan Wakely [Tue, 30 Jul 2024 09:55:55 +0000 (10:55 +0100)]
libstdc++: Fix overwriting files with fs::copy_file on Windows
There are no inode numbers on Windows filesystems, so stat_type::st_ino
is always zero and the check for equivalent files in do_copy_file was
incorrectly identifying distinct files as equivalent. This caused
copy_file to incorrectly report errors when trying to overwrite existing
files.
The fs::equivalent function already does the right thing on Windows, so
factor that logic out into a new function that can be reused by
fs::copy_file.
The tests for fs::copy_file were quite inadequate, so this also adds
checks for that function's error conditions.
libstdc++-v3/ChangeLog:
* src/c++17/fs_ops.cc (auto_win_file_handle): Change constructor
parameter from const path& to const wchar_t*.
(fs::equiv_files): New function.
(fs::equivalent): Use equiv_files.
* src/filesystem/ops-common.h (fs::equiv_files): Declare.
(do_copy_file): Use equiv_files.
* src/filesystem/ops.cc (fs::equiv_files): Define.
(fs::copy, fs::equivalent): Use equiv_files.
* testsuite/27_io/filesystem/operations/copy.cc: Test
overwriting directory contents recursively.
* testsuite/27_io/filesystem/operations/copy_file.cc: Test
overwriting existing files.
libstdc++: Fix fs::hard_link_count behaviour on MinGW [PR113663]
std::filesystem::hard_link_count() always returns 1 on
mingw-w64ucrt-11.0.1-r3 on Windows 10 19045
hard_link_count() queries _wstat64() on MinGW-w64
The MSFT documentation claims _wstat64() will always return 1 *non*-NTFS volumes
https://learn.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-2013/14h5k7ff(v=vs.120)
My tests suggest that is not always true -
hard_link_count()/_wstat64() still returns 1 on NTFS.
GetFileInformationByHandle does return the correct result of 2.
Please see the PR for a minimal repro.
This patch changes the Windows implementation to always call
GetFileInformationByHandle.
PR libstdc++/113663
libstdc++-v3/ChangeLog:
* src/c++17/fs_ops.cc (fs::equivalent): Moved helper class
auto_handle to anonymous namespace as auto_win_file_handle.
(fs::hard_link_count): Changed Windows implementation to use
information provided by GetFileInformationByHandle which is more
reliable.
* testsuite/27_io/filesystem/operations/hard_link_count.cc: New
test.
Signed-off-by: "Lennox" Shou Hao Ho <lennoxhoe@gmail.com> Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
(cherry picked from commit 658193658f05e9a8ebf0bce8bab15555f43bfee1)
Jonathan Wakely [Mon, 2 Sep 2024 10:29:13 +0000 (11:29 +0100)]
libstdc++: Specialize std::disable_sized_sentinel_for for std::move_iterator [PR116549]
LWG 3736 added a partial specialization of this variable template for
two std::move_iterator types. This is needed for the case where the
types satisfy std::sentinel_for and are subtractable, but do not model
the semantics requirements of std::sized_sentinel_for.
libstdc++-v3/ChangeLog:
PR libstdc++/116549
* include/bits/stl_iterator.h (disable_sized_sentinel_for):
Define specialization for two move_iterator types, as per LWG
3736.
* testsuite/24_iterators/move_iterator/lwg3736.cc: New test.
Dhruv Chawla [Mon, 26 Aug 2024 05:39:19 +0000 (11:09 +0530)]
libstdc++: Add missing feature-test macro in various headers
version.syn#2 requires various headers to define
__cpp_lib_allocator_traits_is_always_equal. Currently, only <memory> was
defining this macro. Implement fixes for the other headers as well.
Jonathan Wakely [Tue, 20 Aug 2024 15:52:22 +0000 (16:52 +0100)]
libstdc++: Fix std::variant to reject array types [PR116381]
For the backport, rejecting array types is only done in strict modes.
libstdc++-v3/ChangeLog:
PR libstdc++/116381
* include/std/variant (variant): Fix conditions for
static_assert to match the spec.
* testsuite/20_util/variant/types_neg.cc: New test.
Andrew Carlotti [Thu, 26 Oct 2023 14:45:15 +0000 (15:45 +0100)]
aarch64: Fix ls64 intrinsic availability
The availability of ls64 intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.
This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. We also get better error
messages when ls64 is not available (matching the existing error
messages for SVE intrinsics).
The data512_t type is made always available; this is consistent with the
present behaviour for Neon fp16/bf16 types.
gcc/ChangeLog:
PR target/112108
* config/aarch64/aarch64-builtins.cc (handle_arm_acle_h): Remove
feature check at initialisation.
(aarch64_general_check_builtin_call): Check ls64 intrinsics.
* config/aarch64/arm_acle.h: (data512_t) Make always available.
gcc/testsuite/ChangeLog:
PR target/112108
* gcc.target/aarch64/acle/ls64_guard-1.c: New test.
* gcc.target/aarch64/acle/ls64_guard-2.c: New test.
* gcc.target/aarch64/acle/ls64_guard-3.c: New test.
* gcc.target/aarch64/acle/ls64_guard-4.c: New test.
Andrew Carlotti [Tue, 18 Jul 2023 19:09:38 +0000 (20:09 +0100)]
aarch64: Fix memtag intrinsic availability
The availability of memtag intrinsics and data types were determined
solely by the globally specified architecture features, which did not
reflect any changes specified in target pragmas or attributes.
This patch removes the initialisation-time guards for the intrinsics,
and replaces them with checks at use time. It also removes the macro
indirection from the header file - this simplifies the header, and
allows the missing extension error reporting to find the user-facing
intrinsic names.
PR target/112108
* gcc.target/aarch64/acle/memtag_guard-1.c: New test.
* gcc.target/aarch64/acle/memtag_guard-2.c: New test.
* gcc.target/aarch64/acle/memtag_guard-3.c: New test.
* gcc.target/aarch64/acle/memtag_guard-4.c: New test.
Andrew Carlotti [Thu, 26 Oct 2023 14:43:44 +0000 (15:43 +0100)]
aarch64: Fix tme intrinsic availability
The availability of tme intrinsics was previously gated at both
initialisation time (using global target options) and usage time
(accounting for function-specific target options). This patch removes
the check at initialisation time, and also moves the intrinsics out of
the header file to allow for better error messages (matching the
existing error messages for SVE intrinsics).
PR target/112108
* gcc.target/aarch64/acle/tme_guard-1.c: New test.
* gcc.target/aarch64/acle/tme_guard-2.c: New test.
* gcc.target/aarch64/acle/tme_guard-3.c: New test.
* gcc.target/aarch64/acle/tme_guard-4.c: New test.
Andrew Carlotti [Tue, 13 Aug 2024 15:15:11 +0000 (16:15 +0100)]
aarch64: Refactor check_required_extensions
Replace TARGET_GENERAL_REGS_ONLY check with an explicit check that
aarch64_isa_flags enables all required extensions. This will be more
flexible when repurposing this function for non-SVE intrinsics.
This hook allows the BFD linker plugin to distinguish calls to
claim_file_handler that know the object is being used by the linker
(from ldmain.c:add_archive_element), from calls that don't know it's
being used by the linker (from elf_link_is_defined_archive_symbol); in
the latter case, the plugin should avoid including the unused LTO archive
members in link output. To get the proper support for archives with LTO
common symbols, the linker fix
lto: Don't include unused LTO archive members in output
is required.
PR lto/116361
* lto-plugin.c (claim_file_handler_v2): Rename claimed to
can_be_claimed. Include the LTO object only if it is known to
be included in link output.
The intrin for non-optimized got a typo in mask type, which will cause
the high bits of __mmask32 being unexpectedly zeroed.
The test does not fail under O0 with current 1b since the testcase is
wrong. We need to include avx512-mask-type.h after SIZE is defined, or
it will always be __mmask8. I will write a seperate patch to fix that
on trunk ONLY.
gcc/ChangeLog:
* config/i386/avx512fp16intrin.h
(_mm512_mask_fpclass_ph_mask): Correct mask type to __mmask32.
(_mm512_fpclass_ph_mask): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/i386/avx512fp16-vfpclassph-1c.c: New test.
liuhongt [Thu, 29 Aug 2024 03:39:20 +0000 (11:39 +0800)]
Check avx upper register for parallel.
For function arguments/return, when it's BLK mode, it's put in a
parallel with an expr_list, and the expr_list contains the real mode
and registers.
Current ix86_check_avx_upper_register only checked for SSE_REG_P, and
failed to handle that. The patch extend the handle to each subrtx.
gcc/ChangeLog:
PR target/116512
* config/i386/i386.cc (ix86_check_avx_upper_register): Iterate
subrtx to scan for avx upper register.
(ix86_check_avx_upper_stores): Inline old
ix86_check_avx_upper_register.
(ix86_avx_u128_mode_needed): Ditto, and replace
FOR_EACH_SUBRTX with call to new
ix86_check_avx_upper_register.
Xi Ruoyao [Mon, 6 May 2024 03:39:14 +0000 (11:39 +0800)]
i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]
After r14-811 "call *nop@GOTPCREL(%rip)" is only generated with
-mno-direct-extern-access even if --enable-default-pie. So the r13-1614
change to this file is not valid anymore.
liuhongt [Thu, 15 Aug 2024 04:54:07 +0000 (12:54 +0800)]
Align ix86_{move_max,store_max} with vectorizer.
When none of mprefer-vector-width, avx256_optimal/avx128_optimal,
avx256_store_by_pieces/avx512_store_by_pieces is specified, GCC will
set ix86_{move_max,store_max} as max available vector length except
for AVX part.
So for -mavx2, vectorizer will choose 256-bit for vectorization, but
128-bit is used for struct copy, there could be a potential STLF issue
due to this "misalign".
The patch fixes that.
gcc/ChangeLog:
* config/i386/i386-options.cc (ix86_option_override_internal):
set ix86_{move_max,store_max} to PVW_AVX256 when TARGET_AVX
instead of PVW_AVX128.
Alexandre Oliva [Wed, 26 Jun 2024 05:08:18 +0000 (02:08 -0300)]
[testsuite] [arm] [vect] adjust mve-vshr test [PR113281]
The test was too optimistic, alas. We used to vectorize shifts by
clamping the shift counts below the bit width of the types (e.g. at 15
for 16-bit vector elements), but (uint16_t)32768 >> (uint16_t)16 is
well defined (because of promotion to 32-bit int) and must yield 0,
not 1 (as before the fix).
Unfortunately, in the gimple model of vector units, such large shift
counts wouldn't be well-defined, so we won't vectorize such shifts any
more, unless we can tell they're in range or undefined.
So the test that expected the vectorization we no longer performed
needs to be adjusted. Instead of nobbling the test, Richard Earnshaw
suggested annotating the test with the expected ranges so as to enable
the optimization, and Christophe Lyon suggested a further
simplification.
Co-Authored-By: Richard Earnshaw <Richard.Earnshaw@arm.com>
for gcc/testsuite/ChangeLog
Marek Polacek [Thu, 15 Aug 2024 15:53:10 +0000 (11:53 -0400)]
c++: fix ICE in convert_nontype_argument [PR116384]
Here we ICE since r14-8291 in C++11/C++14 modes. Fortunately
this is an easy one.
The important bit of r14-8291 is this:
@@ -20056,9 +20071,12 @@ tsubst_expr (tree t, tree args, tsubst_flags_t complain, tree in_decl)
RETURN (retval);
}
if (IMPLICIT_CONV_EXPR_NONTYPE_ARG (t))
- /* We'll pass this to convert_nontype_argument again, we don't need
- to actually perform any conversion here. */
- RETURN (expr);
+ {
+ tree r = convert_nontype_argument (type, expr, complain);
+ if (r == NULL_TREE)
+ r = error_mark_node;
+ RETURN (r);
+ }
which obviously means that instead of returning right away we go
to convert_nontype_argument. When type is error_mark_node and we're
in C++17, in convert_nontype_argument we go down this path:
Georg-Johann Lay [Sun, 18 Aug 2024 13:00:55 +0000 (15:00 +0200)]
AVR: target/116407 - Fix linker error "relocation truncated to fit".
Some text peepholes output extra instructions prior to a branch
instruction and that increase the jump offset of backward branches.
PR target/116407
gcc/
* config/avr/avr-protos.h (avr_jump_mode): Add an int argument.
* config/avr/avr.cc (avr_jump_mode): Add an int argument to increase
the computed jump offset of backwards branches.
* config/avr/avr.md (*dec-and-branchhi!=-1, *dec-and-branchsi!=-1):
Increase the jump offset used by avr_jump_mode() as needed.
gcc/testsuite/
* gcc.target/avr/torture/pr116407-2.c: New test.
* gcc.target/avr/torture/pr116407-4.c: New test.
testsuite: Verify -fshort-enums and -fno-short-enums in pr33738.C
For some targets, like Cortex-M on arm-none-eabi, the -fshort-enums is
enabled by default. For these targets, the test case fails as
sizeof(Alpha) < sizeof(int).
To make the test case behave identical for targets that does enable
-fshort-enums and those that does not, add -fno-short-enums in the test
case and verify that the warning is not emitted. Then also create a copy
and run the test with -fshort-enums and verify that the warning is
emitted.
Regtested on x86_64-pc-linux-gnu and arm-none-eabi.
gcc/testsuite/ChangeLog:
* g++.dg/warn/pr33738.C: Added -fno-short-enums.
* g++.dg/warn/pr33738-2.C: Duplicate g++.dg/warn/pr33738.C with
-fshort-enums and removed xfail.