Mikael Morin [Sat, 25 Feb 2023 20:37:46 +0000 (21:37 +0100)]
fortran: Reuse associated_dummy memory if previously allocated [PR108923]
This avoids making the associted_dummy field point to a new memory chunk
if it's already pointing somewhere, in which case doing so would leak the
previously allocated chunk.
PR fortran/108923
gcc/fortran/ChangeLog:
* intrinsic.cc (get_intrinsic_dummy_arg,
set_intrinsic_dummy_arg): Rename the former to the latter.
Remove the return value, add a reference to the lhs as argument,
and do the pointer assignment inside the function. Don't do
it if the pointer is already non-NULL.
(sort_actual): Update caller.
Mikael Morin [Fri, 24 Feb 2023 21:11:17 +0000 (22:11 +0100)]
fortran: Plug leak of associated_dummy memory. [PR108923]
This fixes a memory leak by accompanying the release of
gfc_actual_arglist elements' memory with a release of the
associated_dummy field memory (if allocated).
Actual argument copy is adjusted as well so that each copy can free
its field independently.
PR fortran/108923
gcc/fortran/ChangeLog:
* expr.cc (gfc_free_actual_arglist): Free associated_dummy
memory.
(gfc_copy_actual_arglist): Make a copy of the associated_dummy
field if it is set in the original element.
Andrew Pinski [Wed, 2 Nov 2022 15:56:31 +0000 (15:56 +0000)]
Fix PR 105532: match.pd patterns calling tree_nonzero_bits with vector types
Even though this PR was reported with an ubsan issue, the problem is
tree_nonzero_bits is being called with an expression which is a vector type.
This fixes three patterns I noticed which does that.
And adds a testcase for one of the patterns.
Committed after a bootstrapped and tested on x86_64-linux-gnu with no regressions
gcc/ChangeLog:
PR tree-optimization/105532
* match.pd (~(X >> Y) -> ~X >> Y): Check if it is an integral
type before calling tree_nonzero_bits.
(popcount(X) + popcount(Y)): Likewise.
(popcount(X&C1)): Likewise.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/vector-shift-1.c: New test.
Andrew Pinski [Tue, 7 Feb 2023 23:09:40 +0000 (23:09 +0000)]
tree-optimization: [PR108684] ICE in verify_ssa due to simple_dce_from_worklist
In simple_dce_from_worklist, we were removing an inline-asm which had a vdef.
We should not be removing inline-asm which have a vdef as this code
does not check to the store.
This fixes that oversight. This was a latent bug exposed recently
by both VRP and removal of stores to static starting to use
simple_dce_from_worklist.
Backported after bootstrapped and tested on x86_64-linux-gnu with no regressions.
PR tree-optimization/108684
gcc/ChangeLog:
* tree-ssa-dce.cc (simple_dce_from_worklist):
Check all ssa names and not just non-vdef ones
before accepting the inline-asm.
Call unlink_stmt_vdef on the statement before
removing it.
gcc/testsuite/ChangeLog:
* gcc.c-torture/compile/dce-inline-asm-1.c: New test.
* gcc.c-torture/compile/dce-inline-asm-2.c: New test.
* gcc.dg/tree-ssa/pr108684-1.c: New test.
Tobias Burnus [Wed, 1 Mar 2023 12:53:09 +0000 (13:53 +0100)]
OpenMP/Fortran: Fix handling of optional is_device_ptr + bind(C) [PR108546]
For is_device_ptr, optional checks should only be done before calling
libgomp, afterwards they are NULL either because of absent or, by
chance, because it is unallocated or unassociated (for pointers/allocatables).
Additionally, it fixes an issue with explicit mapping for 'type(c_ptr)'.
PR middle-end/108546
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_trans_omp_clauses): Fix mapping of
type(C_ptr) variables.
gcc/ChangeLog:
* omp-low.cc (lower_omp_target): Remove optional handling
on the receiver side, i.e. inside target (data), for
use_device_ptr.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/is_device_ptr-3.f90: New test.
* testsuite/libgomp.fortran/use_device_ptr-optional-4.f90: New test.
Marek Polacek [Wed, 8 Feb 2023 19:02:48 +0000 (14:02 -0500)]
c++: ICE initing lifetime-extended constexpr var [PR107079]
We ICE on the simple:
struct X { const X* x = this; };
constexpr const X& x = X{};
where store_init_value initializes 'x' with
&TARGET_EXPR <D.2768, {.x=(const struct X *) &<PLACEHOLDER_EXPR struct X>}>
but we must lifetime-extend via extend_ref_init_temps and we get
_ZGR1x_.x = (const struct X *) &<PLACEHOLDER_EXPR struct X> >>>;, (const struct X &) &_ZGR1x_;
Since 'x' was declared constexpr, we do cxx_constant_init and we hit
the preeval code added in r269003 while evaluating the INIT_EXPR:
_ZGR1x_.x = (const struct X *) &<PLACEHOLDER_EXPR struct X> >>>
but we have no ctx.ctor or ctx.object here so lookup_placeholder won't
find anything that could replace X and we ICE on
7861 /* A placeholder without a referent. We can get here when
7862 checking whether NSDMIs are noexcept, or in massage_init_elt;
7863 just say it's non-constant for now. */
7864 gcc_assert (ctx->quiet);
because cxx_constant_init means !ctx->quiet. It's not correct that
there isn't a referent. I think the following patch is a pretty
straightforward fix: pass the _ZGR var down to maybe_constant_init so
that it can replace the PLACEHOLDER_EXPR with _ZGR1x_.
The commented assert in the test doesn't pass: we complain that _ZGR1x_
isn't a constexpr variable because we don't implement DR2126 (PR101588).
PR c++/107079
gcc/cp/ChangeLog:
* call.cc (set_up_extended_ref_temp): Pass var to maybe_constant_init.
Marek Polacek [Fri, 3 Mar 2023 16:24:24 +0000 (11:24 -0500)]
c++: error with constexpr operator() [PR107939]
Similarly to PR107938, this also started with r11-557, whereby cp_finish_decl
can call check_initializer even in a template for a constexpr initializer.
Here we are rejecting
extern const Q q;
template<int>
constexpr auto p = q(0);
even though q has a constexpr operator(). It's deemed non-const by
decl_maybe_constant_var_p because even though 'q' is const it is not
of integral/enum type.
If fun is not a function pointer, we don't know if we're using it as an
lvalue or rvalue, so with this patch we pass 'any' for want_rval. With
that, p_c_e/VAR_DECL doesn't flat out reject the underlying VAR_DECL.
PR c++/107939
gcc/cp/ChangeLog:
* constexpr.cc (potential_constant_expression_1) <case CALL_EXPR>: Pass
'any' when recursing on a VAR_DECL and not a pointer to function.
gcc/testsuite/ChangeLog:
* g++.dg/cpp1y/var-templ74.C: Remove dg-error.
* g++.dg/cpp1y/var-templ77.C: New test.
Patrick Palka [Fri, 3 Mar 2023 16:37:02 +0000 (11:37 -0500)]
c++: thinko in extract_local_specs [PR108998]
In order to fix PR100295, r13-4730-g18499b9f848707 attempted to make
extract_local_specs walk the given pattern twice, ignoring unevaluated
operands the first time around so that we prefer to process a local
specialization in an evaluated context if it appears in one (we process
each local specialization once even if it appears multiple times in the
pattern).
But there's a thinko in the patch, namely that we don't actually walk
the pattern twice since we don't clear the visited set for the second
walk (to avoid processing a local specialization twice) and so the root
node (and any node leading up to an unevaluated operand) is considered
visited already. So the patch effectively made extract_local_specs
ignore unevaluated operands altogether, which this testcase demonstrates
isn't quite safe (extract_local_specs never sees 'aa' and we don't record
its local specialization, so later we try to specialize 'aa' on the spot
with the args {{int},{17}} which causes us to nonsensically substitute
its auto with 17.)
This patch fixes this by refining the second walk to start from the
trees we skipped over during the first walk.
PR c++/108998
gcc/cp/ChangeLog:
* pt.cc (el_data::skipped_trees): New data member.
(extract_locals_r): Push to skipped_trees any unevaluated
contexts that we skipped over.
(extract_local_specs): For the second walk, start from each
tree in skipped_trees.
Patrick Palka [Fri, 23 Dec 2022 16:17:45 +0000 (11:17 -0500)]
c++: get_nsdmi in template context [PR108116]
Here during ahead of time checking of C{}, we indirectly call get_nsdmi
for C::m from finish_compound_literal, which in turn calls
break_out_target_exprs for C::m's (non-templated) initializer, during
which we build a call to A::~A and check expr_noexcept_p for it (from
build_vec_delete_1). But this is all done with processing_template_decl
set, so the built A::~A call is templated (whose form was recently
changed by r12-6897-gdec8d0e5fa00ceb2) which expr_noexcept_p doesn't
expect, and we crash.
This patch fixes this by clearing processing_template_decl before
the call to break_out_target_exprs from get_nsdmi. And since it more
generally seems we shouldn't be seeing (or producing) non-templated
trees in break_out_target_exprs, this patch also adds an assert to
that effect.
PR c++/108116
gcc/cp/ChangeLog:
* constexpr.cc (maybe_constant_value): Clear
processing_template_decl before calling break_out_target_exprs.
* init.cc (get_nsdmi): Likewise.
* tree.cc (break_out_target_exprs): Assert processing_template_decl
is cleared.
Patrick Palka [Fri, 23 Dec 2022 14:18:37 +0000 (09:18 -0500)]
c++: template friend with variadic constraints [PR107853]
When instantiating a constrained hidden template friend, we substitute
into its template-head requirements in tsubst_friend_function. For this
substitution we use the template's full argument vector whose outer
levels correspond to the instantiated class's arguments and innermost
level corresponds to the template's own level-lowered generic arguments.
But for A<int>::f here, for which the relevant argument vector is
{{int}, {Us...}}, the substitution into (C<Ts, Us> && ...) triggers the
assert in use_pack_expansion_extra_args_p since one argument is a pack
expansion and the other isn't.
And for A<int, int>::f, for which the relevant argument vector is
{{int, int}, {Us...}}, the use_pack_expansion_extra_args_p assert would
also trigger but we first get a bogus "mismatched argument pack lengths"
error from tsubst_pack_expansion.
Sidestepping the question of whether tsubst_pack_expansion should be
able to handle such substitutions, it seems we can work around this by
using only the instantiated class's arguments and not also the template
friend's own generic arguments, which is consistent with how we normally
substitute into the signature of a member template.
PR c++/107853
gcc/cp/ChangeLog:
* constraint.cc (maybe_substitute_reqs_for): Substitute into
the template-head requirements of a template friend using only
its outer arguments via outer_template_args.
* cp-tree.h (outer_template_args): Declare.
* pt.cc (outer_template_args): Define, factored out and
generalized from ...
(ctor_deduction_guides_for): ... here.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-friend12.C: New test.
* g++.dg/cpp2a/concepts-friend13.C: New test.
Patrick Palka [Tue, 29 Nov 2022 14:55:21 +0000 (09:55 -0500)]
c++: explicit specialization and trailing requirements [PR107864]
Here we're crashing when using the explicit specialization of the
function template g with trailing requirements ultimately because
earlier decls_match (called indirectly from register_specialization) for
for the explicit specialization returned false since the template has
trailing requirements whereas the specialization doesn't.
In r12-2230-gddd25bd1a7c8f4, we fixed a similar issue concerning template
requirements instead of trailing requirements. We could extend that fix
to ignore trailing requirement mismatches for explicit specializations
as well, but it seems cleaner to just propagate constraints from the
specialized template to the specialization when declaring an explicit
specialization so that decls_match will naturally return true in this
case. And it looks like determine_specialization already does this,
albeit inconsistently (only when specializing a non-template member
function of a class template as in cpp2a/concepts-explicit-spec4.C).
So this patch makes determine_specialization consistently propagate
constraints from the specialized template to the specialization, which
in turn lets us get rid of the function_requirements_equivalent_p special
case added by r12-2230.
PR c++/107864
gcc/cp/ChangeLog:
* decl.cc (function_requirements_equivalent_p): Don't check
DECL_TEMPLATE_SPECIALIZATION.
* pt.cc (determine_specialization): Propagate constraints when
specializing a function template too. Simplify by using
add_outermost_template_args.
Patrick Palka [Thu, 3 Nov 2022 19:35:18 +0000 (15:35 -0400)]
c++: requires-expr and access checking [PR107179]
Like during satisfaction, we also need to avoid deferring access checks
during substitution of a requires-expr because the outcome of an access
check can determine the value of the requires-expr. Otherwise (in
deferred access checking contexts such as within a base-clause), the
requires-expr may evaluate to the wrong result, and along the way a
failed access check may leak out from it into a non-SFINAE context and
cause a hard error (as in the below testcase).
PR c++/107179
gcc/cp/ChangeLog:
* constraint.cc (tsubst_requires_expr): Make sure we're not
deferring access checks.
Xi Ruoyao [Thu, 2 Mar 2023 10:05:23 +0000 (18:05 +0800)]
LoongArch: Stop -mfpu from silently breaking ABI [PR109000]
In the toolchain convention, we describe -mfpu= as:
"Selects the allowed set of basic floating-point instructions and
registers. This option should not change the FP calling convention
unless it's necessary."
Though not explicitly stated, the rationale of this rule is to allow
combinations like "-mabi=lp64s -mfpu=64". This will be useful for
running applications with LP64S/F ABI on a double-float-capable
LoongArch hardware and using a math library with LP64S/F ABI but native
double float HW instructions, for a better performance.
And now a case in Linux kernel has again proven the usefulness of this
kind of combination. The AMDGPU DCN kernel driver needs to perform some
floating-point operation, but the entire kernel uses LP64S ABI. So the
translation units of the AMDGPU DCN driver need to be compiled with
-mfpu=64 (the kernel lacks soft-FP routines in libgcc), but -mabi=lp64s
(or you can't link it with the other part of the kernel).
Unfortunately, currently GCC uses TARGET_{HARD,SOFT,DOUBLE}_FLOAT to
determine the floating calling convention. This causes "-mfpu=64"
silently allow using $fa* to pass parameters and return values EVEN IF
-mabi=lp64s is used. To make things worse, the generated object file
has SOFT-FLOAT set in the eflags field so the linker will happily link
it with other LP64S ABI object files, but obviously this will lead to
bad results at runtime. And for now all loongarch64 CPU models (-march
settings) implies -mfpu=64 on by default, so the issue makes a single
"-mabi=lp64s" option basically broken (fortunately most projects for eg
the Linux kernel have used -msoft-float which implies both -mabi=lp64s
and -mfpu=none as we've recommended in the toolchain convention doc).
The fix is simple: use TARGET_*_FLOAT_ABI instead.
I consider this a bug fix: the behavior difference from the toolchain
convention doc is a bug, and generating object files with SOFT-FLOAT
flag but parameters/return values passed through FPRs is definitely a
bug.
Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk and
release/gcc-12 branch?
gcc/ChangeLog:
PR target/109000
* config/loongarch/loongarch.h (FP_RETURN): Use
TARGET_*_FLOAT_ABI instead of TARGET_*_FLOAT.
(UNITS_PER_FP_ARG): Likewise.
gcc/testsuite/ChangeLog:
PR target/109000
* gcc.target/loongarch/flt-abi-isa-1.c: New test.
* gcc.target/loongarch/flt-abi-isa-2.c: New test.
* gcc.target/loongarch/flt-abi-isa-3.c: New test.
* gcc.target/loongarch/flt-abi-isa-4.c: New test.
Harald Anlauf [Mon, 20 Feb 2023 20:28:09 +0000 (21:28 +0100)]
Fortran: improve checking of character length specification [PR96025]
gcc/fortran/ChangeLog:
PR fortran/96025
* parse.cc (check_function_result_typed): Improve type check of
specification expression for character length and return status.
(parse_spec): Use status from above.
* resolve.cc (resolve_fntype): Prevent use of invalid specification
expression for character length.
gcc/testsuite/ChangeLog:
PR fortran/96025
* gfortran.dg/pr96025.f90: New test.
Marek Polacek [Sat, 4 Mar 2023 18:14:01 +0000 (13:14 -0500)]
c++: variable template and targ deduction [PR108550]
In this test, we get a bogus error because we failed to deduce the auto in
constexpr auto is_pointer_v = is_pointer<Tp>::value;
to bool. Then ensure_literal_type_for_constexpr_object thinks the object
isn't literal and an error is reported.
This is another case of the interaction between tf_partial and 'auto',
where the auto was not reduced so the deduction failed. In more detail:
we have
Wrap1<int>()
in the code and we need to perform OR -> fn_type_unification. The targ
list is incomplete, so we do
tsubst_flags_t ecomplain = complain | tf_partial | tf_fndecl_type;
fntype = tsubst (TREE_TYPE (fn), explicit_targs, ecomplain, NULL_TREE);
where TREE_TYPE (fn) is struct integral_constant <T402> (void). Then
we substitute the return type, which results in tsubsting is_pointer_v<int>.
is_pointer_v is a variable template with a placeholder type:
template <class Tp>
constexpr auto is_pointer_v = is_pointer<Tp>::value;
so we find ourselves in lookup_and_finish_template_variable. tf_partial is
still set, so finish_template_variable -> instantiate_template -> tsubst
won't reduce the level of auto. But then we do mark_used which eventually
calls do_auto_deduction which clears tf_partial, because we want to replace
the auto now. But we hadn't reduced auto's level so this fails. And
since we're not in an immediate context, we emit a hard error.
I suppose that when we reach lookup_and_finish_template_variable it's
probably time to clear tf_partial. (I added an assert and our testsuite
doesn't have a test whereby we get to lookup_and_finish_template_variable
while tf_partial is still active.)
Marek Polacek [Wed, 22 Feb 2023 20:17:03 +0000 (15:17 -0500)]
c-family: avoid compile-time-hog in c_genericize [PR108880]
This fixes a compile-time hog with UBSan. This only happened in cc1 but
not cc1plus. The problem is ultimately that c_genericize_control_stmt/
STATEMENT_LIST -> walk_tree_1 doesn't use a hash_set to remember visited
nodes, so it kept on recursing for a long time. We should be able to
use the pset that c_genericize created. We just need to use walk_tree
instead of walk_tree_w_d so that the pset is explicit.
PR c/108880
gcc/c-family/ChangeLog:
* c-gimplify.cc (c_genericize_control_stmt) <case STATEMENT_LIST>: Pass
pset to walk_tree_1.
(c_genericize): Call walk_tree with an explicit pset.
Marek Polacek [Wed, 1 Mar 2023 19:28:46 +0000 (14:28 -0500)]
c++: ICE with -Wmismatched-tags and member template [PR106259]
-Wmismatched-tags warns about the (harmless) struct/class mismatch.
For, e.g.,
template<typename T> struct A { };
class A<int> a;
it works by adding A<T> to the class2loc hash table while parsing the
class-head and then, while parsing the elaborate type-specifier, we
add A<int>. At the end of c_parse_file we go through the table and
warn about the class-key mismatches. In this PR we crash though; we
have
template<typename T> struct A {
template<typename U> struct W { };
};
struct A<int>::W<int> w; // #1
where while parsing A and #1 we've stashed
A<T>
A<T>::W<U>
A<int>::W<int>
into class2loc. Then in class_decl_loc_t::diag_mismatched_tags TYPE
is A<int>::W<int>, and specialization_of gets us A<int>::W<U>, which
is not in class2loc, so we crash on gcc_assert (cdlguide). But it's
OK not to have found A<int>::W<U>, we should just look one "level" up,
that is, A<T>::W<U>.
It's important to handle class specializations, so e.g.
template<>
struct A<char> {
template<typename U>
class W { };
};
where W's class-key is different than in the primary template above,
so we should warn depending on whether we're looking into A<char>
or into a different instantiation.
PR c++/106259
gcc/cp/ChangeLog:
* parser.cc (class_decl_loc_t::diag_mismatched_tags): If the first
lookup of SPEC didn't find anything, try to look for
most_general_template.
where TYPE_PTRMEM_CLASS_TYPE (type) is going to crash since the type
is ptrdiff_type_node. We could just add a TYPE_PTRMEM_P check before
accessing TYPE_PTRMEM_CLASS_TYPE but I think it's nicer to explain why
we couldn't evaluate the expression.
PR c++/107574
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Emit an error when
a PTRMEM_CST cannot be evaluated.
Marek Polacek [Thu, 23 Feb 2023 22:54:47 +0000 (17:54 -0500)]
c++: ICE with constexpr variable template [PR107938]
Since r11-557, cp_finish_decl can call check_initializer even in
a template for a constexpr initializer. That ultimately leads to
convert_for_assignment and check_address_or_pointer_of_packed_member,
where we crash, because it doesn't expect that the CALL_EXPR is
a function object. Q has a constexpr operator(), but since we're
in a template, q(0) is a CALL_EXPR whose CALL_EXPR_FN is just
a VAR_DECL; it hasn't been converted to Q::operator<int>(&q, 0) yet.
I propose to robustify check_address_or_pointer_of_packed_member.
var-templ74.C has an XFAIL, subject to 107939.
I noticed that our -Waddress-of-packed-member tests weren't testing
member functions, added thus. (I was tempted to check
FUNCTION_POINTER_TYPE_P but that doesn't include METHOD_TYPE.)
This is a follow-up to commit a4c6bd0821099f6b8c0f64a96ffd9d01a025c413
introducing a runtime check for alignment for 16 byte atomic
compare-exchange, load, and store.
libatomic/ChangeLog:
* config/s390/cas_n.c: New file.
* config/s390/load_n.c: New file.
* config/s390/store_n.c: New file.
Martin Liska [Fri, 17 Feb 2023 14:11:02 +0000 (15:11 +0100)]
asan: adjust module name for global variables
As mentioned in the PR, when we use LTO, we wrongly use ltrans output
file name as a module name of a global variable. That leads to a
non-reproducible output.
After the suggested change, we emit context name of normal global
variables. And for artificial variables (like .Lubsan_data3), we use
aux_base_name (e.g. "./a.ltrans0.ltrans").
PR sanitizer/108834
gcc/ChangeLog:
* asan.cc (asan_add_global): Use proper TU name for normal
global variables (and aux_base_name for the artificial one).
gcc/testsuite/ChangeLog:
* c-c++-common/asan/global-overflow-1.c: Test line and column
info for a global variable.
Kewen Lin [Tue, 14 Feb 2023 02:03:26 +0000 (20:03 -0600)]
rs6000/test: Adjust some test cases on partial vector [PR96373]
As Richard pointed out in [1] and the testing on Power10, the
proposed fix for PR96373 requires some updates on a few rs6000
test cases which adopt partial vector. This patch is to fix
all of them with one extra option "-fno-trapping-math" as
Richard suggested.
Besides, the original test case also failed on Power10 without
Richard's proposed fix, this patch adds it together for a bit
better testing coverage.
Jonathan Wakely [Wed, 31 Aug 2022 12:57:34 +0000 (13:57 +0100)]
libstdc++: Add noexcept-specifier to std::reference_wrapper::operator()
This isn't required by the standard, but there's an LWG issue suggesting
to add it.
Also use __invoke_result instead of result_of, to match the spec in
recent standards.
libstdc++-v3/ChangeLog:
* include/bits/refwrap.h (reference_wrapper::operator()): Add
noexcept-specifier and use __invoke_result instead of result_of.
* testsuite/20_util/reference_wrapper/invoke-noexcept.cc: New test.
Jonathan Wakely [Thu, 2 Feb 2023 14:06:40 +0000 (14:06 +0000)]
libstdc++: Fix std::filesystem errors with -fkeep-inline-functions [PR108636]
With -fkeep-inline-functions there are linker errors when including
<filesystem>. This happens because there are some filesystem::path
constructors defined inline which call non-exported functions defined in
the library. That's usually not a problem, because those constructors
are only called by code that's also inside the library. But when the
header is compiled with -fkeep-inline-functions those inline functions
are emitted even though they aren't called. That then creates an
undefined reference to the other library internsl. The fix is to just
move the private constructors into the library where they are called.
That way they are never even seen by users, and so not compiled even if
-fkeep-inline-functions is used.
Marek Polacek [Thu, 16 Feb 2023 22:41:24 +0000 (17:41 -0500)]
c++: ICE with redundant capture [PR108829]
Here we crash in is_capture_proxy:
/* Location wrappers should be stripped or otherwise handled by the
caller before using this predicate. */
gcc_checking_assert (!location_wrapper_p (decl));
We only crash with the redundant capture:
int abyPage = [=, abyPage] { ... }
because prune_lambda_captures is only called when there was a default
capture, and with [=] only abyPage won't be in LAMBDA_EXPR_CAPTURE_LIST.
The problem is that LAMBDA_CAPTURE_EXPLICIT_P wasn't propagated
correctly and so var_to_maybe_prune proceeded where it shouldn't.
Co-Authored by: Patrick Palka <ppalka@redhat.com>
PR c++/108829
gcc/cp/ChangeLog:
* pt.cc (prepend_one_capture): Set LAMBDA_CAPTURE_EXPLICIT_P.
(tsubst_lambda_expr): Pass LAMBDA_CAPTURE_EXPLICIT_P to
prepend_one_capture.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-108829-2.C: New test.
* g++.dg/cpp0x/lambda/lambda-108829.C: New test.
Alex Coplan [Mon, 6 Feb 2023 14:32:21 +0000 (14:32 +0000)]
aarch64: Fix up bfmlal lane pattern [PR104921]
As the testcase shows, this pattern had an incorrect constraint leading
to GCC's output getting rejected by the assembler.
This patch fixes the constraint accordingly.
The test is split into two: one that can run without bf16 support from
the assembler and another that checks that the output actually assembles
when such support is available.
gcc/ChangeLog:
PR target/104921
* config/aarch64/aarch64-simd.md (aarch64_bfmlal<bt>_lane<q>v4sf):
Use correct constraint for operand 3.
gcc/testsuite/ChangeLog:
PR target/104921
* gcc.target/aarch64/pr104921-1.c: New test.
* gcc.target/aarch64/pr104921-2.c: New test.
* gcc.target/aarch64/pr104921.x: Include file for new tests.
Xi Ruoyao [Mon, 13 Feb 2023 10:38:53 +0000 (18:38 +0800)]
LoongArch: Fix multiarch tuple canonization
Multiarch tuple will be coded in file or directory names in
multiarch-aware distros, so one ABI should have only one multiarch
tuple. For example, "--target=loongarch64-linux-gnu --with-abi=lp64s"
and "--target=loongarch64-linux-gnusf" should both set multiarch tuple
to "loongarch64-linux-gnusf". Before this commit,
"--target=loongarch64-linux-gnu --with-abi=lp64s --disable-multilib"
will produce wrong result (loongarch64-linux-gnu).
A recent LoongArch psABI revision mandates "loongarch64-linux-gnu" to be
used for -mabi=lp64d (instead of "loongarch64-linux-gnuf64") for some
non-technical reason [1]. Note that we cannot make
"loongarch64-linux-gnuf64" an alias for "loongarch64-linux-gnu" because
to implement such an alias, we must create thousands of symlinks in the
distro and doing so would be completely unpractical. This commit also
aligns GCC with the revision.
Tested by building cross compilers with --enable-multiarch and multiple
combinations of --target=loongarch64-linux-gnu*, --with-abi=lp64{s,f,d},
and --{enable,disable}-multilib; and run "xgcc --print-multiarch" then
manually verify the result with eyesight.
* config.gcc (triplet_abi): Set its value based on $with_abi,
instead of $target.
(la_canonical_triplet): Set it after $triplet_abi is set
correctly.
* config/loongarch/t-linux (MULTILIB_OSDIRNAMES): Make the
multiarch tuple for lp64d "loongarch64-linux-gnu" (without
"f64" suffix).
* include/experimental/bits/simd_x86.h
(_SimdImplX86::_S_not_equal_to, _SimdImplX86::_S_less)
(_SimdImplX86::_S_less_equal): Do not call
__builtin_is_constant_evaluated in constexpr-if.
* include/experimental/bits/simd.h
(_SimdWrapper::_M_is_constprop_none_of)
(_SimdWrapper::_M_is_constprop_all_of): Return false unless the
computed result still satisfies __builtin_constant_p.
Eric Botcazou [Wed, 15 Feb 2023 22:32:12 +0000 (23:32 +0100)]
Fix PR target/90458
This is the incompatibility of -fstack-clash-protection with Windows SEH.
Now the Windows ports always enable TARGET_STACK_PROBE, which means that
the stack is always probed (out of line) so -fstack-clash-protection does
nothing more.
gcc/
PR target/90458
* config/i386/i386.cc (ix86_compute_frame_layout): Disable the
effects of -fstack-clash-protection for TARGET_STACK_PROBE.
(ix86_expand_prologue): Likewise.
Marek Polacek [Tue, 14 Feb 2023 22:05:44 +0000 (17:05 -0500)]
warn-access: wrong -Wdangling-pointer with labels [PR106080]
-Wdangling-pointer warns when the address of a label escapes. This
causes grief in OCaml (<https://github.com/ocaml/ocaml/issues/11358>) as
well as in the kernel:
<https://bugzilla.kernel.org/show_bug.cgi?id=215851> because it uses
to get the PC. -Wdangling-pointer is documented to warn about pointers
to objects. However, it uses is_auto_decl which checks DECL_P, but DECL_P
is also true for a label/enumerator/function declaration, none of which is
an object. Rather, it should use auto_var_p which correctly checks VAR_P
and PARM_DECL.
PR middle-end/106080
gcc/ChangeLog:
* gimple-ssa-warn-access.cc (is_auto_decl): Remove. Use auto_var_p
instead.
gcc/testsuite/ChangeLog:
* c-c++-common/Wdangling-pointer-10.c: New test.
* c-c++-common/Wdangling-pointer-9.c: New test.
Marek Polacek [Fri, 10 Feb 2023 22:26:57 +0000 (17:26 -0500)]
c++: fix ICE in joust_maybe_elide_copy [PR106675]
joust_maybe_elide_copy checks that the last conversion in the ICS for
the first argument is ck_ref_bind, which is reasonable, because we've
checked that we're dealing with a copy/move constructor. But it can
also happen that we couldn't figure out which conversion function is
better to convert the argument, as in this testcase: joust couldn't
decide if we should go with
operator foo &()
or
operator foo const &()
so we get a ck_ambig, which then upsets joust_maybe_elide_copy. Since
a ck_ambig can validly occur, I think we should just return early, as
in the patch below.
PR c++/106675
gcc/cp/ChangeLog:
* call.cc (joust_maybe_elide_copy): Return false for ck_ambig.
"vec_vsubcuqP" should be "vec_vsubcuq", this typo caused
us to define vec_vsubcuqP in rs6000-vecdefines.h instead
of vec_vsubcuq, so that compiler is not able to realize
the built-in function name vec_vsubcuq any more.
Co-authored-By: Andrew Pinski <apinski@marvell.com>
PR target/108396
gcc/ChangeLog:
* config/rs6000/rs6000-overload.def (VEC_VSUBCUQ): Fix typo
vec_vsubcuqP with vec_vsubcuq.
Kewen Lin [Wed, 18 Jan 2023 08:34:19 +0000 (02:34 -0600)]
rs6000: Teach rs6000_opaque_type_invalid_use_p about gcall [PR108348]
PR108348 shows one special case that MMA opaque types are
used in function arguments and treated as pass by reference,
it results in one copying from argument to a temp variable,
since this copying happens before rs6000_function_arg check,
it can cause ICE without MMA support then. This patch is to
teach function rs6000_opaque_type_invalid_use_p to check if
any function argument in a gcall stmt has the invalid use of
MMA opaque types.
btw, I checked the handling on return value, it doesn't have
this kind of issue as its checking and error emission is quite
early, so this doesn't handle function return value.
PR target/108348
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the
support for invalid uses of MMA opaque type in function arguments.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr108348-1.c: New test.
* gcc.target/powerpc/pr108348-2.c: New test.
Kewen Lin [Mon, 16 Jan 2023 08:15:39 +0000 (02:15 -0600)]
rs6000: Teach rs6000_opaque_type_invalid_use_p about inline asm [PR108272]
As PR108272 shows, there are some invalid uses of MMA opaque
types in inline asm statements. This patch is to teach the
function rs6000_opaque_type_invalid_use_p for inline asm,
check and error any invalid use of MMA opaque types in input
and output operands.
PR target/108272
gcc/ChangeLog:
* config/rs6000/rs6000.cc (rs6000_opaque_type_invalid_use_p): Add the
support for invalid uses in inline asm, factor out the checking and
erroring to lambda function check_and_error_invalid_use.
gcc/testsuite/ChangeLog:
* gcc.target/powerpc/pr108272-1.c: New test.
* gcc.target/powerpc/pr108272-2.c: New test.
* gcc.target/powerpc/pr108272-3.c: New test.
* gcc.target/powerpc/pr108272-4.c: New test.
Some package builds enable -fstack-protector and -Werror. Since
-fstack-protector is not supported on hppa because the stack grows
up, these packages must check for the warning generated by
-fstack-protector and suppress it on hppa. This is problematic
since hppa is the only significant architecture where the stack
grows up.
2022-12-16 John David Anglin <danglin@gcc.gnu.org>
Jakub Jelinek [Tue, 31 Jan 2023 08:20:34 +0000 (09:20 +0100)]
i386: Fix up -Wuninitialized warnings in avx512erintrin.h [PR105593]
As reported in the PR, there are some -Wuninitialized warnings in
avx512erintrin.h. One can see that by compiling sse-23.c testcase with
-Wuninitialized (or when actually using those intrinsics).
Those 6 spots use an uninitialized variable and pass it as one of the
argument to a builtin with constant mask -1, because there is no unmasked
builtin. It is true that expansion of those builtins into RTL will see
mask is all ones and ignore the unneeded argument, but -Wuninitialized
is diagnosed on GIMPLE and on GIMPLE these builtins are just builtin calls.
avx512fintrin.h and other headers use in these cases the _mm*_undefined_* ()
intrinsics, like:
return (__m512i) __builtin_ia32_psrav8di_mask ((__v8di) __X,
(__v8di) __Y,
(__v8di)
_mm512_undefined_epi32 (),
(__mmask8) -1);
etc. and the following patch does the same for avx512erintrin.h.
With the recent changes in C++ FE and the _mm*_undefined_* intrinsics,
we don't emit -Wuninitialized warnings for those (previously we didn't
just in C due to self-initialization). Of course we could also
just self-initialize these uninitialized vars and add the #pragma GCC
diagnostic dances around it, but using the intrinsics is consistent with
the rest and IMHO cleaner.
2023-01-31 Jakub Jelinek <jakub@redhat.com>
PR c++/105593
* config/i386/avx512erintrin.h (_mm512_exp2a23_round_pd,
_mm512_exp2a23_round_ps, _mm512_rcp28_round_pd, _mm512_rcp28_round_ps,
_mm512_rsqrt28_round_pd, _mm512_rsqrt28_round_ps): Use
_mm512_undefined_pd () or _mm512_undefined_ps () instead of using
uninitialized automatic variable __W.
* gcc.target/i386/sse-23.c: Add -Wuninitialized to dg-options.
Jakub Jelinek [Mon, 16 Jan 2023 08:41:38 +0000 (09:41 +0100)]
x86: Avoid -Wuninitialized warnings on _mm*_undefined_* in C++ [PR105593]
In https://gcc.gnu.org/pipermail/gcc-patches/2023-January/609844.html
I've posted a patch to allow ignoring -Winit-self using GCC diagnostic
pragmas, such that one can mark self-initialization as intentional
disabling of -Wuninitialized warnings.
The following incremental patch uses that in the x86 intrinsic
headers.
Jakub Jelinek [Mon, 16 Jan 2023 08:40:14 +0000 (09:40 +0100)]
c, c++: Allow ignoring -Winit-self through pragmas [PR105593]
As mentioned in the PR, various x86 intrinsics need to return
an uninitialized vector. Currently they use self initialization
to avoid -Wuninitialized warnings, which works fine in C, but
doesn't work in C++ where -Winit-self is enabled in -Wall.
We don't have an attribute to mark a variable as knowingly
uninitialized (the uninitialized attribute exists but means
something else, only in the -ftrivial-auto-var-init context),
and trying to suppress either -Wuninitialized or -Winit-self
inside of the _mm_undefined_ps etc. intrinsic definitions
doesn't work, one needs to currently disable through pragmas
-Wuninitialized warning at the point where _mm_undefined_ps etc.
result is actually used, but that goes against the intent of
those intrinsics.
The -Winit-self warning option actually doesn't do any warning,
all we do is record a suppression for -Winit-self if !warn_init_self
on the decl definition and later look that up in uninit pass.
The following patch changes those !warn_init_self tests which
are true only based on the command line option setting, not based
on GCC diagnostic pragma overrides to
!warning_enabled_at (DECL_SOURCE_LOCATION (decl), OPT_Winit_self)
such that it takes them into account.
2023-01-16 Jakub Jelinek <jakub@redhat.com>
PR c++/105593
gcc/c/
* c-parser.cc (c_parser_initializer): Check warning_enabled_at
at the DECL_SOURCE_LOCATION (decl) for OPT_Winit_self instead
of warn_init_self.
gcc/cp/
* decl.cc (cp_finish_decl): Check warning_enabled_at
at the DECL_SOURCE_LOCATION (decl) for OPT_Winit_self instead
of warn_init_self.
gcc/testsuite/
* c-c++-common/Winit-self3.c: New test.
* c-c++-common/Winit-self4.c: New test.
* c-c++-common/Winit-self5.c: New test.
because of the check on line 718 -- (int) i is not i. So -Wno-init-self
doesn't disable the warning as it's supposed to.
The following patch fixes it by moving the suppress_warning call from
c_gimplify_expr to the front ends, at points where we haven't created
the NOP_EXPR yet.
* c-parser.cc (c_parser_initializer): Add new tree parameter. Use it.
Call suppress_warning.
(c_parser_declaration_or_fndef): Pass d down to c_parser_initializer.
(c_parser_omp_declare_reduction): Pass omp_priv down to
c_parser_initializer.
Jakub Jelinek [Tue, 24 Jan 2023 10:28:00 +0000 (11:28 +0100)]
c++: Handle structured bindings like anon unions in initializers [PR108474]
As reported by Andrew Pinski, structured bindings (with the exception
of the ones using std::tuple_{size,element} and get which are really
standalone variables in addition to the binding one) also use
DECL_VALUE_EXPR and needs the same treatment in static initializers.
On Sun, Jan 22, 2023 at 07:19:07PM -0500, Jason Merrill wrote:
> Though, actually, why not instead fix expand_expr_real_1 (and staticp) to
> look through DECL_VALUE_EXPR?
Doing it when emitting the initializers seems to be too late to me,
we in various spots try to put parts of the static var DECL_INITIAL expressions
into the IL, or e.g. for varpool purposes remember which vars are referenced
there.
This patch moves it to record_reference, which is called from varpool_node::analyze
and so about the same time as gimplification of the bodies which also
replaces DECL_VALUE_EXPRs.
2023-01-24 Jakub Jelinek <jakub@redhat.com>
PR c++/108474
* cp-gimplify.cc (cp_fold_r): Handle structured bindings
vars like anon union artificial vars.
* g++.dg/cpp1z/decomp57.C: New test.
* g++.dg/cpp1z/decomp58.C: New test.
Jakub Jelinek [Thu, 19 Jan 2023 09:00:51 +0000 (10:00 +0100)]
forwprop: Further fixes for simplify_rotate [PR108440]
As mentioned in the simplify_rotate comment, for e.g.
((T) ((T2) X << (Y & (B - 1)))) | ((T) ((T2) X >> ((-Y) & (B - 1))))
we already emit
X r<< (Y & (B - 1))
as replacement. This PR is about the
((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms if T2 is wider than T. Unlike e.g.
(X << Y) OP (X >> (B - Y))
which is valid just for Y in [1, B - 1], the above 2 forms are actually
valid and do the rotates for Y in [0, B] - for Y 0 the X value is preserved
by the left shift and right logical shift by B adds just zeros (but because
the shift is in wider precision B is still valid shift count), while for
Y equal to B X is preserved through the latter shift and the former adds
just zeros.
Now, it is unclear if we in the middle-end treat rotates with rotate count
equal or larger than precision as UB or not, unlike shifts there are less
reasons to do so, but e.g. expansion of X r<< Y if there is no rotate optab
for the mode is emitted as (X << Y) | (((unsigned) X) >> ((-Y) & (B - 1)))
and so with UB on Y == B.
The following patch does multiple things:
1) for the above 2, asks the ranger if Y could be equal to B and if so,
instead of using X r<< Y uses X r<< (Y & (B - 1))
2) for the
((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
forms that were fixed 2 days ago it only punts if Y might be in the
[B,B2-1] range but isn't known to be in the
[0,B][2*B,2*B][3*B,3*B]... range. Because for Y which is a multiple
of B but smaller than B2 it acts as a rotate too, left shift provides
0 and (-Y) & (B - 1) is 0 and so preserves X. Though, for the cases
where Y is not known to be in [0,B-1] the patch also uses
X r<< (Y & (B - 1)) rather than X r<< Y
3) as discussed with Aldy, instead of using global ranger it uses a pass
specific copy but lazily created on first simplify_rotate that needs it;
this e.g. handles rotate inside of if body where the guarding condition
limits the shift count to some range which will not work with the
global ranger (unless there is some SSA_NAME to attach the range to).
Note, e.g. on x86 X r<< (Y & (B - 1)) and X r<< Y actually emit the
same assembly because rotates work the same even for larger rotate counts,
but that is handled only during combine.
2023-01-19 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108440
* tree-ssa-forwprop.cc: Include gimple-range.h.
(simplify_rotate): For the forms with T2 wider than T and shift counts of
Y and B - Y add & (B - 1) masking for the rotate count if Y could be equal
to B. For the forms with T2 wider than T and shift counts of
Y and (-Y) & (B - 1), don't punt if range could be [B, B2], but only if
range doesn't guarantee Y < B or Y = N * B. If range doesn't guarantee
Y < B, also add & (B - 1) masking for the rotate count. Use lazily created
pass specific ranger instead of get_global_range_query.
(pass_forwprop::execute): Disable that ranger at the end of pass if it has
been created.
* c-c++-common/rotate-10.c: New test.
* c-c++-common/rotate-11.c: New test.
Jakub Jelinek [Tue, 17 Jan 2023 11:14:25 +0000 (12:14 +0100)]
forwprop: Fix up rotate pattern matching [PR106523]
The comment above simplify_rotate roughly describes what patterns
are matched into what:
We are looking for X with unsigned type T with bitsize B, OP being
+, | or ^, some type T2 wider than T. For:
(X << CNT1) OP (X >> CNT2) iff CNT1 + CNT2 == B
((T) ((T2) X << CNT1)) OP ((T) ((T2) X >> CNT2)) iff CNT1 + CNT2 == B
transform these into:
X r<< CNT1
Or for:
(X << Y) OP (X >> (B - Y))
(X << (int) Y) OP (X >> (int) (B - Y))
((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
(X << Y) | (X >> ((-Y) & (B - 1)))
(X << (int) Y) | (X >> (int) ((-Y) & (B - 1)))
((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
transform these into (last 2 only if ranger can prove Y < B):
X r<< Y
The following testcase shows that 2 of these are problematic.
If T2 is wider than T, then the 2 which yse (-Y) & (B - 1) on one
of the shift counts but Y on the can do something different from
rotate. E.g.:
__attribute__((noipa)) unsigned char
f7 (unsigned char x, unsigned int y)
{
unsigned int t = x;
return (t << y) | (t >> ((-y) & 7));
}
if y is [0, 7], then it is a normal rotate, and if y is in [32, ~0U]
then it is UB, but for y in [9, 31] the left shift in this case
will never leave any bits in the result, while in a rotate they are
left there. Say for y 5 and x 0xaa the expression gives
0x55 which is the same thing as rotate, while for y 19 and x 0xaa
0x5, which is different.
Now, I believe the
((T) ((T2) X << Y)) OP ((T) ((T2) X >> (B - Y)))
((T) ((T2) X << (int) Y)) OP ((T) ((T2) X >> (int) (B - Y)))
forms are ok, because B - Y still needs to be a valid shift count,
and if Y > B then B - Y should be either negative or very large
positive (for unsigned types).
And similarly the last 2 cases above which use & (B - 1) on both
shift operands are definitely ok.
The following patch disables the
((T) ((T2) X << Y)) | ((T) ((T2) X >> ((-Y) & (B - 1))))
((T) ((T2) X << (int) Y)) | ((T) ((T2) X >> (int) ((-Y) & (B - 1))))
unless ranger says Y is not in [B, B2 - 1] range.
And, looking at it again this morning, actually the Y equal to B
case is still fine, if Y is equal to 0, then it is
(T) (((T2) X << 0) | ((T2) X >> 0))
and so X, for Y == B it is
(T) (((T2) X << B) | ((T2) X >> 0))
which is the same as
(T) (0 | ((T2) X >> 0))
which is also X. So instead of the [B, B2 - 1] range we could use
[B + 1, B2 - 1]. And, if we wanted to go further, even multiplies
of B are ok if they are smaller than B2, so we could construct a detailed
int_range_max if we wanted.
2023-01-17 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/106523
* tree-ssa-forwprop.cc (simplify_rotate): For the
patterns with (-Y) & (B - 1) in one operand's shift
count and Y in another, if T2 has wider precision than T,
punt if Y could have a value in [B, B2 - 1] range.
* c-c++-common/rotate-2.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-2b.c: New test.
* c-c++-common/rotate-4.c (f5, f6, f7, f8, f13, f14, f15, f16,
f37, f38, f39, f40, f45, f46, f47, f48): Add assertions using
__builtin_unreachable about shift count.
* c-c++-common/rotate-4b.c: New test.
* gcc.c-torture/execute/pr106523.c: New test.
Jakub Jelinek [Sat, 14 Jan 2023 09:17:14 +0000 (10:17 +0100)]
c++: Avoid incorrect shortening of divisions [PR108365]
The following testcase is miscompiled, because we shorten the division
in a case where it should not be shortened.
Divisions (and modulos) can be shortened if it is unsigned division/modulo,
or if it is signed division/modulo where we can prove the dividend will
not be the minimum signed value or divisor will not be -1, because e.g.
on sizeof(long long)==sizeof(int)*2 && __INT_MAX__ == 0x7fffffff targets
(-2147483647 - 1) / -1 is UB
but
(int) (-2147483648LL / -1LL) is not, it is -2147483648.
The primary aim of both the C and C++ FE division/modulo shortening I assume
was for the implicit integral promotions of {,signed,unsigned} {char,short}
and because at this point we have no VRP information etc., the shortening
is done if the integral promotion is from unsigned type for the divisor
or if the dividend is an integer constant other than -1.
This works fine for char/short -> int promotions when char/short have
smaller precision than int - unsigned char -> int or unsigned short -> int
will always be a positive int, so never the most negative.
Now, the C FE checks whether orig_op0 is TYPE_UNSIGNED where op0 is either
the same as orig_op0 or that promoted to int, I think that works fine,
if it isn't promoted, either the division/modulo common type will have the
same precision as op0 but then the division/modulo is unsigned and so
without UB, or it will be done in wider precision (e.g. because op1 has
wider precision), but then op0 can't be minimum signed value. Or it has
been promoted to int, but in that case it was again from narrower type and
so never minimum signed int.
But the C++ FE was checking if op0 is a NOP_EXPR from TYPE_UNSIGNED.
First of all, not sure if the operand of NOP_EXPR couldn't be non-integral
type where TYPE_UNSIGNED wouldn't be meaningful, but more importantly,
even if it is a cast from unsigned integral type, we only know it can't be
minimum signed value if it is a widening cast, if it is same precision or
narrowing cast, we know nothing.
So, the following patch for the NOP_EXPR cases checks just in case that
it is from integral type and more importantly checks it is a widening
conversion.
2023-01-14 Jakub Jelinek <jakub@redhat.com>
PR c++/108365
* typeck.cc (cp_build_binary_op): For integral division or modulo,
shorten if type0 is unsigned, or op0 is cast from narrower unsigned
integral type or stripped_op1 is INTEGER_CST other than -1.
* g++.dg/opt/pr108365.C: New test.
* g++.dg/warn/pr108365.C: New test.
Jakub Jelinek [Thu, 24 Nov 2022 09:38:42 +0000 (10:38 +0100)]
libstdc++: Another merge from fast_float upstream [PR107468]
Upstream fast_float came up with a cheaper test for
fegetround () == FE_TONEAREST using one float addition, one subtraction
and one comparison. If we know we are rounding to nearest, we can use
fast path in more cases as before.
The following patch merges those changes into libstdc++.
Jakub Jelinek [Mon, 7 Nov 2022 14:17:21 +0000 (15:17 +0100)]
libstdc++: Update from latest fast_float [PR107468]
The following patch partially updates from fast_float trunk.
That way it gets fix for the incorrect rounding case,
PR107468 aka https://github.com/fastfloat/fast_float/issues/149
Using std::fegetround showed in benchmarks too slow, so instead of
doing that the patch limits the fast path where it uses floating
point multiplication rather than integral to cases where we can
prove there will be no rounding (the multiplication will be exact, not
just that the two multiplication or division operation arguments are
exactly representable).
2022-11-07 Jakub Jelinek <jakub@redhat.com>
PR libstdc++/107468
* src/c++17/fast_float/fast_float.h: Partial merge from fast_float 662497742fea7055f0e0ee27e5a7ddc382c2c38e commit.
* testsuite/20_util/from_chars/pr107468.cc: New test.
Andrew Pinski [Thu, 9 Feb 2023 15:03:54 +0000 (16:03 +0100)]
match.pd: When simplifying BFR of an insert, require a mode precision integral type [PR108688]
The same problem as PR 88739 has crept in but
this time in match.pd when simplifying bit_field_ref of
an bit_insert. That is we are generating a BIT_FIELD_REF
of a non-mode-precision integral type.
PR tree-optimization/108688
* match.pd (bit_field_ref [bit_insert]): Avoid generating
BIT_FIELD_REFs of non-mode-precision integral operands.
Jakub Jelinek [Wed, 8 Feb 2023 17:41:21 +0000 (18:41 +0100)]
vect-patterns: Fix up vect_widened_op_tree [PR108692]
The following testcase is miscompiled on aarch64-linux since r11-5160.
Given
<bb 3> [local count: 955630225]:
# i_22 = PHI <i_20(6), 0(5)>
# r_23 = PHI <r_19(6), 0(5)>
...
a.0_5 = (unsigned char) a_15;
_6 = (int) a.0_5;
b.1_7 = (unsigned char) b_17;
_8 = (int) b.1_7;
c_18 = _6 - _8;
_9 = ABS_EXPR <c_18>;
r_19 = _9 + r_23;
...
where SSA_NAMEs 15/17 have signed char, 5/7 unsigned char and rest is int
we first pattern recognize c_18 as
patt_34 = (a.0_5) w- (b.1_7);
which is still correct, 5/7 are unsigned char subtracted in wider type,
but then vect_recog_sad_pattern turns it into
SAD_EXPR <a_15, b_17, r_23>
which is incorrect, because 15/17 are signed char and so it is
sum of absolute signed differences rather than unsigned sum of
absolute unsigned differences.
The reason why this happens is that vect_recog_sad_pattern calls
vect_widened_op_tree with MINUS_EXPR, WIDEN_MINUS_EXPR on the
patt_34 = (a.0_5) w- (b.1_7); statement's vinfo and vect_widened_op_tree
calls vect_look_through_possible_promotion on the operands of the
WIDEN_MINUS_EXPR, which looks through the further casts.
vect_look_through_possible_promotion has careful code to stop when there
would be nested casts that need to be preserved, but the problem here
is that the WIDEN_*_EXPR operation itself has an implicit cast on the
operands already - in this case of WIDEN_MINUS_EXPR the unsigned char
5/7 SSA_NAMEs are widened to unsigned short before the subtraction,
and vect_look_through_possible_promotion obviously isn't told about that.
Now, I think when we see those WIDEN_{MULT,MINUS,PLUS}_EXPR codes, we had
to look through possible promotions already when creating those and so
vect_look_through_possible_promotion again isn't really needed, all we need
to do is arrange what that function will do if the operand isn't result
of any cast. Other option would be let vect_look_through_possible_promotion
know about the implicit promotion from the WIDEN_*_EXPR, but I'm afraid
that would be much harder.
2023-02-08 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/108692
* tree-vect-patterns.cc (vect_widened_op_tree): If rhs_code is
widened_code which is different from code, don't call
vect_look_through_possible_promotion but instead just check op is
SSA_NAME with integral type for which vect_is_simple_use is true
and call set_op on this_unprom.
Jakub Jelinek [Fri, 3 Feb 2023 20:37:27 +0000 (21:37 +0100)]
fortran: Fix up hash table usage in gfc_trans_use_stmts [PR108451]
The first testcase in the PR (which I haven't included in the patch because
it is unclear to me if it is supposed to be valid or not) ICEs since extra
hash table checking has been added recently. The problem is that
gfc_trans_use_stmts does
tree *slot = entry->decls->find_slot_with_hash (rent->use_name, hash,
INSERT);
if (*slot == NULL)
and later on doesn't store anything into *slot and continues. Another spot
a few lines later correctly clears the slot if it decides not to use the
slot, so the following patch does the same.
2023-02-03 Jakub Jelinek <jakub@redhat.com>
PR fortran/108451
* trans-decl.cc (gfc_trans_use_stmts): Call clear_slot before
doing continue.
Jakub Jelinek [Thu, 2 Feb 2023 08:54:18 +0000 (09:54 +0100)]
nested, openmp: Wrap OMP_CLAUSE_*_GIMPLE_SEQ into GIMPLE_BIND for declare_vars [PR108435]
When gimplifying OMP_CLAUSE_{LASTPRIVATE,LINEAR}_STMT, we wrap it always
into a GIMPLE_BIND, but when putting statements directly into
OMP_CLAUSE_{LASTPRIVATE,LINEAR}_GIMPLE_SEQ, we do it only if needed (there
are any temporaries that need to be declared in the sequence).
convert_nonlocal_omp_clauses was relying on the GIMPLE_BIND to be there always
because it called declare_vars on it.
The following patch wraps it into GIMPLE_BIND in tree-nested if we need to
declare_vars on it on demand.
2023-02-02 Jakub Jelinek <jakub@redhat.com>
PR middle-end/108435
* tree-nested.cc (convert_nonlocal_omp_clauses)
<case OMP_CLAUSE_LASTPRIVATE>: If info->new_local_var_chain and *seq
is not a GIMPLE_BIND, wrap the sequence into a new GIMPLE_BIND
before calling declare_vars.
(convert_nonlocal_omp_clauses) <case OMP_CLAUSE_LINEAR>: Merge
with the OMP_CLAUSE_LASTPRIVATE handling except for whether
seq is initialized to &OMP_CLAUSE_LASTPRIVATE_GIMPLE_SEQ (clause)
or &OMP_CLAUSE_LINEAR_GIMPLE_SEQ (clause).
Jakub Jelinek [Wed, 1 Feb 2023 11:52:52 +0000 (12:52 +0100)]
ree: Fix -fcompare-debug issues in combine_reaching_defs [PR108573]
The PR78437 r7-4871 changes made combine_reaching_defs punt on
WORD_REGISTER_OPERATIONS targets if a setter of smaller than word
register has wider uses. This unfortunately breaks -fcompare-debug,
because if such a use appears only in DEBUG_INSN(s), while all other
uses aren't wider than the setter, we can REE optimize it without -g
and not with -g.
Such decisions shouldn't be based on debug instructions. We could try
to reset them or adjust in some other way after we decide to perform the
change, but at least on the testcase which used to fail on riscv64-linux
the
(debug_insn 8 7 9 2 (var_location:HI s (minus:HI (subreg:HI (and:DI (reg:DI 10 a0 [160])
(const_int 1 [0x1])) 0)
(subreg:HI (ashiftrt:DI (reg/v:DI 9 s1 [orig:151 l ] [151])
(debug_expr:SI D#1)) 0))) "pr108573.c":12:5 -1
(nil))
clearly doesn't care about the upper bits and I have hard time imaging how
could one end up with DEBUG_INSN which actually cares about those upper
bits.
So, the following patch just ignores uses on DEBUG_INSNs in this case,
if we run into something where we'd need to do something further later on,
let's deal with it when we have a testcase for it.
2023-02-01 Jakub Jelinek <jakub@redhat.com>
PR debug/108573
* ree.cc (combine_reaching_defs): Don't return false for paradoxical
subregs in DEBUG_INSNs.
Jakub Jelinek [Wed, 1 Feb 2023 09:38:46 +0000 (10:38 +0100)]
c++, openmp: Handle some OMP_*/OACC_* constructs during constant expression evaluation [PR108607]
While potential_constant_expression_1 handled most of OMP_* codes (by saying that
they aren't potential constant expressions), OMP_SCOPE was missing in that list.
I've also added OMP_SCAN, though that is less important (similarly to OMP_SECTION
it ought to appear solely inside of OMP_{FOR,SIMD} resp. OMP_SECTIONS).
As the testcase shows, it isn't enough, potential_constant_expression_1
can catch only some cases, as soon as one uses switch or ifs where at least
one of the possible paths could be constant expression, we can run into the
same codes during cxx_eval_constant_expression, so this patch handles those
there as well.
2023-02-01 Jakub Jelinek <jakub@redhat.com>
PR c++/108607
* constexpr.cc (cxx_eval_constant_expression): Handle OMP_*
and OACC_* constructs as non-constant.
(potential_constant_expression_1): Handle OMP_SCAN and OMP_SCOPE.
Jakub Jelinek [Tue, 31 Jan 2023 09:12:19 +0000 (10:12 +0100)]
i386: Fix up ix86_convert_const_wide_int_to_broadcast [PR108599]
The following testcase is miscompiled. The problem is that during
RTL DSE we see a V4DI register is being loaded { 16, 16, 0, 0 }
value and DSE mostly works in terms of scalar modes, so it calls
movoi to set an OImode REG to (const_wide_int 0x100000000000000010)
and ix86_convert_const_wide_int_to_broadcast thinks it can compute
that value by broadcasting DImode 0x10. While it is true that
for TImode result the broadcast could be used, for OImode/XImode
it can't be, because all but the lowest 2 HOST_WIDE_INTs aren't
present (so are 0 or -1 depending on sign), not 0x10 in this case.
The function checks if the least significant HOST_WIDE_INT elt
of the CONST_WIDE_INT is broadcastable from QI/HI/SI/DImode and then
/* Check if OP can be broadcasted from VAL. */
for (int i = 1; i < CONST_WIDE_INT_NUNITS (op); i++)
if (val != CONST_WIDE_INT_ELT (op, i))
return nullptr;
That is needed of course, but nothing checks that
CONST_WIDE_INT_NUNITS (op) isn't too small for the mode in question.
I think if op would be 0 or -1, it ought to be never CONST_WIDE_INT,
but CONST_INT and so we can just punt whenever the number of
CONST_WIDE_INT elts is not the expected one.
2023-01-31 Jakub Jelinek <jakub@redhat.com>
PR target/108599
* config/i386/i386-expand.cc
(ix86_convert_const_wide_int_to_broadcast): Return nullptr if
CONST_WIDE_INT_NUNITS (op) times HOST_BITS_PER_WIDE_INT isn't
equal to bitsize of mode.
Jakub Jelinek [Tue, 31 Jan 2023 08:46:35 +0000 (09:46 +0100)]
bbpart: Fix up ICE on asm goto [PR108596]
On the following testcase we have asm goto in hot block with 2 successors,
one cold to which it both falls through and has one of the label
pointing to it and another hot successor with another label.
Now, during bbpart we want to ensure that no blocks from one partition fall
through into a block in a different partition. fix_up_fall_thru_edges
does that by temporarily clearing the EDGE_CROSSING on the fallthrough edge,
calling force_nonfallthru and then depending on whether it created a new
bb either set EDGE_CROSSING on the single successor edge from the new bb
(the new bb is kept in the same partition as the predecessor block), or
if no new bb has been created setting EDGE_CROSSING back on the fallthru
edge which has been forced non-EDGE_FALLTHRU.
For asm goto this doesn't always work, force_nonfallthru can create a new bb
and change the fallthrough edge to point to that, but if the original
fallthru destination block has its label referenced among the asm goto
labels, it will create a new non-fallthru edge for the label(s).
But because we've temporarily cheated and cleared EDGE_CROSSING on the edge,
it is cleared on the new edge as well, then the caller sees we've created
a new bb and just sets EDGE_CROSSING on the single fallthru edge from the
new bb. But the direct edge from cur_bb to fallthru edge's destination
isn't handled and fails afterwards consistency checks, because it crosses
partitions.
The following patch notes the case and sets EDGE_CROSSING on that edge too.
2023-01-31 Jakub Jelinek <jakub@redhat.com>
PR rtl-optimization/108596
* bb-reorder.cc (fix_up_fall_thru_edges): Handle the case where cur_bb
ends with asm goto and has a crossing fallthrough edge to the same bb
that contains at least one of its labels by restoring EDGE_CROSSING
flag even on possible edge from cur_bb to new_bb successor.