git.ipfire.org Git - thirdparty/gcc.git/log

c++: constexpr PMF conversion [PR105996]

Here, we were calling build_reinterpret_cast regardless of whether there was
actually a cast, and that now sets REINTERPRET_CAST_P. But that
optimization seems dodgy anyway, as it involves NOP_EXPR from one
RECORD_TYPE to another and we try to reserve NOP_EXPR for fundamental types.
And the generated code seems the same, so let's drop it. And also strip
location wrappers.

PR c++/105996

gcc/cp/ChangeLog:

* typeck.cc (build_ptrmemfunc): Drop 0-offset optimization
and location wrappers.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-pmf3.C: New test.

c++: DMI in template with virtual base [PR106890]

When parsing a default member init we just build a CONVERT_EXPR for
converting to a virtual base, and then expand that into the more complex
form when we actually use the DMI in a constructor.  But that wasn't working
for the template case where we are considering the conversion at the point
that the constructor needs the DMI instantiation, so it seemed like we were
in a constructor already.  And then when the other constructor tries to
reuse the instantiation, it sees uses of the first constructor's parameters,
and dies.  So ensure that we get the CONVERT_EXPR in this case, too.

PR c++/106890

gcc/cp/ChangeLog:

* init.cc (maybe_instantiate_nsdmi_init): Don't leave
current_function_decl set to a constructor.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/nsdmi-template25.C: New test.

c++: constant, array, lambda, template [PR108975]

When a lambda refers to a constant local variable in the enclosing scope, we
tentatively capture it, but if we end up pulling out its constant value, we
go back at the end of the lambda and prune any unneeded captures. Here
while parsing the template we decided that the dim capture was unneeded,
because we folded it away, but then we brought back the use in the template
trees that try to preserve the source representation with added type info.
So then when we tried to instantiate that use, we couldn't find what it was
trying to use, and crashed.

Fixed by not trying to prune when parsing a template; we'll prune at
instantiation time.

PR c++/108975

gcc/cp/ChangeLog:

* lambda.cc (prune_lambda_captures): Don't bother in a template.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/lambda/lambda-const11.C: New test.

c++: namespace-scoped friend in local class [PR69410]

do_friend was only considering class-qualified identifiers for the
qualified-id case, but we also need to skip local scope when there's an
explicit namespace scope.

PR c++/69410

gcc/cp/ChangeLog:

* friend.cc (do_friend): Handle namespace as scope argument.
* decl.cc (grokdeclarator): Pass down in_namespace.

gcc/testsuite/ChangeLog:

* g++.dg/lookup/friend24.C: New test.

c++: __func__ and local class DMI [PR105809]

As in 108242, we need to instantiate in the context of the enclosing
function, not after it's gone.

PR c++/105809

gcc/cp/ChangeLog:

* init.cc (get_nsdmi): Split out...
(maybe_instantiate_nsdmi_init): ...this function.
* cp-tree.h: Declare it.
* pt.cc (tsubst_expr): Use it.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-__func__3.C: New test.

c++: generic lambda, local class, __func__ [PR108242]

Here we are trying to do name lookup in a deferred instantiation of t() and
failing to find __func__. tsubst_expr already tries to instantiate members
of local classes, but was failing with the partial instantiation of generic
lambdas.

PR c++/108242

gcc/cp/ChangeLog:

* pt.cc (tsubst_expr) [TAG_DEFN]: Handle partial instantiation.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/lambda-generic-func2.C: New test.

c++: &enum::enumerator [PR101869]

We don't want to call build_offset_ref with an enum.

PR c++/101869

gcc/cp/ChangeLog:

* semantics.cc (finish_qualified_id_expr): Don't try to build a
pointer-to-member if the scope is an enumeration.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/enum43.C: New test.

c++: co_await and move-only type [PR105406]

Here we were building a temporary MoveOnlyAwaitable to hold the result of
evaluating 'o', but since 'o' is an lvalue we should build a reference
instead.

PR c++/105406

gcc/cp/ChangeLog:

* coroutines.cc (build_co_await): Handle lvalue 'o'.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/co-await-moveonly1.C: New test.

c++: co_await and initializer_list [PR103871]

When flatten_await_stmt processes the backing array for an initializer_list,
we call cp_build_modify_expr to initialize the promoted variable from the
TARGET_EXPR; that needs to be accepted.

PR c++/103871
PR c++/98056

gcc/cp/ChangeLog:

* typeck.cc (cp_build_modify_expr): Allow array initialization of
DECL_ARTIFICIAL variable.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/co-await-initlist1.C: New test.

c++: variable tmpl partial specialization [PR108468]

Generally we expect TPARMS_PRIMARY_TEMPLATE to be set, but sometimes it
isn't for partial instantiations. This ought to be improved, but it's
trivial to work around it in this case.

PR c++/108468

gcc/cp/ChangeLog:

* pt.cc (unify_pack_expansion): Check that TPARMS_PRIMARY_TEMPLATE
is non-null.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/var-templ78.C: New test.

c++: -Wreturn-type with if (true) throw [PR107310]

I removed this folding in GCC 12 because it was interfering with an
experiment of richi's, but that never went in and the change causes
regressions, so let's put it back.

This reverts commit r12-5638-ga3e75c1491cd2d.

PR c++/107310

gcc/cp/ChangeLog:

* cp-gimplify.cc (genericize_if_stmt): Restore folding
of constant conditions.

gcc/testsuite/ChangeLog:

* c-c++-common/Wimplicit-fallthrough-39.c: Adjust warning.
* g++.dg/warn/Wreturn-6.C: New test.

c++: class NTTP and nested anon union [PR108566]

We were failing to come up with the name for the anonymous union. It seems
like unfortunate redundancy, but the ABI does say that the name of an
anonymous union is its first named member.

PR c++/108566

gcc/cp/ChangeLog:

* mangle.cc (anon_aggr_naming_decl): New.
(write_unqualified_name): Use it.

gcc/testsuite/ChangeLog:

* g++.dg/abi/anon6.C: New test.

c++: fix debug info for array temporary [PR107154]

In the testcase the elaboration of the array init that happens at genericize
time was getting the location info for the end of the function; fixed by
doing the expansion at the location of the original expression.

PR c++/107154

gcc/cp/ChangeLog:

* cp-gimplify.cc (cp_genericize_init_expr): Use iloc_sentinel.
(cp_genericize_target_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/debug/dwarf2/lineno-array1.C: New test.

c++: signed __int128_t [PR108099]

The code for handling signed + typedef was breaking on __int128_t, because
it isn't a proper typedef: it doesn't have DECL_ORIGINAL_TYPE.

PR c++/108099

gcc/cp/ChangeLog:

* decl.cc (grokdeclarator): Handle non-typedef typedef_decl.

gcc/testsuite/ChangeLog:

* g++.dg/ext/int128-7.C: New test.

reassoc: Fix up another ICE with returns_twice call [PR109410]

The following testcase ICEs in reassoc, unlike the last case I've fixed
there here SSA_NAME_USED_IN_ABNORMAL_PHI is not the case anywhere.
build_and_add_sum places new statements after the later appearing definition
of an operand but if both operands are default defs or constants, we place
statement at the start of the function.

If the very first statement of a function is a call to returns_twice
function, this doesn't work though, because that call has to be the first
thing in its basic block, so the following patch splits the entry successor
edge such that the new statements are added into a different block from the
returns_twice call.

I think we should in stage1 reconsider such placements, I think it
unnecessarily enlarges the lifetime of the new lhs if its operand(s)
are used more than once in the function.  Unless something sinks those
again.  Would be nice to place it closer to the actual uses (or where
they will be placed).

2023-04-12  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109410
* tree-ssa-reassoc.cc (build_and_add_sum): Split edge from entry
block if first statement of the function is a call to returns_twice
function.

* gcc.dg/pr109410.c: New test.

(cherry picked from commit 51856718a82ce60f067910d9037ca255645b37eb)

c++: Fix Solaris bootstraps across midnight

When working on the PR109040 fix, I wanted to test it on some
WORD_REGISTER_OPERATIONS target and tried sparc-solaris on GCC Farm.
My bootstrap failed in comparison failure on cp/module.o, because
Solaris date doesn't support the -r option and one stage's cp/module.o
was built before midnight and next stage's cp/module.o after midnight,
so they had different -DMODULE_VERSION= value.

Now, I think the advice (don't bootstrap at midnight) is something
we shouldn't have, so the following patch stores the module version
(still generated through the same way, date -r cp/module.cc
if it works, otherwise just date) into a temporary file, makes sure
that temporary file is updated when cp/module.cc source is updated
and when date -r doesn't work copies file from previous stage
if it is newer than cp/module.cc.

2023-04-12 Jakub Jelinek <jakub@redhat.com>

* Make-lang.in (s-cp-module-version): New target.
(cp/module.o): Depend on it.
(MODULE_VERSION): Remove variable.
(CFLAGS-cp/module.o): For -DMODULE_VERSION= argument just
cat s-cp-module-version.

(cherry picked from commit 56529056cb42baa382c40de7d239d02dbf72c94f)

libiberty: Make strstr.c in libiberty ANSI compliant

On Fri, Nov 13, 2020 at 11:53:43AM -0700, Jeff Law via Gcc-patches wrote:
>
> On 5/1/20 6:06 PM, Seija Kijin via Gcc-patches wrote:
> > The original code in libiberty says "FIXME" and then says it has not been
> > validated to be ANSI compliant. However, this patch changes the function to
> > match implementations that ARE compliant, and such code is in the public
> > domain.
> >
> > I ran the test results, and there are no test failures.
>
> Thanks.  This seems to be the standard "simple" strstr implementation.
> There's significantly faster implementations available, but I doubt it's
> worth the effort as the version in this file only gets used if there is
> no system strstr.c.

Except that PR109306 says the new version is non-compliant and
is certainly slower than what we used to have.  The only problem I see
on the old version (sure, it is not very fast version) is that for
strstr ("abcd", "") it returned "abcd"+4 rather than "abcd" because
strchr in that case changed p to point to the last character and then
strncmp returned 0.

The question reported in PR109306 is whether memcmp is required not to
access characters beyond the first difference or not.
For all of memcmp/strcmp/strncmp, C17 says:
"The sign of a nonzero value returned by the comparison functions memcmp, strcmp, and strncmp
is determined by the sign of the difference between the values of the first pair of characters (both
interpreted as unsigned char) that differ in the objects being compared."
but then in memcmp description says:
"The memcmp function compares the first n characters of the object pointed to by s1 to the first n
characters of the object pointed to by s2."
rather than something similar to strncmp wording:
"The strncmp function compares not more than n characters (characters that follow a null character
are not compared) from the array pointed to by s1 to the array pointed to by
s2."
So, while for strncmp it seems clearly well defined when there is zero
terminator before reaching the n, for memcmp it is unclear if say
int
memcmp (const void *s1, const void *s2, size_t n)
{
  int ret = 0;
  size_t i;
  const unsigned char *p1 = (const unsigned char *) s1;
  const unsigned char *p2 = (const unsigned char *) s2;

  for (i = n; i; i--)
    if (p1[i - 1] != p2[i - 1])
      ret = p1[i - 1] < p2[i - 1] ? -1 : 1;
  return ret;
}
wouldn't be valid implementation (one which always compares all characters
and just returns non-zero from the first one that differs).

So, shouldn't we just revert and handle the len == 0 case correctly?

I think almost nothing really uses it, but still, the old version
at least worked nicer with a fast strchr.
Could as well strncmp (p + 1, s2 + 1, len - 1) if that is preferred
because strchr already compared the first character.

2023-04-02  Jakub Jelinek  <jakub@redhat.com>

PR other/109306
* strstr.c: Revert the 2020-11-13 changes.
(strstr): Return s1 if len is 0.

(cherry picked from commit 1719fa40c4ee4def60a2ce2f27e17f8168cf28ba)

c++: Fix up ICE in build_min_non_dep_op_overload [PR109319]

The following testcase ICEs, because grok_array_decl during
processing_template_decl handling of a non-dependent subscript
emits a -Wcomma-subscript pedwarn, we decide to pass to the
single index argument the index expressions as if it was wrapped
with () around it, but then when preparing it for later instantiation
we don't actually take that into account and ICE on a mismatch of
number of index arguments (the overload expects a single index,
testcase has two index expressions in this case).
For non-dependent subscript which are builtin subscripts we also
emit the same pedwarn and don't ICE, but emit the same pedwarn
again whenever we instantiate it, which is also IMHO undesirable,
it is enough to warn once during parsing the template.

The following patch fixes it by turning even the original index expressions
(those which didn't go through make_args_non_dependent) into a single
index using comma expression(s).

2023-03-30 Jakub Jelinek <jakub@redhat.com>

PR c++/109319
* decl2.cc (grok_array_decl): After emitting a pedwarn for
-Wcomma-subscript, if processing_template_decl set orig_index_exp
to compound expr from orig_index_exp_list.

* g++.dg/cpp23/subscript14.C: New test.

(cherry picked from commit c016887c91a79d67b6a3c7e19b9219f5ab1e2a4d)

sanopt: Return TODO_cleanup_cfg if any .{UB,HWA,A}SAN_* calls were lowered [PR106190]

The following testcase ICEs, because without optimization eh lowering
decides not to duplicate finally block of try/finally and so we end up
with variable guarded cleanup.  The sanopt pass creates a cfg that ought
to be cleaned up (some IFN_UBSAN_* functions are lowered in this case with
constant conditions in gcond and when not allowing recovery some bbs which
end with noreturn calls actually have successor edges), but the cfg cleanup
is actually (it is -O0) done only during the optimized pass.  We notice
there that the d[1][a] = 0; statement which has an EH edge is unreachable
(because ubsan would always abort on the out of bounds d[1] access), remove
the EH landing pad and block, but because that block just sets a variable
and jumps to another one which tests that variable and that one is reachable
from normal control flow, the __builtin_eh_pointer (1) later in there is
kept in the IL and we ICE during expansion of that statement because the
EH region has been removed.

The following patch fixes it by doing the cfg cleanup already during
sanopt pass if we create something that might need it, while the EH
landing pad is then removed already during sanopt pass, there is ehcleanup
later and we don't ICE anymore.

2023-03-28  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/106190
* sanopt.cc (pass_sanopt::execute): Return TODO_cleanup_cfg if any
of the IFN_{UB,HWA,A}SAN_* internal fns are lowered.

* gcc.dg/asan/pr106190.c: New test.

(cherry picked from commit 39a43dc336561e0eba0de477b16c7355f19d84ee)

i386: Require just 32-bit alignment for SLOT_FLOATxFDI_387 -m32 -mpreferred-stack-boundary=2 DImode temporaries [PR109276]

The following testcase ICEs since r11-2259 because assign_386_stack_local
-> assign_stack_local -> ix86_local_alignment now uses 64-bit alignment
for DImode temporaries rather than 32-bit as before.
Most of the spots in the backend which ask for DImode temporaries are during
expansion and those apparently are handled fine with -m32
-mpreferred-stack-boundary=2, we dynamically realign the stack in that case
(most of the spots actually don't need that alignment but at least one
does), then 2 spots are in STV which I assume also work correctly.
But during splitting we can create a DImode slot which doesn't need to be
64-bit alignment (it is nicer for performance though), when we apparently
aren't able to detect it for dynamic stack realignment purposes.

The following patch just makes the slot 32-bit aligned in that rare case.

2023-03-28 Jakub Jelinek <jakub@redhat.com>

PR target/109276
* config/i386/i386.cc (assign_386_stack_local): For DImode
with SLOT_FLOATxFDI_387 and -m32 -mpreferred-stack-boundary=2 pass
align 32 rather than 0 to assign_stack_local.

* gcc.target/i386/pr109276.c: New test.

(cherry picked from commit 4b5ef857f5faf09f274c0a95c67faaa80d198124)

predict: Don't emit -Wsuggest-attribute=cold warning for functions which already have that attribute [PR105685]

In the following testcase, we predict baz to have cold
entry regardless of the user supplied attribute (as it call
unconditionally a cold function), but still issue
a -Wsuggest-attribute=cold warning despite it having that attribute
already.

The following patch avoids that.

2023-03-26 Jakub Jelinek <jakub@redhat.com>

PR ipa/105685
* predict.cc (compute_function_frequency): Don't call
warn_function_cold if function already has cold attribute.

* c-c++-common/cold-2.c: New test.

(cherry picked from commit 7eca91d4781bb3df941f25c30b971dac66ba1b3d)

tree-vect-generic: Fix up expand_vector_condition [PR109176]

The following testcase ICEs on aarch64-linux, because
expand_vector_condition attempts to piecewise lower SVE
  d_3 = a_1(D) < b_2(D);
  _5 = VEC_COND_EXPR <d_3, c_4(D), d_3>;
which isn't possible - nunits_for_known_piecewise_op ICEs but
the rest of the code assumes constant number of elements too.

expand_vector_condition attempts to find if a (rhs1) is a SSA_NAME
for comparison and calls expand_vec_cond_expr_p (type, TREE_TYPE (a1), code)
where a1 is one of the operands of the comparison and code is the comparison
code.  That one indeed isn't supported here, but what aarch64 SVE supports
are the individual statements, comparison (expand_vec_cmp_expr_p) and
expand_vec_cond_expr_p (type, TREE_TYPE (a), SSA_NAME), the latter because
that function starts with
  if (VECTOR_BOOLEAN_TYPE_P (cmp_op_type)
      && get_vcond_mask_icode (TYPE_MODE (value_type),
                               TYPE_MODE (cmp_op_type)) != CODE_FOR_nothing)
    return true;

In an earlier version of the patch (in the PR), we did this
  if (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (a))
      && expand_vec_cond_expr_p (type, TREE_TYPE (a), ERROR_MARK))
    return true;
before the code == SSA_NAME handling plus some further tweaks later.
While that fixed the ICE, it broke quite a few tests on x86 and some on
aarch64 too.  The problem is that expand_vector_comparison doesn't lower
comparisons which aren't supported and only feed VEC_COND_EXPR first operand
and expand_vector_condition succeeds for those, so with the above mentioned
change we'd verify the VEC_COND_EXPR is implementable using optab alone,
but nothing would verify the tcc_comparison which relied on
expand_vector_condition to verify.

So, the following patch instead queries whether optabs can handle the
comparison and VEC_COND_EXPR together (if a (rhs1) is a comparison;
otherwise as before it checks only the VEC_COND_EXPR) and if that fails,
also checks whether the two operations could be supported individually
and only if even that fails does the piecewise lowering.

2023-03-23  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/109176
* tree-vect-generic.cc (expand_vector_condition): If a has
vector boolean type and is a comparison, also check if both
the comparison and VEC_COND_EXPR could be successfully expanded
individually.

* gcc.target/aarch64/sve/pr109176.c: New test.

(cherry picked from commit 484c41c747d95f9cee15a33b75b32ae2e7eb45f3)

c++: Drop TREE_READONLY on vars (possibly) initialized by tls wrapper [PR109164]

The following two testcases are miscompiled, because we keep TREE_READONLY
on the vars even when they are (possibly) dynamically initialized by a TLS
wrapper function.  Normally cp_finish_decl drops TREE_READONLY from vars
which need dynamic initialization, but for TLS we do this kind of
initialization upon every access to those variables.  Keeping them
TREE_READONLY means e.g. PRE can hoist loads from those before loops
which contain the TLS wrapper calls, so we can access the TLS variables
before they are initialized.

2023-03-20  Jakub Jelinek  <jakub@redhat.com>

PR c++/109164
* cp-tree.h (var_needs_tls_wrapper): Declare.
* decl2.cc (var_needs_tls_wrapper): No longer static.
* decl.cc (cp_finish_decl): Clear TREE_READONLY on TLS variables
for which a TLS wrapper will be needed.

* g++.dg/tls/thread_local13.C: New test.
* g++.dg/tls/thread_local13-aux.cc: New file.
* g++.dg/tls/thread_local14.C: New test.
* g++.dg/tls/thread_local14-aux.cc: New file.

(cherry picked from commit 0a846340b99675d57fc2f2923a0412134eed09d3)

Daily bump.

PR target/108589 - Check REG_P for AARCH64_FUSE_ADDSUB_2REG_CONST1

This adds a check for REG_P on SET_DEST for the new idiom recognizer
for AARCH64_FUSE_ADDSUB_2REG_CONST1. The reported ICE is only
observable with checking=rtl.

Bootstrapped/regtested aarch64-linux, committed.

PR target/108589

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch_macro_fusion_pair_p): Check
REG_P on SET_DEST.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/pr108589.c: New test.

(cherry picked from commit a39c6ec97906766ad65d15d4856fd41121ee7a45)

aarch64: disable LDP via tuning structure for -mcpu=ampere1

AmpereOne (-mcpu=ampere1) breaks LDP instructions into two uops.
Given the chance that this causes instructions to slip into the next
decoding cycle and the additional overheads when handling
cacheline-crossing LDP instructions, we disable the generation of LDP
isntructions through the tuning structure from instruction combining
(such as in peephole2).

Given the code-density benefits in builtins and prologue/epilogue
expansion, we allow LDPs there.

This commit:
* adds a new tuning option AARCH64_EXTRA_TUNE_NO_LDP_COMBINE
* allows -moverride=tune=... to override this

These changes are benchmark-driven, yielding the following changes
(with a net-overall improvement):
   503.bwaves_r.      -0.88%
   507.cactuBSSN_r     0.35%
   508.namd_r          3.09%
   510.parest_r       -2.99%
   511.povray_r        5.54%
   519.lbm_r          15.83%
   521.wrf_r           0.56%
   526.blender_r       2.47%
   527.cam4_r          0.70%
   538.imagick_r       0.00%
   544.nab_r          -0.33%
   549.fotonik3d_r.   -0.42%
   554.roms_r          0.00%
   -------------------------
   = total             1.79%

Signed-off-by: Philipp Tomsich <philipp.tomsich@vrull.eu>
Co-Authored-By: Di Zhao <di.zhao@amperecomputing.com>
gcc/ChangeLog:

* config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION):
Add AARCH64_EXTRA_TUNE_NO_LDP_COMBINE.
* config/aarch64/aarch64.cc (aarch64_operands_ok_for_ldpstp):
Check for the above tuning option when processing loads.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/ampere1-no_ldp_combine.c: New test.

(cherry picked from commit f200c56787f2c6f93ffb739d57d01a294ab72f68)

aarch64: Don't trust TYPE_ALIGN for pointers [PR108910]

The aarch64 PCS rules ignore user alignment for scalars and
vectors and use the "natural" alignment of the type.  GCC tried
to calculate that natural alignment using:

  TYPE_ALIGN (TYPE_MAIN_VARIANT (type))

But as discussed in the PR, it's possible that the main variant
of a pointer type is an overaligned type (although that's usually
accidental).

This isn't known to be a problem for other types, so this patch
changes the bare minimum.  It might be that we need to ignore
TYPE_ALIGN in other cases too.

gcc/
PR target/108910
* config/aarch64/aarch64.cc (aarch64_function_arg_alignment): Do
not trust TYPE_ALIGN for pointer types; use POINTER_SIZE instead.

gcc/testsuite/
PR target/108910
* gcc.dg/torture/pr108910.c: New test.

(cherry picked from commit 66946624b96b762985de56444d726a0ebd4e0df5)

tree-optimization/109434 - bogus DSE of throwing call LHS

The byte tracking of call LHS didn't properly handle possibly
throwing calls correctly which cases bogus DSE and in turn, for the
testcase a bogus uninit diagnostic and (unreliable) wrong-code.

PR tree-optimization/109434
* tree-ssa-dse.cc (initialize_ao_ref_for_dse): Properly
handle possibly throwing calls when processing the LHS
and may-defs are not OK. Add mode to initialize a may-def.
(dse_optimize_stmt): Query may-defs.

* g++.dg/opt/pr109434.C: New testcase.

tree-optimization/109502 - vector conversion between mask and non-mask

The following fixes a check that should have rejected vectorizing
a conversion between a mask and non-mask type. Those should be
done via pattern statements.

PR tree-optimization/109502
* tree-vect-stmts.cc (vectorizable_assignment): Fix
check for conversion between mask and non-mask types.

* gcc.dg/vect/pr109502.c: New testcase.

(cherry picked from commit bf24f2db2841b97bc5e86bf9294d61eef32f83b3)

tree-optimization/109491 - ICE in expressions_equal_p

At some point I elided the NULL pointer check in expressions_equal_p
because it shouldn't be necessary not realizing that for example
TARGET_MEM_REF has optional operands we cannot substitute with
something non-NULL with the same semantics. The following does the
simple thing and restore the check removed in r11-4982.

PR tree-optimization/109491
* tree-ssa-sccvn.cc (expressions_equal_p): Restore the
NULL operands test.

(cherry picked from commit a37783de23c067d6a26374ff29c014e49604035c)

tree-optimization/109473 - ICE with reduction epilog adjustment op

The following makes sure to carry out the reduction epilog adjustment
in the original computation type which for pointers is an unsigned
integer type. There's a similar issue with signed vs. unsigned ops
and overflow which is fixed by this as well.

PR tree-optimization/109473
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Convert scalar result to the computation type before performing
the reduction adjustment.

* gcc.dg/vect/pr109473.c: New testcase.

(cherry picked from commit df7f55cb2ae550adeda339a57b657ebe1ad39367)

tree-optimization/109469 - SLP with returns-twice region start

The following avoids an SLP region starting with a returns-twice
call where we cannot insert stmts at the head.

PR tree-optimization/109469
* tree-vect-slp.cc (vect_slp_function): Skip region starts with
a returns-twice call.

* gcc.dg/torture/pr109469.c: New testcase.

(cherry picked from commit 2d7ad38707e1fd71193d440198cc0726092b9015)

lto/109263 - lto-wrapper and -g0 -ggdb

The following makes lto-wrapper deal with non-combined debug
disabling / enabling option combinations properly. Interestingly
-gno-dwarf also enables debug.

PR lto/109263
* lto-wrapper.cc (run_gcc): Parse alternate debug options
as well, they always enable debug.

(cherry picked from commit 4cbd5ef0350d8ab04993eb4c48ab80999fb4f358)

tree-optimization/109219 - avoid looking at STMT_SLP_TYPE

The following avoids looking at STMT_SLP_TYPE apart from the only
place needing it - transform and analysis of non-SLP loop stmts.
In particular it doesn't have a reliable meaning on SLP representatives
which are also passed as stmt_vinfo to vectorizable_* routines. The
proper way to check in those is to look for the slp_node argument
instead.

PR tree-optimization/109219
* tree-vect-loop.cc (vectorizable_reduction): Check
slp_node, not STMT_SLP_TYPE.
* tree-vect-stmts.cc (vectorizable_condition): Likewise.
* tree-vect-slp.cc (vect_slp_analyze_node_operations_1):
Remove assertion on STMT_SLP_TYPE.

* gcc.dg/torture/pr109219.c: New testcase.

(cherry picked from commit 26adc870e3675591050f37edab46850b97a3c122)

ipa/106124 - ICE with -fkeep-inline-functions and OpenMP

The testcases in this bug reveal cases where an early generated
type is collected because it was unused but gets attempted to
be recreated later when a late DIE for a function (an OpenMP
reduction) is created. That's unexpected and possibly the fault
of OpenMP but the following allows the re-creation of the context
type to succeed.

PR ipa/106124
* dwarf2out.cc (lookup_type_die): Reset TREE_ASM_WRITTEN
so we can re-create the DIE for the type if required.

* g++.dg/gomp/pr106124.C: New testcase.

(cherry picked from commit 36330e2e95564a360e6dbcfb4e7566d5c2177415)

ipa/105676 - pure attribute suggestion for const function

When a function is declared const (even though it technically
accesses memory), ipa-modref discovering pureness shouldn't end
up suggesting that attribute. The following thus exempts
'const' functions from ipa_make_function_pure handling.

PR ipa/105676
* ipa-pure-const.cc (ipa_make_function_pure): Skip also
for functions already being const.

* gcc.dg/pr105676.c: New testcase.

(cherry picked from commit 45e09c2eb9c2bdd34ef777e06ddc9908dd0664f9)

rs6000: Fix vector parity support [PR108699]

The failures on the original failed case builtin-bitops-1.c
and the associated test case pr108699.c here show that the
current support of parity vector mode is wrong on Power.
The hardware insns vprtyb[wdq] which operate on the least
significant bit of each byte per element, they doesn't match
what RTL opcode parity needs, but the current implementation
expands it with them wrongly.

This patch is to fix the handling with one more insn vpopcntb.

PR target/108699

gcc/ChangeLog:

* config/rs6000/altivec.md (*p9v_parity<mode>2): Rename to ...
(rs6000_vprtyb<mode>2): ... this.
* config/rs6000/rs6000-builtins.def (VPRTYBD): Replace parityv2di2 with
rs6000_vprtybv2di2.
(VPRTYBW): Replace parityv4si2 with rs6000_vprtybv4si2.
(VPRTYBQ): Replace parityv1ti2 with rs6000_vprtybv1ti2.
* config/rs6000/vector.md (parity<mode>2 with VEC_IP): Expand with
popcountv16qi2 and the corresponding rs6000_vprtyb<mode>2.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/p9-vparity.c: Add scan-assembler-not for vpopcntb
to distinguish parity byte from parity.
* gcc.target/powerpc/pr108699.c: New test.

(cherry picked from commit cdd2d6643f7fef40e335a7027edfea7276cde608)

rs6000: Fix vector_set_var_p9 by considering BE [PR108807]

As PR108807 exposes, the current handling in function
rs6000_expand_vector_set_var_p9 doesn't take care of big
endianness. Currently the function is to rotate the
target vector by moving element to-be-set to element 0,
set element 0 with the given val, then rotate back. To
get the permutation control vector for the rotation, it
makes use of lvsr and lvsl, but the element ordering is
different for BE and LE (like element 0 is the most
significant one on BE while the least significant one on
LE), this patch is to add consideration for BE and make
sure permutation control vectors for rotations are expected.

As tested, it helped to fix the below failures:

FAIL: gcc.target/powerpc/pr79251-run.p9.c execution test
FAIL: gcc.target/powerpc/pr89765-mc.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-10d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-11d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-14d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-16d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-18d.c execution test
FAIL: gcc.target/powerpc/vsx-builtin-9d.c execution test
PR target/108807

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_expand_vector_set_var_p9): Fix gen
function for permutation control vector by considering big endianness.

(cherry picked from commit d634e6088f139ee700d79ec73b1ad6436096a6ff)

Daily bump.

Fortran: fix compile-time simplification of SET_EXPONENT [PR109511]

gcc/fortran/ChangeLog:

PR fortran/109511
* simplify.cc (gfc_simplify_set_exponent): Fix implementation of
compile-time simplification of intrinsic SET_EXPONENT for argument
X < 1 and for I < 0.

gcc/testsuite/ChangeLog:

PR fortran/109511
* gfortran.dg/set_exponent_1.f90: New test.

(cherry picked from commit fa4cb42870df60deb8888dbd51e2ddc6d6ab9e6a)

Daily bump.

Disable X86_TUNE_AVX256_MOVE_BY_PIECES and STORE_BY_PIECES for znver1-3

I have enabled SSE moves for znver1-3 since they are performance win on this
machine too (we avoid using loops or string operations which are more costy).
However as discussed in the PR log, this triggers bug in IRA and it was decided
it is better to not backport the fix.

gcc/ChangeLog:

2023-04-14 Jan Hubicka <hubicka@ucw.cz>

PR target/109137
* config/i386/x86-tune.def (X86_TUNE_AVX256_MOVE_BY_PIECES):
Remove znver1-3.
(X86_TUNE_AVX256_STORE_BY_PIECES): Remove znver1-3.

Daily bump.

aarch64: update ampere1 vectorization cost

The original submission of AmpereOne (-mcpu=ampere1) costs occurred
prior to exhaustive testing of vectorizable workloads against
hardware.

Adjust the vector costs to achieve the best results and more closely
match the underlying hardware.

gcc/ChangeLog:

* config/aarch64/aarch64.cc: Update vector costs for ampere1.

Co-Authored-By: Jiangning Liu <jiangning.liu@amperecomputing.com>
Co-Authored-By: Manolis Tsamis <manolis.tsamis@vrull.eu>
(cherry picked from commit ff1f2f2412bda118f7ddc10e69bd4284d9b24b9e)

Daily bump.

libstdc++: Ensure headers used by fast_float are included

This makes floating_from_chars.cc explicitly include all headers
that are used by the original fast_float amalgamation according to
r12-6647-gf5c8b82512f9d3, except:

  1. <cctype> since fast_float doesn't seem to use anything from it
  2. <cinttypes> since fast_float doesn't seem to use anything directly
     from it (this header also pulls in <cstdint>)
  3. <system_error> since std::errc is naturally already available
     from <charconv>

This avoids potential fast_float build failures on platforms for which
some required headers (in particular <cstdint>) end up not getting
transitively included from elsewhere.

libstdc++-v3/ChangeLog:

* src/c++17/floating_from_chars.cc: Include <algorithm>,
<iterator>, <limits> and <cstdint>.

(cherry picked from commit 13669111e7219ed1f71b2079c7b5794c11f6e3ce)

Daily bump.

Backport from master

2023-04-10  Michael Meissner  <meissner@linux.ibm.com>

gcc/

PR target/109067
* config/rs6000/rs6000.cc (create_complex_muldiv): Delete.
(init_float128_ieee): Delete code to switch complex multiply and divide
for long double.  Backport from master, 3/20/2023.
(complex_multiply_builtin_code): New helper function.
(complex_divide_builtin_code): Likewise.
(rs6000_mangle_decl_assembler_name): Add support for mangling the name
of complex 128-bit multiply and divide built-in functions.

gcc/testsuite/

PR target/109067
* gcc.target/powerpc/divic3-1.c: New test.  Backport from master,
3/20/2023.
* gcc.target/powerpc/divic3-2.c: Likewise.
* gcc.target/powerpc/mulic3-1.c: Likewise.
* gcc.target/powerpc/mulic3-2.c: Likewise.

Daily bump.

Fix typo in -param=vect-induction-float= attributes

There was a typo in the attributes of the option
-param=vect-induction-float= for IntegerRange.
This fixes that typo.

Committed to GCC 12 branch as obvious after a build/test.

gcc/ChangeLog:

PR tree-optimization/109427
* params.opt (-param=vect-induction-float=):
Fix option attribute typo for IntegerRange.

(cherry picked from commit 0f816116356fec32e3a3a2fb5af790a0438c5da4)

Daily bump.

vect: Make partial trapping ops use predication [PR96373]

PR96373 points out that a predicated SVE loop currently converts
trapping unconditional ops into unpredicated vector ops.  Doing
the operation on inactive lanes can then raise an exception.

As discussed in the PR trail, we aren't 100% consistent about
whether we preserve traps or not.  But the direction of travel
is clearly to improve that rather than live with it.  This patch
tries to do that for the SVE case.

Doing this regresses gcc.target/aarch64/sve/fabd_1.c.  I've added
-fno-trapping-math for now and filed PR108571 to track it.
A similar problem applies to fsubr_1.c.

I think this is likely to regress Power 10, since conditional
operations are only available for masked loops.  I think we'll
need to add -fno-trapping-math to any affected testcases,
but I don't have a Power 10 system to test on.

gcc/
PR tree-optimization/96373
PR tree-optimization/108979
* tree-vect-stmts.cc (vectorizable_operation): Predicate trapping
operations on the loop mask.  Reject partial vectors if this isn't
possible.  Don't mask operations on invariants.

gcc/testsuite/
PR tree-optimization/96373
PR tree-optimization/108571
PR tree-optimization/108979
* gcc.target/aarch64/sve/fabd_1.c: Add -fno-trapping-math.
* gcc.target/aarch64/sve/fsubr_1.c: Likewise.
* gcc.target/aarch64/sve/fmul_1.c: Expect predicate ops.
* gcc.target/aarch64/sve/fp_arith_1.c: Likewise.
* gfortran.dg/vect/pr108979.f90: New test.

aarch64: Restore vectorisation of vld1 inputs [PR109072]

Before GCC 12, we would vectorize:

  int32_t arr[] = { x, x, x, x };

at -O3.  Vectorizing the store on its own is often a loss, particularly
for integers, so g:4963079769c99c4073adfd799885410ad484cbbe suppressed it.
This was necessary to fix regressions from enabling vectorisation at -O2,

However, the vectorisation is important if the code subsequently loads
from the array using vld1:

  return vld1q_s32 (arr);

This approach of initialising an array and loading from it is the
recommend endian-agnostic way of constructing an ACLE vector.

As discussed in the PR notes, the general fix would be to fold the
store and load-back to a constructor (preferably before vectorisation).
But that's clearly not stage 4 material.

This patch instead delays folding vld1 until after inlining and
records which decls a vld1 loads from.  It then treats vector
stores to those decls as free, on the optimistic assumption that
they will be removed later.  The patch also brute-forces
vectorization of plain constructor+store sequences, since some
of the CPU costs make that (dubiously) expensive even when the
store is discounted.

Delaying folding showed that we were failing to update the vops.
The patch fixes that too.

Thanks to Tamar for discussion & help with testing.

gcc/
PR target/109072
* config/aarch64/aarch64-protos.h (aarch64_vector_load_decl): Declare.
* config/aarch64/aarch64.h (machine_function::vector_load_decls): New
variable.
* config/aarch64/aarch64-builtins.cc (aarch64_record_vector_load_arg):
New function.
(aarch64_general_gimple_fold_builtin): Delay folding of vld1 until
after inlining.  Record which decls are loaded from.  Fix handling
of vops for loads and stores.
* config/aarch64/aarch64.cc (aarch64_vector_load_decl): New function.
(aarch64_accesses_vector_load_decl_p): Likewise.
(aarch64_vector_costs::m_stores_to_vector_load_decl): New member
variable.
(aarch64_vector_costs::add_stmt_cost): If the function has a vld1
that loads from a decl, treat vector stores to those decls as
zero cost.
(aarch64_vector_costs::finish_cost): ...and in that case,
if the vector code does nothing more than a store, give the
prologue a zero cost as well.

gcc/testsuite/
PR target/109072
* gcc.target/aarch64/pr109072_1.c: New test.
* gcc.target/aarch64/pr109072_2.c: Likewise.

(cherry picked from commit fcb411564a655a01f759eea3bb16bfd1bc879bfd)

lra: Replace subregs in bare uses & clobbers [PR108681]

In this PR we had a write to one vector of a 4-vector tuple.
The vector had mode V1DI, and the target doesn't provide V1DI
moves, so this was converted into:

    (clobber (subreg:V1DI (reg/v:V4x1DI 92 [ b ]) 24))

followed by a DImode move.  (The clobber isn't really necessary
or helpful for a single word, but would be for wider moves.)

The subreg in the clobber survived until after RA:

    (clobber (subreg:V1DI (reg/v:V4x1DI 34 v2 [orig:92 b ] [92]) 24))

IMO this isn't well-formed.  If a subreg of a hard register simplifies
to a hard register, it should be replaced by the hard register.  If the
subreg doesn't simplify, then target-independent code can't be sure
which parts of the register are affected and which aren't.  A clobber
of such a subreg isn't useful and (again IMO) should just be removed.
Conversely, a use of such a subreg is effectively a use of the whole
inner register.

LRA has code to simplify subregs of hard registers, but it didn't
handle bare uses and clobbers.  The patch extends it to do that.

One question was whether the final_p argument to alter_subregs
should be true or false.  True is IMO dangerous, since it forces
replacements that might not be valid from a dataflow perspective,
and uses and clobbers only exist for dataflow.  As said above,
I think the correct way of handling a failed simplification would
be to delete clobbers and replace uses of subregs with uses of
the inner register.  But I didn't want to write untested code
to do that.

In the PR, the clobber caused an infinite loop in DCE, because
of a disagreement about what effect the clobber had.  But for
the reasons above, I think that was GIGO rather than a bug in
DF or DCE.

gcc/
PR rtl-optimization/108681
* lra-spills.cc (lra_final_code_change): Extend subreg replacement
code to handle bare uses and clobbers.

gcc/testsuite/
PR rtl-optimization/108681
* gcc.target/aarch64/pr108681.c: New test.

(cherry picked from commit 3cac06d84f334705ed0bce12fbc3a4cec4a8fd3b)

vect: Fix single def-use cycle for ifn reductions [PR108608]

The patch that added support for fmin/fmax reductions didn't
handle single def-use cycles. In some ways, this seems like
going out of our way to make things slower, but that's a
discussion for another day.

gcc/
PR tree-optimization/108608
* tree-vect-loop.cc (vect_transform_reduction): Handle single
def-use cycles that involve function calls rather than tree codes.

gcc/testsuite/
PR tree-optimization/108608
* gcc.dg/vect/pr108608.c: New test.
* gcc.target/aarch64/sve/pr108608-1.c: Likewise.

(cherry picked from commit 2bb444787ed17a9e786f544cdf51ee2ac6779ab2)

Avoid creating (const (reg ...)) [PR108603]

convert_memory_address_addr_space_1 has two modes: one in which it
tries to create a self-contained RTL expression (which might fail)
and one in which it can emit new instructions where necessary.

When handling a CONST, the function recurses into the CONST's
operand and then constifies the result. But that's only valid if
the result is still a self-contained expression. If new instructions
have been emitted, the expression will refer to the (non-constant)
results of those instructions.

In the PR, this caused us to emit a nonsensical (const (reg ...))
REG_EQUAL note.

gcc/
PR tree-optimization/108603
* explow.cc (convert_memory_address_addr_space_1): Only wrap
the result of a recursive call in a CONST if no instructions
were emitted.

gcc/testsuite/
PR tree-optimization/108603
* gcc.target/aarch64/sve/pr108603.c: New test.

(cherry picked from commit b09dc74801cf4e19bdf5fcd18a5cd53fc9e9ca9a)

rtl-ssa: Fix splitting of clobber groups [PR108508]

Since rtl-ssa isn't a real/native SSA representation, it has
to honour the constraints of the underlying rtl representation.
Part of this involves maintaining an rpo list of definitions
for each rtl register, backed by a splay tree where necessary
for quick lookup/insertion.

However, clobbers of a register don't act as barriers to
other clobbers of a register.  E.g. it's possible to move one
flag-clobbering instruction across an arbitrary number of other
flag-clobbering instructions.  In order to allow passes to do
that without quadratic complexity, the splay tree groups all
consecutive clobbers into groups, with only the group being
entered into the splay tree.  These groups in turn have an
internal splay tree of clobbers where necessary.

This means that, if we insert a new definition and use into
the middle of a sea of clobbers, we need to split the clobber
group into two groups.  This was quite a difficult condition
to trigger during development, and the PR shows that the code
to handle it had (at least) two bugs.

First, the process involves searching the clobber tree for
the split point.  This search can give either the previous
clobber (which will belong to the first of the split groups)
or the next clobber (which will belong to the second of the
split groups).  The code for the former case handled the
split correctly but the code for the latter case didn't.

Second, I'd forgotten to add the second clobber group to the
main splay tree. :-(

gcc/
PR rtl-optimization/108508
* rtl-ssa/accesses.cc (function_info::split_clobber_group): When
the splay tree search gives the first clobber in the second group,
make sure that the root of the first clobber group is updated
correctly.  Enter the new clobber group into the definition splay
tree.

gcc/testsuite/
PR rtl-optimization/108508
* gcc.target/aarch64/pr108508.c: New test.

(cherry picked from commit f4e1b46618ef3bd7933992ab79f663ab9112bb80)

vect: Fix voluntarily-masked negative conditionals [PR108430]

vectorizable_condition checks whether a COND_EXPR condition is used
elsewhere with a loop mask.  If so, it applies the loop mask to the
COND_EXPR too, to reduce the number of live masks and to increase the
chance of combining the AND with the comparison.

There is also code to do this for inverted conditions.  E.g. if
we have a < b ? c : d and something else is conditional on !(a < b)
(such as a load in d), we use !(a < b) ? d : c and apply the loop
mask to !(a < b).

This inversion relied on the function's bitop1/bitop2 mechanism.
However, that mechanism is skipped if the condition is split out of
the COND_EXPR as a separate statement.  This meant that we could end
up using the inverse of the intended condition.

There is a separate way of negating the condition when a mask
is being applied (which is also used for EXTRACT_LAST reductions).
This patch uses that instead.

As well as the testcase, this fixes aarch64/sve/vcond_{4,17}_run.c.

gcc/
PR tree-optimization/108430
* tree-vect-stmts.cc (vectorizable_condition): Fix handling
of inverted condition.

gcc/testsuite/
PR tree-optimization/108430
* gcc.target/aarch64/sve/pr108430.c: New test.

(cherry picked from commit 2a8ce4b52f5892a10a02b94d7be689e59a444ff6)

rtl-ssa: Extend m_num_defs to a full unsigned int [PR108086]

insn_info tried to save space by storing the number of
definitions in a 16-bit bitfield.  The justification was:

  // ...  FIRST_PSEUDO_REGISTER + 1
  // is the maximum number of accesses to hard registers and memory, and
  // MAX_RECOG_OPERANDS is the maximum number of pseudos that can be
  // defined by an instruction, so the number of definitions should fit
  // easily in 16 bits.

But while that reasoning holds (I think) for real instructions,
it doesn't hold for artificial instructions.  I don't think there's
any sensible higher limit we can use, so this patch goes for a full
unsigned int.

gcc/
PR rtl-optimization/108086
* rtl-ssa/insns.h (insn_info): Make m_num_defs a full unsigned int.
Adjust size-related commentary accordingly.

(cherry picked from commit cd41085a37b8288dbdfe0f81027ce04b978578f1)

Daily bump.

Change "long_double" into "long double" for C prototypes from Fortran.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (get_c_type_name): Fix "long_long"
type name to be "long long".

Daily bump.

IRA: Use minimal cost for hard register movement

This is the 2nd attempt to fix PR90706.  IRA calculates wrong AVR
costs for moving general hard regs of SFmode.  This was the reason for
spilling a pseudo in the PR.  In this patch we use smaller move cost
of hard reg in its natural and operand modes.

        PR rtl-optimization/90706

gcc/ChangeLog:

* ira-costs.cc: Include print-rtl.h.
(record_reg_classes, scan_one_insn): Add code to print debug info.
(record_operand_costs): Find and use smaller cost for hard reg
move.

gcc/testsuite/ChangeLog:

* gcc.target/avr/pr90706.c: New.

Daily bump.

Fix fc-prototypes usage with C_INT64_T and non LP64 Targets.

The problem here is we were outputing long_long instead of
"long long". This was just an oversight and a missing check.

Committed as obvious after a bootstrap/test on x86_64-linux-gnu.

gcc/fortran/ChangeLog:

* dump-parse-tree.cc (get_c_type_name): Fix "long_long"
type name to be "long long". Add a comment on why adding
2 to the name too.

libstdc++: Use std::remove_cv_t in std::optional::transform [PR109242]

We need to strip cv-qualifiers from the result of the callable passed to
std::optional::transform.

libstdc++-v3/ChangeLog:

PR libstdc++/109242
* include/std/optional (transform): Use std::remove_cv_t.
* testsuite/20_util/optional/monadic/pr109242.cc: New test.

(cherry picked from commit 31a909712014b75fc6ae2ca5eaa425f218bb5f32)

Daily bump.

analyzer: fix ICE on certain longjmp calls [PR109094]

PR analyzer/109094 reports an ICE in the analyzer seen on qemu's
target/i386/tcg/translate.c

The issue turned out to be that when handling a longjmp, the code
to pop the frames was generating an svalue for the result_decl of any
popped frame that had a non-void return type (and discarding it) leading
to "uninit" poisoned_svalue_diagnostic instances being saved since the
result_decl is only set by the greturn stmt.  Later, when checking the
feasibility of the path to these diagnostics, m_check_expr was evaluated
in the context of the frame of the longjmp, leading to an attempt to
evaluate the result_decl of each intervening frames whilst in the
context of the topmost frame, leading to an assertion failure in
frame_region::get_region_for_local here:

919 case RESULT_DECL:
920   gcc_assert (DECL_CONTEXT (expr) == m_fun->decl);
921   break;

This patch updates the analyzer's longjmp implementation so that it
doesn't attempt to generate svalues for the result_decls when popping
frames, fixing the assertion failure (and presumably fixing "uninit"
false positives in a release build).

Cherrypicked from r13-6749-g430d7d88c1a123.

gcc/analyzer/ChangeLog:
PR analyzer/109094
* region-model.cc (region_model::on_longjmp): Pass false for
new "eval_return_svalue" param of pop_frame.
(region_model::pop_frame): Add new "eval_return_svalue" param and
use it to suppress the call to get_rvalue on the result when
needed by on_longjmp.
* region-model.h (region_model::pop_frame): Add new
"eval_return_svalue" param.

gcc/testsuite/ChangeLog:
PR analyzer/109094
* gcc.dg/analyzer/setjmp-pr109094.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix uninit false +ves reading from DECL_HARD_REGISTER [PR108968]

Cherrypicked from r13-6749-g430d7d88c1a123.

gcc/analyzer/ChangeLog:
PR analyzer/108968
* region-model.cc (region_model::get_rvalue_1): Handle VAR_DECLs
with a DECL_HARD_REGISTER by returning UNKNOWN.

gcc/testsuite/ChangeLog:
PR analyzer/108968
* gcc.dg/analyzer/uninit-pr108968-register.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix further overzealous state purging [PR108733]

PR analyzer/108733 reports various false positives in qemu from
-Wanalyzer-use-of-uninitialized-value with __attribute__((cleanup))
at -O1 and above.

Root cause is that the state-purging code was failing to treat:
   _25 = MEM[(void * *)&val];
as a usage of "val", leading to it erroneously purging the
initialization of "val" along an execution path that didn't otherwise
use "val", apart from the  __attribute__((cleanup)).

Fixed thusly.

Integration testing on the patch show this change in the number of
diagnostics:
  -Wanalyzer-use-of-uninitialized-value
       coreutils-9.1: 18 -> 16 (-2)
          qemu-7.2.0: 87 -> 80 (-7)
where all that I investigated appear to have been false positives, hence
an improvement.

Cherrypicked from r13-5745-g77bb54b1b07add.

gcc/analyzer/ChangeLog:
PR analyzer/108733
* state-purge.cc (get_candidate_for_purging): Add ADDR_EXPR
and MEM_REF.

gcc/testsuite/ChangeLog:
PR analyzer/108733
* gcc.dg/analyzer/torture/uninit-pr108733.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix overzealous state purging with on-stack structs [PR108704]

PR analyzer/108704 reports many false positives seen from
-Wanalyzer-use-of-uninitialized-value on qemu's softfloat.c on code like
the following:

   struct st s;
   s = foo ();
   s = bar (s); // bogusly reports that s is uninitialized here

where e.g. "struct st" is "floatx80" in the qemu examples.

The root cause is overzealous purging of on-stack structs in the code I
added in r12-7718-gfaacafd2306ad7, where at:

s = bar (s);

state_purge_per_decl::process_point_backwards "sees" the assignment to 's'
and stops processing, effectively treating 's' as unneeded before this
stmt, not noticing the use of 's' in the argument.

Fixed thusly.

The patch greatly reduces the number of
-Wanalyzer-use-of-uninitialized-value warnings from my integration tests:
  ImageMagick-7.1.0-57:  10 ->  6   (-4)
              qemu-7.2: 858 -> 87 (-771)
         haproxy-2.7.1:   1 ->  0   (-1)
All of the above that I've examined appear to be false positives.

Cherrypicked from r13-5745-g77bb54b1b07add.

gcc/analyzer/ChangeLog:
PR analyzer/108704
* state-purge.cc (state_purge_per_decl::process_point_backwards):
Don't stop processing the decl if it's fully overwritten by
this stmt if it's also used by this stmt.

gcc/testsuite/ChangeLog:
PR analyzer/108704
* gcc.dg/analyzer/uninit-7.c: New test.
* gcc.dg/analyzer/uninit-pr108704.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

doc: add notes about limitations of -fanalyzer

Cherrypicked from r13-5613-ga90316c6ceddfb.

gcc/ChangeLog:
* doc/invoke.texi (Static Analyzer Options): Add notes about
limitations of -fanalyzer.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: use __attribute__((nonnull)) at top level of analysis [PR106325]

PR analyzer/106325 reports false postives from
-Wanalyzer-null-dereference on code like this:

__attribute__((nonnull))
void foo_a (Foo *p)
{
  foo_b (p);

  switch (p->type)
    {
      /* ... */
    }
}

where foo_b (p) has a:

  g_return_if_fail (p);

that expands to:

  if (!p)
    {
      return;
    }

The analyzer "sees" the comparison against NULL in foo_b, and splits the
analysis into the NULL and not-NULL cases; later, back in foo_a,  at
  switch (p->type)
it complains that p is NULL.

Previously we were only using __attribute__((nonnull)) as something to
complain about when it was violated; we weren't using it as a source of
knowledge.

This patch fixes things by making the analyzer respect
__attribute__((nonnull)) at the top-level of the analysis: any such
params are now assumed to be non-NULL, so that the analyzer assumes the
g_return_if_fail inside foo_b doesn't fail when called from foo_a

Doing so fixes the false positives.

Backported from r13-4520-gdcfc7ac94dbcf6.

gcc/analyzer/ChangeLog:
PR analyzer/106325
* region-model-manager.cc
(region_model_manager::get_or_create_null_ptr): New.
* region-model.cc (region_model::on_top_level_param): Add
"nonnull" param and make use of it.
(region_model::push_frame): When handling a top-level entrypoint
to the analysis, determine which params __attribute__((nonnull))
applies to, and pass to on_top_level_param.
* region-model.h (region_model_manager::get_or_create_null_ptr):
New decl.
(region_model::on_top_level_param): Add "nonnull" param.

gcc/testsuite/ChangeLog:
PR analyzer/106325
* gcc.dg/analyzer/attr-nonnull-pr106325.c: New test.
* gcc.dg/analyzer/attribute-nonnull.c (test_6): New.
(test_7): New.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: update internal docs

Cherrypicked from r13-4518-g14b0d6c4bd973c.

gcc/ChangeLog:
* doc/analyzer.texi: Drop out-of-date ideas for other checkers.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: handle comparisons against negated symbolic values [PR107948]

Cherrypicked from r13-4456-g0b737090a69624.

gcc/analyzer/ChangeLog:
PR analyzer/107948
* region-model-manager.cc
(region_model_manager::maybe_fold_binop): Fold (0 - VAL) to -VAL.
* region-model.cc (region_model::eval_condition): Handle e.g.
"-X <= 0" as equivalent to X >= 0".

gcc/testsuite/ChangeLog:
PR analyzer/107948
* gcc.dg/analyzer/feasibility-pr107948.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix folding of '(PTR + 0) => PTR' [PR105784]

Cherrypicked from r13-4398-g3a32fb2eaa761a.

gcc/analyzer/ChangeLog:
PR analyzer/105784
* region-model-manager.cc
(region_model_manager::maybe_fold_binop): For POINTER_PLUS_EXPR,
PLUS_EXPR and MINUS_EXPR, eliminate requirement that the final
type matches that of arg0 in favor of a cast.

gcc/testsuite/ChangeLog:
PR analyzer/105784
* gcc.dg/analyzer/torture/fold-ptr-arith-pr105784.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix feasibility false +ve on jumps through function ptrs [PR107582]

PR analyzer/107582 reports a false +ve from
-Wanalyzer-use-of-uninitialized-value where
the analyzer's feasibility checker erroneously decides
that point (B) in the code below is reachable, with "x" being
uninitialized there:

    pthread_cleanup_push(func, NULL);

    while (ret != ETIMEDOUT)
ret = rand() % 1000;

    /* (A): after the while loop  */

    if (ret != ETIMEDOUT)
      x = &z;

    pthread_cleanup_pop(1);

    if (ret == ETIMEDOUT)
      return 0;

    /* (B): after not bailing out  */

due to these contradictionary conditions somehow both holding:
  * (ret == ETIMEDOUT), at (A) (skipping the initialization of x), and
  * (ret != ETIMEDOUT), at (B)

The root cause is that after the while loop, state merger puts ret in
the exploded graph in an UNKNOWN state, and saves the diagnostic at (B).

Later, as we explore the feasibilty of reaching the enode for (B),
dynamic_call_info_t::update_model is called to push/pop the
frames for handling the call to "func" in pthread_cleanup_pop.
The "ret" at these nodes in the feasible_graph has a conjured_svalue for
"ret", and a constraint on it being either == *or* != ETIMEDOUT.

However dynamic_call_info_t::update_model blithely clobbers the
model with a copy from the exploded_graph, in which "ret" is UNKNOWN.

This patch fixes dynamic_call_info_t::update_model so that it
simulates pushing/popping a frame on the model we're working with,
preserving knowledge of the constraint on "ret", and enabling the
analyzer to "know" that the bail-out must happen.

Doing so fixes the false positive.

Cherrypicked from r13-4158-ga7aef0a5a2b7e2.

gcc/analyzer/ChangeLog:
PR analyzer/107582
* engine.cc (dynamic_call_info_t::update_model): Update the model
by pushing or pop a frame, rather than by clobbering it with the
model from the exploded_node's state.

gcc/testsuite/ChangeLog:
PR analyzer/107582
* gcc.dg/analyzer/feasibility-4.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-1.c: New test.
* gcc.dg/analyzer/feasibility-pr107582-2.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: handle (NULL == &VAR) [PR107345]

Cherrypicked from r13-3468-g18faaeb3af42f3.

gcc/analyzer/ChangeLog:
PR analyzer/107345
* region-model.cc (region_model::eval_condition_without_cm):
Ensure that constants are on the right-hand side before checking
for them.

gcc/testsuite/ChangeLog:
PR analyzer/107345
* gcc.dg/analyzer/pr107345.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fold -(-(VAL)) to VAL

Cherrypicked from r13-3075-g7f42f7adfa69fe.

gcc/analyzer/ChangeLog:
* region-model-manager.cc
(region_model_manager::maybe_fold_unaryop): Fold -(-(VAL)) to VAL.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: better fix for -Wanalyzer-use-of-uninitialized-value [PR106573]

Cherrypicked from r13-2053-gca123e019bb92f.

gcc/analyzer/ChangeLog:
PR analyzer/106573
* region-model.cc (region_model::on_call_pre): Use check_call_args
when ensuring that we call get_arg_svalue on all args. Remove
redundant call from handling for stdio builtins.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

analyzer: fix missing -Wanalyzer-use-of-uninitialized-value on special-cased functions [PR106573]

We were missing checks for uninitialized params on calls to functions
that the analyzer has hardcoded knowledge of - both for those that are
handled just by state machines, and for those that are handled in
region-model-impl-calls.cc (for those arguments for which the svalue
wasn't accessed in handling the call).

Fixed thusly.

Backported from r13-2007-gbddd8d86e3036e, dropping the test case
fd-uninit-1.c.

gcc/analyzer/ChangeLog:
PR analyzer/106573
* region-model.cc (region_model::on_call_pre): Ensure that we call
get_arg_svalue on all arguments.

gcc/testsuite/ChangeLog:
PR analyzer/106573
* gcc.dg/analyzer/error-uninit.c: New test.
* gcc.dg/analyzer/file-uninit-1.c: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

jit: update docs to reflect .c to .cc renaming

Cherrypicked from r13-1878-gb8ce0c4361c267.

gcc/jit/ChangeLog:
* docs/internals/index.rst: Remove reference to ".c" extensions
of source files.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

Daily bump.

libstdc++: Fix self-move for std::weak_ptr [PR108118]

I think an alternative fix would be something like:

_M_ptr = std::exchange(rhs._M_ptr, nullptr);
_M_refcount = std::move(rhs._M_refcount);

The standard's move-and-swap implementation generates smaller code at
all levels except -O0 and -Og, so it seems simplest to just do what the
standard says.

libstdc++-v3/ChangeLog:

PR libstdc++/108118
* include/bits/shared_ptr_base.h (weak_ptr::operator=):
Implement as move-and-swap exactly as specified in the standard.
* testsuite/20_util/weak_ptr/cons/self_move.cc: New test.

(cherry picked from commit 92eb0adc14a5f84acce7e5bc780b81b1544b24aa)

libstdc++: Add missing move in ranges::copy

This is needed to support a move-only output iterator when the input
iterators are specializations of __normal_iterator.

libstdc++-v3/ChangeLog:

* include/bits/ranges_algobase.h (__detail::__copy_or_move):
Move output iterator.
* testsuite/25_algorithms/copy/constrained.cc: Check copying to
move-only output iterator.

(cherry picked from commit 2ff0e62275b1c322a8b65f38f8336f37d31c30e4)

libstdc++: Deliver names of C functions in <stacktrace>

__cxa_demangle is only to demangle C++ names, for all C functions,
extern "C" functions, and including main it returns -2, in that case
just adapt the given name. Otherwise it's kept empty, which doesn't look
nice in the stacktrace.

libstdc++-v3/ChangeLog:

* include/std/stacktrace (stacktrace_entry::_S_demangle): Use
raw __name if __cxa_demangle could not demangle it.

Signed-off-by: Björn Schäpers <bjoern@hazardy.de>
(cherry picked from commit b1c839be8353edfb1951454be3c5a8150f771385)

libstdc++: Make operator<< for stacktraces less templated (LWG 3515)

This change was approved for C++23 last month.

libstdc++-v3/ChangeLog:

* include/std/stacktrace (operator<<): Only output to narrow
ostreams (LWG 3515).
* testsuite/19_diagnostics/stacktrace/synopsis.cc:

(cherry picked from commit 2327d9331430777006008ab3b051afe2b4fc15bd)

libstdc++: Add [[nodiscard]] to chrono conversion functions

Also add doxygen comments.

libstdc++-v3/ChangeLog:

* include/bits/chrono.h (duration_cast, floor, round, abs, ceil)
(time_point_cast): Add [[nodiscard]] attribute and doxygen
comments.
(treat_as_floating_point): Add doxygen commen.

(cherry picked from commit 646e979c43b8c84f0f70ea8f1709dfa2909726cd)

libstdc++: Change class-key for duration and time_point to class

We define these with the 'struct' keyword, but the standard uses
'class'. This results in warnings if users try to refer to them using
elaborated type specifiers.

libstdc++-v3/ChangeLog:

* include/bits/chrono.h (duration, time_point): Change 'struct'
to 'class'.

(cherry picked from commit 7eec3114ebe8d4c55c64b4e47546d3d8f95eb09b)

libstdc++: Add returns_nonnull to non-inline std::map detail [PR108554]

std::map uses a non-inline function to rebalance its tree and the
compiler can't see that it always returns a valid pointer (assuming
valid inputs, which is a precondition anyway). This can result in
-Wnull-derefernce warnings for valid code, because the compiler thinks
there is a path where the function returns null.

Adding the returns_nonnull attribute tells the compiler that is can't
happen. While we're doing that, we might as well also add a nonnull
attribute to the rebalancing functions too.

libstdc++-v3/ChangeLog:

PR libstdc++/108554
* include/bits/stl_tree.h (_Rb_tree_insert_and_rebalance): Add
nonnull attribute.
(_Rb_tree_rebalance_for_erase): Add nonnull and returns_nonnull
attributes.
* testsuite/23_containers/map/modifiers/108554.cc: New test.

(cherry picked from commit 3376467ce090aa0966d59ca3aea35db4f17a4b47)

libstdc++: Optimize std::bitset<N>::to_string

This makes to_string approximately twice as fast at any optimization
level. Instead of iterating through every bit, jump straight to the next
bit that is set, by using _Find_first and _Find_next.

libstdc++-v3/ChangeLog:

* include/std/bitset (bitset::_M_copy_to_string): Find set bits
instead of iterating over individual bits.

(cherry picked from commit ffb03fa12850df3a4f53435d5f20ff122c83732a)

libstdc++: Tweak TSan annotations for std::atomic<shared_ptr<T>>

Do not use the __tsan_mutex_not_static flag for annotation functions
where it's not a valid flag. Also use the try_lock and try_lock_failed
flags to more precisely annotate the CAS loop used to acquire a lock.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_atomic.h (_GLIBCXX_TSAN_MUTEX_PRE_LOCK):
Replace with ...
(_GLIBCXX_TSAN_MUTEX_TRY_LOCK): ... this, add try_lock flag.
(_GLIBCXX_TSAN_MUTEX_TRY_LOCK_FAILED): New macro using
try_lock_failed flag
(_GLIBCXX_TSAN_MUTEX_POST_LOCK): Rename to ...
(_GLIBCXX_TSAN_MUTEX_LOCKED): ... this.
(_GLIBCXX_TSAN_MUTEX_PRE_UNLOCK): Remove invalid flag.
(_GLIBCXX_TSAN_MUTEX_POST_UNLOCK): Remove invalid flag.
(_Sp_atomic::_Atomic_count::lock): Use new macros.

(cherry picked from commit ecbdfa8b314e2c17da17511b86371f552bffd441)