git.ipfire.org Git - thirdparty/gcc.git/log

c++: Fix ICE on constexpr virtual function [PR117317]

Since C++20 virtual methods can be constexpr, and if they are
constexpr evaluated, we choose tentative_decl_linkage for those
defer their output and decide at_eof again.
On the following testcases we ICE though, because if
expand_or_defer_fn_1 decides to use tentative_decl_linkage, it
returns true and the caller in that case cals emit_associated_thunks,
where use_thunk which it calls asserts DECL_INTERFACE_KNOWN on the
thunk destination, which isn't the case for tentative_decl_linkage.

The following patch fixes the ICE by not emitting the thunks
for the DECL_DEFER_OUTPUT fns just yet but waiting until at_eof
time when we return to those.
Note, the second testcase ICEs already since r0-110035 with -std=c++0x
before it gets a chance to diagnose constexpr virtual method.

2024-11-08 Jakub Jelinek <jakub@redhat.com>

PR c++/117317
* semantics.cc (emit_associated_thunks): Do nothing for
!DECL_INTERFACE_KNOWN && DECL_DEFER_OUTPUT fns.

* g++.dg/cpp2a/pr117317-1.C: New test.
* g++.dg/cpp2a/pr117317-2.C: New test.

testsuite: arm: Use check-function-bodies in epilog-1.c test

Update test case for armv8.1-m.main that supports conditional
arithmetic.

armv7-m:
        push    {r4, lr}
        ldr     r4, .L6
        ldr     r4, [r4]
        lsls    r4, r4, #29
        it      mi
        addmi   r2, r2, #1
        bl      bar
        movs    r0, #0
        pop     {r4, pc}

armv8.1-m.main:
        push    {r3, r4, r5, lr}
        ldr     r4, .L5
        ldr     r5, [r4]
        tst     r5, #4
        csinc   r2, r2, r2, eq
        bl      bar
        movs    r0, #0
        pop     {r3, r4, r5, pc}

gcc/testsuite/ChangeLog:

* gcc.target/arm/epilog-1.c: Use check-function-bodies.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: arm: Use effective-target arm_libc_fp_abi for pr68620.c test

This fixes reported regression at
https://linaro.atlassian.net/browse/GNU-1407.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr68620.c: Use effective-target
arm_libc_fp_abi.
* lib/target-supports.exp: Define effective-target
arm_libc_fp_abi.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>
Co-authored-by: Richard Earnshaw <rearnsha@arm.com>

testsuite: arm: Allow vst1.32 instruction in pr40457-2.c

When building the test case with neon, the 'vst1.32' instruction is used
instead of 'strd'. Allow both variants to make the test pass.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr40457-2.c: Add vst1.32 as an allowed
instruction.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: arm: Use effective-target for pr84556.cc test

Using "dg-do run" with a selector overrides the default selector set by
vect.exp that picks between "dg-do run" and "dg-do compile" based on the
target's support for simd operations for Arm targets.
The actual selection of default operation is performed in
check_vect_support_and_set_flags.

gcc/testsuite/ChangeLog:

* g++.dg/vect/pr84556.cc: Change from "dg-do run" with selector
to instead use dg-require-effective-target with the same
selector.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

Enable gcc.dg/vect/vect-early-break_21.c on x86_64

The following also enables the testcase on x86 as it now has the
required cbranch.

* gcc.dg/vect/vect-early-break_21.c: Remove disabling of
x86_64 and i?86.

libstdc++: Simplify __detail::__distance_fw using 'if constexpr'

This uses 'if constexpr' instead of tag dispatching, removing the need
for a second call using that tag, and simplifying the overload set that
needs to be resolved for calls to __distance_fw.

libstdc++-v3/ChangeLog:

* include/bits/hashtable_policy.h (__distance_fw): Replace tag
dispatching with 'if constexpr'.

aarch64: Extend support for the AE family of Cortex CPUs

Implement -mcpu options for:

  - Cortex-A520AE
  - Cortex-A720AE
  - Cortex-R82AE

These all implement the same feature sets as their non-AE
counterparts, using the same scheduler and costs and differing only in
their respective part numbers.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (cortex-a520ae,
cortex-a720ae, cortex-r82ae): Define new entries.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Document A520AE, A720AE and R82AE CPUs.

testsuite: arm: Use effective-target for nomve_fp_1 test

Test uses MVE, so add effective-target arm_fp requirement.

gcc/testsuite/ChangeLog:

* g++.target/arm/mve/general-c++/nomve_fp_1.c: Use
effective-target arm_fp.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

RISC-V: Add testcases for unsigned imm vec SAT_SUB form1

form1:
void __attribute__((noinline))             \
vec_sat_u_sub_imm##IMM##_##T##_fmt_1 (T *out, T *in, unsigned limit)  \
{                                                   \
  unsigned i;                                       \
  for (i = 0; i < limit; i++)                       \
    out[i] = (T)IMM >= in[i] ? (T)IMM - in[i] : 0;  \
}

Passed the rv64gcv full regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vec_sat_data.h: add data for vec sat_sub.
* gcc.target/riscv/rvv/autovec/vec_sat_arith.h: add unsigned imm vec sat_sub form1.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-4.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-1.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-2.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-3.c: New test.
* gcc.target/riscv/rvv/autovec/binop/vec_sat_u_sub_imm-run-4.c: New test.

libstdc++: Improve comment for _Hashtable::_M_insert_unique_node

Clarify the effects if rehashing is needed. Document the __n_elt
parameter.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_M_insert_unique_node): Improve
comment.

libstdc++: Fix conversions to key/value types for hash table insertion [PR115285]

The conversions to key_type and value_type that are performed when
inserting into _Hashtable need to be fixed to do any required
conversions explicitly. The current code assumes that conversions from
the parameter to the key_type or value_type can be done implicitly,
which isn't necessarily true.

Remove the _S_forward_key function which doesn't handle all cases and
either forward the parameter if it already has type cv key_type, or
explicitly construct a temporary of type key_type.

Similarly, the _ConvertToValueType specialization for maps doesn't
handle all cases either, for std::pair arguments only some value
categories are handled. Remove _ConvertToValueType and for the _M_insert
function for unique keys, either forward the argument unchanged or
explicitly construct a temporary of type value_type.

For the _M_insert overload for non-unique keys we don't need any
conversion at all, we can just forward the argument directly to where we
construct a node.

libstdc++-v3/ChangeLog:

PR libstdc++/115285
* include/bits/hashtable.h (_Hashtable::_S_forward_key): Remove.
(_Hashtable::_M_insert_unique_aux): Replace _S_forward_key with
a static_cast to a type defined using conditional_t.
(_Hashtable::_M_insert): Replace _ConvertToValueType with a
static_cast to a type defined using conditional_t.
* include/bits/hashtable_policy.h (_ConvertToValueType): Remove.
* testsuite/23_containers/unordered_map/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/insert/115285.cc: New test.
* testsuite/23_containers/unordered_set/96088.cc: Adjust
expected number of allocations.

libstdc++: Define __is_pair variable template for C++11

libstdc++-v3/ChangeLog:

* include/bits/stl_pair.h (__is_pair): Define for C++11 and
C++14 as well.

libstdc++: Fix grammar in comment, again

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h (_Hashtable): Fix comment grammar.

aarch64: Fix gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c

I missed a search-and-replace on this test, meaning that it was
duplicating bfmlalb_f32.c.

gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/bfmlslb_f32.c: Replace bfmla*
with bfmls*

aarch64: Make PSEL dependent on SME rather than SME2

The svpsel_lane intrinsics were wrongly classified as SME2+ only,
rather than as base SME intrinsics. They should always be available
in streaming mode.

gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_sve_psel<BHSD_BITS>)
(*aarch64_sve_psel<BHSD_BITS>_plus): Require TARGET_STREAMING
rather than TARGET_STREAMING_SME2.

gcc/testsuite/
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b16.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b16.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b32.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b32.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b64.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b64.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_b8.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_b8.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c16.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c16.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c32.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c32.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c64.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c64.c: ...here.
* gcc.target/aarch64/sme2/acle-asm/psel_lane_c8.c: Move to...
* gcc.target/aarch64/sme/acle-asm/psel_lane_c8.c: ...here.

aarch64: Restrict FCLAMP to SME2

There are two sets of patterns for FCLAMP: one set for single registers
and one set for multiple registers. The multiple-register set was
correctly gated on SME2, but the single-register set only required SME.
This doesn't matter for ACLE usage, since the intrinsic definitions
are correctly gated. But it does matter for automatic generation of
FCLAMP from separate minimum and maximum operations (either ACLE
intrinsics or autovectorised code).

gcc/
* config/aarch64/aarch64-sve2.md (@aarch64_sve_fclamp<mode>)
(*aarch64_sve_fclamp<mode>_x): Require TARGET_STREAMING_SME2
rather than TARGET_STREAMING_SME.

gcc/testsuite/
* gcc.target/aarch64/sme/clamp_3.c: Force sme2
* gcc.target/aarch64/sme/clamp_4.c: Likewise.
* gcc.target/aarch64/sme/clamp_5.c: New test.

bpf: avoid possible null deref in btf_ext_output [PR target/117447]

The BPF-specific .BTF.ext section is always generated for BPF programs
if -gbtf is specified, and generating it requires BTF information and
assumes that the BTF info has already been generated.

Compiling non-C languages to BPF is not supported, nor is generating
CTF/BTF for non-C. But, compiling another language like C++ to BPF
with -gbtf specified meant that we would try to generate the .BTF.ext
section anyway, and then ICE because no BTF information was available.

Add a check to bail out of btf_ext_output if the TU CTFC does not exist,
meaning no BTF info is available.

gcc/
PR target/117447
* config/bpf/btfext-out.cc (btf_ext_output): Bail if TU CTFC is null.

btf: check hash maps are non-null before emptying

These maps will always be non-null in btf_finalize under normal
circumstances, but be safe and verify that before trying to empty them.

gcc/
* btfout.cc (btf_finalize): Check that hash maps are non-null before
emptying them.

ifcombine: For short circuit case, allow 2 convert defining statements [PR85605]

r0-126134-g5d2a9da9a7f7c1 added support for circuiting and combing the ifs
into using either AND or OR. But it only allowed the inner condition
basic block having the conditional only. This changes to allow up to 2 defining
statements as long as they are just integer to integer conversions for
either the lhs or rhs of the conditional.

This should allow to use ccmp on aarch64 and x86_64 (APX) slightly more than before.

Boootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/85605

gcc/ChangeLog:

* tree-ssa-ifcombine.cc (can_combine_bbs_with_short_circuit): New function.
(ifcombine_ifandif): Use can_combine_bbs_with_short_circuit
instead of checking if iterator is one before the last statement.

gcc/testsuite/ChangeLog:

* g++.dg/tree-ssa/ifcombine-ccmp-1.C: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-7.c: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-8.c: New test.
* gcc.dg/tree-ssa/ssa-ifcombine-ccmp-9.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

VN: Lookup `val != 0` if we got back val when looking up the predicate for GIMPLE_COND [PR117414]

Sometimes we get back a full ssa name when looking up the comparison of the GIMPLE_COND
rather than a predicate. We then want to lookup the `val != 0` for the predicate.

Note this might happen with other boolean assignments and COND_EXPR but I am not sure
if it is as important; I have not found a testcase yet.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117414

gcc/ChangeLog:

* tree-ssa-sccvn.cc (process_bb): Lookup
`val != 0` if got back a ssa name when looking the comparison.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fre-predicated-4.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

VN: Handle `(A CMP B) !=/== 0` for predicates [PR117414]

After the last patch, we also want to record `(A CMP B) != 0`
as `(A CMP B)` and `(A CMP B) == 0` as `(A CMP B)` with the
true/false edges swapped.

This shows up more due to the new handling of
`(A | B) ==/!= 0` in insert_predicates_for_cond
as now we can notice these comparisons which were not seen before.

This is enough to fix the original issue in `gcc.dg/tree-ssa/pr111456-1.c`
and make sure we don't regress it when enhancing ifcombine.

This adds that predicate and allows us to optimize f
in fre-predicated-3.c.

Changes since v1:
* v2: Use vn_valueize.

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117414

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): Handle `(A CMP B) !=/== 0`.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fre-predicated-3.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

VN: Handle `(a | b) !=/== 0` for predicates [PR117414]

For `(a | b) == 0`, we can "assert" on the true edge that
both `a == 0` and `b == 0` but nothing on the false edge.
For `(a | b) != 0`, we can "assert" on the false edge that
both `a == 0` and `b == 0` but nothing on the true edge.
This adds that predicate and allows us to optimize f0, f1,
and f2 in fre-predicated-[12].c.

Changes since v1:
* v2: Use vn_valueize. Also canonicalize the comparison
      at the begining of insert_predicates_for_cond for
      constants to be on the rhs. Return early for
      non-ssa names on the lhs (after canonicalization).

Bootstrapped and tested on x86_64-linux-gnu.

PR tree-optimization/117414

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): Canonicalize the comparison.
Don't insert anything if lhs is not a SSA_NAME. Handle `(a | b) !=/== 0`.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/fre-predicated-1.c: New test.
* gcc.dg/tree-ssa/fre-predicated-2.c: New test.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

VN: Factor out inserting predicates for conditional

To make it easier to add more predicates in some cases,
factor out the code. Plus it makes the code slightly more
readable since it is not indented as much.

Bootstrapped and tested on x86_64.

gcc/ChangeLog:

* tree-ssa-sccvn.cc (insert_predicates_for_cond): New function, factored out from ...
(process_bb): Here.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

libstdc++: Tweak comments on includes in hashtable headers

std::is_permutation is only used in <bits/hashtable.h> not in
<bits/hashtable_policy.h>, so move the comment referring to it.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Add is_permutation to comment.
* include/bits/hashtable_policy.h: Remove it from comment.

libstdc++: Fix typo in comment in hashtable.h

And tweak grammar in a couple of comments.

libstdc++-v3/ChangeLog:

* include/bits/hashtable.h: Fix spelling in comment.

libgomp.texi: Document OpenMP's Interoperability Routines

libgomp/ChangeLog:

* libgomp.texi (OpenMP Technical Report 13): Remove 'iterator'
in 'map' clause of 'declare mapper' as it is already the list above.
(Interoperability Routines): Add.
(omp_target_memcpy_async, omp_target_memcpy_rect_async):
Document that depobj_list may be omitted in C++ and Fortran.

Unify registered_pp_pragmas and registered_pragmas

Until now, the structures that keep pragma information were different
when in preprocessing only mode and in normal mode. This change unifies
both so that the space and name of a pragma are always registered and
can be queried easily at a later time.

gcc/c-family/ChangeLog:

* c-pragma.cc (struct pragma_pp_data): Use (struct internal_pragma_handler);
(c_register_pragma_1): Always register name and space for all pragmas.
(c_invoke_pragma_handler): Adapt.
(c_invoke_early_pragma_handler): Likewise.
(c_pp_invoke_early_pragma_handler): Likewise.

Disable gather/scatter for non-first vectorized epilogue

We currently make vect_check_gather_scatter happy by replacing SSA
name references in DR_REF for gather/scatter DRs but the replacement
process only works once since for the second epilogue we have SSA
names from the first epilogue in DR_REF but as we copied from the
original loop the SSA mapping doesn't work.

The following simply punts for non-first epilogues, those gather/scatter
recognized by patterns to IFNs are already analyzed and should work
fine.

* tree-vect-data-refs.cc (vect_check_gather_scatter): Refuse
to analyze DR_REF if from an epilogue that's not first.
* tree-vect-loop.cc (update_epilogue_loop_vinfo): Add comment
how the substitution in DR_REF is broken.

Add LOOP_VINFO_MAIN_LOOP_INFO

The following introduces LOOP_VINFO_MAIN_LOOP_INFO alongside
LOOP_VINFO_ORIG_LOOP_INFO so one can have both access to the main
vectorized loop info and the preceeding vectorized epilogue.
This is critical for correctness as we need to disallow never
executed epilogues by costing in vect_analyze_loop_costing as
we assume those do not exist when deciding to add a skip-vector
edge during peeling.  The patch also changes how multiple vector
epilogues are handled - instead of the epilogue_vinfos array in
the main loop info we now record the single epilogue_vinfo there
and further epilogues in the epilogue_vinfo member of the
epilogue info.  This simplifies code.

* tree-vectorizer.h (_loop_vec_info::main_loop_info): New.
(LOOP_VINFO_MAIN_LOOP_INFO): Likewise.
(_loop_vec_info::epilogue_vinfo): Change from epilogue_vinfos
from array to single element.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
main_loop_info and epilogue_vinfo.  Remove epilogue_vinfos
allocation.
(_loop_vec_info::~_loop_vec_info): Do not release epilogue_vinfos.
(vect_create_loop_vinfo): Rename parameter, set
LOOP_VINFO_MAIN_LOOP_INFO.
(vect_analyze_loop_1): Rename parameter.
(vect_analyze_loop_costing): Properly distinguish between
the main vector loop and the preceeding epilogue.
(vect_analyze_loop): Change for epilogue_vinfos no longer
being a vector.
* tree-vect-loop-manip.cc (vect_do_peeling): Simplify and
thereby handle a vector epilogue of a vector epilogue.

Add LOOP_VINFO_DRS_ADVANCED_BY

The following remembers how we advanced DRs when vectorizing an
epilogue. When we want to vectorize the epilogue of such epilogue
we have to retain that advancement and add the advancement for this
vectorized epilogue. Due to the way we copy and re-associate
stmt_vec_infos and DRs recording this advancement and re-applying
it for the next epilogue is simplest.

* tree-vectorizer.h (_loop_vec_info::drs_advanced_by): New.
(LOOP_VINFO_DRS_ADVANCED_BY): Likewise.
* tree-vect-loop.cc (_loop_vec_info::_loop_vec_info): Initialize
drs_advanced_by.
(update_epilogue_loop_vinfo): Remember the DR advancement made.
(vect_transform_loop): Accumulate past advancements.

Check LOOP_VINFO_PEELING_FOR_GAPS on epilog is supported

We need to check that an epilogue doesn't require LOOP_VINFO_PEELING_FOR_GAPS
in case the main loop didn't (the other way around is OK), the
computation whether the epilog is executed or not gets our of sync
otherwise.

* tree-vect-loop.cc (vect_analyze_loop_2): Move
vect_analyze_loop_costing after check whether we can do
peeling. Add check on LOOP_VINFO_PEELING_FOR_GAPS for
epilogues.

testsuite: Fix up pr116725.c test [PR116725]

On Fri, Oct 18, 2024 at 02:05:59PM -0400, Antoni Boucher wrote:
>             PR target/116725
>             * gcc.target/i386/pr116725.c: Add test using those AVX builtins.

This test FAILs for me, as I don't have the latest gas around and the test
is dg-do assemble, so doesn't need just fixed compiler, but also assembler
which supports those instructions.

The following patch adds effective target directives to ensure assembler
supports those too.

2024-11-07  Jakub Jelinek  <jakub@redhat.com>

PR target/116725
* gcc.target/i386/pr116725.c: Add dg-require-effective-target
avx512{dq,fp16,vl}.

openmp: Fix max_vf testcases with -march=cascadelake

Apparently we need to explicitly disable AVX, not just enabled SSE, to
guarentee the 16-lane vectors we need for the pattern match.

libgomp/ChangeLog:

* testsuite/libgomp.c/max_vf-1.c: Add -mno-avx.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/max_vf-1.c: Add -mno-avx.

Doc: Add doc for standard name mask_len_strided_load{store}m

This patch would like to add doc for the below 2 standard names.

1. strided load: v = mask_len_strided_load (ptr, stried, mask, len, bias)
2. strided store: mask_len_stried_store (ptr, stride, v, mask, len, bias)

gcc/ChangeLog:

* doc/md.texi: Add doc for mask_len_stried_load{store}.

Signed-off-by: Pan Li <pan2.li@intel.com>
Co-Authored-By: Juzhe-Zhong <juzhe.zhong@rivai.ai>

rtl-optimization/117467 - 33% compile-time in rest of compilation

ext-dce uses TV_NONE, that's not OK for a pass taking 33% compile-time.
The following adds a timevar to it for proper blaming.

PR rtl-optimization/117467
* timevar.def (TV_EXT_DCE): New.
* ext-dce.cc (pass_data_ext_dce): Use TV_EXT_DCE.

i386: Support cstorebf4 with native bf16 comi

We recently supports cbranchbf4 with AVX10_2 native bf16 comi
instructions, so do similar to cstorebf4.

gcc/ChangeLog:

* config/i386/i386.md (cstorebf4): Use vcomsbf16 under
TARGET_AVX10_2_256 and -fno-trapping-math.
(cbranchbf4): Adjust formatting.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-comibf-3.c: New test.
* gcc.target/i386/avx10_2-comibf-4.c: Likewise.

i386: Modify regexp of pr117304-1.c

Since the test doesn't care if the hint is correct,
modify the regexp of the hint part to avoid future
changes to the hint that would cause the test to fail.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr117304-1.c: Modify regexp.

limit ifcombine stmt moving and adjust flow info

It became apparent that conditions could be combined that had deep SSA
dependency trees, that might thus require moving lots of statements.
Set a hard upper bound for now, hopefully to be replaced by a
dynamically computed bound, based on probabilities and costs.

Also reset flow sensitive info and avoid introducing undefined
behavior when moving stmts from under guarding conditions.

Finally, rework the preexisting reset of flow sensitive info and
avoidance of undefined behavior to be done when needed on all affected
inner blocks: reset flow info whenever enclosing conditions change,
and avoid undefined behavior whenever enclosing conditions become
laxer.

for  gcc/ChangeLog

* tree-ssa-ifcombine.cc
(ifcombine_rewrite_to_defined_overflow): New.
(ifcombine_replace_cond): Reject conds that would require
moving too many stmts.  Reset flow sensitive info and avoid
undefined behavior in moved stmts.  Reset flow sensitive info
in all inner blocks when the outer condition changes, and
avoid undefined behavior whenever the outer condition becomes
laxer, adapted and moved from...
(pass_tree_ifcombine::execute): ... here.

handle TRUTH_ANDIF cond exprs in ifcombine_replace_cond

The upcoming move of fold_truth_andor to ifcombine brings with it the
possibility of TRUTH_ANDIF cond exprs. Handle them by splitting the
cond so as to best use both BB insertion points, but only if they're
contiguous.

for gcc/ChangeLog

* tree-ssa-ifcombine.c (ifcombine_replace_cond): Support
TRUTH_ANDIF cond exprs.

ifcombine across noncontiguous blocks

Rework ifcombine to support merging conditions from noncontiguous
blocks.  This depends on earlier preparation changes.

The function that attempted to ifcombine a block with its immediate
predecessor, tree_ssa_ifcombine_bb, now loops over dominating blocks
eligible for ifcombine, attempting to combine with them.

The function that actually drives the combination of a pair of blocks,
tree_ssa_ifcombine_bb_1, now takes an additional parameter: the
successor of outer that leads to inner.

The function that recognizes if_then_else patterns is modified to
enable testing without distinguishing between then and else, or to
require nondegenerate conditions, that aren't worth combining with.

for  gcc/ChangeLog

* tree-ssa-ifcombine.cc (recognize_if_then_else): Support
relaxed then/else testing; require nondegenerate condition
otherwise.
(tree_ssa_ifcombine_bb_1): Add outer_succ_bb parm, use it
instead of inner_cond_bb.  Adjust callers.
(tree_ssa_ifcombine_bb): Loop over dominating outer blocks
eligible for ifcombine.
(pass_tree_ifcombine::execute): Noted potential need for
changes to the post-combine logic.

extend ifcombine_replace_cond to handle noncontiguous ifcombine

Prepare to handle noncontiguous ifcombine, introducing logic to modify
the outer condition when needed.  There are two cases worth
mentioning:

- when blocks are noncontiguous, we have to place the combined
  condition in the outer block to avoid pessimizing carefully crafted
  short-circuited tests;

- even when blocks are contiguous, we prepare for situations in which
  the combined condition has two tests, one to be placed in outer and
  the other in inner.  This circumstance will not come up when
  noncontiguous ifcombine is first enabled, but it will when
  an improved fold_truth_andor is integrated with ifcombine.

Combining the condition from inner into outer may require moving SSA
DEFs used in the inner condition, and the changes implement this as
well.

for  gcc/ChangeLog

* tree-ssa-ifcombine.cc: Include bitmap.h.
(ifcombine_mark_ssa_name): New.
(struct ifcombine_mark_ssa_name_t): New.
(ifcombine_mark_ssa_name_walk): New.
(ifcombine_replace_cond): Prepare to handle noncontiguous and
split-condition ifcombine.

adjust update_profile_after_ifcombine for noncontiguous ifcombine

Prepare for ifcombining noncontiguous blocks, adding (still unused)
logic to the ifcombine profile updater to handle such cases.

for gcc/ChangeLog

* tree-ssa-ifcombine.cc (known_succ_p): New.
(update_profile_after_ifcombine): Handle noncontiguous blocks.

introduce ifcombine_replace_cond

Refactor ifcombine_ifandif, moving the common code from the various
paths that apply the combined condition to a new function.

for gcc/ChangeLog

* tree-ssa-ifcombine.cc (ifcombine_replace_cond): Factor out
of...
(ifcombine_ifandif): ... this. Leave it for the above to
gimplify and invert the condition.

drop redundant ifcombine_ifandif parm

In preparation to changes that may modify both inner and outer
conditions in ifcombine, drop the redundant parameter result_inv, that
is always identical to inner_inv.

for gcc/ChangeLog

* tree-ssa-ifcombine.cc (ifcombine_ifandif): Drop redundant
result_inv parm. Adjust all callers.

allow vuses in ifcombine blocks

Disallowing vuses in blocks for ifcombine is too strict, and it
prevents usefully moving fold_truth_andor into ifcombine. That
tree-level folder has long ifcombined loads, absent other relevant
side effects.

for gcc/ChangeLog

* tree-ssa-ifcombine.c (bb_no_side_effects_p): Allow vuses,
but not vdefs.

[testsuite] disable PIE on ia32 on more tests

Multiple tests fail on ia32 with -fPIE enabled by default because of
different call sequences required by the call-saved PIC register
(no-callee-saved-*.c), uses of the constant pool instead of computing
constants (pr100865-*.c), and unexpected matches of esp in get_pc_thunk
(sse2-stv-1.c). Disable PIE on them, to match the expectations.

for gcc/testsuite/ChangeLog

* gcc.target/i386/no-callee-saved-13.c: Disable PIE on ia32.
* gcc.target/i386/no-callee-saved-14.c: Likewise.
* gcc.target/i386/no-callee-saved-15.c: Likewise.
* gcc.target/i386/no-callee-saved-17.c: Likewise.
* gcc.target/i386/pr100865-1.c: Likewise.
* gcc.target/i386/pr100865-7a.c: Likewise.
* gcc.target/i386/pr100865-7c.c: Likewise.
* gcc.target/i386/sse2-stv-1.c: Likewise.

[testsuite] fix pr70321.c PIC expectations

When we select a non-bx get_pc_thunk, we get an extra mov to set up
the PIC register before the abort call. Expect that mov or a
get_pc_thunk.bx call.

for gcc/testsuite/ChangeLog

* gcc.target/i386/pr70321.c: Cope with non-bx get_pc_thunk.

RISC-V: Add testcases for signed imm SAT_ADD form1

This patch adds testcase for form1, as shown below:

T __attribute__((noinline))                  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x)             \
{                                            \
  T sum = (UT)x + (UT)IMM;                     \
  return (x ^ IMM) < 0                         \
    ? sum                                    \
    : (sum ^ x) >= 0                         \
      ? sum                                  \
      : x < 0 ? MIN : MAX;                   \
}

Passed the rv64gcv regression test.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/testsuite/ChangeLog:

* gcc.target/riscv/sat_arith.h: Support signed
imm SAT_ADD form1.
* gcc.target/riscv/sat_s_add_imm-1-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-3-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-4.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-1.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-2.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-3.c: New test.
* gcc.target/riscv/sat_s_add_imm-run-4.c: New test.

Match:Support signed imm SAT_ADD form1

This patch would like to support .SAT_ADD when one of the op
is singed IMM.

Form1:
T __attribute__((noinline))                  \
sat_s_add_imm_##T##_fmt_1##_##INDEX (T x)             \
{                                            \
  T sum = (UT)x + (UT)IMM;                     \
  return (x ^ IMM) < 0                         \
    ? sum                                    \
    : (sum ^ x) >= 0                         \
      ? sum                                  \
      : x < 0 ? MIN : MAX;                   \
}

Take below form1 as example:
DEF_SAT_S_ADD_IMM_FMT_1(0, int8_t, uint8_t, -10, INT8_MIN, INT8_MAX)

Before this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  int8_t sum;
  unsigned char x.0_1;
  unsigned char _2;
  signed char _4;
  int8_t _5;
  _Bool _9;
  signed char _10;
  signed char _11;
  signed char _12;
  signed char _14;
  signed char _16;

  <bb 2> [local count: 1073741824]:
  x.0_1 = (unsigned char) x_6(D);
  _2 = x.0_1 + 246;
  sum_7 = (int8_t) _2;
  _4 = x_6(D) ^ sum_7;
  _16 = x_6(D) ^ 9;
  _14 = _4 & _16;
  if (_14 < 0)
    goto <bb 3>; [41.00%]
  else
    goto <bb 4>; [59.00%]

  <bb 3> [local count: 259738147]:
  _9 = x_6(D) < 0;
  _10 = (signed char) _9;
  _11 = -_10;
  _12 = _11 ^ 127;

  <bb 4> [local count: 1073741824]:
  # _5 = PHI <sum_7(2), _12(3)>
  return _5;

}

After this patch:
__attribute__((noinline))
int8_t sat_s_add_imm_int8_t_fmt_1_0 (int8_t x)
{
  int8_t _5;

  <bb 2> [local count: 1073741824]:
  _5 = .SAT_ADD (x_6(D), -10); [tail call]
  return _5;

}

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

Signed-off-by: Li Xu <xuli1@eswincomputing.com>
gcc/ChangeLog:

* match.pd: Add the form1 of signed imm .SAT_ADD matching.
* tree-ssa-math-opts.cc (match_saturation_add): Add fold
convert for const_int to the type of operand 0.

Daily bump.

avx10_2-comibf-2.c: Require AVX10.2 support

Since avx10_2-comibf-2.c is a run test, require AVX10.2 support.

* gcc.target/i386/avx10_2-comibf-2.c: Require avx10_2 target.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

[PATCH v2] RISC-V: zero_extend(not) -> xor optimization [PR112398]

This patch adds optimization of the following patterns:

  (zero_extend:M (subreg:N (not:O==M (X:Q==M)))) ->
  (xor:M (zero_extend:M (subreg:N (X:M)), mask))
    ... where the mask is GET_MODE_MASK (N).

For the cases when X:M doesn't have any non-zero bits outside of mode N,
(zero_extend:M (subreg:N (X:M)) could be simplified to just (X:M)
and whole optimization will be:

  (zero_extend:M (subreg:N (not:M (X:M)))) ->
  (xor:M (X:M, mask))

Patch targets to handle code patterns like:
  not   a0,a0
  andi  a0,a0,0xff
to be optimized to:
  xori  a0,a0,255

Change was locally tested for x86_64 and AArch64 (as most common)
and for RV-64 and MIPS-32 targets (as having an effect from this optimization):
no regressions for all cases.

PR rtl-optimization/112398
gcc/ChangeLog:

* simplify-rtx.cc (simplify_context::simplify_unary_operation_1):
Simplify ZERO_EXTEND (SUBREG (NOT X)) to XOR (X, GET_MODE_MASK(SUBREG))
when X doesn't have any non-zero bits outside of SUBREG mode.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/pr112398.c: New test.

Signed-off-by: Alexey Merzlyakov <alexey.merzlyakov@samsung.com>

Darwin: Fix a narrowing warning.

cdtor_record needs to have an unsigned entry for the position in order to
match with vec_safe_length.

gcc/ChangeLog:

* config/darwin.cc (cdtor_record): Make position unsigned.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

openmp: Fix signed/unsigned warning

My previous patch broke things when building with Werror.

gcc/ChangeLog:

* omp-general.cc (omp_max_vf): Cast the constant to poly_uint64.

openmp: Add testcases for omp_max_vf

Ensure that the GOMP_MAX_VF does the right thing for explicit schedules, when
offloading is enabled ("target" directives are present), and is inactive
otherwise.

libgomp/ChangeLog:

* testsuite/libgomp.c/max_vf-1.c: New test.
* testsuite/libgomp.c/max_vf-2.c: New test.

gcc/testsuite/ChangeLog:

* gcc.dg/gomp/max_vf-1.c: New test.

openmp: Add IFN_GOMP_MAX_VF

Delay omp_max_vf call until after the host and device compilers have diverged
so that the max_vf value can be tuned exactly right on both variants.

This change means that the ompdevlow pass must be enabled for functions that
use OpenMP directives with both "simd" and "schedule" enabled.

gcc/ChangeLog:

* internal-fn.cc (expand_GOMP_MAX_VF): New function.
* internal-fn.def (GOMP_MAX_VF): New internal function.
* omp-expand.cc (omp_adjust_chunk_size): Emit IFN_GOMP_MAX_VF when
called in offload context, otherwise assume host context.
* omp-offload.cc (execute_omp_device_lower): Expand IFN_GOMP_MAX_VF.

openmp: use offload max_vf for chunk_size

The chunk size for SIMD loops should be right for the current device; too big
allocates too much memory, too small is inefficient. Getting it wrong doesn't
actually break anything though.

This patch attempts to choose the optimal setting based on the context. Both
host-fallback and device will get the same chunk size, but device performance
is the most important in this case.

gcc/ChangeLog:

* omp-expand.cc (is_in_offload_region): New function.
(omp_adjust_chunk_size): Add pass-through "offload" parameter.
(get_ws_args_for): Likewise.
(determine_parallel_type): Use is_in_offload_region to adjust call to
get_ws_args_for.
(expand_omp_for_generic): Likewise.
(expand_omp_for_static_chunk): Likewise.

openmp: Tune omp_max_vf for offload targets

If requested, return the vectorization factor appropriate for the offload
device, if any.

This change gives a significant speedup in the BabelStream "dot" benchmark on
amdgcn.

The omp_adjust_chunk_size usecase is set "false", for now, but I intend to
change that in a follow-up patch.

Note that NVPTX SIMT offload does not use this code-path.

gcc/ChangeLog:

* gimple-loop-versioning.cc (loop_versioning::loop_versioning): Set
omp_max_vf to offload == false.
* omp-expand.cc (omp_adjust_chunk_size): Likewise.
* omp-general.cc (omp_max_vf): Add "offload" parameter, and detect
amdgcn offload devices.
* omp-general.h (omp_max_vf): Likewise.
* omp-low.cc (lower_rec_simd_input_clauses): Pass offload state to
omp_max_vf.

Add details output for assume processing.

The Assume pass simply produces results, with no indication of how it
arrived as the results it gets. Add some output to the details listing.

The only functional change is when gori is used to calculate a range
more than once (ie, multiple uses), we now load the merged range rather
than just using the last calculated one.

* tree-assume.cc (assume_query::assume_query): Add debug output.
(assume_query::update_parms): Likewise.
(assume_query::calculate_phi): Likewise.
(assume_query::calculate_op): Likewise. Also pick up any
merged path values.
(assume_query::calculate_stmt): Likewise.

testsuite: add infinite recursion test case [PR63388]

gcc/testsuite/ChangeLog:
PR c++/63388
* g++.dg/analyzer/infinite-recursion-pr63388.C: New test.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: fix typo in comment

gcc/ChangeLog:
* diagnostic.h (class diagnostic_context): Fix typo in leading
comment.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libstdc++: Deprecate useless <cxxx> compatibility headers for C++17

These headers make no sense for C++ programs, because they either define
different content to the corresponding <xxx.h> C header, or define
nothing at all in namespace std. They were all deprecated in C++17, so
add deprecation warnings to them, which can be disabled with
-Wno-deprecated. For C++20 and later these headers are no longer in the
standard at all, so compiling with _GLIBCXX_USE_DEPRECATED defined to 0
will give an error when they are included.

Because #warning is non-standard before C++23 we need to use pragmas to
ignore -Wc++23-extensions for the -Wsystem-headers -pedantic case.

One g++ test needs adjustment because it includes <ciso646>, but that
can be made conditional on the __cplusplus value without any reduction
in test coverage.

For the library tests, consolidate the std_c++0x_neg.cc XFAIL tests into
the macros.cc test, using dg-error with a { target c++98_only }
selector. This avoids having two separate test files, one for C++98 and
one for everything later. Also add tests for the <xxx.h> headers to
ensure that they behave as expected and don't give deprecated warnings.

libstdc++-v3/ChangeLog:

* doc/xml/manual/evolution.xml: Document deprecations.
* doc/html/*: Regenerate.
* include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move
include guard to start of file. Include <complex> directly
instead of <ccomplex>.
* include/c_compatibility/tgmath.h: Include <cmath> and
<complex> directly, instead of <ctgmath>.
* include/c_global/ccomplex: Add deprecated #warning for C++17
and #error for C++20 if _GLIBCXX_USE_DEPRECATED == 0.
* include/c_global/ciso646: Likewise.
* include/c_global/cstdalign: Likewise.
* include/c_global/cstdbool: Likewise.
* include/c_global/ctgmath: Likewise.
* include/c_std/ciso646: Likewise.
* include/precompiled/stdc++.h: Do not include ccomplex,
ciso646, cstdalign, cstdbool, or ctgmath in C++17 and later.
* testsuite/18_support/headers/cstdalign/macros.cc: Check for
warnings and errors for unsupported dialects.
* testsuite/18_support/headers/cstdbool/macros.cc: Likewise.
* testsuite/26_numerics/headers/ctgmath/complex.cc: Likewise.
* testsuite/27_io/objects/char/1.cc: Do not include <ciso646>.
* testsuite/27_io/objects/wchar_t/1.cc: Likewise.
* testsuite/18_support/headers/cstdbool/std_c++0x_neg.cc: Removed.
* testsuite/18_support/headers/cstdalign/std_c++0x_neg.cc: Removed.
* testsuite/26_numerics/headers/ccomplex/std_c++0x_neg.cc: Removed.
* testsuite/26_numerics/headers/ctgmath/std_c++0x_neg.cc: Removed.
* testsuite/18_support/headers/ciso646/macros.cc: New test.
* testsuite/18_support/headers/ciso646/macros.h.cc: New test.
* testsuite/18_support/headers/cstdbool/macros.h.cc: New test.
* testsuite/26_numerics/headers/ccomplex/complex.cc: New test.
* testsuite/26_numerics/headers/ccomplex/complex.h.cc: New test.
* testsuite/26_numerics/headers/ctgmath/complex.h.cc: New test.

gcc/testsuite/ChangeLog:

* g++.old-deja/g++.other/headers1.C: Do not include ciso646 for
C++17 and later.

libstdc++: Move include guards to start of headers

libstdc++-v3/ChangeLog:

* include/c_compatibility/complex.h (_GLIBCXX_COMPLEX_H): Move
include guard to start of the header.
* include/c_global/ctgmath (_GLIBCXX_CTGMATH): Likewise.

libstdc++: More user-friendly failed assertions from shared_ptr dereference

Currently dereferencing an empty shared_ptr prints a complicated
internal type in the assertion message:

include/bits/shared_ptr_base.h:1377: std::__shared_ptr_access<_Tp, _Lp, <anonymous>, <anonymous> >::element_type& std::__shared_ptr_access<_Tp, _Lp, <anonymous>, <anonymous> >::operator*() const [with _Tp = std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack; __gnu_cxx::_Lock_policy _Lp = __gnu_cxx::_S_atomic; bool <anonymous> = false; bool <anonymous> = false; element_type = std::filesystem::__cxx11::recursive_directory_iterator::_Dir_stack]: Assertion '_M_get() != nullptr' failed.

Users don't care about any of the _Lp and <anonymous> template
parameters, so this is unnecessarily verbose.

We can simplify it to something that only mentions "shared_ptr_deref"
and the element type:

include/bits/shared_ptr_base.h:1371: _Tp* std::__shared_ptr_deref(_Tp*) [with _Tp = filesystem::__cxx11::recursive_directory_iterator::_Dir_stack]: Assertion '__p != nullptr' failed.

libstdc++-v3/ChangeLog:

* include/bits/shared_ptr_base.h (__shared_ptr_deref): New
function template.
(__shared_ptr_access, __shared_ptr_access<>): Use it.

libstdc++: Enable debug assertions for filesystem directory iterators

Several member functions of filesystem::directory_iterator and
filesystem::recursive_directory_iterator currently dereference their
shared_ptr data member without checking for non-null. Because they use
operator-> and that function only uses _GLIBCXX_DEBUG_PEDASSERT rather
than __glibcxx_assert there is no assertion even when the library is
built with _GLIBCXX_ASSERTIONS defined. This means that dereferencing
invalid directory iterators gives an unhelpful segfault.

By using (*p). instead of p-> we get an assertion when the library is
built with _GLIBCXX_ASSERTIONS, with a "_M_get() != nullptr" message.

libstdc++-v3/ChangeLog:

* src/c++17/fs_dir.cc (fs::directory_iterator::operator*): Use
shared_ptr::operator* instead of shared_ptr::operator->.
(fs::recursive_directory_iterator::options): Likewise.
(fs::recursive_directory_iterator::depth): Likewise.
(fs::recursive_directory_iterator::recursion_pending): Likewise.
(fs::recursive_directory_iterator::operator*): Likewise.
(fs::recursive_directory_iterator::disable_recursion_pending):
Likewise.

ipcp don't propagate where not needed

This patch disables propagation of ipcp information into partitions
where all instances of the node are marked to be inlined.

Motivation:
Incremental LTO needs stable values between compilations to be
effective. This requirement fails with following example:

void heavily_used_function(int);
...
heavily_used_function(__LINE__);

Ipcp creates long list of all __LINE__ arguments, and then
propagates it with every function clone, even though for inlined
functions this information is not useful.

gcc/ChangeLog:

* ipa-prop.cc (write_ipcp_transformation_info): Disable
uneeded value propagation.
* lto-cgraph.cc (lto_symtab_encoder_encode): Default values.
(lto_symtab_encoder_always_inlined_p): New.
(lto_set_symtab_encoder_not_always_inlined): New.
(add_node_to): Set always inlined.
* lto-streamer.h (struct lto_encoder_entry): New field.
(lto_symtab_encoder_always_inlined_p): New.

store-merging: Apply --param=store-merging-max-size= in more spots [PR117439]

Store merging assumes a merged region won't be too large.  The assumption is
e.g. in using inappropriate types in various spots (e.g. int for bit sizes
and bit positions in a few spots, or unsigned for the total size in bytes of
the merged region), in doing XNEWVEC for the whole total size of the merged
region and preparing everything in there and even that XALLOCAVEC in two
spots.  The last case is what was breaking the test below in the patch,
64MB XALLOCAVEC is just too large, but even with that fixed I think we just
shouldn't be merging gigabyte large merge groups.

We already have --param=store-merging-max-size= parameter, right now with
65536 bytes maximum (if needed, we could raise that limit a little bit).
That parameter is currently used when merging two adjacent stores, if the
size of the already merged bitregion together with the new store's bitregion
is above that limit, we don't merge those.
I guess initially that was sufficient, at that time a store was always
limited to MAX_BITSIZE_MODE_ANY_INT bits.
But later on we've added support for empty ctors ({} and even later
{CLOBBER}) and also added another spot where we merge further stores into
the merge group, if there is some overlap, we can merge various other stores
in one coalesce_immediate_stores iteration.
And, we weren't applying the --param=store-merging-max-size= parameter
in either of those cases.  So a single store can be gigabytes long, and
if there is some overlap, we can extend the region again to gigabytes in
size.

The following patch attempts to apply that parameter even in those cases.
So, if testing if it should merge the merged group with info (we've already
punted if those together are above the parameter) and some other stores,
the first two hunks just punt if that would make the merge group too large.
And the third hunk doesn't even add stores which are over the limit.

2024-11-06  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/117439
* gimple-ssa-store-merging.cc
(imm_store_chain_info::coalesce_immediate_stores): Punt if merging of
any of the additional overlapping stores would result in growing the
bitregion size over param_store_merging_max_size.
(pass_store_merging::process_store): Terminate all aliasing chains
for stores with bitregion larger than param_store_merging_max_size.

* g++.dg/opt/pr117439.C: New test.

store-merging: Don't use sub_byte_op_p mode for empty_ctor_p unless necessary [PR117439]

encode_tree_to_bitpos uses the more expensive sub_byte_op_p mode in which
it has to allocate a buffer and do various extra work like shifting the bits
etc. if bitlen or bitpos aren't multiples of BITS_PER_UNIT, or if bitlen
doesn't have corresponding integer mode.
The last case is explained later in the comments:
  /* The native_encode_expr machinery uses TYPE_MODE to determine how many
     bytes to write.  This means it can write more than
     ROUND_UP (bitlen, BITS_PER_UNIT) / BITS_PER_UNIT bytes (for example
     write 8 bytes for a bitlen of 40).  Skip the bytes that are not within
     bitlen and zero out the bits that are not relevant as well (that may
     contain a sign bit due to sign-extension).  */
Now, we've later added empty_ctor_p support, either {} CONSTRUCTOR
or {CLOBBER}, which doesn't use native_encode_expr at all, just memset,
so that case doesn't need those fancy games unless bitlen or bitpos
aren't multiples of BITS_PER_UNIT (unlikely, but let's pretend it is
possible).

The following patch makes us use the fast path even for empty_ctor_p
which occupy full bytes, we can just memset that in the provided buffer and
don't need to XALLOCAVEC another buffer.

This patch in itself fixes the testcase from the PR (which was about using
huge XALLLOCAVEC), but I want to do some other changes, to be posted in a
next patch.

2024-11-06  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/117439
* gimple-ssa-store-merging.cc (encode_tree_to_bitpos): For
empty_ctor_p use !sub_byte_op_p even if bitlen doesn't have an
integral mode.

Fortran: F2008 passing of internal procs to a proc pointer [PR117434]

2024-11-06 Paul Thomas <pault@gcc.gnu.org>

gcc/fortran
PR fortran/117434
* interface.cc (gfc_compare_actual_formal): Skip 'Expected a
procedure pointer error' if the formal argument typespec has an
interface and the type of the actual arg is BT_PROCEDURE.

gcc/testsuite/
PR fortran/117434
* gfortran.dg/proc_ptr_54.f90: New test. This is temporarily
compile-only until one one seven four five five is fixed.
* gfortran.dg/proc_ptr_55.f90: New test.
* gfortran.dg/proc_ptr_56.f90: New test.

i386: Add OPTION_MASK_ISA2_EVEX512 for some AVX512 instructions.

gcc/ChangeLog:

PR target/117304
* config/i386/i386-builtin.def: Add OPTION_MASK_ISA2_EVEX512 for some
AVX512 512-bits instructions.

gcc/testsuite/ChangeLog:

PR target/117304
* gcc.target/i386/pr117304-1.c: New test.

Intel MOVRS tests: Also scan (%e.x)

Since x32 uses (%reg32), instead of (%r.x), also scan (%e.x).

* gcc.target/i386/avx10_2-512-movrs-1.c: Also scan (%e.x).
* gcc.target/i386/avx10_2-movrs-1.c: Likewise.
* gcc.target/i386/movrs-1.c: Likewise.

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

gcc.target/i386/apx-ndd.c: Also scan (%edi)

Since x32 uses (%edi), instead of (%rdi), also scan (%edi).

* gcc.target/i386/apx-ndd.c: Also scan (%edi).

Signed-off-by: H.J. Lu <hjl.tools@gmail.com>

Daily bump.

fortran: dynamically allocate error_buffer [PR117442]

PR fortran/117442 reports a crash on exit of f951 when configured
with --enable-gather-detailed-mem-stats.

The crash happens if any diagnostics were ever buffered into
error_buffer.  The root cause is that error_buffer is statically
allocated and thus has a non-trivial destructor called at exit.
If error_buffer's diagnostic_buffer ever buffered anything, then
a diagnostic_per_format_buffer will have been created for the
buffer per-output-sink, and the destructors for these call
into the mem-stats subsystem, which has already beeen cleaned up.

The simplest fix is to allocate error_buffer on the heap, rather
that statically, which fixes the crash.

There's a comment about error_buffer:

  /* pp_error_buffer is statically allocated.  This simplifies memory
     management when using gfc_push/pop_error. */

added by Manu in r6-1748-g5862c189c2c3c2 while fixing PR fortran/66528.
The comment appears to be out of date.  I've tested maxerrors.f90 under
valgrind, and it's clean with the patch.

gcc/fortran/ChangeLog:
PR fortran/117442
* error.cc (error_buffer): Convert to a pointer so it can be
heap-allocated.
(gfc_error_now): Update for error_buffer being heap-allocated.
(gfc_clear_error): Likewise.
(gfc_error_flag_test): Likewise.
(gfc_error_check): Likewise.
(gfc_push_error): Likewise.
(gfc_pop_error): Likewise.
(gfc_diagnostics_init): Allocate error_buffer on the heap, rather
than statically.
(gfc_diagnostics_finish): Delete error_buffer.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

match: Fix comment for `X != 0 ? X + ~0 : 0` transformation

Just a small coment fix, the `(` was in the wrong location,
making it look it was transforming into `(X - X) != 0`
rather than `X - (X != 0)`.

Pushed as obvious after a quick build for x86_64-linux-gnu.

gcc/ChangeLog:

* match.pd (X != 0 ? X + ~0 : 0): Fix comment.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

testsuite: arm: Use effective-target for pr68620 and pr78041 tests

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr68620.c: Use effective-target arm_neon.
* gcc.target/arm/pr78041.c: Use effective-target arm_arch_v7a.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: arm: Relax register selection [PR116623]

Since r15-1619-g3b9b8d6cfdf, test5 and test8 fails due to that "ip"
might be used and r3 might be moved to another register for later
dereference.

gcc/testsuite/ChangeLog:

PR testsuite/116623
* gcc.target/arm/mve/dlstp-compile-asm-2.c: Align test5 and
test8 with changes in r15-1619-g3b9b8d6cfdf.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

testsuite: arm: Use effective-target for pr98636.c test

The test case assumes that -mfp16-format=alternative is accepted for the
target, but not all targets support this flag. One such target is
Cortex-M85 that does support FP16, but not the alternative format.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr98636.c: Use effective-target
arm_fp16_alternative.

Signed-off-by: Torbjörn SVENSSON <torbjorn.svensson@foss.st.com>

c: gimplefe: Only allow an identifier before ? [PR117445]

Since r13-707-g68e0063397ba82, COND_EXPR/VEC_COND_EXPR has not
allowed a comparison as the first operand but the gimple front-end
was not updated for this change and you would error out later on.
An assert was added with r15-4791-gb60031e8f9f8fe which meant an ICE
would happen from the gimple FE.
This removes support for parsing of the `?:` expressions except for an
identifier.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/c/ChangeLog:

PR c/117445
* gimple-parser.cc (c_parser_gimple_statement): Remove
support for comparisons before the querry (`?`) token.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

PR target/117449: Restrict vector rotate match and split to pre-reload

The vector rotate splitter has some logic to deal with post-reload splitting
but not all cases in aarch64_emit_opt_vec_rotate are post-reload-safe.
In particular the ROTATE+XOR expansion for TARGET_SHA3 can create RTL that
can later be simplified to a simple ROTATE post-reload, which would then
match the insn again and try to split it.
So do a clean split pre-reload and avoid going down this path post-reload
by restricting the insn_and_split to can_create_pseudo_p ().

Bootstrapped and tested on aarch64-none-linux.

Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/

PR target/117449
* config/aarch64/aarch64-simd.md (*aarch64_simd_rotate_imm<mode>):
Match only when can_create_pseudo_p ().
* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Assume
can_create_pseudo_p ().

gcc/testsuite/

PR target/117449
* gcc.c-torture/compile/pr117449.c: New test.

testsuite: Fix up gcc.target/powerpc/safe-indirect-jump-3.c test [PR117444]

The test safe-indirect-jump-3.c FAILs on powerpc64le-linux with the change
in jump table generation behavior with commit r15-4756-g06bc3a734e8890,
since it is compiled without optimization and expects jump tables to be
generated. Add an explicit -fjump-tables to dg-options to get the old
behavior back.

2024-11-05 Peter Bergner <bergner@linux.ibm.com>

gcc/testsuite/
PR testsuite/117444
* gcc.target/powerpc/safe-indirect-jump-3.c: Add -fjump-tables to
dg-options.

c++: allow array mem-init with -fpermissive [PR116634]

We've accidentally accepted this forever (at least as far back as 4.7), but
it's always been ill-formed; this was PR59465. And we didn't accept it for
scalar types. But rather than switch to a hard error for this code, let's
give a permerror so affected code can continue to work with -fpermissive.

PR c++/116634

gcc/cp/ChangeLog:

* init.cc (can_init_array_with_p): Allow PR59465 case with
permerror.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/aggr-init1.C: Expect warning with -fpermissive.
* g++.dg/init/array62.C: Adjust diagnostic.
* g++.dg/init/array63.C: Adjust diagnostic.
* g++.dg/init/array64.C: Adjust diagnostic.

Deprecate the ARM simulator

The ARM simulator is no longer able to simulator modern ARM cores, so it
is being deprecated. Once this change has been active for a while - and
assuming that no problems have been found - the ARm simulator codebase
will be removed.

2024-11-05 Nick Clifton <nickc@redhat.com>

* configure.ac: Add sim to noconfigdirs for ARM targets.
* configure: Regenerate.

c++: Fix crash during NRV optimization with invalid input [PR117099, PR117129]

PR117099 and PR117129 are ICEs upon invalid code that happen when NRVO
is activated, and both due to the fact that we don't consistently set
current_function_return_value to error_mark_node upon error in
finish_return_expr.

This patch fixes this inconsistency which fixes both cases since we skip
calling finalize_nrv when current_function_return_value is
error_mark_node.

PR c++/117099
PR c++/117129

gcc/cp/ChangeLog:

* typeck.cc (check_return_expr): Upon error, set
current_function_return_value to error_mark_node.

gcc/testsuite/ChangeLog:

* g++.dg/parse/crash78.C: New test.
* g++.dg/parse/crash78a.C: New test.
* g++.dg/parse/crash79.C: New test.

c++: Don't crash upon invalid placement new operator [PR117101]

We currently crash upon the following invalid code (notice the "void
void**" parameter)

=== cut here ===
using size_t = decltype(sizeof(int));
void *operator new(size_t, void void **p) noexcept { return p; }
int x;
void f() {
int y;
new (&y) int(x);
}
=== cut here ===

The problem is that in this case, we end up with a NULL_TREE parameter
list for the new operator because of the error, and (1) coerce_new_type
wrongly complains about the first parameter type not being size_t,
(2) std_placement_new_fn_p blindly accesses the parameter list, hence a
crash.

This patch does NOT address #1 since we can't easily distinguish between
a new operator declaration without parameters from one with erroneous
parameters (and it's not worth the risk to refactor and break things for
an error recovery issue) hence a dg-bogus in new52.C, but it does
address #2 and the ICE by simply checking the first parameter against
NULL_TREE.

It also adds a new testcase checking that we complain about new
operators with no or invalid first parameters, since we did not have
any.

PR c++/117101

gcc/cp/ChangeLog:

* init.cc (std_placement_new_fn_p): Check first_arg against
NULL_TREE.

gcc/testsuite/ChangeLog:

* g++.dg/init/new52.C: New test.
* g++.dg/init/new53.C: New test.

c++: Defer -fstrong-eval-order processing to template instantiation time [PR117158]

Since r10-3793-g1a37b6d9a7e57c, we ICE upon the following valid code
with -std=c++17 and above

=== cut here ===
struct Base {
  unsigned int *intarray;
};
template <typename T> struct Sub : public Base {
  bool Get(int i) {
    return (Base::intarray[++i] == 0);
  }
};
=== cut here ===

The problem is that from c++17 on, we use -fstrong-eval-order and need
to wrap the array access expression into a SAVE_EXPR. We do so at
template declaration time, and end up calling contains_placeholder_p
with a SCOPE_REF, that it does not handle well.

This patch fixes this by deferring the wrapping into SAVE_EXPR to
instantiation time for templates, when the SCOPE_REF will have been
turned into a COMPONENT_REF.

PR c++/117158

gcc/cp/ChangeLog:

* typeck.cc (cp_build_array_ref): Only wrap array expression
into a SAVE_EXPR at template instantiation time.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/eval-order13.C: New test.
* g++.dg/parse/crash77.C: New test.

testsuite: fix testcase pr110279-1.c

The test case is for targets that support FMA. Previously
the "target" selector is missed in dg-final command.

gcc/testsuite/ChangeLog:
PR tree-optimization/110279
* gcc.dg/pr110279-1.c: add target selector.

Support vector float_extend from __bf16 to float.

It's supported by vector permutation with zero vector.

gcc/ChangeLog:

* config/i386/i386-expand.cc
(ix86_expand_vector_bf2sf_with_vec_perm): New function.
* config/i386/i386-protos.h
(ix86_expand_vector_bf2sf_with_vec_perm): New Declare.
* config/i386/mmx.md (extendv2bfv2sf2): New expander.
* config/i386/sse.md (extend<sf_cvt_bf16_lower><mode>2):
Ditto.
(VF1_AVX512BW): New mode iterator.
(sf_cvt_bf16): Add V4SF.
(sf_cvt_bf16_lower): New mode attr.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bw-extendbf2sf.c: New test.
* gcc.target/i386/sse2-extendbf2sf.c: New test.

Support vector float_truncate for SF to BF.

Generate native instruction whenever possible, otherwise use vector
permutation with odd indices.

gcc/ChangeLog:

* config/i386/i386-expand.cc
(ix86_expand_vector_sf2bf_with_vec_perm): New function.
* config/i386/i386-protos.h
(ix86_expand_vector_sf2bf_with_vec_perm): New declare.
* config/i386/mmx.md (truncv2sfv2bf2): New expander.
* config/i386/sse.md (truncv4sfv4bf2): Ditto.
(truncv8sfv8bf2): Ditto.
(truncv16sfv16bf2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bf16-truncsfbf.c: New test.
* gcc.target/i386/avx512bw-truncsfbf.c: New test.
* gcc.target/i386/ssse3-truncsfbf.c: New test.

c++: Mark replaceable global operator new/delete with const std::nothrow_t& argument as DECL_IS_REPLACEABLE_OPERATOR [PR117370]

cxx_init_decl_processing predeclares 12 out of the 20 replaceable global
new/delete operators and sets DECL_IS_REPLACEABLE_OPERATOR on those.
But it doesn't handle the remaining 8, in particular
void* operator new(std::size_t, const std::nothrow_t&) noexcept;
void* operator new[](std::size_t, const std::nothrow_t&) noexcept;
void operator delete(void*, const std::nothrow_t&) noexcept;
void operator delete[](void*, const std::nothrow_t&) noexcept;
void* operator new(std::size_t, std::align_val_t, const std::nothrow_t&) noexcept;
void* operator new[](std::size_t, std::align_val_t, const std::nothrow_t&) noexcept;
void operator delete(void*, std::align_val_t, const std::nothrow_t&) noexcept;
void operator delete[](void*, std::align_val_t, const std::nothrow_t&) noexcept;

The following patch sets that flag during grok_op_properties for those, so
that they don't need to be predeclared.
The patch doesn't fix the whole PR, as some work is needed on the CDDCE side
too, unlike the throwing operator new case the if (ptr) conditional around
operator delete isn't removed by VRP and so we need to handle conditional
delete for unconditional new.

2024-11-05 Jakub Jelinek <jakub@redhat.com>

PR c++/117370
* cp-tree.h (is_std_class): Declare.
* constexpr.cc (is_std_class): New function.
(is_std_allocator): Use it.
* decl.cc (grok_op_properties): Mark global replaceable
operator new/delete operators with const std::nothrow_t & last
argument with DECL_IS_REPLACEABLE_OPERATOR.

i386: Handling exception input of __builtin_ia32_prefetch. [PR117416]

op1 should be between 0 and 2. Add an error handler, and op3 should be 0
or 1, raise a warning, when op3 is an invalid value.

gcc/ChangeLog:

PR target/117416
* config/i386/i386-expand.cc (ix86_expand_builtin): Raise warning when
op1 isn't in range of [0, 2] and set op1 as const0_rtx, and raise
warning when op3 isn't in range of [0, 1].

gcc/testsuite/ChangeLog:

PR target/117416
* gcc.target/i386/pr117416-1.c: New test.
* gcc.target/i386/pr117416-2.c: Ditto.

middle-end/117433 - ICE with gimple BLKmode reg copy

When we end up expanding a SSA name copy with BLKmode regs which can
happen for vectors, possibly wrapped in a NOP-conversion or
a PAREN_EXPR and we are not optimizing we can end up with two
BLKmode MEMs that expand_gimple_stmt_1 doesn't properly handle
when expanding, trying to emit_move_insn them. Looking at store_expr
which what expand_gimple_stmt_1 is really doing reveals a lot of
magic that's missing. It eventually falls back to emit_block_move
(store_expr isn't exported), so this is what I ended up using here
given I think we'll only have BLKmode "registers" for vectors.

PR middle-end/117433
* cfgexpand.cc (expand_gimple_stmt_1): Use emit_block_move
when moving temp to BLKmode target.

* gcc.dg/pr117433.c: New testcase.

aarch64: remove falkor-tag-collision-avoidance pass

This code is not well tested and there is only a single testcase
(gcc.target/aarch64/pr94530.c) which only enables this code but it
is testing to make sure there is no ICE.

The falkor cores have not been supported from Qualcomm from 2019 or so
either. And I don't have a way to test these cores internally either.

Bootstrapped and tested on aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-passes.def: Don't add pass_tag_collision_avoidance.
* config/aarch64/aarch64-protos.h (make_pass_tag_collision_avoidance): Remove.
* config/aarch64/aarch64-tuning-flags.def (RENAME_LOAD_REGS): Remove.
* config/aarch64/tuning_models/qdf24xx.h (qdf24xx_tunings): Set tuning flags to
AARCH64_EXTRA_TUNE_NONE.
* config/aarch64/falkor-tag-collision-avoidance.cc: Removed.
* config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o): Remove.
* config.gcc (aarch64*-*-*): Remove falkor-tag-collision-avoidance.o from extra_objs.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

aarch64: Remove scheduling models for falkor and saphira

These 2 qualcomm cores have been long gone in that Qualcomm has not
supported since at least 2019. Removing them will make it easier I think
to change the insn type attributes instead of keeping them up todate.

Note this does not remove the cores, just the schedule models.

Bootstrapped and tested on aarch64-linux-gnu.

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (falkor): Use cortex-a57 scheduler.
(saphira): Likewise.
* config/aarch64/aarch64.md: Don't include falkor.md and saphira.md.
* config/aarch64/falkor.md: Removed.
* config/aarch64/saphira.md: Removed.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

i386: Utilize VCOMSBF16 for BF16 Comparisons with AVX10.2

This patch enables the use of the VCOMSBF16 instruction from AVX10.2 for
efficient BF16 comparisons.

gcc/ChangeLog:

* config/i386/i386-expand.cc (ix86_expand_branch): Handle BFmode
when TARGET_AVX10_2_256 is enabled.
(ix86_prepare_fp_compare_args): Use SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P.
(ix86_expand_fp_movcc): Ditto.
(ix86_expand_fp_compare): Handle BFmode under IX86_FPCMP_COMI.
* config/i386/i386.cc (ix86_multiplication_cost): Use
SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P.
(ix86_division_cost): Ditto.
(ix86_rtx_costs): Ditto.
(ix86_vector_costs::add_stmt_cost): Ditto.
* config/i386/i386.h (SSE_FLOAT_MODE_SSEMATH_OR_HF_P): Rename to ...
(SSE_FLOAT_MODE_SSEMATH_OR_HFBF_P): ...this, and add BFmode.
* config/i386/i386.md (*cmpibf): New define_insn.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx10_2-comibf-1.c: New test.
* gcc.target/i386/avx10_2-comibf-2.c: Ditto.

Handle T_HRESULT types in CodeView records

Follow MSVC in having a special type value, T_HRESULT, for (signed)
longs that have been typedef'd with the name "HRESULT". This is so that
the debugger can display user-friendly constant names when debugging COM
code.

gcc/
* dwarf2codeview.cc (get_type_num_typedef): New function.
(get_type_num): Call get_type_num_typedef.
* dwarf2codeview.h (T_HRESULT): Define.

Write LF_POINTER CodeView types for pointers to member functions or data

Translate DW_TAG_ptr_to_member_type DIEs into special extended
LF_POINTER CodeView types.

gcc/
* dwarf2codeview.cc (struct codeview_custom_type): Add new fields to
lf_pointer struct in union.
(write_lf_pointer): Write containing_class and ptr_to_mem_type if
applicable.
(get_type_num_subroutine_type): Write correct containing_class_type if
this is a pointer to a member function.
(get_type_num_ptr_to_member_type): New function.
(get_type_num): Call get_type_num_ptr_to_member_type.
* dwarf2codeview.h (CV_PTR_MODE_MASK, CV_PTR_MODE_PMEM): Define.
(CV_PTR_MODE_PMFUNC, CV_PMTYPE_D_Single, CV_PMTYPE_F_Single): Likewise.

Write LF_BCLASS records in CodeView struct definitions

When writing the CodeView type definition for a struct, translate
DW_TAG_inheritance DIEs into LF_BCLASS records, to record which other
structs this one inherits from.

gcc/
* dwarf2codeview.cc (enum cv_leaf_type): Add LF_BCLASS.
(struct codeview_subtype): Add lf_bclass to union.
(write_cv_padding): Add declaration.
(write_lf_fieldlist): Handle LF_BCLASS records.
(add_struct_inheritance): New function.
(get_type_num_struct): Call add_struct_inheritance.

c++/modules: Merge default arguments [PR99274]

When merging a newly imported declaration with an existing declaration
we don't currently propagate new default arguments, which causes issues
when modularising header units. This patch adds logic to propagate
default arguments to existing declarations on import, and error if the
defaults do not match.

PR c++/99274

gcc/cp/ChangeLog:

* module.cc (trees_in::is_matching_decl): Merge default
arguments.
* tree.cc (cp_tree_equal) <AGGR_INIT_EXPR>: Handle unification
of AGGR_INIT_EXPRs with new VAR_DECL slots.

gcc/testsuite/ChangeLog:

* g++.dg/modules/lambda-7.h: Skip ODR-violating declaration when
testing ODR deduplication.
* g++.dg/modules/lambda-7_b.C: Note we're testing ODR
deduplication.
* g++.dg/modules/default-arg-1_a.H: New test.
* g++.dg/modules/default-arg-1_b.C: New test.
* g++.dg/modules/default-arg-2_a.H: New test.
* g++.dg/modules/default-arg-2_b.C: New test.
* g++.dg/modules/default-arg-3.h: New test.
* g++.dg/modules/default-arg-3_a.H: New test.
* g++.dg/modules/default-arg-3_b.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Patrick Palka <ppalka@redhat.com>
Reviewed-by: Jason Merrill <jason@redhat.com>