git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Make __gnu_debug::vector usable in constant expressions [PR109536]

This makes constexpr std::vector (mostly) work in Debug Mode. All safe
iterator instrumentation and checking is disabled during constant
evaluation, because it requires mutex locks and calls to non-inline
functions defined in libstdc++.so. It should be OK to disable the safety
checks, because most UB should be detected during constant evaluation
anyway.

We could try to enable the full checking in constexpr, but it would mean
wrapping all the non-inline functions like _M_attach with an inline
_M_constexpr_attach that does the iterator housekeeping inline without
mutex locks when called for constant evaluation, and calls the
non-inline function at runtime. That could be done in future if we find
that we've lost safety or useful checking by disabling the safe
iterators.

There are a few test failures in C++20 mode, which I'm unable to
explain. The _Safe_iterator::operator++() member gives errors for using
non-constexpr functions during constant evaluation, even though those
functions are guarded by std::is_constant_evaluated() checks. The same
code works fine for C++23 and up.

libstdc++-v3/ChangeLog:

PR libstdc++/109536
* include/bits/c++config (__glibcxx_constexpr_assert): Remove
macro.
* include/bits/stl_algobase.h (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr to overloads for
debug mode iterators.
* include/debug/helper_functions.h (__unsafe): Add constexpr.
* include/debug/macros.h (_GLIBCXX_DEBUG_VERIFY_COND_AT): Remove
macro, folding it into ...
(_GLIBCXX_DEBUG_VERIFY_AT_F): ... here. Do not use
__glibcxx_constexpr_assert.
* include/debug/safe_base.h (_Safe_iterator_base): Add constexpr
to some member functions. Omit attaching, detaching and checking
operations during constant evaluation.
* include/debug/safe_container.h (_Safe_container): Likewise.
* include/debug/safe_iterator.h (_Safe_iterator): Likewise.
* include/debug/safe_iterator.tcc (__niter_base, __copy_move_a)
(__copy_move_backward_a, __fill_a, __fill_n_a, __equal_aux)
(__lexicographical_compare_aux): Add constexpr.
* include/debug/vector (_Safe_vector, vector): Add constexpr.
Omit safe iterator operations during constant evaluation.
* testsuite/23_containers/vector/bool/capacity/constexpr.cc:
Remove dg-xfail-if for debug mode.
* testsuite/23_containers/vector/bool/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/bool/cons/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/1.cc:
Likewise.
* testsuite/23_containers/vector/bool/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/bool/modifiers/swap/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/capacity/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/cmp_c++20.cc: Likewise.
* testsuite/23_containers/vector/cons/constexpr.cc: Likewise.
* testsuite/23_containers/vector/data_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/element_access/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/assign/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/modifiers/swap/constexpr.cc:
Likewise.
* testsuite/23_containers/vector/cons/destructible_debug_neg.cc:
Adjust dg-error line number.

tree-optimization/113018 - ICE with BB reduction vectorization

When BB reduction vectorization picks up a chain with an ASM def
in it and that's inside the vectorized region we fail to get its
LHS. Instead of trying to get the correct def the following
avoids vectorizing such def and instead keeps it as def to add
in the epilog.

PR tree-optimization/113018
* tree-vect-slp.cc (vect_slp_check_for_roots): Only start
SLP discovery from stmts with a LHS.

c++: Implement P2582R1, CTAD from inherited constructors

This patch implements C++23 class template argument deduction from
inherited constructors, the mechanism for which relies on alias
CTAD which we already fully support.  The process for transforming
the return type of an inherited guide is specified in terms of a
partially specialized class template, but this patch implements it in
a simpler way, effectively performing ahead of time deduction instead
of instantiation time deduction.  I wasn't able to find an example for
which this implementation strategy makes a difference, but I didn't
look very hard.  Support seems good enough to advertise as complete
but there doesn't seem to be a feature-test macro update for this
feature yet.  There should be no functional change before C++23 mode.

There's a couple of FIXMEs, one in inherited_ctad_tweaks for recognizing
more forms of inherited constructors, and one in deduction_guides_for for
making the cache aware of base-class dependencies.

gcc/cp/ChangeLog:

* cp-tree.h (type_targs_deducible_from): Adjust return type.
* pt.cc (alias_ctad_tweaks): Also handle C++23 inherited CTAD.
(inherited_ctad_tweaks): Define.
(type_targs_deducible_from): Return the deduced arguments or
NULL_TREE instead of a bool.  Handle 'tmpl' being a TREE_LIST
representing a synthetic alias template.
(ctor_deduction_guides_for): Do inherited_ctad_tweaks for each
USING_DECL in C++23 mode.
(deduction_guides_for): Add FIXME for stale cache entries in
light of inherited CTAD.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/class-deduction67.C: Accept in C++23 mode.
* g++.dg/cpp23/class-deduction-inherited1.C: New test.
* g++.dg/cpp23/class-deduction-inherited2.C: New test.
* g++.dg/cpp23/class-deduction-inherited3.C: New test.
* g++.dg/cpp23/class-deduction-inherited4.C: New test.

tree-optimization/112793 - SLP of constant/external code-generated twice

The following makes the attempt at code-generating a constant/external
SLP node twice well-formed as that can happen when partitioning BB
vectorization attempts where we keep constants/externals unpartitioned.

PR tree-optimization/112793
* tree-vect-slp.cc (vect_schedule_slp_node): Already
code-generated constant/external nodes are OK.

* g++.dg/vect/pr112793.cc: New testcase.

analyzer: cleanups [PR112655]

Avoid copying eedges in infinite_loop::infinite_loop.

Use initializer lists in the various places reported in
PR analyzer/112655 (apart from coord_test's ctor, which
would require nontrivial refactoring).

gcc/analyzer/ChangeLog:
PR analyzer/112655
* infinite-loop.cc (infinite_loop::infinite_loop): Pass eedges
via rvalue reference rather than by value.
(starts_infinite_loop_p): Move eedges when constructing an
infinite_loop instance.
* sm-file.cc (fileptr_state_machine::fileptr_state_machine): Use
initializer list for states.
* sm-sensitive.cc
(sensitive_state_machine::sensitive_state_machine): Likewise.
* sm-signal.cc (signal_state_machine::signal_state_machine):
Likewise.
* sm-taint.cc (taint_state_machine::taint_state_machine):
Likewise.
* varargs.cc (va_list_state_machine::va_list_state_machine): Likewise.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

aarch64: Improve handling of accumulators in early-ra

Being very simplistic, early-ra just models an allocno's live range
as a single interval.  This doesn't work well for single-register
accumulators that are updated multiple times in a loop, since in
SSA form, each intermediate result will be a separate SSA name and
will remain separate from the accumulator even after out-of-ssa.
This means that in something like:

  for (;;)
    {
      x = x + ...;
      x = x + ...;
    }

the first definition of x and the second use will be a separate pseudo
from the "main" loop-carried pseudo.

A real RA would fix this by keeping general, segmented live ranges.
But that feels like a slippery slope in this context.

This patch instead looks for sharability at a more local level,
as described in the comments.  It's a bit hackish, but hopefully
not too much.

The patch also contains some small tweaks that are needed to make
the new and existing tests pass:

- fix a case where a pseudo that was only moved was wrongly treated
  as not an FPR candidate

- fix some bookkeeping related to is_strong_copy_src

- use the number of FPR preferences as a tiebreaker when sorting colors

I fully expect that we'll need to be more aggressive at skipping the
early-ra allocation.  For example, it probably makes sense to refuse any
allocation that involves an FPR move.  But I'd like to keep collecting
examples of where things go wrong first, so that hopefully we can improve
the cases with strided registers or structures.

gcc/
* config/aarch64/aarch64-early-ra.cc (allocno_info::is_equiv): New
member variable.
(allocno_info::equiv_allocno): Replace with...
(allocno_info::related_allocno): ...this member variable.
(allocno_info::chain_prev): Put into an enum with...
(allocno_info::last_use_point): ...this new member variable.
(color_info::num_fpr_preferences): New member variable.
(early_ra::m_shared_allocnos): Likewise.
(allocno_info::is_shared): New member function.
(allocno_info::is_equiv_to): Likewise.
(early_ra::dump_allocnos): Dump sharing information.  Tweak column
widths.
(early_ra::fpr_preference): Check ALLOWS_NONFPR before returning -2.
(early_ra::start_new_region): Handle m_shared_allocnos.
(early_ra::create_allocno_group): Set related_allocno rather than
equiv_allocno.
(early_ra::record_allocno_use): Likewise.  Detect multiple calls
for the same program point.  Update last_use_point and is_equiv.
Clear is_strong_copy_src rather than is_strong_copy_dest.
(early_ra::record_allocno_def): Use related_allocno rather than
equiv_allocno.  Update last_use_point.
(early_ra::valid_equivalence_p): Replace with...
(early_ra::find_related_start): ...this new function.
(early_ra::record_copy): Look for cases where a destination copy chain
can be shared with the source allocno.
(early_ra::find_strided_accesses): Update for equiv_allocno->
related_allocno change.  Only call consider_strong_copy_src_chain
at the head of a copy chain.
(early_ra::is_chain_candidate): Skip shared allocnos.  Update for
new representation of equivalent allocnos.
(early_ra::chain_allocnos): Update for new representation of
equivalent allocnos.
(early_ra::try_to_chain_allocnos): Likewise.
(early_ra::merge_fpr_info): New function, split out from...
(early_ra::set_single_color_rep): ...here.
(early_ra::form_chains): Handle shared allocnos.
(early_ra::process_copies): Count the number of FPR preferences.
(early_ra::cmp_decreasing_size): Rename to...
(early_ra::cmp_allocation_order): ...this.  Sort equal-sized groups
by the number of FPR preferences.
(early_ra::finalize_allocation): Handle shared allocnos.
(early_ra::process_region): Reset chain_prev as well as chain_next.

gcc/testsuite/
* gcc.target/aarch64/sve/accumulators_1.c: New test.
* gcc.target/aarch64/sve/acle/asm/create2_1.c: Allow the moves to
be in any order.
* gcc.target/aarch64/sve/acle/asm/create3_1.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/create4_1.c: Likewise.

strub: indirect volatile parms in wrappers

Arrange for strub internal wrappers to pass volatile arguments by
reference to the wrapped bodies.

for gcc/ChangeLog

PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Pass volatile args
by reference to internal strub wrapped bodies.

for gcc/testsuite/ChangeLog

PR middle-end/112938
* gcc.dg/strub-internal-volatile.c: Check indirection of
volatile args.

strub: handle volatile promoted args in internal strub [PR112938]

When generating code for an internal strub wrapper, don't clear the
DECL_NOT_GIMPLE_REG_P flag of volatile args, and gimplify them both
before and after any conversion.

While at that, move variable TMP into narrower scopes so that it's
more trivial to track where ARG lives.

for  gcc/ChangeLog

PR middle-end/112938
* ipa-strub.cc (pass_ipa_strub::execute): Handle promoted
volatile args in internal strub.  Simplify.

for  gcc/testsuite/ChangeLog

PR middle-end/112938
* gcc.dg/strub-internal-volatile.c: New.

[committed] Fix m68k testcase for c99

More fallout from the c99 conversion. The m68k specific test pr63347.c calls
exit and abort without a prototype in scope. This patch turns them into
__builtin calls avoiding the error.

Bootstrapped and regression tested on m68k-linux-gnu, pushed to the trunk.

gcc/testsuite
* gcc.target/m68k/pr63347.c: Call __builtin_abort and __builtin_exit
instead of abort and exit.

In 'gcc/gimple-ssa-sccopy.cc', '#define INCLUDE_ALGORITHM' instead of '#include <algorithm>'

... to avoid issues such as:

    In file included from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/xmmintrin.h:34:0,
                     from [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/x86intrin.h:31,
                     from [...]/i686-pc-linux-gnu/include/c++/5.2.0/i686-pc-linux-gnu/64/bits/opt_random.h:33,
                     from [...]/i686-pc-linux-gnu/include/c++/5.2.0/random:50,
                     from [...]/i686-pc-linux-gnu/include/c++/5.2.0/bits/stl_algo.h:66,
                     from [...]/i686-pc-linux-gnu/include/c++/5.2.0/algorithm:62,
                     from [...]/source-gcc/gcc/gimple-ssa-sccopy.cc:32:
    [...]/lib/gcc/i686-pc-linux-gnu/5.2.0/include/mm_malloc.h:42:12: error: attempt to use poisoned "malloc"
         return malloc (size);
                ^
    make[2]: *** [Makefile:1197: gimple-ssa-sccopy.o] Error 1

Minor fix-up for commit cd794c3961017703a4d2ca0e854ea23b3d4b6373
"A new copy propagation and PHI elimination pass".

gcc/
* gimple-ssa-sccopy.cc: '#define INCLUDE_ALGORITHM' instead of
'#include <algorithm>'.

build: Add libgrust as compilation modules

Define the libgrust directory as a host compilation module as well as
for targets. Disable target libgrust if we're not building target
libstdc++.

ChangeLog:

* Makefile.def: Add libgrust as host & target module.
* configure.ac: Add libgrust to host tools list. Add libgrust to
noconfigdirs if we're not building target libstdc++.
* Makefile.in: Regenerate.
* configure: Regenerate.

gcc/rust/ChangeLog:

* config-lang.in: Add libgrust as a target module for the rust
language.

Co-authored-by: Thomas Schwinge <thomas@codesourcery.com>
Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

libgrust: Add libproc_macro and build system

Add some dummy files in libproc_macro along with its build system.

libgrust/ChangeLog:

* Makefile.am: New file.
* Makefile.in: Generate.
* configure.ac: New file.
* configure: Generate.
* aclocal.m4: Generate.
* libproc_macro/Makefile.am: New file.
* libproc_macro/proc_macro.cc: New file.
* libproc_macro/proc_macro.h: New file.
* libproc_macro/Makefile.in: Generate.

contrib/ChangeLog:

* gcc_update: Add libgrust file dependencies.

Co-authored-by: Arthur Cohen <arthur.cohen@embecosm.com>
Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

Revert "RISC-V: Add avail interface into function_group_info"

This reverts commit ce7e66787b5b4ad385b21756da5a89171d233ddc.
Will refactor this part in the same way as aarch64 sve.

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-functions.def (DEF_RVV_FUNCTION):
Revert changes.
(read_vl): Ditto.
(vlenb): Ditto.
(vsetvl): Ditto.
(vsetvlmax): Ditto.
(vle): Ditto.
(vse): Ditto.
(vlm): Ditto.
(vsm): Ditto.
(vlse): Ditto.
(vsse): Ditto.
(vluxei8): Ditto.
(vluxei16): Ditto.
(vluxei32): Ditto.
(vluxei64): Ditto.
(vloxei8): Ditto.
(vloxei16): Ditto.
(vloxei32): Ditto.
(vloxei64): Ditto.
(vsuxei8): Ditto.
(vsuxei16): Ditto.
(vsuxei32): Ditto.
(vsuxei64): Ditto.
(vsoxei8): Ditto.
(vsoxei16): Ditto.
(vsoxei32): Ditto.
(vsoxei64): Ditto.
(vleff): Ditto.
(vadd): Ditto.
(vsub): Ditto.
(vrsub): Ditto.
(vneg): Ditto.
(vwaddu): Ditto.
(vwsubu): Ditto.
(vwadd): Ditto.
(vwsub): Ditto.
(vwcvt_x): Ditto.
(vwcvtu_x): Ditto.
(vzext): Ditto.
(vsext): Ditto.
(vadc): Ditto.
(vmadc): Ditto.
(vsbc): Ditto.
(vmsbc): Ditto.
(vand): Ditto.
(vor): Ditto.
(vxor): Ditto.
(vnot): Ditto.
(vsll): Ditto.
(vsra): Ditto.
(vsrl): Ditto.
(vnsrl): Ditto.
(vnsra): Ditto.
(vncvt_x): Ditto.
(vmseq): Ditto.
(vmsne): Ditto.
(vmsltu): Ditto.
(vmslt): Ditto.
(vmsleu): Ditto.
(vmsle): Ditto.
(vmsgtu): Ditto.
(vmsgt): Ditto.
(vmsgeu): Ditto.
(vmsge): Ditto.
(vminu): Ditto.
(vmin): Ditto.
(vmaxu): Ditto.
(vmax): Ditto.
(vmul): Ditto.
(vmulh): Ditto.
(vmulhu): Ditto.
(vmulhsu): Ditto.
(vdivu): Ditto.
(vdiv): Ditto.
(vremu): Ditto.
(vrem): Ditto.
(vwmul): Ditto.
(vwmulu): Ditto.
(vwmulsu): Ditto.
(vmacc): Ditto.
(vnmsac): Ditto.
(vmadd): Ditto.
(vnmsub): Ditto.
(vwmaccu): Ditto.
(vwmacc): Ditto.
(vwmaccsu): Ditto.
(vwmaccus): Ditto.
(vmerge): Ditto.
(vmv_v): Ditto.
(vsaddu): Ditto.
(vsadd): Ditto.
(vssubu): Ditto.
(vssub): Ditto.
(vaaddu): Ditto.
(vaadd): Ditto.
(vasubu): Ditto.
(vasub): Ditto.
(vsmul): Ditto.
(vssrl): Ditto.
(vssra): Ditto.
(vnclipu): Ditto.
(vnclip): Ditto.
(vfadd): Ditto.
(vfsub): Ditto.
(vfrsub): Ditto.
(vfadd_frm): Ditto.
(vfsub_frm): Ditto.
(vfrsub_frm): Ditto.
(vfwadd): Ditto.
(vfwsub): Ditto.
(vfwadd_frm): Ditto.
(vfwsub_frm): Ditto.
(vfmul): Ditto.
(vfdiv): Ditto.
(vfrdiv): Ditto.
(vfmul_frm): Ditto.
(vfdiv_frm): Ditto.
(vfrdiv_frm): Ditto.
(vfwmul): Ditto.
(vfwmul_frm): Ditto.
(vfmacc): Ditto.
(vfnmsac): Ditto.
(vfmadd): Ditto.
(vfnmsub): Ditto.
(vfnmacc): Ditto.
(vfmsac): Ditto.
(vfnmadd): Ditto.
(vfmsub): Ditto.
(vfmacc_frm): Ditto.
(vfnmacc_frm): Ditto.
(vfmsac_frm): Ditto.
(vfnmsac_frm): Ditto.
(vfmadd_frm): Ditto.
(vfnmadd_frm): Ditto.
(vfmsub_frm): Ditto.
(vfnmsub_frm): Ditto.
(vfwmacc): Ditto.
(vfwnmacc): Ditto.
(vfwmsac): Ditto.
(vfwnmsac): Ditto.
(vfwmacc_frm): Ditto.
(vfwnmacc_frm): Ditto.
(vfwmsac_frm): Ditto.
(vfwnmsac_frm): Ditto.
(vfsqrt): Ditto.
(vfsqrt_frm): Ditto.
(vfrsqrt7): Ditto.
(vfrec7): Ditto.
(vfrec7_frm): Ditto.
(vfmin): Ditto.
(vfmax): Ditto.
(vfsgnj): Ditto.
(vfsgnjn): Ditto.
(vfsgnjx): Ditto.
(vfneg): Ditto.
(vfabs): Ditto.
(vmfeq): Ditto.
(vmfne): Ditto.
(vmflt): Ditto.
(vmfle): Ditto.
(vmfgt): Ditto.
(vmfge): Ditto.
(vfclass): Ditto.
(vfmerge): Ditto.
(vfmv_v): Ditto.
(vfcvt_x): Ditto.
(vfcvt_xu): Ditto.
(vfcvt_rtz_x): Ditto.
(vfcvt_rtz_xu): Ditto.
(vfcvt_f): Ditto.
(vfcvt_x_frm): Ditto.
(vfcvt_xu_frm): Ditto.
(vfcvt_f_frm): Ditto.
(vfwcvt_x): Ditto.
(vfwcvt_xu): Ditto.
(vfwcvt_rtz_x): Ditto.
(vfwcvt_rtz_xu): Ditto.
(vfwcvt_f): Ditto.
(vfwcvt_x_frm): Ditto.
(vfwcvt_xu_frm): Ditto.
(vfncvt_x): Ditto.
(vfncvt_xu): Ditto.
(vfncvt_rtz_x): Ditto.
(vfncvt_rtz_xu): Ditto.
(vfncvt_f): Ditto.
(vfncvt_rod_f): Ditto.
(vfncvt_x_frm): Ditto.
(vfncvt_xu_frm): Ditto.
(vfncvt_f_frm): Ditto.
(vredsum): Ditto.
(vredmaxu): Ditto.
(vredmax): Ditto.
(vredminu): Ditto.
(vredmin): Ditto.
(vredand): Ditto.
(vredor): Ditto.
(vredxor): Ditto.
(vwredsum): Ditto.
(vwredsumu): Ditto.
(vfredusum): Ditto.
(vfredosum): Ditto.
(vfredmax): Ditto.
(vfredmin): Ditto.
(vfredusum_frm): Ditto.
(vfredosum_frm): Ditto.
(vfwredosum): Ditto.
(vfwredusum): Ditto.
(vfwredosum_frm): Ditto.
(vfwredusum_frm): Ditto.
(vmand): Ditto.
(vmnand): Ditto.
(vmandn): Ditto.
(vmxor): Ditto.
(vmor): Ditto.
(vmnor): Ditto.
(vmorn): Ditto.
(vmxnor): Ditto.
(vmmv): Ditto.
(vmclr): Ditto.
(vmset): Ditto.
(vmnot): Ditto.
(vcpop): Ditto.
(vfirst): Ditto.
(vmsbf): Ditto.
(vmsif): Ditto.
(vmsof): Ditto.
(viota): Ditto.
(vid): Ditto.
(vmv_x): Ditto.
(vmv_s): Ditto.
(vfmv_f): Ditto.
(vfmv_s): Ditto.
(vslideup): Ditto.
(vslidedown): Ditto.
(vslide1up): Ditto.
(vslide1down): Ditto.
(vfslide1up): Ditto.
(vfslide1down): Ditto.
(vrgather): Ditto.
(vrgatherei16): Ditto.
(vcompress): Ditto.
(vundefined): Ditto.
(vreinterpret): Ditto.
(vlmul_ext): Ditto.
(vlmul_trunc): Ditto.
(vset): Ditto.
(vget): Ditto.
(vcreate): Ditto.
(vlseg): Ditto.
(vsseg): Ditto.
(vlsseg): Ditto.
(vssseg): Ditto.
(vluxseg): Ditto.
(vloxseg): Ditto.
(vsuxseg): Ditto.
(vsoxseg): Ditto.
(vlsegff): Ditto.
* config/riscv/riscv-vector-builtins.cc (DEF_RVV_FUNCTION): Ditto.
* config/riscv/riscv-vector-builtins.h (struct function_group_info): Ditto.
* config/riscv/t-riscv: Ditto.
* config/riscv/riscv-vector-builtins-avail.h: Removed.

libgrust: Add entry for maintainers

ChangeLog:

* MAINTAINERS: Add maintainers for libgrust.

contrib/ChangeLog:

* gcc-changelog/git_commit.py: Add libgrust.

Co-authored-by: Arthur Cohen <arthur.cohen@embecosm.com>
Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

libgrust: Add ChangeLog file

libgrust/ChangeLog:

* ChangeLog: New file.

Signed-off-by: Pierre-Emmanuel Patry <pierre-emmanuel.patry@embecosm.com>

match.pd: Simplify (t * u) / (t * v) [PR112994]

On top of the previously posted patch, this simplifies say (x * 16) / (x * 4)
into 4.  Unlike the previous pattern, this is something we didn't fold
previously on GENERIC, so I think it shouldn't be all wrapped with #if
GIMPLE.  The question whether there should be fold_overflow_warning for the
TYPE_OVERFLOW_UNDEFINED case remains.

2023-12-14  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/112994
* match.pd ((t * u) / (t * v) -> (u / v)): New simplification.

* gcc.dg/tree-ssa/pr112994-2.c: New test.

match.pd: Simplify (t * u) / v -> t * (u / v) [PR112994]

The following testcase is optimized just on GENERIC (using
      strict_overflow_p = false;
      if (TREE_CODE (arg1) == INTEGER_CST
          && (tem = extract_muldiv (op0, arg1, code, NULL_TREE,
                                    &strict_overflow_p)) != 0)
        {
          if (strict_overflow_p)
            fold_overflow_warning (("assuming signed overflow does not occur "
                                    "when simplifying division"),
                                   WARN_STRICT_OVERFLOW_MISC);
          return fold_convert_loc (loc, type, tem);
        }
) but not on GIMPLE.

An earlier version of the patch regressed
+FAIL: gcc.dg/Wstrict-overflow-3.c correct warning (test for warnings, line 12)
test, we are indeed assuming that signed overflow does not occur
when simplifying division in there.

This version of the patch (which provides the simplification only
for GIMPLE) fixes that.
And/or we could add the
            fold_overflow_warning (("assuming signed overflow does not occur "
                                    "when simplifying division"),
                                   WARN_STRICT_OVERFLOW_MISC);
call into the simplification, but in that case IMHO it should go into
the (t * u) / u -> t simplification as well, there we assume the exact
same thing (of course, in both cases only in the spots where we don't
verify it through ranger that it never overflows).

Guarding the whole simplification to GIMPLE only IMHO makes sense because
the above mentioned folding does it for GENERIC (and extract_muldiv even
handles far more cases, dunno how many from that we should be doing on
GIMPLE in match.pd and what could be done elsewhere; e.g. extract_muldiv
can handle (x * 16 + y * 32) / 8 -> x * 2 + y * 4 etc.).

Dunno about the fold_overflow_warning, I always have doubts about why
such a warning is useful to users.

2023-12-14  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/112994
* match.pd ((t * 2) / 2 -> t): Adjust comment to use u instead of 2.
Punt without range checks if TYPE_OVERFLOW_SANITIZED.
((t * u) / v -> t * (u / v)): New simplification.

* gcc.dg/tree-ssa/pr112994-1.c: New test.

A new copy propagation and PHI elimination pass

This patch adds the strongly-connected copy propagation (SCCOPY) pass.
It is a lightweight GIMPLE copy propagation pass that also removes some
redundant PHI statements. It handles degenerate PHIs, e.g.:

_5 = PHI <_1>;
_6 = PHI <_6, _6, _1, _1>;
_7 = PHI <16, _7>;
// Replaces occurences of _5 and _6 by _1 and _7 by 16

It also handles more complicated situations, e.g.:

_8 = PHI <_9, _10>;
_9 = PHI <_8, _10>;
_10 = PHI <_8, _9, _1>;
// Replaces occurences of _8, _9 and _10 by _1

gcc/ChangeLog:

* Makefile.in: Added sccopy pass.
* passes.def: Added sccopy pass before LTO streaming and before
RTL expansion.
* tree-pass.h (make_pass_sccopy): Added sccopy pass.
* gimple-ssa-sccopy.cc: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/sccopy-1.c: New test.

Signed-off-by: Filip Kastl <fkastl@suse.cz>

SRA: Relax requirements to use build_reconstructed_reference (PR 111807)

This patch half-reverts 3aaf704bca3e and replaces it with a fix with
relaxed requiremets for invoking build_reconstructed_reference in
build_ref_for_model.

build_ref_for_model/build_ref_for_offset is used in two slightly
different contexts. The first is when we are looking at an assignmernt
like

   p->field_A.field_B = s.field_B;

and we have a replacements for e.g. s.field_B.field_C.field_D and we
want to store them directly to p->field_A.field_B.field_C.field_D (as
opposed to going through s or using a MEM_REF based in
p->field_A.field_B).  In this case, the offset of the
"model" (s.field_B.field_C.field_D) within this can be different than
offset within the LHS that we want to reach (field_C.field_D within
the "base" p->field_A.field_B).  Patch 3aaf704bca3e has caused us to
unnecessarily create MEM_REFs for these situations.  These uses of
build_ref_for_model work with the relaxed condition just fine.

The second, problematic, context is when somewhere in the function we
have an assignment

  s.field_A = t.field_A.field_B;

and we are creating an access structure to represent s.field_A.field_B
even if it is not actually accessed in the original input.  This is
done after scanning the entire function body and we need to construct
a "universal" reference to s.field_A.field_B.  In this case the "base"
is "s" and it has to be the DECL itself and not some reference for it
because for arbitrary references we need a GSI pointing to a statement
which we don't have, the reference is supposed to be universal.

But then using build_ref_for_model and within it
build_reconstructed_reference misbihaves if the expression contains
any ARRAY_REFs.  In the first case those are fine because as we
eventually reach the aggregate type that matches a real LHS or RHS, we
know we we can just bolt the rest of the references onto it and end up
with the correct overall reference.  However when dealing with

   s.array[1].field_A = s.array[2].field_B;

we cannot just bolt array[2] reference when we want array[1] but that
is exactly what happens when we use build_reconstructed_reference and
keep it walking all the way to s.

I was consiering making all users of the second kind use directly
build_ref_for_offset instead of build_ref_for_model but the latter
also handles COMPONENT_REFs to bit-fields which the former does not.
THerefore I have deided to use the NULL-ness of GSI as an indicator
how strict we need to be.  I have changed the function comment to
reflect that.

I have been able to observe diambiguation improvements with this patch
over currenct master, we do successfuly manage a few more
aliasing_component_refs_p disambiguations when compiling cc1, going
from:

  Alias oracle query stats:
    refs_may_alias_p: 94354287 disambiguations, 106279231 queries
    ref_maybe_used_by_call_p: 1572511 disambiguations, 95618222 queries
    call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
    stmt_kills_ref_p: 142342 kills, 8407309 queries
    nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
    nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
    aliasing_component_refs_p: 67090 disambiguations, 3081766 queries
    TBAA oracle: 22675296 disambiguations 61781978 queries
                 14045969 are in alias set 0
                 10997085 queries asked about the same object
                 153 queries asked about the same alias set
                 0 access volatile
                 12485774 are dependent in the DAG
                 1577701 are aritificially in conflict with void *

  Modref stats:
    modref kill: 832 kills, 19399 queries
    modref use: 50760 disambiguations, 1825109 queries
    modref clobber: 1371014 disambiguations, 40152535 queries
    5190238 tbaa queries (0.129263 per modref query)
    1341663 base compares (0.033414 per modref query)

  PTA query stats:
    pt_solution_includes: 36784427 disambiguations, 46141175 queries
    pt_solutions_intersect: 4519387 disambiguations, 17081996 queries

to:

  Alias oracle query stats:
    refs_may_alias_p: 94354083 disambiguations, 106278948 queries
    ref_maybe_used_by_call_p: 1572511 disambiguations, 95618018 queries
    call_may_clobber_ref_p: 649273 disambiguations, 659371 queries
    stmt_kills_ref_p: 142342 kills, 8407310 queries
    nonoverlapping_component_refs_p: 19 disambiguations, 10227 queries
    nonoverlapping_refs_since_match_p: 15665 disambiguations, 52585 must overlaps, 68893 queries
    aliasing_component_refs_p: 67104 disambiguations, 3081781 queries
    TBAA oracle: 22676608 disambiguations 61782455 queries
                 14044948 are in alias set 0
                 10998619 queries asked about the same object
                 153 queries asked about the same alias set
                 0 access volatile
                 12484882 are dependent in the DAG
                 1577245 are aritificially in conflict with void *

  Modref stats:
    modref kill: 832 kills, 19399 queries
    modref use: 50760 disambiguations, 1825106 queries
    modref clobber: 1371028 disambiguations, 40152504 queries
    5190319 tbaa queries (0.129265 per modref query)
    1341403 base compares (0.033408 per modref query)

  PTA query stats:
    pt_solution_includes: 36784449 disambiguations, 46141210 queries
    pt_solutions_intersect: 4519320 disambiguations, 17082083 queries

gcc/ChangeLog:

2023-12-13  Martin Jambor  <mjambor@suse.cz>

PR tree-optimization/111807
* tree-sra.cc (build_ref_for_model): Allow offset smaller than
model->offset when gsi is non-NULL.  Adjust function comment.

Force broadcast constant to mem for vec_dup{v4di,v8si,v4df,v8df} when TARGET_AVX2 is not available.

vpbroadcastd/vpbroadcastq is avaiable under TARGET_AVX2, but
vec_dup{v4di,v8si} pattern is avaiable under AVX with memory operand.
And it will cause LRA/Reload to generate spill and reload if we put
constant in register.

gcc/ChangeLog:

PR target/112992
* config/i386/i386-expand.cc
(ix86_convert_const_wide_int_to_broadcast): Don't convert to
broadcast for vec_dup{v4di,v8si} when TARGET_AVX2 is not
available.
(ix86_broadcast_from_constant): Allow broadcast for V4DI/V8SI
when !TARGET_AVX2 since it will be forced to memory later.
(ix86_expand_vector_move): Force constant to mem for
vec_dup{vssi,v4di} when TARGET_AVX2 is not available.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr100865-7a.c: Adjust testcase.
* gcc.target/i386/pr100865-7c.c: Ditto.
* gcc.target/i386/pr112992.c: New test.

RISC-V: Add failed SLP testcase

After recent RVV cost model tweak, I found this PR issue has been fixed.

Add testcase and committed.

PR target/112387

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr112387.c: New test.

tree-optimization/110640 - testcase for fixed bug

PR tree-optimization/110640
* gcc.dg/torture/pr110640.c: New testcase.

testsuite: Fix up target-enter-data-1.c on 32-bit targets

struct bar { int num_vectors; double *vectors; };

is 16 bytes only on 64-bit targets, on 32-bit ones it is just 8 bytes,
so the explicit matching of the * 16 multiplication only works on the
former.

2023-12-14 Jakub Jelinek <jakub@redhat.com>

* c-c++-common/gomp/target-enter-data-1.c: Match also sizeof bar on
32-bit targets - 8 bytes - rather than just 16 bytes.

testsuite: Fix up pr112904.C test [PR112904]

On Fri, Dec 08, 2023 at 03:12:00PM +0800, liuhongt wrote:
> * g++.target/i386/pr112904.C: New test.

The new test FAILs on i686-linux and even on x86_64-linux I think
it doesn't actually test what was reported, unless one performs testing
with -march= for some XOP enabled CPU or -mxop.

The following patch fixes that, tested on x86_64-linux with
make check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m32/-mno-sse/-mno-mmx,-m64\} i386.exp=pr112904.C'

2023-12-14 Jakub Jelinek <jakub@redhat.com>

PR target/112904
* g++.target/i386/pr112904.C: Add dg-do compile, dg-options -mxop
and for ia32 also dg-additional-options -mmmx.

c++: Fix tinst_level::to_list [PR112968]

With valgrind checking, there are various errors reported on some C++26
libstdc++ tests, like:
==2009913== Conditional jump or move depends on uninitialised value(s)
==2009913==    at 0x914C59: gt_ggc_mx_lang_tree_node(void*) (gt-cp-tree.h:107)
==2009913==    by 0x8AB7A5: gt_ggc_mx_tinst_level(void*) (gt-cp-pt.h:32)
==2009913==    by 0xB89B25: ggc_mark_root_tab(ggc_root_tab const*) (ggc-common.cc:75)
==2009913==    by 0xB89DF4: ggc_mark_roots() (ggc-common.cc:104)
==2009913==    by 0x9D6311: ggc_collect(ggc_collect) (ggc-page.cc:2227)
==2009913==    by 0xDB70F6: execute_one_pass(opt_pass*) (passes.cc:2738)
==2009913==    by 0xDB721F: execute_pass_list_1(opt_pass*) (passes.cc:2755)
==2009913==    by 0xDB7258: execute_pass_list(function*, opt_pass*) (passes.cc:2766)
==2009913==    by 0xA55525: cgraph_node::analyze() (cgraphunit.cc:695)
==2009913==    by 0xA57CC7: analyze_functions(bool) (cgraphunit.cc:1248)
==2009913==    by 0xA5890D: symbol_table::finalize_compilation_unit() (cgraphunit.cc:2555)
==2009913==    by 0xEB02A1: compile_file() (toplev.cc:473)

I think the problem is in the tinst_level::to_list optimization from 2018.
That function returns a TREE_LIST with TREE_PURPOSE/TREE_VALUE filled in.
Either it freshly allocates using build_tree_list (NULL, NULL); + stores
TREE_PURPOSE/TREE_VALUE, that case is fine (the whole tree_list object
is zeros, except for TREE_CODE set to TREE_LIST and TREE_PURPOSE/TREE_VALUE
modified later; the above also means in particular TREE_TYPE of it is NULL
and TREE_CHAIN is NULL and both are accessible/initialized even in valgrind
annotations.
Or it grabs a TREE_LIST node from a freelist.
If defined(ENABLE_GC_CHECKING), the object is still all zeros except
for TREE_CODE/TREE_PURPOSE/TREE_VALUE like in the fresh allocation case
(but unlike the build_tree_list case in the valgrind annotations
TREE_TYPE and TREE_CHAIN are marked as uninitialized).
If !defined(ENABLE_GC_CHECKING), I believe the actual memory content
is that everything but TREE_CODE/TREE_PURPOSE/TREE_VALUE/TREE_CHAIN is
zeros and TREE_CHAIN is something random (whatever next entry is in the
freelist, nothing overwrote it) and from valgrind POV again,
TREE_TYPE and TREE_CHAIN are marked as uninitialized.

When using the other freelist instantiations (pending_template and
tinst_level) I believe everything is correct, from valgrind POV it marks
the whole pending_template or tinst_level as uninitialized, but the
caller initializes it all).

One way to fix this would be let tinst_level::to_list not store just
  TREE_PURPOSE (ret) = tldcl;
  TREE_VALUE (ret) = targs;
but also
  TREE_TYPE (ret) = NULL_TREE;
  TREE_CHAIN (ret) = NULL_TREE;
Though, that seems like wasted effort in the build_tree_list case to me.

So, the following patch instead does that TREE_CHAIN = NULL_TREE store only
in the case where it isn't already done (and likewise for TREE_TYPE just to
be sure) and marks both TREE_CHAIN and TREE_TYPE as initialized (the latter
is at that spot, the former is because we never really touch TREE_TYPE of a
TREE_LIST anywhere and so the NULL gets stored into the freelist and
restored from there (except for ENABLE_GC_CHECKING where it is poisoned
and then cleared again).

2023-12-14  Jakub Jelinek  <jakub@redhat.com>

PR c++/112968
* pt.cc (freelist<tree_node>::reinit): Make whole obj->common
defined for valgrind annotations rather than just obj->base,
and do it even for ENABLE_GC_CHECKING.  If not ENABLE_GC_CHECKING,
clear TREE_CHAIN (obj) and TREE_TYPE (obj).

RISC-V: Add RVV builtin vectorization cost model

This patch fixes PR11153:

        ble     a1,zero,.L8
        addiw   a5,a1,-1
        li      a4,4
        addi    sp,sp,-16
        mv      a2,a0
        sext.w  a3,a1
        bleu    a5,a4,.L9
        srliw   a4,a3,2
        slli    a4,a4,4
        mv      a5,a0
        add     a4,a4,a0
        vsetivli        zero,4,e32,m1,ta,ma
        vmv.v.i v1,0
        vse32.v v1,0(sp)
.L4:
        vle32.v v1,0(a5) ---> This loop always processes 4 elements which is ok for VLEN = 128bits, but waste a huge amount of computation units when VLEN > 128bits
        vle32.v v2,0(sp)
        addi    a5,a5,16
        vadd.vv v1,v2,v1
        vse32.v v1,0(sp)
        bne     a4,a5,.L4
        ld      a5,0(sp)
        lw      a4,0(sp)
        andi    a1,a1,-4
        srai    a5,a5,32
        addw    a5,a4,a5
        lw      a4,8(sp)
        addw    a5,a5,a4
        ld      a4,8(sp)
        srai    a4,a4,32
        addw    a0,a5,a4
        beq     a3,a1,.L15
.L3:
        subw    a3,a3,a1
        slli    a5,a1,32
        slli    a3,a3,32
        srli    a3,a3,32
        srli    a5,a5,30
        add     a2,a2,a5
        vsetvli a5,a3,e8,mf4,tu,mu
        vsetvli a4,zero,e32,m1,ta,ma
        sub     a1,a3,a5
        vmv.v.i v1,0
        vsetvli zero,a3,e32,m1,tu,ma
        vle32.v v2,0(a2)
        vmv.v.v v1,v2
        bne     a3,a5,.L21
.L7:
        vsetvli a4,zero,e32,m1,ta,ma
        vmv.s.x v2,zero
        vredsum.vs      v1,v1,v2
        vmv.x.s a5,v1
        addw    a0,a0,a5
.L15:
        addi    sp,sp,16
        jr      ra
.L21:
        slli    a5,a5,2
        add     a2,a2,a5
        vsetvli zero,a1,e32,m1,tu,ma
        vle32.v v2,0(a2)
        vadd.vv v1,v1,v2
        j       .L7
.L8:
        li      a0,0
        ret
.L9:
        li      a1,0
        li      a0,0
        j       .L3

The rootcause of this is we missed RVV builtin vectorization cost model.

After this patch:

ble a1,zero,.L4
vsetvli a5,zero,e32,m1,ta,ma
vmv.v.i v1,0
.L3:
vsetvli a5,a1,e32,m1,tu,ma
vle32.v v2,0(a0)
slli a4,a5,2
sub a1,a1,a5
add a0,a0,a4
vadd.vv v1,v2,v1
bne a1,zero,.L3
li a5,0
vsetivli zero,1,e32,m1,ta,ma
vmv.s.x v2,a5
vsetvli a5,zero,e32,m1,ta,ma
vredsum.vs v1,v1,v2
vmv.x.s a0,v1
ret
.L4:
li a0,0
ret

PR target/111153

gcc/ChangeLog:

* config/riscv/riscv-protos.h (struct common_vector_cost): New struct.
(struct scalable_vector_cost): Ditto.
(struct cpu_vector_cost): Ditto.
* config/riscv/riscv-vector-costs.cc (costs::add_stmt_cost): Add RVV
builtin vectorization cost
* config/riscv/riscv.cc (struct riscv_tune_param): Ditto.
(get_common_costs): New function.
(riscv_builtin_vectorization_cost): Ditto.
(TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New targethook.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: New test.

[committed] Minor testsuite fallout from c99 changes

The alpha port failed its weekly test due to a lack of a prototype for the
syscall() routine. Fixed thusly and pushed to the trunk.

gcc/testsuite
* gcc.c-torture/execute/20001229-1.c: Prototype syscall().

Daily bump.

c++: fix cpp0x/constexpr-ex1.C in C++23

Since r14-6505 I see:

FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++23  at line 91 (test for errors, line 89)
FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++23 (test for excess errors)
FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++26  at line 91 (test for errors, line 89)
FAIL: g++.dg/cpp0x/constexpr-ex1.C  -std=c++26 (test for excess errors)

and it wasn't fixed by r14-6511.  So I'm fixing it with the below.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/constexpr-ex1.C: Adjust expected diagnostic line.

aarch64: SVE/NEON Bridging intrinsics

ACLE has added intrinsics to bridge between SVE and Neon.

The NEON_SVE Bridge adds intrinsics that allow conversions between NEON and
SVE vectors.

This patch adds support to GCC for the following 3 intrinsics:
svset_neonq, svget_neonq and svdup_neonq

gcc/ChangeLog:

* config.gcc: Adds new header to config.
* config/aarch64/aarch64-builtins.cc (enum aarch64_type_qualifiers):
Moved to header file.
(ENTRY): Likewise.
(enum aarch64_simd_type): Likewise.
(struct aarch64_simd_type_info): Remove static.
(GTY): Likewise.
* config/aarch64/aarch64-c.cc (aarch64_pragma_aarch64):
Defines pragma for arm_neon_sve_bridge.h.
* config/aarch64/aarch64-protos.h:
Add handle_arm_neon_sve_bridge_h
* config/aarch64/aarch64-sve-builtins-base.h: New intrinsics.
* config/aarch64/aarch64-sve-builtins-base.cc
(class svget_neonq_impl): New intrinsic implementation.
(class svset_neonq_impl): Likewise.
(class svdup_neonq_impl): Likewise.
(NEON_SVE_BRIDGE_FUNCTION): New intrinsics.
* config/aarch64/aarch64-sve-builtins-functions.h
(NEON_SVE_BRIDGE_FUNCTION): Defines macro for NEON_SVE_BRIDGE
functions.
* config/aarch64/aarch64-sve-builtins-shapes.h: New shapes.
* config/aarch64/aarch64-sve-builtins-shapes.cc
(parse_element_type): Add NEON element types.
(parse_type): Likewise.
(struct get_neonq_def): Defines function shape for get_neonq.
(struct set_neonq_def): Defines function shape for set_neonq.
(struct dup_neonq_def): Defines function shape for dup_neonq.
* config/aarch64/aarch64-sve-builtins.cc
(DEF_SVE_TYPE_SUFFIX): Changed to be called through
SVE_NEON macro.
(DEF_SVE_NEON_TYPE_SUFFIX): Defines
macro for NEON_SVE_BRIDGE type suffixes.
(DEF_NEON_SVE_FUNCTION): Defines
macro for NEON_SVE_BRIDGE functions.
(function_resolver::infer_neon128_vector_type): Infers type suffix
for overloaded functions.
(handle_arm_neon_sve_bridge_h): Handles #pragma arm_neon_sve_bridge.h.
* config/aarch64/aarch64-sve-builtins.def
(DEF_SVE_NEON_TYPE_SUFFIX): Macro for handling neon_sve type suffixes.
(bf16): Replace entry with neon-sve entry.
(f16): Likewise.
(f32): Likewise.
(f64): Likewise.
(s8): Likewise.
(s16): Likewise.
(s32): Likewise.
(s64): Likewise.
(u8): Likewise.
(u16): Likewise.
(u32): Likewise.
(u64): Likewise.
* config/aarch64/aarch64-sve-builtins.h
(GCC_AARCH64_SVE_BUILTINS_H): Include aarch64-builtins.h.
(ENTRY): Add aarch64_simd_type definiton.
(enum aarch64_simd_type): Add neon information to type_suffix_info.
(struct type_suffix_info): New function.
* config/aarch64/aarch64-sve.md
(@aarch64_sve_get_neonq_<mode>): New intrinsic insn for big endian.
(@aarch64_sve_set_neonq_<mode>): Likewise.
* config/aarch64/iterators.md: Add UNSPEC_SET_NEONQ.
* config/aarch64/aarch64-builtins.h: New file.
* config/aarch64/aarch64-neon-sve-bridge-builtins.def: New file.
* config/aarch64/arm_neon_sve_bridge.h: New file.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/sve/acle/asm/test_sve_acle.h: Add include
arm_neon_sve_bridge header file
* gcc.dg/torture/neon-sve-bridge.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/dup_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/get_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_bf16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_f64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_s8.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u16.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u32.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u64.c: New test.
* gcc.target/aarch64/sve/acle/asm/set_neonq_u8.c: New test.
* gcc.target/aarch64/sve/acle/general-c/dup_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/get_neonq_1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/set_neonq_1.c: New test.

c++: note other candidates when diagnosing deletedness

With the previous two patches in place, we can now extend our
deletedness diagnostic to note the other considered candidates, e.g.:

  deleted.C: In function 'int main()':
  deleted.C:10:4: error: use of deleted function 'void f(int)'
     10 |   f(0);
        |   ~^~~
  deleted.C:5:6: note: declared here
      5 | void f(int) = delete;
        |      ^
  deleted.C:5:6: note: candidate: 'void f(int)' (deleted)
  deleted.C:6:6: note: candidate: 'void f(...)'
      6 | void f(...);
        |      ^
  deleted.C:7:6: note: candidate: 'void f(int, int)'
      7 | void f(int, int);
        |      ^
  deleted.C:7:6: note:   candidate expects 2 arguments, 1 provided

These notes are controlled by a new command line flag
-fdiagnostics-all-candidates which also controls whether we note
ignored candidates more generally.

gcc/ChangeLog:

* doc/invoke.texi (C++ Dialect Options): Document
-fdiagnostics-all-candidates.

gcc/c-family/ChangeLog:

* c.opt: Add -fdiagnostics-all-candidates.

gcc/cp/ChangeLog:

* call.cc (print_z_candidates): Only print ignored candidates
when -fdiagnostics-all-candidates is set, otherwise suggest
the flag.
(build_over_call): When diagnosing deletedness, note
other candidates only if -fdiagnostics-all-candidates is
set, otherwise suggest the flag.

gcc/testsuite/ChangeLog:

* g++.dg/overload/error6.C: Pass -fdiagnostics-all-candidates.
* g++.dg/cpp0x/deleted16.C: New test.
* g++.dg/cpp0x/deleted16a.C: New test.
* g++.dg/overload/error6a.C: New test.

c++: remember candidates that we ignored

During overload resolution, we sometimes outright ignore a function in
the overload set and leave no trace of it in the candidates list, for
example when we find a perfect non-template candidate we discard all
function templates, or when the callee is a template-id we discard all
non-template functions.  We should still however make note of these
non-viable functions when diagnosing overload resolution failure, but
that's not possible if they're not present in the returned candidates
list.

To that end, this patch reworks add_candidates to add such ignored
functions to the list.  The new rr_ignored rejection reason is somewhat
of a catch-all; we could perhaps split it up into more specific rejection
reasons, but I leave that as future work.

gcc/cp/ChangeLog:

* call.cc (enum rejection_reason_code): Add rr_ignored.
(add_ignored_candidate): Define.
(ignored_candidate_p): Define.
(add_template_candidate_real): Do add_ignored_candidate
instead of returning NULL.
(splice_viable): Put ignored (non-viable) candidates last.
(print_z_candidate): Handle ignored candidates.
(build_new_function_call): Refine shortcut that calls
cp_build_function_call_vec now that non-templates can
appear in the candidate list for a template-id call.
(add_candidates): Replace 'bad_fns' overload with 'bad_cands'
candidate list.  When not considering a candidate, add it
to the list as an ignored candidate.  Add all 'bad_cands'
to the overload set as well.

gcc/testsuite/ChangeLog:

* g++.dg/diagnostic/param-type-mismatch-2.C: Rename template
function test_7 that (maybe accidentally) shares the same name
as its non-template callee.
* g++.dg/overload/error6.C: New test.

c++: sort candidates according to viability

This patch:

  * changes splice_viable to move the non-viable candidates to the end
    of the list instead of removing them outright
  * makes tourney move the best candidate to the front of the candidate
    list
  * adjusts print_z_candidates to preserve our behavior of printing only
    viable candidates when diagnosing ambiguity
  * adds a parameter to print_z_candidates to control this default behavior
    (the follow-up patch will want to print all candidates when diagnosing
    deletedness)

Thus after this patch we have access to the entire candidate list through
the best viable candidate.

This change also happens to fix diagnostics for the below testcase where
we currently neglect to note the third candidate, since the presence of
the two unordered non-strictly viable candidates causes splice_viable to
prematurely get rid of the non-viable third candidate.

gcc/cp/ChangeLog:

* call.cc: Include "tristate.h".
(splice_viable): Sort the candidate list according to viability.
Don't remove non-viable candidates from the list.
(print_z_candidates): Add defaulted only_viable_p parameter.
By default only print non-viable candidates if there is no
viable candidate.
(tourney): Ignore non-viable candidates.  Move the true champ to
the front of the candidates list, and update 'candidates' to
point to the front.  Rename champ_compared_to_predecessor to
previous_worse_champ.

gcc/testsuite/ChangeLog:

* g++.dg/overload/error5.C: New test.

c++: unifying constants vs their type [PR99186, PR104867]

When unifying constants we need to treat constants of different types
but same value as different in light of auto template parameters since
otherwise e.g. A<1> will unify with A<1u> (where A's template-head is
template<auto>). This patch fixes this in a minimal way; it seems we
could get away with just using template_args_equal here, as we do in the
default case, or even just cp_tree_equal since the CONVERT_EXPR_P loop
seems to be dead code, but that's a simplification we could consider
during next stage 1.

PR c++/99186
PR c++/104867

gcc/cp/ChangeLog:

* pt.cc (unify) <case INTEGER_CST>: Compare types as well.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/nontype-auto23.C: New test.
* g++.dg/cpp1z/nontype-auto24.C: New test.

c++: unifying FUNCTION_DECLs [PR93740]

unify currently always returns success when unifying two FUNCTION_DECLs
(due to the is_overloaded_fn deferment within the default case), which
means for the below testcase we incorrectly unify &A::foo and &A::bar
leading to deduction failure for the index_of calls due to a bogus base
class ambiguity.

This patch makes unify handle FUNCTION_DECL naturally like other decls.

PR c++/93740

gcc/cp/ChangeLog:

* pt.cc (unify) <case FUNCTION_DECL>: Handle it like FIELD_DECL
and TEMPLATE_DECL.

gcc/testsuite/ChangeLog:

* g++.dg/template/ptrmem34.C: New test.

c-family: rename warn_for_address_or_pointer_of_packed_member

Following the last patch, let's rename the functions to reflect the change
in behavior.

gcc/c-family/ChangeLog:

* c-warn.cc (check_address_or_pointer_of_packed_member):
Rename to check_address_of_packed_member.
(check_and_warn_address_or_pointer_of_packed_member):
Rename to check_and_warn_address_of_packed_member.
(warn_for_address_or_pointer_of_packed_member):
Rename to warn_for_address_of_packed_member.
* c-common.h: Adjust.

gcc/c/ChangeLog:

* c-typeck.cc (convert_for_assignment): Adjust call to
warn_for_address_of_packed_member.

gcc/cp/ChangeLog:

* call.cc (convert_for_arg_passing)
* typeck.cc (convert_for_assignment): Adjust call to
warn_for_address_of_packed_member.

c-family: -Waddress-of-packed-member and casts

-Waddress-of-packed-member, in addition to the documented warning about
actually taking the address of a packed member, also warns about casting
from a pointer to a TYPE_PACKED type to a pointer to a type with greater
alignment.

This wrongly warns if the source is a pointer to enum when -fshort-enums
is on, since that is also represented by TYPE_PACKED.

And there's already -Wcast-align to catch casting from pointer to less
aligned type (packed or otherwise) to pointer to more aligned type; even
apart from the enum problem, this seems like a somewhat arbitrary subset of
that warning.

So, this patch removes the undocumented type-based warning from
-Waddress-of-packed-member. Some of the tests where the warning is
desirable I changed to use -Wcast-align=strict instead. The ones that
require -Wno-incompatible-pointer-types I just removed.

gcc/c-family/ChangeLog:

* c-warn.cc (check_address_or_pointer_of_packed_member):
Remove warning based on TYPE_PACKED.

gcc/testsuite/ChangeLog:

* c-c++-common/Waddress-of-packed-member-1.c: Don't expect
a warning on the cast cases.
* c-c++-common/pr51628-35.c: Use -Wcast-align=strict.
* g++.dg/warn/Waddress-of-packed-member3.C: Likewise.
* gcc.dg/pr88928.c: Likewise.
* gcc.dg/pr51628-20.c: Removed.
* gcc.dg/pr51628-21.c: Removed.
* gcc.dg/pr51628-25.c: Removed.

OpenMP: Pointers and member mappings

This patch changes the mapping node arrangement used for array components
of derived types in order to accommodate for changes made in the previous
patch, particularly the use of "GOMP_MAP_ATTACH_DETACH" for pointer-typed
derived-type members instead of "GOMP_MAP_ALWAYS_POINTER".

We change the mapping nodes used for a derived-type mapping like this:

  type T
  integer, pointer, dimension(:) :: arrptr
  end type T

  type(T) :: tvar
  [...]
  !$omp target map(tofrom: tvar%arrptr)

So that the nodes used look like this:

  1) map(to: tvar%arrptr)   -->
  GOMP_MAP_TO [implicit]  *tvar%arrptr%data  (the array data)
  GOMP_MAP_TO_PSET        tvar%arrptr        (the descriptor)
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

  2) map(tofrom: tvar%arrptr(3:8)   -->
  GOMP_MAP_TOFROM         *tvar%arrptr%data(3)  (size 8-3+1, etc.)
  GOMP_MAP_TO_PSET        tvar%arrptr
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data      (bias 3, etc.)

In this case, we can determine in the front-end that the
whole-array/pointer mapping (1) is only needed to map the pointer
-- so we drop it entirely.  (Note also that we set -- early -- the
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P flag for whole-array-via-pointer
mappings. See below.)

In the middle end, we process mappings using the struct sibling-list
handling machinery by moving the "GOMP_MAP_TO_PSET" node from the middle
of the group of three mapping nodes to the proper sorted position after
the GOMP_MAP_STRUCT mapping:

  GOMP_MAP_STRUCT   tvar     (len: 1)
  GOMP_MAP_TO_PSET  tvar%arr (size: 64, etc.)  <--. moved here
  [...]                                           |
  GOMP_MAP_TOFROM         *tvar%arrptr%data(3) ___|
  GOMP_MAP_ATTACH_DETACH  tvar%arrptr%data

In another case, if we have an array of derived-type values "dtarr",
and mappings like:

  i = 1
  j = 1
  map(to: dtarr(i)%arrptr) map(tofrom: dtarr(j)%arrptr(3:8))

We still map the same way, but this time we cannot prove that the base
expressions "dtarr(i) and "dtarr(j)" are the same in the front-end.
So we keep both mappings, but we move the "[implicit]" mapping of the
full-array reference to the end of the clause list in gimplify.cc (by
adjusting the topological sorting algorithm):

  GOMP_MAP_STRUCT         dtvar  (len: 2)
  GOMP_MAP_TO_PSET        dtvar(i)%arrptr
  GOMP_MAP_TO_PSET        dtvar(j)%arrptr
  [...]
  GOMP_MAP_TOFROM         *dtvar(j)%arrptr%data(3)  (size: 8-3+1)
  GOMP_MAP_ATTACH_DETACH  dtvar(j)%arrptr%data
  GOMP_MAP_TO [implicit]  *dtvar(i)%arrptr%data(1)  (size: whole array)
  GOMP_MAP_ATTACH_DETACH  dtvar(i)%arrptr%data

Always moving "[implicit]" full-array mappings after array-section
mappings (without that bit set) means that we'll avoid copying the whole
array unnecessarily -- even in cases where we can't prove that the arrays
are the same.

The patch also fixes some bugs with "enter data" and "exit data"
directives with this new mapping arrangement.  Also now if you have
mappings like this:

  #pragma omp target enter data map(to: dv, dv%arr(1:20))

The whole of the derived-type variable "dv" is mapped, so the
GOMP_MAP_TO_PSET for the array-section mapping can be dropped:

  GOMP_MAP_TO            dv

  GOMP_MAP_TO            *dv%arr%data
  GOMP_MAP_TO_PSET       dv%arr <-- deleted (array section mapping)
  GOMP_MAP_ATTACH_DETACH dv%arr%data

To accommodate for recent changes to mapping nodes made by
Tobias, this version of the patch avoids using GOMP_MAP_TO_PSET
for "exit data" directives, in favour of using the "correct"
GOMP_MAP_RELEASE/GOMP_MAP_DELETE kinds during early expansion.  A new
flag is introduced so the middle-end knows when the latter two kinds
are being used specifically for an array descriptor.

This version of the patch fixes "omp target exit data" handling
for GOMP_MAP_DELETE, and adds pretty-printing dump output
for the OMP_CLAUSE_RELEASE_DESCRIPTOR flag (for a little extra
clarity).

Also I noticed the handling of descriptors on *OpenACC*
exit-data directives was inconsistent, so I've made those use
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the new flag in the same way as
OpenMP too.  In the end it doesn't actually matter to the runtime,
which handles GOMP_MAP_RELEASE/GOMP_MAP_DELETE/GOMP_MAP_TO_PSET for
array descriptors on OpenACC "exit data" directives the same, anyway,
and doing it this way in the FE avoids needless divergence.

I've added a couple of new tests (gomp/target-enter-exit-data.f90 and
goacc/enter-exit-data-2.f90).

2023-12-07  Julian Brown  <julian@codesourcery.com>

gcc/fortran/
* dependency.cc (gfc_omp_expr_prefix_same): New function.
* dependency.h (gfc_omp_expr_prefix_same): Add prototype.
* gfortran.h (gfc_omp_namelist): Add "duplicate_of" field to "u2"
union.
* trans-openmp.cc (dependency.h): Include.
(gfc_trans_omp_array_section): Adjust mapping node arrangement for
array descriptors.  Use GOMP_MAP_TO_PSET or
GOMP_MAP_RELEASE/GOMP_MAP_DELETE with the OMP_CLAUSE_RELEASE_DESCRIPTOR
flag set.
(gfc_symbol_rooted_namelist): New function.
(gfc_trans_omp_clauses): Check subcomponent and subarray/element
accesses elsewhere in the clause list for pointers to derived types or
array descriptors, and adjust or drop mapping nodes appropriately.
Adjust for changes to mapping node arrangement.
(gfc_trans_oacc_executable_directive): Pass code op through.

gcc/
* gimplify.cc (omp_map_clause_descriptor_p): New function.
(build_omp_struct_comp_nodes, omp_get_attachment, omp_group_base): Use
above function.
(omp_tsort_mapping_groups): Process nodes that have
OMP_CLAUSE_MAP_RUNTIME_IMPLICIT_P set after those that don't.  Add
enter_exit_data parameter.
(omp_resolve_clause_dependencies): Remove GOMP_MAP_TO_PSET mappings if
we're mapping the whole containing derived-type variable.
(omp_accumulate_sibling_list): Adjust GOMP_MAP_TO_PSET handling.
Remove GOMP_MAP_ALWAYS_POINTER handling.
(gimplify_scan_omp_clauses): Pass enter_exit argument to
omp_tsort_mapping_groups.  Don't adjust/remove GOMP_MAP_TO_PSET
mappings for derived-type components here.
* tree.h (OMP_CLAUSE_RELEASE_DESCRIPTOR): New macro.
* tree-pretty-print.cc (dump_omp_clause): Show
OMP_CLAUSE_RELEASE_DESCRIPTOR in dump output (with
GOMP_MAP_TO_PSET-like syntax).

gcc/testsuite/
* gfortran.dg/goacc/enter-exit-data-2.f90: New test.
* gfortran.dg/goacc/finalize-1.f: Adjust scan output.
* gfortran.dg/gomp/map-9.f90: Adjust scan output.
* gfortran.dg/gomp/map-subarray-2.f90: New test.
* gfortran.dg/gomp/map-subarray.f90: New test.
* gfortran.dg/gomp/target-enter-exit-data.f90: New test.

libgomp/
* testsuite/libgomp.fortran/map-subarray.f90: New test.
* testsuite/libgomp.fortran/map-subarray-2.f90: New test.
* testsuite/libgomp.fortran/map-subarray-3.f90: New test.
* testsuite/libgomp.fortran/map-subarray-4.f90: New test.
* testsuite/libgomp.fortran/map-subarray-6.f90: New test.
* testsuite/libgomp.fortran/map-subarray-7.f90: New test.
* testsuite/libgomp.fortran/map-subarray-8.f90: New test.
* testsuite/libgomp.fortran/map-subcomponents.f90: New test.
* testsuite/libgomp.fortran/struct-elem-map-1.f90: Adjust for
descriptor-mapping changes.  Remove XFAIL.

OpenMP/OpenACC: Rework clause expansion and nested struct handling

This patch reworks clause expansion in the C, C++ and (to a lesser
extent) Fortran front ends for OpenMP and OpenACC mapping nodes used in
GPU offloading support.

At present a single clause may be turned into several mapping nodes,
or have its mapping type changed, in several places scattered through
the front- and middle-end.  The analysis relating to which particular
transformations are needed for some given expression has become quite hard
to follow.  Briefly, we manipulate clause types in the following places:

1. During parsing, in c_omp_adjust_map_clauses.  Depending on a set of
    rules, we may change a FIRSTPRIVATE_POINTER (etc.) mapping into
    ATTACH_DETACH, or mark the decl addressable.

2. In semantics.cc or c-typeck.cc, clauses are expanded in
    handle_omp_array_sections (called via {c_}finish_omp_clauses, or in
    finish_omp_clauses itself.  The two cases are for processing array
    sections (the former), or non-array sections (the latter).

3. In gimplify.cc, we build sibling lists for struct accesses, which
    groups and sorts accesses along with their struct base, creating
    new ALLOC/RELEASE nodes for pointers.

4. In gimplify.cc:gimplify_adjust_omp_clauses, mapping nodes may be
    adjusted or created.

This patch doesn't completely disrupt this scheme, though clause
types are no longer adjusted in c_omp_adjust_map_clauses (step 1).
Clause expansion in step 2 (for C and C++) now uses a single, unified
mechanism, parts of which are also reused for analysis in step 3.

Rather than the kind-of "ad-hoc" pattern matching on addresses used to
expand clauses used at present, a new method for analysing addresses is
introduced.  This does a recursive-descent tree walk on expression nodes,
and emits a vector of tokens describing each "part" of the address.
This tokenized address can then be translated directly into mapping nodes,
with the assurance that no part of the expression has been inadvertently
skipped or misinterpreted.  In this way, all the variations of ways
pointers, arrays, references and component accesses might be combined
can be teased apart into easily-understood cases - and we know we've
"parsed" the whole address before we start analysis, so the right code
paths can easily be selected.

For example, a simple access "arr[idx]" might parse as:

  base-decl access-indexed-array

or "mystruct->foo[x]" with a pointer "foo" component might parse as:

  base-decl access-pointer component-selector access-pointer

A key observation is that support for "array" bases, e.g. accesses
whose root nodes are not structures, but describe scalars or arrays,
and also *one-level deep* structure accesses, have first-class support
in gimplify and beyond.  Expressions that use deeper struct accesses
or e.g. multiple indirections were more problematic: some cases worked,
but lots of cases didn't.  This patch reimplements the support for those
in gimplify.cc, again using the new "address tokenization" support.

An expression like "mystruct->foo->bar[0:10]" used in a mapping node will
translate the right-hand access directly in the front-end.  The base for
the access will be "mystruct->foo".  This is handled recursively in
gimplify.cc -- there may be several accesses of "mystruct"'s members
on the same directive, so the sibling-list building machinery can be
used again.  (This was already being done for OpenACC, but the new
implementation differs somewhat in details, and is more robust.)

For OpenMP, in the case where the base pointer itself,
i.e. "mystruct->foo" here, is NOT mapped on the same directive, we
create a "fragile" mapping.  This turns the "foo" component access
into a zero-length allocation (which is a new feature for the runtime,
so support has been added there too).

A couple of changes have been made to how mapping clauses are turned
into mapping nodes:

The first change is based on the observation that it is probably never
correct to use GOMP_MAP_ALWAYS_POINTER for component accesses (e.g. for
references), because if the containing struct is already mapped on the
target then the host version of the pointer in question will be corrupted
if the struct is copied back from the target.  This patch removes all
such uses, across each of C, C++ and Fortran.

The second change is to the way that GOMP_MAP_ATTACH_DETACH nodes
are processed during sibling-list creation.  For OpenMP, for pointer
components, we must map the base pointer separately from an array section
that uses the base pointer, so e.g. we must have both "map(mystruct.base)"
and "map(mystruct.base[0:10])" mappings.  These create nodes such as:

  GOMP_MAP_TOFROM mystruct.base
  G_M_TOFROM *mystruct.base [len: 10*elemsize] G_M_ATTACH_DETACH mystruct.base

Instead of using the first of these directly when building the struct
sibling list then skipping the group using GOMP_MAP_ATTACH_DETACH,
leading to:

  GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_TOFROM mystruct.base

we now introduce a new "mini-pass", omp_resolve_clause_dependencies, that
drops the GOMP_MAP_TOFROM for the base pointer, marks the second group
as having had a base-pointer mapping, then omp_build_struct_sibling_lists
can create:

  GOMP_MAP_STRUCT mystruct [len: 1] GOMP_MAP_ALLOC mystruct.base [len: ptrsize]

This ends up working better in many cases, particularly those involving
references.  (The "alloc" space is immediately overwritten by a pointer
attachment, so this is mildly more efficient than a redundant TO mapping
at runtime also.)

There is support in the address tokenizer for "arbitrary" base expressions
which aren't rooted at a decl, but that is not used as present because
such addresses are disallowed at parse time.

In the front-ends, the address tokenization machinery is mostly only
used for clause expansion and not for diagnostics at present.  It could
be used for those too, which would allow more of my previous "address
inspector" implementation to be removed.

The new bits in gimplify.cc work with OpenACC also.

This version of the patch addresses several first-pass review comments
from Tobias, and fixes a few previously-missed cases for manually-managed
ragged array mappings (including cases using references).  Some arbitrary
differences between handling of clause expansion for C vs. C++ have also
been fixed, and some fragments from later in the patch series have been
moved forward (where they were useful for fixing bugs).  Several new
test cases have been added.

2023-11-29  Julian Brown  <julian@codesourcery.com>

gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_EXIT_DATA,
C_ORT_OMP_EXIT_DATA and C_ORT_ACC_TARGET.
(omp_addr_token): Add forward declaration.
(c_omp_address_inspector): New class.
* c-omp.cc (c_omp_adjust_map_clauses): Mark decls addressable here, but
do not change any mapping node types.
(c_omp_address_inspector::unconverted_ref_origin,
c_omp_address_inspector::component_access_p,
c_omp_address_inspector::check_clause,
c_omp_address_inspector::get_root_term,
c_omp_address_inspector::map_supported_p,
c_omp_address_inspector::get_origin,
c_omp_address_inspector::maybe_unconvert_ref,
c_omp_address_inspector::maybe_zero_length_array_section,
c_omp_address_inspector::expand_array_base,
c_omp_address_inspector::expand_component_selector,
c_omp_address_inspector::expand_map_clause): New methods.
(omp_expand_access_chain): New function.

gcc/c/
* c-parser.cc (c_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for c_finish_omp_clauses call.
(c_parser_oacc_loop): Update calls to c_parser_oacc_all_clauses.
(c_parser_oacc_compute): Likewise.
(c_parser_omp_target_data, c_parser_omp_target_enter_data): Support
ATTACH kind.
(c_parser_omp_target_exit_data): Support DETACH kind.
(check_clauses): Handle GOMP_MAP_POINTER and GOMP_MAP_ATTACH here.
* c-typeck.cc (handle_omp_array_sections_1,
handle_omp_array_sections, c_finish_omp_clauses): Use
c_omp_address_inspector class and OMP address tokenizer to analyze and
expand map clause expressions.  Fix some diagnostics.  Fix "is OpenACC"
condition for C_ORT_ACC_TARGET addition.

gcc/cp/
* parser.cc (cp_parser_oacc_all_clauses): Add TARGET_P parameter. Use
to select region type for finish_omp_clauses call.
(cp_parser_omp_target_data, cp_parser_omp_target_enter_data): Support
GOMP_MAP_ATTACH kind.
(cp_parser_omp_target_exit_data): Support GOMP_MAP_DETACH kind.
(cp_parser_oacc_declare): Update call to cp_parser_oacc_all_clauses.
(cp_parser_oacc_loop): Update calls to cp_parser_oacc_all_clauses.
(cp_parser_oacc_compute): Likewise.
* pt.cc (tsubst_expr): Use C_ORT_ACC_TARGET for call to
tsubst_omp_clauses for OpenACC compute regions.
* semantics.cc (cp_omp_address_inspector): New class, derived from
c_omp_address_inspector.
(handle_omp_array_sections_1, handle_omp_array_sections,
finish_omp_clauses): Use cp_omp_address_inspector class and OMP address
tokenizer to analyze and expand OpenMP map clause expressions.  Fix
some diagnostics.  Support C_ORT_ACC_TARGET.
(finish_omp_target): Handle GOMP_MAP_POINTER.

gcc/fortran/
* trans-openmp.cc (gfc_trans_omp_array_section): Add OPENMP parameter.
Use GOMP_MAP_ATTACH_DETACH instead of GOMP_MAP_ALWAYS_POINTER for
derived type components.
(gfc_trans_omp_clauses): Update calls to gfc_trans_omp_array_section.

gcc/
* gimplify.cc (build_struct_comp_nodes): Don't process
GOMP_MAP_ATTACH_DETACH "middle" nodes here.
(omp_mapping_group): Add REPROCESS_STRUCT and FRAGILE booleans for
nested struct handling.
(omp_strip_components_and_deref, omp_strip_indirections): Remove
functions.
(omp_get_attachment): Handle GOMP_MAP_DETACH here.
(omp_group_last): Handle GOMP_MAP_*, GOMP_MAP_DETACH,
GOMP_MAP_ATTACH_DETACH groups for "exit data" of reference-to-pointer
component array sections.
(omp_gather_mapping_groups_1): Initialise reprocess_struct and fragile
fields.
(omp_group_base): Handle GOMP_MAP_ATTACH_DETACH after GOMP_MAP_STRUCT.
(omp_index_mapping_groups_1): Skip reprocess_struct groups.
(omp_get_nonfirstprivate_group, omp_directive_maps_explicitly,
omp_resolve_clause_dependencies, omp_first_chained_access_token): New
functions.
(omp_check_mapping_compatibility): Adjust accepted node combinations
for "from" clauses using release instead of alloc.
(omp_accumulate_sibling_list): Add GROUP_MAP, ADDR_TOKENS, FRAGILE_P,
REPROCESSING_STRUCT, ADDED_TAIL parameters.  Use OMP address tokenizer
to analyze addresses.  Reimplement nested struct handling, and
implement "fragile groups".
(omp_build_struct_sibling_lists): Adjust for changes to
omp_accumulate_sibling_list.  Recalculate bias for ATTACH_DETACH nodes
after GOMP_MAP_STRUCT nodes.
(gimplify_scan_omp_clauses): Call omp_resolve_clause_dependencies.  Use
OMP address tokenizer.
(gimplify_adjust_omp_clauses_1): Use build_fold_indirect_ref_loc
instead of build_simple_mem_ref_loc.
* omp-general.cc (omp-general.h, tree-pretty-print.h): Include.
(omp_addr_tokenizer): New namespace.
(omp_addr_tokenizer::omp_addr_token): New.
(omp_addr_tokenizer::omp_parse_component_selector,
omp_addr_tokenizer::omp_parse_ref,
omp_addr_tokenizer::omp_parse_pointer,
omp_addr_tokenizer::omp_parse_access_method,
omp_addr_tokenizer::omp_parse_access_methods,
omp_addr_tokenizer::omp_parse_structure_base,
omp_addr_tokenizer::omp_parse_structured_expr,
omp_addr_tokenizer::omp_parse_array_expr,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New functions.
(omp_parse_expr, debug_omp_tokenized_addr): New functions.
* omp-general.h (omp_addr_tokenizer::access_method_kinds,
omp_addr_tokenizer::structure_base_kinds,
omp_addr_tokenizer::token_type,
omp_addr_tokenizer::omp_addr_token,
omp_addr_tokenizer::omp_access_chain_p,
omp_addr_tokenizer::omp_accessed_addr): New.
(omp_addr_token, omp_parse_expr): New.
* omp-low.cc (scan_sharing_clauses): Skip error check for references
to pointers.
* tree.h (OMP_CLAUSE_ATTACHMENT_MAPPING_ERASED): New macro.

gcc/testsuite/
* c-c++-common/gomp/clauses-2.c: Fix error output.
* c-c++-common/gomp/target-implicit-map-2.c: Adjust scan output.
* c-c++-common/gomp/target-50.c: Adjust scan output.
* c-c++-common/gomp/target-enter-data-1.c: Adjust scan output.
* g++.dg/gomp/static-component-1.C: New test.
* gcc.dg/gomp/target-3.c: Adjust scan output.
* gfortran.dg/gomp/map-9.f90: Adjust scan output.

libgomp/
* target.c (gomp_map_pointer): Modify zero-length array section
pointer handling.
(gomp_attach_pointer): Likewise.
(gomp_map_fields_existing): Use gomp_map_0len_lookup.
(gomp_attach_pointer): Allow attaching null pointers (or Fortran
"unassociated" pointers).
(gomp_map_vars_internal): Handle zero-sized struct members.  Add
diagnostic for unmapped struct pointer members.
* testsuite/libgomp.c-c++-common/baseptrs-1.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-2.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-6.c: New test.
* testsuite/libgomp.c-c++-common/baseptrs-7.c: New test.
* testsuite/libgomp.c-c++-common/ptr-attach-2.c: New test.
* testsuite/libgomp.c-c++-common/target-implicit-map-2.c: Fix missing
"free".
* testsuite/libgomp.c-c++-common/target-implicit-map-5.c: New test.
* testsuite/libgomp.c-c++-common/target-map-zlas-1.c: New test.
* testsuite/libgomp.c++/class-array-1.C: New test.
* testsuite/libgomp.c++/baseptrs-3.C: New test.
* testsuite/libgomp.c++/baseptrs-4.C: New test.
* testsuite/libgomp.c++/baseptrs-5.C: New test.
* testsuite/libgomp.c++/baseptrs-8.C: New test.
* testsuite/libgomp.c++/baseptrs-9.C: New test.
* testsuite/libgomp.c++/ref-mapping-1.C: New test.
* testsuite/libgomp.c++/target-48.C: New test.
* testsuite/libgomp.c++/target-49.C: New test.
* testsuite/libgomp.c++/target-exit-data-reftoptr-1.C: New test.
* testsuite/libgomp.c++/target-lambda-1.C: Update for OpenMP 5.2
semantics.
* testsuite/libgomp.c++/target-this-3.C: Likewise.
* testsuite/libgomp.c++/target-this-4.C: Likewise.
* testsuite/libgomp.fortran/struct-elem-map-1.f90: Add temporary XFAIL.
* testsuite/libgomp.fortran/target-enter-data-6.f90: Likewise.

OpenMP/OpenACC: Reindent TO/FROM/_CACHE_ stanza in {c_}finish_omp_clause

This patch trivially adds braces and reindents the
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza in
c_finish_omp_clause and finish_omp_clause, in preparation for the
following patch (to clarify the diff a little).

2022-09-13 Julian Brown <julian@codesourcery.com>

gcc/c/
* c-typeck.cc (c_finish_omp_clauses): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.

gcc/cp/
* semantics.cc (finish_omp_clause): Add braces and reindent
OMP_CLAUSE_TO/OMP_CLAUSE_FROM/OMP_CLAUSE__CACHE_ stanza.

libcpp: Fix valgrind errors on pr88974.c [PR112956]

On the c-c++-common/cpp/pr88974.c testcase I'm seeing
==600549== Conditional jump or move depends on uninitialised value(s)
==600549==    at 0x1DD3A05: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:3050)
==600549==    by 0x1DBFC7F: _cpp_parse_expr (expr.cc:1392)
==600549==    by 0x1DB9471: do_if(cpp_reader*) (directives.cc:2087)
==600549==    by 0x1DBB4D8: _cpp_handle_directive (directives.cc:572)
==600549==    by 0x1DCD488: _cpp_lex_token (lex.cc:3682)
==600549==    by 0x1DD3A97: cpp_get_token_1(cpp_reader*, unsigned int*) (macro.cc:2936)
==600549==    by 0x7F7EE4: scan_translation_unit (c-ppoutput.cc:350)
==600549==    by 0x7F7EE4: preprocess_file(cpp_reader*) (c-ppoutput.cc:106)
==600549==    by 0x7F6235: c_common_init() (c-opts.cc:1280)
==600549==    by 0x704C8B: lang_dependent_init (toplev.cc:1837)
==600549==    by 0x704C8B: do_compile (toplev.cc:2135)
==600549==    by 0x704C8B: toplev::main(int, char**) (toplev.cc:2306)
==600549==    by 0x7064BA: main (main.cc:39)
error.  The problem is that _cpp_lex_direct can leave result->src_loc
uninitialized in some cases and later on we use that location_t.

_cpp_lex_direct essentially does:
  cppchar_t c;
...
  cpp_token *result = pfile->cur_token++;

fresh_line:
  result->flags = 0;
...
  if (buffer->need_line)
    {
      if (pfile->state.in_deferred_pragma)
        {
          result->type = CPP_PRAGMA_EOL;
          ... // keeps result->src_loc uninitialized;
          return result;
        }
      if (!_cpp_get_fresh_line (pfile))
        {
          result->type = CPP_EOF;
          if (!pfile->state.in_directive && !pfile->state.parsing_args)
            {
              result->src_loc = pfile->line_table->highest_line;
              ...
            }
          ... // otherwise result->src_loc is sometimes uninitialized here
          return result;
        }
      ...
    }
...
  result->src_loc = pfile->line_table->highest_line;
...
  c = *buffer->cur++;
  switch (c)
    {
...
    case '\n':
...
      buffer->need_line = true;
      if (pfile->state.in_deferred_pragma)
        {
          result->type = CPP_PRAGMA_EOL;
...
          return result;
        }
      goto fresh_line;
...
    }
...
So, if _cpp_lex_direct is called without buffer->need_line initially set,
result->src_loc is always initialized (and actually hundreds of tests rely
on that exact value it has), even when c == '\n' and we set that flag later
on and goto fresh_line.  For CPP_PRAGMA_EOL case we have in that case
separate handling and don't goto.
But if _cpp_lex_direct is called with buffer->need_line initially set and
either decide to return a CPP_PRAGMA_EOL token or if getting a new line fails
for some reason and we return an CPP_ERROR token and we are in directive
or parsing args state, it is kept uninitialized and can be whatever the
allocation left it there as.

The following patch attempts to keep the status quo, use value that was
returned previously if it was initialized (i.e. we went through the
goto fresh_line; statement in c == '\n' handling) and only initialize
result->src_loc if it was uninitialized before.

2023-12-13  Jakub Jelinek  <jakub@redhat.com>

PR preprocessor/112956
* lex.cc (_cpp_lex_direct): Initialize c to 0.
For CPP_PRAGMA_EOL tokens and if c == 0 also for CPP_EOF
set result->src_loc to highest locus.

Fix 'libgomp/config/linux/allocator.c' 'size_t' vs. '%ld' format string mismatch

Fix-up for commit 348874f0baac0f22c98ab11abbfa65fd172f6bdd
"libgomp: basic pinned memory on Linux", which may result in build failures
as follow, for example, for the '-m32' multilib of x86_64-pc-linux-gnu:

    In file included from [...]/source-gcc/libgomp/config/linux/allocator.c:31:
    [...]/source-gcc/libgomp/config/linux/allocator.c: In function ‘linux_memspace_alloc’:
    [...]/source-gcc/libgomp/config/linux/allocator.c:70:26: error: format ‘%ld’ expects argument of type ‘long int’, but argument 3 has type ‘size_t’ {aka ‘unsigned int’} [-Werror=format=]
       70 |           gomp_debug (0, "libgomp: failed to pin %ld bytes of"
          |                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
       71 |                       " memory (ulimit too low?)\n", size);
          |                                                      ~~~~
          |                                                      |
          |                                                      size_t {aka unsigned int}
    [...]/source-gcc/libgomp/libgomp.h:186:29: note: in definition of macro ‘gomp_debug’
      186 |       (gomp_debug) ((KIND), __VA_ARGS__); \
          |                             ^~~~~~~~~~~
    [...]/source-gcc/libgomp/config/linux/allocator.c:70:52: note: format string is defined here
       70 |           gomp_debug (0, "libgomp: failed to pin %ld bytes of"
          |                                                  ~~^
          |                                                    |
          |                                                    long int
          |                                                  %d
    cc1: all warnings being treated as errors
    make[9]: *** [allocator.lo] Error 1
    make[9]: Leaving directory `[...]/build-gcc/x86_64-pc-linux-gnu/32/libgomp'
    [...]

Fix this in the same way as used elsewhere in libgomp.

libgomp/
* config/linux/allocator.c (linux_memspace_alloc): Fix 'size_t'
vs. '%ld' format string mismatch.

c++: TARGET_EXPR location in default arg [PR96997]

My r14-6505-g52b4b7d7f5c7c0 change to copy the location in
build_aggr_init_expr reopened PR96997; let's fix it properly this time, by
clearing the location like we do for other trees.

PR c++/96997

gcc/cp/ChangeLog:

* tree.cc (bot_manip): Check data.clear_location for TARGET_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/debug/cleanup2.C: New test.

Revert "testsuite: fix g++.dg/pr112822.C"

This reverts commit d2b269ce30d77dbfc6c28c75887c330d4698b132.

PR modula2/112921 missing modules shortreal shortstr shortconv convstringshort

For completeness here are three SHORTREAL modules which match their
LONGREAL and REAL counterparts. The datatype SHORTREAL is a GNU
extension and these modules were missing.

gcc/m2/ChangeLog:

PR modula2/112921
* gm2-libs-iso/ConvStringShort.def: New file.
* gm2-libs-iso/ConvStringShort.mod: New file.
* gm2-libs-iso/ShortConv.def: New file.
* gm2-libs-iso/ShortConv.mod: New file.
* gm2-libs-iso/ShortMath.def: New file.
* gm2-libs-iso/ShortMath.mod: New file.
* gm2-libs-iso/ShortStr.def: New file.
* gm2-libs-iso/ShortStr.mod: New file.

libgm2/ChangeLog:

PR modula2/112921
* libm2iso/Makefile.am (M2DEFS): Add ConvStringShort.def,
ShortConv.def, ShortMath.def and ShortStr.def.
(M2MODS): Add ConvStringShort.mod,
ShortConv.mod, ShortMath.mod and ShortStr.mod.
* libm2iso/Makefile.in: Regenerate.

gcc/testsuite/ChangeLog:

PR modula2/112921
* gm2/iso/run/pass/shorttest.mod: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

c++: End lifetime of objects in constexpr after destructor call [PR71093]

This patch adds checks for using objects after they've been manually
destroyed via explicit destructor call. Currently this is only
implemented for 'top-level' objects; FIELD_DECLs and individual elements
of arrays will need a lot more work to track correctly and are left for
a future patch.

The other limitation is that destruction of parameter objects is checked
too 'early', happening at the end of the function call rather than the
end of the owning full-expression as they should be for consistency;
see cpp2a/constexpr-lifetime2.C. This is because I wasn't able to find a
good way to link the constructed parameter declarations with the
variable declarations that are actually destroyed later on to propagate
their lifetime status, so I'm leaving this for a later patch.

PR c++/71093

gcc/cp/ChangeLog:

* constexpr.cc (constexpr_global_ctx::get_value_ptr): Don't
return NULL_TREE for objects we're initializing.
(constexpr_global_ctx::destroy_value): Rename from remove_value.
Only mark real variables as outside lifetime.
(constexpr_global_ctx::clear_value): New function.
(destroy_value_checked): New function.
(cxx_eval_call_expression): Defer complaining about non-constant
arg0 for operator delete. Use remove_value_safe.
(cxx_fold_indirect_ref_1): Handle conversion to 'as base' type.
(outside_lifetime_error): Include name of object we're
accessing.
(cxx_eval_store_expression): Handle clobbers. Improve error
messages.
(cxx_eval_constant_expression): Use remove_value_safe. Clear
bind variables before entering body.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-lifetime1.C: Improve error message.
* g++.dg/cpp1y/constexpr-lifetime2.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime3.C: Likewise.
* g++.dg/cpp1y/constexpr-lifetime4.C: Likewise.
* g++.dg/cpp2a/bitfield2.C: Likewise.
* g++.dg/cpp2a/constexpr-new3.C: Likewise. New check.
* g++.dg/cpp1y/constexpr-lifetime7.C: New test.
* g++.dg/cpp2a/constexpr-lifetime1.C: New test.
* g++.dg/cpp2a/constexpr-lifetime2.C: New test.

Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>

c++: fix in-charge parm in constexpr

I was puzzled by the proposed patch for PR71093 specifically ignoring the
in-charge parameter; the problem turned out to be that when
cxx_eval_call_expression jumps from the clone to the cloned function, it
assumes that the latter has the same parameters, and so the in-charge parm
doesn't get an argument. Since a class with vbases can't have constexpr
'tors there isn't actually a need for an in-charge parameter in a
destructor, but we used to use it for deleting destructors and never removed
it. I have a patch to do that for GCC 15, but for now let's work around it.

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Handle missing in-charge
argument.

c++: constant direct-initialization [PR108243]

When testing the proposed patch for PR71093 I noticed that it changed the
diagnostic for consteval-prop6.C. I then noticed that the diagnostic wasn't
very helpful either way; it was complaining about modification of the 'x'
variable, but it's not a problem to initialize a local variable with a
consteval constructor as long as the value is actually constant, we want to
know why the value isn't constant. And then it turned out that this also
fixed a missed-optimization bug in the testsuite.

PR c++/108243

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_outermost_constant_expr): Turn
a constructor CALL_EXPR into a TARGET_EXPR.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/consteval-prop6.C: Adjust diagnostic.
* g++.dg/opt/is_constant_evaluated3.C: Remove xfails.

c++: copy location to AGGR_INIT_EXPR

When building an AGGR_INIT_EXPR from a CALL_EXPR, we shouldn't lose location
information.

gcc/cp/ChangeLog:

* tree.cc (build_aggr_init_expr): Copy EXPR_LOCATION.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-nsdmi7b.C: Adjust line.
* g++.dg/template/copy1.C: Likewise.

testsuite: fix g++.dg/pr112822.C

gcc/testsuite/ChangeLog:

* g++.dg/pr112822.C: Require C++17.

amdgcn: Work around XNACK register allocation problem

The extra register pressure is causing infinite loops in some cases, especially
at -O0. I have not yet observed any issue on devices that have AVGPRs for
spilling, and XNACK is only really useful on those devices anyway, so change
the defaults.

gcc/ChangeLog:

* config/gcn/gcn-hsa.h (NO_XNACK): Change the defaults.
* config/gcn/gcn-opts.h (enum hsaco_attr_type): Add HSACO_ATTR_DEFAULT.
* config/gcn/gcn.cc (gcn_option_override): Set the default flag_xnack.
* config/gcn/gcn.opt: Add -mxnack=default.
* doc/invoke.texi: Document the -mxnack default.

amdgcn: Support XNACK mode

The XNACK feature allows memory load instructions to restart safely following
a page-miss interrupt. This is useful for shared-memory devices, like APUs,
and to implement OpenMP Unified Shared Memory.

To support the feature we must be able to set the appropriate meta-data and
set the load instructions to early-clobber. When the port supports scheduling
of s_waitcnt instructions there will be further requirements.

gcc/ChangeLog:

* config/gcn/gcn-hsa.h (NO_XNACK): Ignore missing -march.
(XNACKOPT): Match on/off; ignore any.
* config/gcn/gcn-valu.md (gather<mode>_insn_1offset<exec>):
Add xnack compatible alternatives.
(gather<mode>_insn_2offsets<exec>): Likewise.
* config/gcn/gcn.cc (gcn_option_override): Permit -mxnack for devices
other than Fiji and gfx1030.
(gcn_expand_epilogue): Remove early-clobber problems.
(gcn_hsa_declare_function_name): Obey -mxnack setting.
* config/gcn/gcn.md (xnack): New attribute.
(enabled): Rework to include "xnack" attribute.
(*movbi): Add xnack compatible alternatives.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*mov<mode>_insn): Likewise.
(*movti_insn): Likewise.
* config/gcn/gcn.opt (-mxnack): Change the default to "any".
* doc/invoke.texi: Remove placeholder notice for -mxnack.

aarch64 testsuite: Check entire .arch string

Add a terminating newline to various tests, and add missing
extensions to some test strings. The current output is broken for
options_set_4.c, so this test is left unchanged, to be fixed in a
subsequent patch.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/cpunative/native_cpu_18.c: Add \+nopauth\n
* gcc.target/aarch64/options_set_7.c: Add \+crc\n
* gcc.target/aarch64/options_set_8.c: Add \+crc\+nodotprod\n
* gcc.target/aarch64/cpunative/native_cpu_0.c: Add \n
* gcc.target/aarch64/cpunative/native_cpu_1.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_2.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_3.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_4.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_5.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_6.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_7.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_8.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_9.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_10.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_11.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_12.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_13.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_14.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_15.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_16.c: Ditto.
* gcc.target/aarch64/cpunative/native_cpu_17.c: Ditto.
* gcc.target/aarch64/options_set_1.c: Ditto.
* gcc.target/aarch64/options_set_2.c: Ditto.
* gcc.target/aarch64/options_set_3.c: Ditto.
* gcc.target/aarch64/options_set_5.c: Ditto.
* gcc.target/aarch64/options_set_6.c: Ditto.
* gcc.target/aarch64/options_set_9.c: Ditto.
* gcc.target/aarch64/options_set_11.c: Ditto.
* gcc.target/aarch64/options_set_12.c: Ditto.
* gcc.target/aarch64/options_set_13.c: Ditto.
* gcc.target/aarch64/options_set_14.c: Ditto.
* gcc.target/aarch64/options_set_15.c: Ditto.
* gcc.target/aarch64/options_set_16.c: Ditto.
* gcc.target/aarch64/options_set_17.c: Ditto.
* gcc.target/aarch64/options_set_18.c: Ditto.
* gcc.target/aarch64/options_set_19.c: Ditto.
* gcc.target/aarch64/options_set_20.c: Ditto.
* gcc.target/aarch64/options_set_21.c: Ditto.
* gcc.target/aarch64/options_set_22.c: Ditto.
* gcc.target/aarch64/options_set_23.c: Ditto.
* gcc.target/aarch64/options_set_24.c: Ditto.
* gcc.target/aarch64/options_set_25.c: Ditto.
* gcc.target/aarch64/options_set_26.c: Ditto.

aarch64: Add missing driver-aarch64 dependencies

gcc/ChangeLog:

* config/aarch64/x-aarch64: Add missing dependencies.

libgomp: basic pinned memory on Linux

Implement the OpenMP pinned memory trait on Linux hosts using the mlock
syscall. Pinned allocations are performed using mmap, not malloc, to ensure
that they can be unpinned safely when freed.

This implementation will work OK for page-scale allocations, and finer-grained
allocations will be implemented in a future patch.

libgomp/ChangeLog:

* allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
(omp_init_allocator): Use MEMSPACE_VALIDATE to check pinning.
(omp_aligned_alloc): Add pinning to all MEMSPACE_* calls.
(omp_aligned_calloc): Likewise.
(omp_realloc): Likewise.
(omp_free): Likewise.
* config/linux/allocator.c: New file.
* config/nvptx/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
(MEMSPACE_VALIDATE): Add PIN.
* config/gcn/allocator.c (MEMSPACE_ALLOC): Add PIN.
(MEMSPACE_CALLOC): Add PIN.
(MEMSPACE_REALLOC): Add PIN.
(MEMSPACE_FREE): Add PIN.
* libgomp.texi: Switch pinned trait to supported.
(MEMSPACE_VALIDATE): Add PIN.
* testsuite/libgomp.c/alloc-pinned-1.c: New test.
* testsuite/libgomp.c/alloc-pinned-2.c: New test.
* testsuite/libgomp.c/alloc-pinned-3.c: New test.
* testsuite/libgomp.c/alloc-pinned-4.c: New test.

Co-Authored-By: Thomas Schwinge <thomas@codesourcery.com>

testsuite: Add dg-do compile target c++17 directive for testcase [PR112822]

Add dg-do compile target directive that limits the test case to being built
on c++17 compiles or greater.

2023-12-13 Peter Bergner <bergner@linux.ibm.com>

gcc/testsuite/
PR tree-optimization/112822
* g++.dg/pr112822.C: Add dg-do compile target c++17 directive.

RISC-V: Refine test cases for both PR112929 and PR112988

Refine the test cases for:

* Name convention.
* Add run case.

These test cases used to cause out-of-bounds writes to the stack
and therefore showed unreliable behavior.  Depending on the
execution environment they can either pass or fail.  As of now,
with the latest QEMU version, they will pass even without the
underlying issue fixed.  As the test case is known to have
caused the problem before we keep it as a run test case for
future reference.

PR target/112929
PR target/112988

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr112929.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112929-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: Moved to...
* gcc.target/riscv/rvv/vsetvl/pr112988-1.c: ...here.
* gcc.target/riscv/rvv/vsetvl/pr112929-2.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988-2.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

aarch64 testsuite: Only run aarch64-ssve tests once

gcc/testsuite/ChangeLog:

* g++.target/aarch64/sve/aarch64-ssve.exp:

ARC: Add *extvsi_n_0 define_insn_and_split for PR 110717.

This patch improves the code generated for bitfield sign extensions on
ARC cpus without a barrel shifter.

Compiling the following test case:

int foo(int x) { return (x<<27)>>27; }

with -O2 -mcpu=em, generates two loops:

foo: mov     lp_count,27
        lp      2f
        add     r0,r0,r0
        nop
2:      # end single insn loop
        mov     lp_count,27
        lp      2f
        asr     r0,r0
        nop
2:      # end single insn loop
        j_s     [blink]

and the closely related test case:

struct S { int a : 5; };
int bar (struct S *p) { return p->a; }

generates the slightly better:

bar: ldb_s   r0,[r0]
        mov_s   r2,0    ;3
        add3    r0,r2,r0
        sexb_s  r0,r0
        asr_s   r0,r0
        asr_s   r0,r0
        j_s.d   [blink]
        asr_s   r0,r0

which uses 6 instructions to perform this particular sign extension.
It turns out that sign extensions can always be implemented using at
most three instructions on ARC (without a barrel shifter) using the
idiom ((x&mask)^msb)-msb [as described in section "2-5 Sign Extension"
of Henry Warren's book "Hacker's Delight"].  Using this, the sign
extensions above on ARC's EM both become:

        bmsk_s  r0,r0,4
        xor     r0,r0,16
        sub     r0,r0,16

which takes about 3 cycles, compared to the ~112 cycles for the loops
in foo.

2023-12-13  Roger Sayle  <roger@nextmovesoftware.com>
    Jeff Law  <jlaw@ventanamicro.com>

gcc/ChangeLog
* config/arc/arc.md (*extvsi_n_0): New define_insn_and_split to
implement SImode sign extract using a AND, XOR and MINUS sequence.

gcc/testsuite/ChangeLog
* gcc.target/arc/extvsi-1.c: New test case.
* gcc.target/arc/extvsi-2.c: Likewise.

RISC-V:Add crypto vector implied ISA info.

Due to the crypto vector entension is depend on the Vector extension,
so add the implied ISA info with the corresponding crypto vector extension.

gcc/ChangeLog:

* common/config/riscv/riscv-common.cc: Modify implied ISA info.
* config/riscv/arch-canonicalize: Add crypto vector implied info.

libstdc++: Fix regression in std::format output of %Y for negative years

The change in r14-6468-ga01462ae8bafa8 was only supposed to apply to %C
formats, not %Y.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Do
not round century down for %Y formats.

gettext: disable install, docs targets, libasprintf, threads

This fixes issues reported by David Edelsohn <dje.gcc@gmail.com>, and by
Eric Gallager <egallager@gcc.gnu.org>.

ChangeLog:

* Makefile.def (gettext): Disable (via missing)
{install-,}{pdf,html,info,dvi} and TAGS targets. Set no_install
to true. Add --disable-threads --disable-libasprintf. Drop the
lib_path (as there are no shared libs).
* Makefile.in: Regenerate.

download_prerequisites: add --only-gettext

contrib/ChangeLog:

* download_prerequisites
<arg parse>: Parse --only-gettext.
(echo_archives): Check only_gettext and stop early if true.
(helptext): Document --only-gettext.

RISC-V: Postpone full available optimization [VSETVL PASS]

Fix VSETVL BUG that AVL is polluted

.L15:
        li      a3,9
        lui     a4,%hi(s)
        sw      a3,%lo(j)(t2)
        sh      a5,%lo(s)(a4) <--a4 is hold the address of s
        beq     t0,zero,.L42
        sw      t5,8(t4)
        vsetvli zero,a4,e8,m8,ta,ma  <<--- a4 as avl

Actually, this vsetvl is redundant.
The root cause we include full available optimization in LCM local data computation.

full available optimization should be after LCM computation.

PR target/112929
PR target/112988

gcc/ChangeLog:

* config/riscv/riscv-vsetvl.cc
(pre_vsetvl::compute_lcm_local_properties): Remove full available.
(pre_vsetvl::pre_global_vsetvl_info): Add full available optimization.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/vsetvl/pr112929.c: New test.
* gcc.target/riscv/rvv/vsetvl/pr112988.c: New test.

RISC-V: Fix dynamic lmul tests depended on abi

Some toolchain configs would report:
fatal error: gnu/stubs-ilp32.h: No such file or directory

Fix method suggested by Juzhe-Zhong

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/riscv_vector.h: New file.

Signed-off-by: demin.han <demin.han@starfivetech.com>
Signed-off-by: demin.han <demin.han@starfivetech.com>

Middle-end: Adjust decrement IV style partial vectorization COST model

Hi, before this patch, a simple conversion case for RVV codegen:

foo:
        ble     a2,zero,.L8
        addiw   a5,a2,-1
        li      a4,6
        bleu    a5,a4,.L6
        srliw   a3,a2,3
        slli    a3,a3,3
        add     a3,a3,a0
        mv      a5,a0
        mv      a4,a1
        vsetivli        zero,8,e16,m1,ta,ma
.L4:
        vle8.v  v2,0(a5)
        addi    a5,a5,8
        vzext.vf2       v1,v2
        vse16.v v1,0(a4)
        addi    a4,a4,16
        bne     a3,a5,.L4
        andi    a5,a2,-8
        beq     a2,a5,.L10
.L3:
        slli    a4,a5,32
        srli    a4,a4,32
        subw    a2,a2,a5
        slli    a2,a2,32
        slli    a5,a4,1
        srli    a2,a2,32
        add     a0,a0,a4
        add     a1,a1,a5
        vsetvli zero,a2,e16,m1,ta,ma
        vle8.v  v2,0(a0)
        vzext.vf2       v1,v2
        vse16.v v1,0(a1)
.L8:
        ret
.L10:
        ret
.L6:
        li      a5,0
        j       .L3

This vectorization go through first loop:

        vsetivli        zero,8,e16,m1,ta,ma
.L4:
        vle8.v  v2,0(a5)
        addi    a5,a5,8
        vzext.vf2       v1,v2
        vse16.v v1,0(a4)
        addi    a4,a4,16
        bne     a3,a5,.L4

Each iteration processes 8 elements.

For a scalable vectorization with VLEN > 128 bits CPU, it's ok when VLEN = 128.
But, as long as VLEN > 128 bits, it will waste the CPU resources. That is, e.g. VLEN = 256bits.
only half of the vector units are working and another half is idle.

After investigation, I realize that I forgot to adjust COST for SELECT_VL.
So, adjust COST for SELECT_VL styple length vectorization. We adjust COST from 3 to 2. since
after this patch:

foo:
ble a2,zero,.L5
.L3:
vsetvli a5,a2,e16,m1,ta,ma     -----> SELECT_VL cost.
vle8.v v2,0(a0)
slli a4,a5,1                -----> additional shift of outcome SELECT_VL for memory address calculation.
vzext.vf2 v1,v2
sub a2,a2,a5
vse16.v v1,0(a1)
add a0,a0,a5
add a1,a1,a4
bne a2,zero,.L3
.L5:
ret

This patch is a simple fix that I previous forgot.

Ok for trunk ?

If not, I am going to adjust cost in backend cost model.

PR target/111317

gcc/ChangeLog:

* tree-vect-loop.cc (vect_estimate_min_profitable_iters): Adjust for COST for decrement IV.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/pr111317.c: New test.

lower-bitint: Fix lowering of non-_BitInt to _BitInt cast merged with some wider cast [PR112940]

The following testcase ICEs, because a PHI argument from latch edge
uses a SSA_NAME set only in a conditionally executed block inside of the
loop.
This happens when we have some outer cast which lowers its operand several
times, under some condition with variable index, under different condition
with some constant index, otherwise something else, and then there is
an inner cast from non-_BitInt integer (or small/middle one).  Such cast
in certain conditions is emitted by initializing some SSA_NAMEs in the
initialization statements before loops (say for casts from <= limb size
precision by computing a SSA_NAME for the first limb and then extension
of it for the later limbs) and uses the prepare_data_in_out function
to create a PHI node.  Such function is passed the value (constant or
SSA_NAME) to use in the PHI argument from the pre-header edge, but for
the latch edge it always created a new SSA_NAME and then caller emitted
in the following 3 spots an extra assignment to set that SSA_NAME to
whatever value we want from the latch edge.  In all these 3 cases
the argument from the latch edge is known already before the loop though,
either constant or SSA_NAME computed in pre-header as well.
But the need to emit an assignment combined with the handle_operand done
in a conditional basic block results in the SSA verification failure.

The following patch fixes it by extending the prpare_data_in_out method,
so that when the latch edge argument is known before (constant or computed
in pre-header), we can just use it directly and avoid the extra assignment
that would normally be hopefully optimized away later to what we now emit
directly.

2023-12-13  Jakub Jelinek  <jakub@redhat.com>

PR tree-optimization/112940
* gimple-lower-bitint.cc (struct bitint_large_huge): Add another
argument to prepare_data_in_out method defaulted to NULL_TREE.
(bitint_large_huge::handle_operand): Pass another argument to
prepare_data_in_out instead of emitting an assignment to set it.
(bitint_large_huge::prepare_data_in_out): Add VAL_OUT argument.
If non-NULL, use it as PHI argument instead of creating a new
SSA_NAME.
(bitint_large_huge::handle_cast): Pass rext as another argument
to 2 prepare_data_in_out calls instead of emitting assignments
to set them.

* gcc.dg/bitint-53.c: New test.

attribs: Fix valgrind failures on -Wno-attributes* tests [PR112953]

The r14-6076 change changed the allocation of attribute tables from
table = new attribute_spec[2];
to
table = new attribute_spec { ... };
with
ignored_attributes_table.safe_push (table);
later in both cases, but didn't change the corresponding delete in
free_attr_data, which means valgrind is unhappy about that:
FAIL: c-c++-common/Wno-attributes-2.c  -Wc++-compat  (test for excess errors)
Excess errors:
==974681== Mismatched free() / delete / delete []
==974681==    at 0x484965B: operator delete[](void*) (vg_replace_malloc.c:1103)
==974681==    by 0x707434: free_attr_data() (attribs.cc:318)
==974681==    by 0xCFF8A4: compile_file() (toplev.cc:454)
==974681==    by 0x704D23: do_compile (toplev.cc:2150)
==974681==    by 0x704D23: toplev::main(int, char**) (toplev.cc:2306)
==974681==    by 0x7064BA: main (main.cc:39)
==974681==  Address 0x51dffa0 is 0 bytes inside a block of size 40 alloc'd
==974681==    at 0x4845FF5: operator new(unsigned long) (vg_replace_malloc.c:422)
==974681==    by 0x70A040: handle_ignored_attributes_option(vec<char*, va_heap, vl_ptr>*) (attribs.cc:301)
==974681==    by 0x7FA089: handle_pragma_diagnostic_impl<false, false> (c-pragma.cc:934)
==974681==    by 0x7FA089: handle_pragma_diagnostic(cpp_reader*) (c-pragma.cc:1028)
==974681==    by 0x75814F: c_parser_pragma(c_parser*, pragma_context, bool*) (c-parser.cc:14707)
==974681==    by 0x784A85: c_parser_external_declaration(c_parser*) (c-parser.cc:2027)
==974681==    by 0x785223: c_parser_translation_unit (c-parser.cc:1900)
==974681==    by 0x785223: c_parse_file() (c-parser.cc:26713)
==974681==    by 0x7F6331: c_common_parse_file() (c-opts.cc:1301)
==974681==    by 0xCFF87D: compile_file() (toplev.cc:446)
==974681==    by 0x704D23: toplev::main(int, char**) (toplev.cc:2306)
==974681==    by 0x7064BA: main (main.cc:39)

2023-12-13  Jakub Jelinek  <jakub@redhat.com>

PR middle-end/112953
* attribs.cc (free_attr_data): Use delete x rather than delete[] x.

i386: Fix ICE on __builtin_ia32_pabsd128 without lhs [PR112962]

The following patch fixes ICE on the testcase in similar way to how
other folded builtins are handled in ix86_gimple_fold_builtin when
they don't have a lhs; these builtins are const or pure, so normally
DCE would remove them later, but with -O0 that isn't guaranteed to
happen, and during expansion if they are marked TREE_SIDE_EFFECTS
it might still be attempted to be expanded.
This removes them right away during the folding.

Initially I wanted to also change all gsi_replace last args in that function
to true, but Andrew pointed to PR107209, so I've kept them as is.

2023-12-13 Jakub Jelinek <jakub@redhat.com>

PR target/112962
* config/i386/i386.cc (ix86_gimple_fold_builtin): For shifts
and abs without lhs replace with nop.

* gcc.target/i386/pr112962.c: New test.

Avoid losing MEM_REF offset in MEM_EXPR adjustment for stack slot sharing

When investigating PR111591 with respect to TBAA and stack slot sharing
I noticed we're eventually scrapping a [TARGET_]MEM_REF offset when
rewriting the VAR_DECL base of the MEM_EXPR to use a pointer to the
partition instead. The following makes sure to preserve that.

* emit-rtl.cc (set_mem_attributes_minus_bitpos): Preserve
the offset when rewriting an exising MEM_REF base for
stack slot sharing.

tree-optimization/112991 - re-do PR112961 fix

The following does away with the fake edge adding as in the original
PR112961 fix and instead exposes handling of entry PHIs as additional
parameter of the region VN run.

PR tree-optimization/112991
PR tree-optimization/112961
* tree-ssa-sccvn.h (do_rpo_vn): Add skip_entry_phis argument.
* tree-ssa-sccvn.cc (do_rpo_vn): Likewise.
(do_rpo_vn_1): Likewise, merge with auto-processing.
(run_rpo_vn): Adjust.
(pass_fre::execute): Likewise.
* tree-if-conv.cc (tree_if_conversion): Revert last change.
Value-number latch block but disable value-numbering of
entry PHIs.
* tree-ssa-uninit.cc (execute_early_warn_uninitialized): Adjust.

* gcc.dg/torture/pr112991.c: New testcase.

tree-optimization/112990 - unsupported VEC_PERM from match pattern

The following avoids creating an unsupported VEC_PERM after vector
lowering from the pattern merging a bit-insert from a bit-field-ref
to a VEC_PERM. For the already existing s390 testcase we get
TImode vectors which later ICE during attempted expansion of
a vec_perm_const.

PR tree-optimization/112990
* match.pd (bit_insert @0 (BIT_FIELD_REF @1 ..) ..):
Restrict to vector modes after lowering.

middle-end/111591 - explain why TBAA doesn't need adjustment

While tidying the prototype patch I've done for the reduced testcase
in PR111591 and in that process trying to produce a testcase that
is miscompiled by stack slot coalescing and the TBAA info that
remains un-altered I've realized we do not need to adjust TBAA info.

The following documents this in the place we adjust points-to info
which we do need to adjust.

PR middle-end/111591
* cfgexpand.cc (update_alias_info_with_stack_vars): Document
why not adjusting TBAA info on accesses is OK.

multiflags: fix doc warning properly

Rather than a dubious fix for a dubious warning, namely adding a
period after a parenthesized @xref because the warning demands it, use
@pxref that is meant for exactly this case. Thanks to Joseph Myers
for introducing me to it.

for gcc/ChangeLog

* doc/invoke.texi (multiflags): Drop extraneous period, use
@pxref instead.

aarch64: Implement the ACLE instruction/data prefetch functions.

Implement the ACLE data and instruction prefetch functions[1] with the
following signatures:

  1. Data prefetch intrinsics:
  ----------------------------
  void __pldx (/*constant*/ unsigned int /*access_kind*/,
               /*constant*/ unsigned int /*cache_level*/,
               /*constant*/ unsigned int /*retention_policy*/,
               void const volatile *addr);

  void __pld (void const volatile *addr);

  2. Instruction prefetch intrinsics:
  -----------------------------------
  void __plix (/*constant*/ unsigned int /*cache_level*/,
               /*constant*/ unsigned int /*retention_policy*/,
               void const volatile *addr);

  void __pli (void const volatile *addr);

`__pldx' affords the programmer more fine-grained control over the
data prefetch behaviour than the analogous GCC builtin
`__builtin_prefetch', and allows access to the "SLC" cache level.

While `__builtin_prefetch' chooses both cache-level and retention
policy automatically via the optional `locality' parameter, `__pldx'
expects 2 (mandatory) arguments to explicitly define the desired
cache-level and retention policies.

`__plix' on the other hand, generates a code prefetch instruction and
so extends functionality on aarch64 targets beyond that which is
exposed by `builtin_prefetch'.

`__pld' and `__pli' do prefetch of data and instructions,
respectively, using default values for both cache-level and retention
policies.

Bootstrapped and tested on aarch64-none-linux-gnu.

[1] https://arm-software.github.io/acle/main/acle.html#memory-prefetch-intrinsics

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.cc:
(AARCH64_PLD): New enum aarch64_builtins entry.
(AARCH64_PLDX): Likewise.
(AARCH64_PLI): Likewise.
(AARCH64_PLIX): Likewise.
(aarch64_init_prefetch_builtin): New.
(aarch64_general_init_builtins): Call prefetch init function.
(aarch64_expand_prefetch_builtin): New.
(aarch64_general_expand_builtin):  Add prefetch expansion.
(require_const_argument): New.
* config/aarch64/aarch64.md (UNSPEC_PLDX): New.
(aarch64_pldx): Likewise.
* config/aarch64/arm_acle.h (__pld): Likewise.
(__pli): Likewise.
(__plix): Likewise.
(__pldx): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/builtin_pld_pli.c: New.
* gcc.target/aarch64/builtin_pld_pli_illegal.c: New.

range: Workaround different type precision between _Float128 and long double [PR112788]

As PR112788 shows, on rs6000 with -mabi=ieeelongdouble type _Float128
has the different type precision (128) from that (127) of type long
double, but actually they has the same underlying mode, so they have
the same precision as the mode indicates the same real type format
ieee_quad_format.

It's not sensible to have such two types which have the same mode but
different type precisions, some fix attempt was posted at [1].
As the discussion there, there are some historical reasons and
practical issues. Considering we passed stage 1 and it also affected
the build as reported, this patch is trying to temporarily workaround
it. I thought to introduce a hookpod but that seems a bit overkill,
assuming scalar float type with the same mode should have the same
precision looks sensible.

[1] https://inbox.sourceware.org/gcc-patches/718677e7-614d-7977-312d-05a75e1fd5b4@linux.ibm.com/

PR tree-optimization/112788

gcc/ChangeLog:

* value-range.h (range_compatible_p): Workaround same type mode but
different type precision issue for rs6000 scalar float types
_Float128 and long double.

i386: Fix PR110790 testcase

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110790-2.c: Change scan-assembler from shrq
to shr\[qx\].

rs6000: using pli for constant splitting

For constant building e.g. r120=0x66666666, which does not fit 'li or lis',
'pli' is used to build this constant via 'emit_move_insn'.

While for a complicated constant, e.g. 0x6666666666666666ULL, when using
'rs6000_emit_set_long_const' to split the constant recursively, it fails to
use 'pli' to build the half part constant: 0x66666666.

'rs6000_emit_set_long_const' could be updated to use 'pli' to build half
part of the constant when necessary. For example: 0x6666666666666666ULL,
"pli 3,1717986918; rldimi 3,3,32,0" can be used.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add code to use
pli for 34bit constant.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/const-build-1.c: New test.

rs6000: accurate num_insns_constant_gpr

Trunk gcc supports more constants to be built via two instructions:
e.g. "li/lis; xori/xoris/rldicl/rldicr/rldic".
And then num_insns_constant should also be updated.

Function "rs6000_emit_set_long_const" is used to build complicated
constants; and "num_insns_constant_gpr" is used to compute 'how
many instructions are needed" to build the constant. So, these
two functions should be aligned.

The idea of this patch is: to reuse "rs6000_emit_set_long_const" to
compute/record the instruction number(when computing the insn_num,
then do not emit instructions).

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_set_long_const): Add new
parameter to record number of instructions to build the constant.
(num_insns_constant_gpr): Call rs6000_emit_set_long_const to compute
num_insn.

Daily bump.

c++: class hotness attribute and member template

The FUNCTION_DECL check ignored member function templates.

gcc/cp/ChangeLog:

* class.cc (propagate_class_warmth_attribute): Handle
member templates.

gcc/testsuite/ChangeLog:

* g++.dg/ext/attr-hotness.C: Add member templates.

Co-authored-by: Jason Xu <rxu@DRWHoldings.com>

RISC-V: Apply vla vs. vls mode heuristic vector COST model

This patch apply vla vs. vls mode heuristic which can fixes the following FAILs:
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize
scan-assembler-not vset
FAIL: gcc.target/riscv/rvv/autovec/pr111751.c -O3 -ftree-vectorize
scan-assembler-times li\\s+[a-x0-9]+,0\\s+ret 2

The root cause of this FAIL is we failed to pick VLS mode for the vectorization.

Before this patch:

foo2:
        addi    sp,sp,-208
        addi    a2,sp,64
        addi    a5,sp,128
        lui     a6,%hi(.LANCHOR0)
        sd      ra,200(sp)
        addi    a6,a6,%lo(.LANCHOR0)
        mv      a0,a2
        mv      a1,a5
        li      a3,16
        mv      a4,sp
        vsetivli        zero,8,e64,m8,ta,ma
        vle64.v v8,0(a6)
        vse64.v v8,0(a2)
        vse64.v v8,0(a5)
.L4:
        vsetvli a5,a3,e32,m1,ta,ma
        slli    a2,a5,2
        vle32.v v2,0(a1)
        vle32.v v1,0(a0)
        sub     a3,a3,a5
        vadd.vv v1,v1,v2
        vse32.v v1,0(a4)
        add     a1,a1,a2
        add     a0,a0,a2
        add     a4,a4,a2
        bne     a3,zero,.L4
        lw      a4,128(sp)
        lw      a5,64(sp)
        addw    a5,a5,a4
        lw      a4,0(sp)
        bne     a4,a5,.L5
        lw      a4,132(sp)
        lw      a5,68(sp)
        addw    a5,a5,a4
        lw      a4,4(sp)
        bne     a4,a5,.L5
        lw      a4,136(sp)
        lw      a5,72(sp)
        addw    a5,a5,a4
        lw      a4,8(sp)
        bne     a4,a5,.L5
        lw      a4,140(sp)
        lw      a5,76(sp)
        addw    a5,a5,a4
        lw      a4,12(sp)
        bne     a4,a5,.L5
        lw      a4,144(sp)
        lw      a5,80(sp)
        addw    a5,a5,a4
        lw      a4,16(sp)
        bne     a4,a5,.L5
        lw      a4,148(sp)
        lw      a5,84(sp)
        addw    a5,a5,a4
        lw      a4,20(sp)
        bne     a4,a5,.L5
        lw      a4,152(sp)
        lw      a5,88(sp)
        addw    a5,a5,a4
        lw      a4,24(sp)
        bne     a4,a5,.L5
        lw      a4,156(sp)
        lw      a5,92(sp)
        addw    a5,a5,a4
        lw      a4,28(sp)
        bne     a4,a5,.L5
        lw      a4,160(sp)
        lw      a5,96(sp)
        addw    a5,a5,a4
        lw      a4,32(sp)
        bne     a4,a5,.L5
        lw      a4,164(sp)
        lw      a5,100(sp)
        addw    a5,a5,a4
        lw      a4,36(sp)
        bne     a4,a5,.L5
        lw      a4,168(sp)
        lw      a5,104(sp)
        addw    a5,a5,a4
        lw      a4,40(sp)
        bne     a4,a5,.L5
        lw      a4,172(sp)
        lw      a5,108(sp)
        addw    a5,a5,a4
        lw      a4,44(sp)
        bne     a4,a5,.L5
        lw      a4,176(sp)
        lw      a5,112(sp)
        addw    a5,a5,a4
        lw      a4,48(sp)
        bne     a4,a5,.L5
        lw      a4,180(sp)
        lw      a5,116(sp)
        addw    a5,a5,a4
        lw      a4,52(sp)
        bne     a4,a5,.L5
        lw      a4,184(sp)
        lw      a5,120(sp)
        addw    a5,a5,a4
        lw      a4,56(sp)
        bne     a4,a5,.L5
        lw      a4,188(sp)
        lw      a5,124(sp)
        addw    a5,a5,a4
        lw      a4,60(sp)
        bne     a4,a5,.L5
        ld      ra,200(sp)
        li      a0,0
        addi    sp,sp,208
        jr      ra
.L5:
        call    abort

After this patch:

        li      a0,0
        ret

The heuristic leverage ARM SVE and fully tested and confirm we have same behavior
as ARM SVE GCC and RVV Clang.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (costs::analyze_loop_vinfo): New function.
(costs::record_potential_vls_unrolling): Ditto.
(costs::prefer_unrolled_loop): Ditto.
(costs::better_main_loop_than_p): Ditto.
(costs::add_stmt_cost): Ditto.
* config/riscv/riscv-vector-costs.h (enum cost_type_enum): New enum.
* config/riscv/t-riscv: Add new include files.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/pr111313.c: Adapt test.
* gcc.target/riscv/rvv/autovec/vls/shift-3.c: Ditto.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-1.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-10.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-11.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-12.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-2.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-3.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-4.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-5.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-6.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-7.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-8.c: New test.
* gcc.dg/vect/costmodel/riscv/rvv/vla_vs_vls-9.c: New test.

RISC-V: Refactor Dynamic LMUL codes

This patch refactor dynamic LMUL to remove this following variable:
static hash_map<class loop *, autovec_info> loop_autovec_infos;

which will keep growing on-the-fly.

gcc/ChangeLog:

* config/riscv/riscv-vector-costs.cc (get_current_lmul): Remove it.
(compute_estimated_lmul): New function.
(costs::costs): Refactor.
(costs::preferred_new_lmul_p): Ditto.
(preferred_new_lmul_p): Ditto.
(costs::better_main_loop_than_p): Ditto.
* config/riscv/riscv-vector-costs.h (struct autovec_info): Remove it.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul-mixed-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-3.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul1-6.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-2.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-3.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-4.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul2-6.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-2.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-3.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-4.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-6.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-7.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-8.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul4-9.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-1.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-10.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-2.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-3.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-4.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-5.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-6.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-7.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-8.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/dynamic-lmul8-9.c: Adapt test.
* gcc.dg/vect/costmodel/riscv/rvv/pr111848.c: Adapt test.

testsuite: Add testcase for already fixed PR [PR112822]

Adding a testcase for PR112822 to ensure we won't regress.

2023-12-12 Peter Bergner <bergner@linux.ibm.com>

gcc/testsuite/
PR tree-optimization/112822
* g++.dg/pr112822.C: New test.

libstdc++: Fix std::format("{}", 'c')

When I added a fast path for std::format("{}", x) in
r14-5587-g41a5ea4cab2c59 I forgot to handle char separately from other
integral types. That caused std::format("{}", 'c') to return "99"
instead of "c".

libstdc++-v3/ChangeLog:

* include/std/format (__do_vformat_to): Handle char separately
from other integral types.
* testsuite/std/format/functions/format.cc: Check for expected
output for char and bool arguments.
* testsuite/std/format/string.cc: Check that 0 filling is
rejected for character and string formats.

libstdc++: Fix std::format output of %C for negative years

During discussion of LWG 4022 I noticed that we do not correctly
implement floored division for the century. We were just truncating
towards zero, rather than applying the floor function. For negative
values that rounds the wrong way.

libstdc++-v3/ChangeLog:

* include/bits/chrono_io.h (__formatter_chrono::_M_C_y_Y): Fix
rounding for negative centuries.
* testsuite/std/time/year/io.cc: Check %C for negative years.

libstdc++: Remove redundant -std flags from Makefile

In r14-4060-gc4baeaecbbf7d0 I moved some files from src/c++98 to
src/c++11 but I didn't remove the redundant -std=gnu++11 flags for those
files. The flags aren't needed now, because AM_CXXFLAGS for that
directory already uses -std=gnu++11. This removes them.

libstdc++-v3/ChangeLog:

* src/c++11/Makefile.am: Remove redundant -std=gnu++11 flags.
* src/c++11/Makefile.in: Regenerate.

SRA: Force gimple operand in an additional corner case (PR 112822)

PR 112822 revealed a corner case in load_assign_lhs_subreplacements
where it creates invalid gimple: an assignment where on the LHS there
is a complex variable which however is not a gimple register because
it has partial defs and on the right hand side there is a
VIEW_CONVERT_EXPR. This patch invokes force_gimple_operand_gsi on
such statements (like it already does when both sides of a generated
assignment have partial definitions.

gcc/ChangeLog:

2023-12-12 Martin Jambor <mjambor@suse.cz>

PR tree-optimization/112822
* tree-sra.cc (load_assign_lhs_subreplacements): Invoke
force_gimple_operand_gsi also when LHS has partial stores and RHS is a
VIEW_CONVERT_EXPR.

PR modula2/112984 Compiling program with -Wpedantic shows warning in libraries

This patch tidies up the library modules so that -Wpedantic does not
generate any warnings (apart from two procedures with legitimate infinite
loops).

gcc/m2/ChangeLog:

PR modula2/112984
* gm2-libs-coroutines/SYSTEM.mod: Remove redundant import of memcpy.
* gm2-libs-iso/ClientSocket.mod: Remove redundant import of IOConsts.
* gm2-libs-iso/IOChan.mod: Remove redundant import of IOConsts.
* gm2-libs-iso/IOLink.mod: Remove redundant import of IOChan and SYSTEM.
* gm2-libs-iso/IOResult.mod: Remove redundant import of IOChan.
* gm2-libs-iso/LongIO.mod: Remove redundant import of writeString.
* gm2-libs-iso/LongWholeIO.mod: Remove redundant import of IOChan.
* gm2-libs-iso/M2RTS.mod: Remove redundant import of ADDRESS.
* gm2-libs-iso/MemStream.mod: Remove redundant import of ADDRESS.
* gm2-libs-iso/RTdata.mod: Remove redundant import of DeviceTablePtr.
* gm2-libs-iso/RTfio.mod: Remove redundant import of DeviceTablePtr.
* gm2-libs-iso/RTgen.mod: Remove redundant import of DeviceTablePtr.
* gm2-libs-iso/RealIO.mod: Remove redundant import of writeString.
* gm2-libs-iso/RndFile.mod: Remove redundant import of SYSTEM.
* gm2-libs-iso/SYSTEM.mod: Remove redundant import of memcpy.
* gm2-libs-iso/ShortWholeIO.mod: Remove redundant import of IOConsts.
* gm2-libs-iso/TextIO.mod: Remove redundant import of IOChan.
* gm2-libs-iso/TextUtil.mod: Remove redundant import of IOChan.
* gm2-libs-iso/WholeIO.mod: Remove redundant import of IOChan.
* gm2-libs-log/BitByteOps.mod: Remove redundant import of BYTE.
* gm2-libs-log/FileSystem.mod: Remove redundant import of BYTE and ADDRESS.
* gm2-libs-log/InOut.mod: Remove redundant import of String.
* gm2-libs-log/RealConversions.mod: Remove redundant import of StringToLongreal.
* gm2-libs/FIO.mod: Remove redundant import of SIZE.
* gm2-libs/FormatStrings.mod: Remove redundant import of String
and ConCatChar.
* gm2-libs/IO.mod: Remove redundant import of SIZE.
* gm2-libs/Indexing.mod: Remove redundant import of ADDRESS.
* gm2-libs/M2Dependent.mod: Remove redundant import of SIZE.
* gm2-libs/M2RTS.mod: Remove redundant import of ADDRESS.
* gm2-libs/OptLib.mod: Remove redundant import of DynamicStrings.
* gm2-libs/SYSTEM.mod: Remove redundant import of memcpy.
* gm2-libs/StringConvert.mod: Remove redundant import of String.

libgm2/ChangeLog:

* libm2iso/Makefile.am (libm2iso_la_M2FLAGS): Added line breaks.
* libm2iso/Makefile.in: Regenerate.
* libm2log/Makefile.am (libm2log_la_M2FLAGS): Added line breaks.
* libm2log/Makefile.in: Regenerate.
* libm2pim/Makefile.am (libm2pim_la_M2FLAGS): Added line breaks.
* libm2pim/Makefile.in: Regenerate.

gcc/testsuite/ChangeLog:

PR modula2/112984
* gm2/switches/pedantic/pass/hello.mod: New test.
* gm2/switches/pedantic/pass/switches-pedantic-pass.exp: New test.

Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>

LoongArch: testsuite: Remove XFAIL in vect-ftint-no-inexact.c

After r14-6455 this no longer fails.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/vect-ftint-no-inexact.c (xfail): Remove.

testsuite: fix is_nothrow_default_constructible8.C

This testcase uses variable templates, a C++14 feature.

gcc/testsuite/ChangeLog:

* g++.dg/ext/is_nothrow_constructible8.C: Require C++14.

tree: add to clobber_kind

In discussion of PR71093 it came up that more clobber_kind options would be
useful within the C++ front-end.

gcc/ChangeLog:

* tree-core.h (enum clobber_kind): Rename CLOBBER_EOL to
CLOBBER_STORAGE_END. Add CLOBBER_STORAGE_BEGIN,
CLOBBER_OBJECT_BEGIN, CLOBBER_OBJECT_END.
* gimple-lower-bitint.cc
* gimple-ssa-warn-access.cc
* gimplify.cc
* tree-inline.cc
* tree-ssa-ccp.cc: Adjust for rename.
* tree-pretty-print.cc: And handle new values.

gcc/cp/ChangeLog:

* call.cc (build_trivial_dtor_call): Use CLOBBER_OBJECT_END.
* decl.cc (build_clobber_this): Take clobber_kind argument.
(start_preparsed_function): Pass CLOBBER_OBJECT_BEGIN.
(begin_destructor_body): Pass CLOBBER_OBJECT_END.

gcc/testsuite/ChangeLog:

* gcc.dg/pr87052.c: Adjust expected CLOBBER output.

Co-authored-by: Nathaniel Shead <nathanieloshead@gmail.com>

aarch64,arm: Fix branch-protection= parsing

Refactor the parsing to have a single API and fix a few parsing issues:

- Different handling of "bti+none" and "none+bti": these should be
  rejected because "none" can only appear alone.

- Accepted empty strings such as "bti++pac-ret" or "bti+", this bug
  was caused by using strtok_r.

- Memory got leaked (str_root was never freed). And two buffers got
  allocated when one is enough.

The callbacks now have no failure mode, only parsing can fail and
all failures are handled locally.  The "-mbranch-protection=" vs
"target("branch-protection=")" difference in the error message is
handled by a separate argument to aarch_validate_mbranch_protection.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options): Update.
(aarch64_handle_attr_branch_protection): Update.
* config/arm/aarch-common-protos.h (aarch_parse_branch_protection):
Remove.
(aarch_validate_mbranch_protection): Add new argument.
* config/arm/aarch-common.cc (aarch_handle_no_branch_protection):
Update.
(aarch_handle_standard_branch_protection): Update.
(aarch_handle_pac_ret_protection): Update.
(aarch_handle_pac_ret_leaf): Update.
(aarch_handle_pac_ret_b_key): Update.
(aarch_handle_bti_protection): Update.
(aarch_parse_branch_protection): Remove.
(next_tok): New.
(aarch_validate_mbranch_protection): Rewrite.
* config/arm/aarch-common.h (struct aarch_branch_protect_type):
Add field "alone".
* config/arm/arm.cc (arm_configure_build_target): Update.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/branch-protection-attr.c: Update.
* gcc.target/aarch64/branch-protection-option.c: Update.

aarch64,arm: Remove accepted_branch_protection_string

On aarch64 this caused ICE with pragma push_options since

  commit ae54c1b09963779c5c3914782324ff48af32e2f1
  Author:     Wilco Dijkstra <wilco.dijkstra@arm.com>
  CommitDate: 2022-06-01 18:13:57 +0100

  AArch64: Cleanup option processing code

The failure is at pop_options:

internal compiler error: ‘global_options’ are modified in local context

On arm the variable was unused.

gcc/ChangeLog:

* config/aarch64/aarch64.cc (aarch64_override_options_after_change_1):
Do not override branch_protection options.
(aarch64_override_options): Remove accepted_branch_protection_string.
* config/arm/aarch-common.cc (BRANCH_PROTECT_STR_MAX): Remove.
(aarch_parse_branch_protection): Remove
accepted_branch_protection_string.
* config/arm/arm.cc: Likewise.

tree-optimization/112736 - avoid overread with non-grouped SLP load

The following aovids over/under-read of storage when vectorizing
a non-grouped load with SLP.  Instead of forcing peeling for gaps
use a smaller load for the last vector which might access excess
elements.  This builds upon the existing optimization avoiding
peeling for gaps, generalizing it to all gap widths leaving a
power-of-two remaining number of elements (but it doesn't replace
or improve that particular case at this point).

I wonder if the poly relational compares I set up are good enough
to guarantee /* remain should now be > 0 and < nunits.  */.

There is existing test coverage that runs into /* DR will be unused.  */
always when the gap is wider than nunits.  Compared to the
existing gap == nunits/2 case this only adjusts the load that will
cause the overrun at the end, not every load.  Apart from the
poly relational compares it should reliably cover these cases but
I'll leave it for stage1 to remove.

PR tree-optimization/112736
* tree-vect-stmts.cc (vectorizable_load): Extend optimization
to avoid peeling for gaps to handle single-element non-groups
we now allow with SLP.

* gcc.dg/torture/pr112736.c: New testcase.

ipa/92606 - properly handle no_icf attribute for variables

The following adds no_icf handling for variables where the attribute
was rejected. It also fixes the check for no_icf by checking both
the source and the targets decl.

PR ipa/92606
gcc/c-family/
* c-attribs.cc (handle_noicf_attribute): Also allow the
attribute on global variables.

gcc/
* ipa-icf.cc (sem_item_optimizer::merge_classes): Check
both source and alias for the no_icf attribute.
* doc/extend.texi (no_icf): Document variable attribute.

tree-optimization/112961 - include latch in if-conversion CSE

The following makes sure to also process the (empty) latch when
performing CSE on the if-converted loop body. That's important
to get all uses of copies propagated out on the backedge as well.
To avoid CSE on the PHI nodes itself which is prohibitive
(see PR90402) this temporarily adds a fake entry edge to the loop.

PR tree-optimization/112961
* tree-if-conv.cc (tree_if_conversion): Instead of excluding
the latch block from VN, add a fake entry edge.

* g++.dg/vect/pr112961.cc: New testcase.

testsuite: Fix up test directive syntax errors

I've noticed
+ERROR: gcc.dg/gomp/pr87887-1.c: syntax error in target selector ".-4" for " dg-warning 13 "unsupported return type ‘struct S’ for ‘simd’ functions" { target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/pr87887-1.c: syntax error in target selector ".-4" for " dg-warning 13 "unsupported return type ‘struct S’ for ‘simd’ functions" { target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/pr89246-1.c: syntax error in target selector ".-4" for " dg-warning 11 "unsupported argument type ‘__int128’ for ‘simd’ functions" { target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/pr89246-1.c: syntax error in target selector ".-4" for " dg-warning 11 "unsupported argument type ‘__int128’ for ‘simd’ functions" { target aarch64*-*-* } .-4 "
+ERROR: gcc.dg/gomp/simd-clones-2.c: unmatched open quote in list for " dg-final 19 { scan-tree-dump "_ZGVnN2ua32vl_setArray" "optimized { target aarch64*-*-* } } "
+ERROR: gcc.dg/gomp/simd-clones-2.c: unmatched open quote in list for " dg-final 19 { scan-tree-dump "_ZGVnN2ua32vl_setArray" "optimized { target aarch64*-*-* } } "
regressions. The following patch fixes those.

2023-12-12 Jakub Jelinek <jakub@redhat.com>

* gcc.dg/gomp/pr87887-1.c: Add missing comment argument to dg-warning.
* gcc.dg/gomp/pr89246-1.c: Likewise.
* gcc.dg/gomp/simd-clones-2.c: Add missing " after dump name.

Only allow (int)trunc(x) to (int)x simplification with -ffp-int-builtin-inexact [PR107723]

With -fno-fp-int-builtin-inexact, trunc is not allowed to raise
FE_INEXACT and it should produce an integral result (if the input is not
NaN or Inf). Thus FE_INEXACT should not be raised.

But (int)x may raise FE_INEXACT when x is a non-integer, non-NaN, and
non-Inf value. C23 recommends to do so in a footnote.

Thus we should not simplify (int)trunc(x) to (int)x if
-fno-fp-int-builtin-inexact is in-effect.

gcc/ChangeLog:

PR middle-end/107723
* convert.cc (convert_to_integer_1) [case BUILT_IN_TRUNC]: Break
early if !flag_fp_int_builtin_inexact and flag_trapping_math.

gcc/testsuite/ChangeLog:

PR middle-end/107723
* gcc.dg/torture/builtin-fp-int-inexact-trunc.c: New test.

aarch64: Add dg-options to prfm_imm_offset_2.c

gcc/testsuite/
* gcc.target/aarch64/prfm_imm_offset_2.c: Add dg-options.