git.ipfire.org Git - thirdparty/gcc.git/log

libstdc++: Fix std::barrier for constant initialization [PR118395]

The std::barrier constructor should be constexpr, which means we need to
defer the dynamic allocation if the constructor is called during
constant-initialization. We can defer it to the first call to
barrier::arrive, using compare-and-swap on an atomic<T*> (instead of the
unique_ptr<T[]> currently used).

Also add precondition checks to the constructor and arrive member
function. Also implement the proposed resolution of LWG 3898.

libstdc++-v3/ChangeLog:

PR libstdc++/118395
PR libstdc++/108974
PR libstdc++/98749
* include/std/barrier (__tree_barrier): Use default
member-initializers. Change _M_state member from
unique_ptr<__state_t[]> to atomic<__state_t*>. Add
no_unique_address attribute to _M_completion.
(__tree_barrier::_M_arrive): Load value from _M_state.
(__tree_barrier::_M_invoke_completion): New member function to
ensure a throwing completion function will terminate, as
proposed in LWG 3898.
(__tree_barrier::max): Reduce by one to avoid overflow.
(__tree_barrier::__tree_barrier): Add constexpr. Qualify call to
std::move. Remove mem-initializers made unnecessary by default
member-initializers. Add precondition check. Only allocate state
array if not constant evaluated.
(__tree_barrier::arrive): Add precondition check. Do deferred
initialization of _M_state if needed.
(barrier): Add static_assert, as proposed in LWG 3898.
(barrier::barrier): Add constexpr.
* testsuite/30_threads/barrier/cons.cc: New test.
* testsuite/30_threads/barrier/lwg3898.cc: New test.

libstdc++: Optimise std::latch::arrive_and_wait

We don't need to wait if we know the counter has reached zero.

libstdc++-v3/ChangeLog:

* include/std/latch (latch::arrive_and_wait): Optimise.

libstdc++: Move atomic wait/notify entry points into the library

This moves the implementation details of atomic wait/notify functions
into the library, so that only a small API surface is exposed to users.

This also fixes some race conditions present in the design for proxied
waits:

- The stores to _M_ver in __notify_impl must be protected by the mutex,
  and the loads from _M_ver in __wait_impl and __wait_until_impl to
  check for changes must also be protected by the mutex. This ensures
  that checking _M_ver for updates and waiting on the condition_variable
  happens atomically. Otherwise it's possible to have: _M_ver == old
  happens-before {++_M_ver; cv.notify;} which happens-before cv.wait.
  That scenario results in a missed notification, and so the waiting
  function never wakes. This wasn't a problem for Linux, because the
  futex wait call re-checks the _M_ver value before sleeping, so the
  increment cannot interleave between the check and the wait.

- The initial load from _M_ver that reads the 'old' value used for the
  _M_ver == old checks must be done before loading and checking the
  value of the atomic variable. Otherwise it's possible to have:
  var.load() == val happens-before {++_M_ver; _M_cv.notify_all();}
  happens-before {old = _M_ver; lock mutex; if (_M_ver == old) cv.wait}.
  This results in the waiting thread seeing the already-incremented
  value of _M_ver and then waiting for it to change again, which doesn't
  happen. This race was present even for Linux, because using a futex
  instead of mutex+condvar doesn't prevent the increment from happening
  before the waiting threads checks for the increment.

The first race can be solved locally in the waiting and notifying
functions, by acquiring the mutex lock earlier in the function. The
second race cannot be fixed locally, because the load of the atomic
variable and the check for updates to _M_ver happen in different
functions (one in a function template in the headers and one in the
library). We do have an _M_old data member in the __wait_args_base
struct which was previously only used for non-proxy waits using a futex.
We can add a new entry point into the library to look up the waitable
state for the address and then load its _M_ver into the _M_old member.
This allows the inline function template to ensure that loading _M_ver
happens-before testing whether the atomic variable has been changed, so
that we can reliably tell if _M_ver changes after we've already tested
the atomic variable. This isn't 100% reliable, because _M_ver could be
incremented 2^32 times and wrap back to the same value, but that seems
unlikely in practice. If/when we support waiting on user-defined
predicates (which could execute long enough for _M_ver to wrap) we might
want to always wait with a timeout, so that we get a chance to re-check
the predicate even in the rare case that _M_ver wraps.

Another change is to make the __wait_until_impl function take a
__wait_clock_t::duration instead of a __wait_clock_t::time_point, so
that the __wait_until_impl function doesn't depend on the symbol name of
chrono::steady_clock. Inside the library it can be converted back to a
time_point for the clock. This would potentially allow using a different
clock, if we made a different __abi_version in the __wait_args imply
waiting with a different clock.

This also adds a void* to the __wait_args_base structure, so that
__wait_impl can store the __waitable_state* in there the first time it's
looked up for a given wait, so that it doesn't need to be retrieved
again on each loop. This requires passing the __wait_args_base structure
by non-const reference.

The __waitable_state::_S_track function can be removed now that it's all
internal to the library, and namespace-scope RAII types added for
locking and tracking contention.

libstdc++-v3/ChangeLog:

* config/abi/pre/gnu.ver: Add new symbol version and exports.
* include/bits/atomic_timed_wait.h (__platform_wait_until): Move
to atomic.cc.
(__cond_wait_until, __spin_until_impl): Likewise.
(__wait_until_impl): Likewise. Change __wait_args_base parameter
to non-const reference and change third parameter to
__wait_clock_t::duration.
(__wait_until): Change __wait_args_base parameter to non-const
reference. Change Call time_since_epoch() to get duration from
time_point.
(__wait_for): Change __wait_args_base parameter to non-const
reference.
(__atomic_wait_address_until): Call _M_prep_for_wait_on on args.
(__atomic_wait_address_for): Likewise.
(__atomic_wait_address_until_v): Qualify call to avoid ADL. Do
not forward __vfn.
* include/bits/atomic_wait.h (__platform_wait_uses_type): Use
alignof(T) not alignof(T*).
(__futex_wait_flags, __platform_wait, __platform_notify)
(__waitable_state, __spin_impl, __notify_impl): Move to
atomic.cc.
(__wait_impl): Likewise. Change __wait_args_base parameter to
non-const reference.
(__wait_args_base::_M_wait_state): New data member.
(__wait_args_base::_M_prep_for_wait_on): New member function.
(__wait_args_base::_M_load_proxy_wait_val): New member
function.
(__wait_args_base::_S_memory_order_for): Remove member function.
(__atomic_wait_address): Call _M_prep_for_wait_on on args.
(__atomic_wait_address_v): Qualify call to avoid ADL.
* src/c++20/Makefile.am: Add new file.
* src/c++20/Makefile.in: Regenerate.
* src/c++20/atomic.cc: New file.
* testsuite/17_intro/headers/c++1998/49745.cc: Remove XFAIL for
C++20 and later.
* testsuite/29_atomics/atomic/wait_notify/100334.cc: Remove use
of internal implementation details.
* testsuite/util/testsuite_abi.cc: Add GLIBCXX_3.4.35 version.

libstdc++: Rename __waiter_pool_impl to __waitable_state

The name __waiter_pool_impl is misleading. An object of that type is a
member of the pool, not the pool itself, and it's not an "impl" of
any abstract base class or generic concept. Just call it
__waitable_state since it maintains the state used for waiting/notifying
a waitable atomic object.

Similarly, rename _S_impl_for to _S_state_for.

Once these functions move into the shared library they won't be exported
and so the naming won't matter much anyway.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__wait_until_impl): Adjust
to use new naming.
* include/bits/atomic_wait.h (__waiter_pool_impl): Rename to
__waitable_state.
(__waiter_pool_impl::_S_wait): Rename to _M_waiters.
(__waiter_pool_impl::_S_impl_for): Rename to _S_state_for.
(__waiter_pool_impl::_S_track): Adjust to use new naming.
(__wait_impl, __notify_impl): Likewise.
* testsuite/29_atomics/atomic/wait_notify/100334.cc: Adjust to
use new naming.

libstdc++: Rename __atomic_compare to __atomic_eq

This is an equality comparison rather than a three-way comparison like
memcmp and <=>, so name it more precisely.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h
(__atomic_wait_address_until_v): Replace __atomic_compare with
__atomic_eq.
(__atomic_wait_address_for_v): Likewise.
* include/bits/atomic_wait.h (__atomic_compare): Rename to
__atomic_eq.
(__atomic_wait_address_v): Replace __atomic_compare with
__atomic_eq.

libstdc++: Remove reinterpret_cast uses in atomic wait/notify

We can pass around void* instead of casting incompatible pointers to
__platform_wait_t*, and then only static_cast to __platform_wait_t* when
we know that's valid.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__wait_until_impl): Change
first parameter to const void* and then static_cast to const
__platform_wait_t* when not using proxied wait.
(__wait_until): Change first parameter to const void*.
(__wait_for): Likewise.
(__atomic_wait_address_until): Remove reinterpret_cast and allow
address to implicitly convert to const void* instead.
(__atomic_wait_address_for): Likewise.
* include/bits/atomic_wait.h: (__wait_impl, __notify_impl):
Change first parameter to const void* and then static_cast to
const __platform_wait_t* when not using proxied wait.
(__atomic_wait_address, __atomic_notify_address) Remove
reinterpret_cast and allow address to implicitly convert to
const void* instead.

libstdc++: Simplify futex wrapper functions for atomic wait/notify

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (__platform_wait): Change function
template to a normal function. The parameter is always
__platform_wait_t* which is just int* for this implementation of
the function.
(__platform_notify): Likewise.

libstdc++: Fix time_point conversion in atomic timed waits

Even if a time_point already uses the right clock, we might still need
to convert it to use the expected duration. Calling __to_wait_clock will
perform that conversion, so use that even when the clock is correct.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__to_wait_clock): Do not use
chrono::ceil if clock and duration are already correct type.
(__wait_until): Always call __to_wait_clock.

libstdc++: Fix race condition in new atomic notify code

When using a proxy object for atomic waiting and notifying operations,
we need to ensure that the _M_ver value is always incremented by a
notifying operation, even if we return early without doing the futex
wake syscall. Otherwise we get missed wake-ups because the notifying
thread doesn't modify the value that other threads are doing a futex
wait on.

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (__notify_impl): Increment the
proxy value before returning early for the uncontended case.

libstdc++: Various fixes for atomic wait/notify code

Pass __wait_args_base by const reference instead of const pointer. I
don't see a reason it needs to be passed by pointer to the internals.
We can also avoid constructing a __wait_args from __wait_args_base in
some places, instaad just using the latter directly.

The code using the __wait_flags bitmask type is broken, because the
__spin_only constant includes the __do_spin element. This means that
testing (__args & __wait_flags::__spin_only) will be inadvertently true
when only __do_spin is set. This causes the __wait_until_impl function
to never actually wait on the futex (or condition variable), turning all
uses of that function into expensive busy spins. Change __spin_only to
be a single bit (i.e. a bitmask element) and adjust the places where
that bit is set so that they also use the __do_spin element.

Update the __args._M_old value when looping in __atomic_wait_address, so
that the next wait doesn't fail spuriously.

With the new __atomic_wait_address logic, the value function needs to
return the correct type, not just a bool. Without that change, the
boolean value returned by the value function is used as the value
passed to the futex wait, but that mean we're comparing (_M_a == 0) to
_M_a and so can block on the futex when we shouldn't, and then never
wake up.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__cond_wait_impl): Add
missing inline keyword.
(__spin_until_impl): Change parameter from pointer to reference.
Replace make_pair with list-initialization.  Initialize variable
for return value.
(__wait_until_impl): Likewise. Remove some preprocessor
conditional logic. Use _S_track for contention tracking.
Avoid unnecessary const_cast.
(__wait_until): Change parameter from pointer to reference.
Replace make_pair with list-initialization.
(__wait_for):  Change parameter from pointer to reference. Add
__do_spin flag to args.
* include/bits/atomic_wait.h (__waiter_pool_impl::_S_track): New
function returning an RAII object for contention tracking.
(__wait_flags): Do not set the __do_spin flag in the __spin_only
enumerator. Comment out the unused __abi_version_mask
enumerator.  Define operator| and operator|= overloads.
(__wait_args_base::operator&): Define.
(__wait_args::operator&, __wait_args::_S_default_flags): Remove.
(__wait_args::operator|, __wait_args::operator|=): Remove.
(__spin_impl): Change parameter from pointer to reference.
Replace make_pair call with list-initialization.
(__wait_impl): Likewise.  Remove some preprocessor conditional
logic.  Always store old value in __args._M_old. Avoid
unnecessary const_cast. Use _S_track.
(__notify_impl): Change parameter to reference. Remove some
preprocessor conditional logic.
(__atomic_wait_address): Add comment. Update __args._M_old on
each iteration.
(__atomic_wait_address_v): Add comment.
* include/std/latch (latch::wait): Adjust predicates for new
logic.
* testsuite/29_atomics/atomic_integral/wait_notify.cc: Improve
test.

libstdc++: Whitespace fixes in atomic wait/notify code

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h: Whitespace fixes.
* include/bits/atomic_wait.h: Likewise.

libstdc++: Pass __wait_args to internal API by const pointer

This change splits the __wait_args data members to a new struct
__wait_args_base and then passes that type by const pointer to the low
level implementation functions.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h (__spin_until_impl): Accept
__wait_args as const __wait_args_base*.
(__wait_until_impl): Likewise.
(__wait_until): Likewise.
(__wait_for): Likewise.
(__atomic_wait_address_until): Pass __wait_args by address.
(__atomic_wait_address_for): Likewise.
* include/bits/atomic_wait.h (__wait_args_base): New struct.
(__wait_args): Derive from __wait_args_base.
(__wait_args::__wait_args()): Adjust ctors to call call base ctor.
(__wait_args::__wait_args(const __wait_args_base&)): New ctor.
(__wait_args::operator|=): New method.
(__wait_args::_S_flags_for): Change return type to
__wait_flags.
(__spin_impl): Accept __wait_args as const __wait_args_base*.
(__wait_impl): Likewise.
(__notify_impl): Likewise.
(__atomic_wait_address): Pass __wait_args by address.
(__atomic_wait_address_v): Likewise.
(__atomic_notify_address): Likewise.

libstdc++: Atomic wait/notify ABI stabilization

This represents a major refactoring of the previous atomic::wait
and atomic::notify implementation detail. The aim of this change
is to simplify the implementation details and position the resulting
implementation so that much of the current header-only detail
can be moved into the shared library, while also accounting for
anticipated changes to wait/notify functionality for C++26.

The previous implementation implemented spin logic in terms of
the types __default_spin_policy, __timed_backoff_spin_policy, and
the free function __atomic_spin. These are replaced in favor of
two new free functions; __spin_impl and __spin_until_impl. These
currently inline free functions are expected to be moved into the
libstdc++ shared library in a future commit.

The previous implementation derived untimed and timed wait
implementation detail from __detail::__waiter_pool_base. This
is-a relationship is removed in the new version and the previous
implementation detail is renamed to reflect this change. The
static _S_for member has been renamed as well to indicate that it
returns the __waiter_pool_impl entry in the static 'side table'
for a given awaited address.

This new implementation replaces all of the non-templated waiting
detail of __waiter_base, __waiter_pool, __waiter, __enters_wait, and
__bare_wait with the __wait_impl free function, and the supporting
__wait_flags enum and __wait_args struct. This currenly inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This new implementation replaces all of the non-templated notifying
detail of __waiter_base, __waiter_pool, and __waiter with the
__notify_impl free function. This currently inline free function
is expected to be moved into the libstdc++ shared library in a
future commit.

The __atomic_wait_address template function is updated to account
for the above changes and to support the expected C++26 change to
pass the most recent observed value to the caller supplied predicate.

A new non-templated __atomic_wait_address_v free function is added
that only works for atomic types that operate only on __platform_wait_t
and requires the caller to supply a memory order. This is intended
to be the simplest code path for such types.

The __atomic_wait_address_v template function is now implemented in
terms of new __atomic_wait_address template and continues to accept
a user supplied "value function" to retrieve the current value of
the atomic.

The __atomic_notify_address template function is updated to account
for the above changes.

The template __platform_wait_until_impl is renamed to
__wait_clock_t. The previous __platform_wait_until template is deleted
and the functionality previously provided is moved t the new tempalate
function __wait_until. A similar change is made to the
__cond_wait_until_impl/__cond_wait_until implementation.

This new implementation similarly replaces all of the non-templated
waiting detail of __timed_waiter_pool, __timed_waiter, etc. with
the new __wait_until_impl free function. This currently inline free
function is expected to be moved into the libstdc++ shared library
in a future commit.

This implementation replaces all templated waiting functions that
manage clock conversion as well as relative waiting (wait_for) with
the new template functions __wait_until and __wait_for.

Similarly the previous implementation detail for the various
__atomic_wait_address_Xxx templates is adjusted to account for the
implementation changes outlined above.

All of the "bare wait" versions of __atomic_wait_Xxx have been removed
and replaced with a defaulted boolean __bare_wait parameter on the
new version of these templates.

libstdc++-v3/ChangeLog:

* include/bits/atomic_timed_wait.h:
(__detail::__platform_wait_until_impl): Rename to
__platform_wait_until.
(__detail::__platform_wait_until): Remove previous
definition.
(__detail::__cond_wait_until_impl): Rename to
__cond_wait_until.
(__detail::__cond_wait_until): Remove previous
definition.
(__detail::__spin_until_impl): New function.
(__detail::__wait_until_impl): New function.
(__detail::__wait_until): New function.
(__detail::__wait_for): New function.
(__detail::__timed_waiter_pool): Remove type.
(__detail::__timed_backoff_spin_policy): Remove type.
(__detail::__timed_waiter): Remove type.
(__detail::__enters_timed_wait): Remove type alias.
(__detail::__bare_timed_wait): Remove type alias.
(__atomic_wait_address_until): Adjust to new implementation
detail.
(__atomic_wait_address_until_v): Likewise.
(__atomic_wait_address_bare): Remove.
(__atomic_wait_address_for): Adjust to new implementation
detail.
(__atomic_wait_address_for_v): Likewise.
(__atomic_wait_address_for_bare): Remove.
* include/bits/atomic_wait.h: Include bits/stl_pair.h.
(__detail::__default_spin_policy): Remove type.
(__detail::__atomic_spin): Remove function.
(__detail::__waiter_pool_base): Rename to __waiter_pool_impl.
Remove _M_notify. Rename _S_for to _S_impl_for.
(__detail::__waiter_base): Remove type.
(__detail::__waiter_pool): Remove type.
(__detail::__waiter): Remove type.
(__detail::__enters_wait): Remove type alias.
(__detail::__bare_wait): Remove type alias.
(__detail::__wait_flags): New enum.
(__detail::__wait_args): New struct.
(__detail::__wait_result_type): New type alias.
(__detail::__spin_impl): New function.
(__detail::__wait_impl): New function.
(__atomic_wait_address): Adjust to new implementation detail.
(__atomic_wait_address_v): Likewise.
(__atomic_notify_address): Likewise.
(__atomic_wait_address_bare): Delete.
(__atomic_notify_address_bare): Likewise.
* include/bits/semaphore_base.h: Adjust implementation to
use new __atomic_wait_address_v contract.
* include/std/barrier: Adjust implementation to use new
__atomic_wait contract.
* include/std/latch: Adjust implementation to use new
__atomic_wait contract.
* testsuite/29_atomics/atomic/wait_notify/100334.cc (main):
Adjust to for __detail::__waiter_pool_base renaming.

rtl-ssa: Reject non-address uses of autoinc regs [PR120347]

As the rtl.texi documentation of RTX_AUTOINC expressions says:

  If a register used as the operand of these expressions is used in
  another address in an insn, the original value of the register is
  used.  Uses of the register outside of an address are not permitted
  within the same insn as a use in an embedded side effect expression
  because such insns behave differently on different machines and hence
  must be treated as ambiguous and disallowed.

late-combine was failing to follow this rule.  One option would have
been to enforce it during the substitution phase, like combine does.
This could either be a dedicated condition in the substitution code
or, more generally, an extra condition in can_merge_accesses.
(The latter would include extending is_pre_post_modify to uses.)

However, since the restriction applies to patterns rather than to
actions on patterns, the more robust fix seemed to be test and reject
this case in (a subroutine of) rtl_ssa::recog.  We already do something
similar for hard-coded register clobbers.

Using vec_rtx_properties isn't the lightest-weight operation
out there.  I did wonder about relying on the is_pre_post_modify
flag of the definitions in the new_defs array, but that would
require callers that create new autoincs to set the flag before
calling recog.  Normally these flags are instead updated
automatically based on the final pattern.

Besides, recog itself has had to traverse the whole pattern,
and it is even less light-weight than vec_rtx_properties.
At least the pattern should be in cache.

The rtl-ssa fix showed up a mistake (of mine) in the rtl_properties
walker: try_to_add_src would drop all flags except IN_NOTE before
recursing into RTX_AUTOINC addresses.

RTX_AUTOINCs only occur in addresses, and so for them, the flags coming
into try_to_add_src are set by:

  unsigned int base_flags = flags & rtx_obj_flags::STICKY_FLAGS;
  ...
  if (MEM_P (x))
    {
      ...

      unsigned int addr_flags = base_flags | rtx_obj_flags::IN_MEM_STORE;
      if (flags & rtx_obj_flags::IS_READ)
addr_flags |= rtx_obj_flags::IN_MEM_LOAD;
      try_to_add_src (XEXP (x, 0), addr_flags);
      return;
    }

This means that the only flags that can be set are:

- IN_NOTE (the sole member of STICKY_FLAGS)
- IN_MEM_STORE
- IN_MEM_LOAD

Thus dropping all flags except IN_NOTE had the effect of dropping
IN_MEM_STORE and IN_MEM_LOAD, and nothing else.  But those flags
are the ones that mark something as being part of a mem address.
The exclusion was therefore exactly wrong.

gcc/
PR rtl-optimization/120347
* rtlanal.cc (rtx_properties::try_to_add_src): Don't drop the
IN_MEM_LOAD and IN_MEM_STORE flags for autoinc registers.
* rtl-ssa/changes.cc (recog_level2): Check whether an
RTX_AUTOINCed register also appears outside of an address.

gcc/testsuite/
PR rtl-optimization/120347
* gcc.dg/torture/pr120347.c: New test.

OpenMP: C++ "declare mapper" support

This patch adds support for OpenMP 5.0 "declare mapper" functionality
for C++.  I've merged it to og13 based on the last version
posted upstream, with some minor changes due to the newly-added
'present' map modifier support.  There's also a fix to splay-tree
traversal in gimplify.cc:omp_instantiate_implicit_mappers, and this patch
omits the rearrangement of gimplify.cc:gimplify_{scan,adjust}_omp_clauses
that I separated out into its own patch and applied (to og13) already.

gcc/c-family/
* c-common.h (c_omp_region_type): Add C_ORT_DECLARE_MAPPER and
C_ORT_OMP_DECLARE_MAPPER codes.
(omp_mapper_list): Add forward declaration.
(c_omp_find_nested_mappers, c_omp_instantiate_mappers): Add prototypes.
* c-omp.cc (c_omp_find_nested_mappers): New function.
(remap_mapper_decl_info): New struct.
(remap_mapper_decl_1, omp_instantiate_mapper,
c_omp_instantiate_mappers): New functions.

gcc/cp/
* constexpr.cc (reduced_constant_expression_p): Add OMP_DECLARE_MAPPER
case.
(cxx_eval_constant_expression, potential_constant_expression_1):
Likewise.
* cp-gimplify.cc (cxx_omp_finish_mapper_clauses): New function.
* cp-objcp-common.h (LANG_HOOKS_OMP_FINISH_MAPPER_CLAUSES,
LANG_HOOKS_OMP_MAPPER_LOOKUP, LANG_HOOKS_OMP_EXTRACT_MAPPER_DIRECTIVE,
LANG_HOOKS_OMP_MAP_ARRAY_SECTION): Define langhooks.
* cp-tree.h (lang_decl_base): Add omp_declare_mapper_p field.  Recount
spare bits comment.
(DECL_OMP_DECLARE_MAPPER_P): New macro.
(omp_mapper_id): Add prototype.
(cp_check_omp_declare_mapper): Add prototype.
(omp_instantiate_mappers): Add prototype.
(cxx_omp_finish_mapper_clauses): Add prototype.
(cxx_omp_mapper_lookup): Add prototype.
(cxx_omp_extract_mapper_directive): Add prototype.
(cxx_omp_map_array_section): Add prototype.
* decl.cc (check_initializer): Add OpenMP declare mapper support.
(cp_finish_decl): Set DECL_INITIAL for OpenMP declare mapper var decls
as appropriate.
* decl2.cc (mark_used): Instantiate OpenMP "declare mapper" magic var
decls.
* error.cc (dump_omp_declare_mapper): New function.
(dump_simple_decl): Use above.
* parser.cc (cp_parser_omp_clause_map): Add KIND parameter.  Support
"mapper" modifier.
(cp_parser_omp_all_clauses): Add KIND argument to
cp_parser_omp_clause_map call.
(cp_parser_omp_target): Call omp_instantiate_mappers before
finish_omp_clauses.
(cp_parser_omp_declare_mapper): New function.
(cp_parser_omp_declare): Add "declare mapper" support.
* pt.cc (tsubst_decl): Adjust name of "declare mapper" magic var decls
once we know their type.
(tsubst_omp_clauses): Call omp_instantiate_mappers before
finish_omp_clauses, for target regions.
(tsubst_expr): Support OMP_DECLARE_MAPPER nodes.
(instantiate_decl): Instantiate initialiser (i.e definition) for OpenMP
declare mappers.
* semantics.cc (gimplify.h): Include.
(omp_mapper_id, omp_mapper_lookup, omp_extract_mapper_directive,
cxx_omp_map_array_section, cp_check_omp_declare_mapper): New functions.
(finish_omp_clauses): Delete GOMP_MAP_PUSH_MAPPER_NAME and
GOMP_MAP_POP_MAPPER_NAME artificial clauses.
(omp_target_walk_data): Add MAPPERS field.
(finish_omp_target_clauses_r): Scan for uses of struct/union/class type
variables.
(finish_omp_target_clauses): Create artificial mapper binding clauses
for used structs/unions/classes in offload region.

gcc/fortran/
* parse.cc (tree.h, fold-const.h, tree-hash-traits.h): Add includes
(for additions to omp-general.h).

gcc/
* gimplify.cc (gimplify_omp_ctx): Add IMPLICIT_MAPPERS field.
(new_omp_context): Initialise IMPLICIT_MAPPERS hash map.
(delete_omp_context): Delete IMPLICIT_MAPPERS hash map.
(instantiate_mapper_info): New structs.
(remap_mapper_decl_1, omp_mapper_copy_decl, omp_instantiate_mapper,
omp_instantiate_implicit_mappers): New functions.
(gimplify_scan_omp_clauses): Handle MAPPER_BINDING clauses.
(gimplify_adjust_omp_clauses): Instantiate implicit declared mappers.
(gimplify_omp_declare_mapper): New function.
(gimplify_expr): Call above function.
* langhooks-def.h (lhd_omp_mapper_lookup,
lhd_omp_extract_mapper_directive, lhd_omp_map_array_section): Add
prototypes.
(LANG_HOOKS_OMP_FINISH_MAPPER_CLAUSES,
LANG_HOOKS_OMP_MAPPER_LOOKUP, LANG_HOOKS_OMP_EXTRACT_MAPPER_DIRECTIVE,
LANG_HOOKS_OMP_MAP_ARRAY_SECTION): Define macros.
(LANG_HOOK_DECLS): Add above macros.
* langhooks.cc (lhd_omp_mapper_lookup,
lhd_omp_extract_mapper_directive, lhd_omp_map_array_section): New
dummy functions.
* langhooks.h (lang_hooks_for_decls): Add OMP_FINISH_MAPPER_CLAUSES,
OMP_MAPPER_LOOKUP, OMP_EXTRACT_MAPPER_DIRECTIVE, OMP_MAP_ARRAY_SECTION
hooks.
* omp-general.h (omp_name_type<T>): Add templatized struct, hash type
traits (for omp_name_type<tree> specialization).
(omp_mapper_list<T>): Add struct.
* tree-core.h (omp_clause_code): Add OMP_CLAUSE__MAPPER_BINDING_.
* tree-pretty-print.cc (dump_omp_clause): Support GOMP_MAP_UNSET,
GOMP_MAP_PUSH_MAPPER_NAME, GOMP_MAP_POP_MAPPER_NAME artificial mapping
clauses.  Support OMP_CLAUSE__MAPPER_BINDING_ and OMP_DECLARE_MAPPER.
* tree.cc (omp_clause_num_ops, omp_clause_code_name): Add
OMP_CLAUSE__MAPPER_BINDING_.
* tree.def (OMP_DECLARE_MAPPER): New tree code.
* tree.h (OMP_DECLARE_MAPPER_ID, OMP_DECLARE_MAPPER_DECL,
OMP_DECLARE_MAPPER_CLAUSES): New defines.
(OMP_CLAUSE__MAPPER_BINDING__ID, OMP_CLAUSE__MAPPER_BINDING__DECL,
OMP_CLAUSE__MAPPER_BINDING__MAPPER): New defines.

include/
* gomp-constants.h (gomp_map_kind): Add GOMP_MAP_UNSET,
GOMP_MAP_PUSH_MAPPER_NAME, GOMP_MAP_POP_MAPPER_NAME artificial mapping
clause types.

gcc/testsuite/
* c-c++-common/gomp/map-6.c: Update error scan output.
* c-c++-common/gomp/declare-mapper-3.c: New test (only enabled for C++
for now).
* c-c++-common/gomp/declare-mapper-4.c: Likewise.
* c-c++-common/gomp/declare-mapper-5.c: Likewise.
* c-c++-common/gomp/declare-mapper-6.c: Likewise.
* c-c++-common/gomp/declare-mapper-7.c: Likewise.
* c-c++-common/gomp/declare-mapper-8.c: Likewise.
* c-c++-common/gomp/declare-mapper-9.c: Likewise.
* c-c++-common/gomp/declare-mapper-10.c: Likewise.
* c-c++-common/gomp/declare-mapper-12.c: Likewise.
* g++.dg/gomp/declare-mapper-1.C: New test.
* g++.dg/gomp/declare-mapper-2.C: New test.
* g++.dg/gomp/declare-mapper-3.C: New test.

libgomp/
* testsuite/libgomp.c++/declare-mapper-1.C: New test.
* testsuite/libgomp.c++/declare-mapper-2.C: New test.
* testsuite/libgomp.c++/declare-mapper-3.C: New test.
* testsuite/libgomp.c++/declare-mapper-4.C: New test.
* testsuite/libgomp.c++/declare-mapper-5.C: New test.
* testsuite/libgomp.c++/declare-mapper-6.C: New test.
* testsuite/libgomp.c++/declare-mapper-7.C: New test.
* testsuite/libgomp.c++/declare-mapper-8.C: New test.
* testsuite/libgomp.c-c++-common/declare-mapper-9.c: New test (only
enabled for C++ for now).
* testsuite/libgomp.c-c++-common/declare-mapper-10.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-11.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-12.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-13.c: Likewise.
* testsuite/libgomp.c-c++-common/declare-mapper-14.c: Likewise.

Co-authored-by: Tobias Burnus <tburnus@baylibre.com>

c: fix ICE for mutually recursive structures [PR120381]

For invalid nesting of a structure definition in a definition
of itself or when using a rather obscure construction using statement
expressions, we can create mutually recursive pairs of non-identical
but compatible structure types. This can lead to invalid composite
types and an ICE. If we detect recursion even for swapped pairs
when forming composite types, this is avoided.

PR c/120381

gcc/c/ChangeLog:
* c-typeck.cc (composite_type_internal): Stop recursion for
swapped pairs.

gcc/testsuite/ChangeLog:
* gcc.dg/pr120381.c: New test.
* gcc.dg/gnu23-tag-composite-6.c: New test.

scc_copy: conditional return TODO_cleanup_cfg.

Only have cleanup cfg happen if scc copy did some proping.
This should be a small compile time improvement by not doing cleanup
cfg if scc copy does nothing.

Also removes TODO_update_ssa since it should not be needed.

gcc/ChangeLog:

* gimple-ssa-sccopy.cc (scc_copy_prop::replace_scc_by_value): Return true
if something was done.
(scc_copy_prop::propagate): Return true if something was changed.
(pass_sccopy::execute): Return TODO_cleanup_cfg if a prop happened.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

[AUTOFDO] Merge profiles of clones before annotating

This patch add support for merging profiles from multiple clones.
That is, when optimized binaries have clones such as IPA-CP clone or SRA
clones, genarted gcov will have profiled them spereately.
Currently we pick one and ignore the rest. This patch fixes this by
merging the profiles.

gcc/ChangeLog:

* auto-profile.cc (function_instance::merge): New.
(autofdo_source_profile::read): Call merge.

Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>

Daily bump.

[AUTOFDO] Enable autofdo tests for aarch64

autofdo tests are now running only for x86. This patch makes it
run for aarch64 too. Verified that perf and create_gcov are running
as expected.

gcc/ChangeLog:

* config/aarch64/gcc-auto-profile: Make script executable.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp: Enable autofdo tests for aarch64.

Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>

forwprop: Add stats for memcpy->memset

As part of the review of copy prop for aggregates, it was
mentioned there should be some statistics added, and I noticed
the memcpy->memset was missing the statistics too. So this adds
that.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Adds
statistics when the statement changed.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

forwprop: Change test in loop of optimize_memcpy_to_memset

This was noticed in the review of copy propagation for aggregates
patch, instead of checking for a NULL or a non-ssa name of vuse,
we should instead check if it the vuse is a default name and stop
then.

Bootstrapped and tested on x86_64-linux-gnu.

gcc/ChangeLog:

* tree-ssa-forwprop.cc (optimize_memcpy_to_memset): Change check
from NULL/non-ssa name to default name.

Signed-off-by: Andrew Pinski <quic_apinski@quicinc.com>

diagnostics: fix PatternFly URL

gcc/ChangeLog:
* diagnostic-format-html.cc (HTML_STYLE): Fix PatternFly URL in
comment.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: reimplement html_token_printer in terms of xml::printer

No functional change intended.

gcc/ChangeLog:
* diagnostic-format-html.cc
(html_builder::make_element_for_diagnostic::html_token_printer):
Reimplement in terms of xml::printer.
(html_builder::make_element_for_diagnostic): Create an
xml::printer and use it with the html_token_printer.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: bulletproof html_builder::make_metadata_element

gcc/ChangeLog:
* diagnostic-format-html.cc (html_builder::make_metadata_element):
Gracefully handle the case where "url" is null.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: use unique_ptr for m_format_postprocessor

No functional change intended.

gcc/cp/ChangeLog:
* error.cc (cxx_format_postprocessor::clone): Update to use
unique_ptr.
(cxx_dump_pretty_printer::cxx_dump_pretty_printer): Likewise.
(cxx_initialize_diagnostics): Likewise.

gcc/ChangeLog:
* pretty-print.cc (pretty_printer::pretty_printer): Use "nullptr"
rather than "NULL". Remove explicit delete of
m_format_postprocessor.
* pretty-print.h (format_postprocessor::clone): Use unique_ptr.
(pretty_printer::set_format_postprocessor): New.
(pretty_printer::m_format_postprocessor): Use unique_ptr.
(pp_format_postprocessor): Update for use of unique_ptr, removing
reference from return type.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

libgomp: Add OpenACC's acc_memcpy_device{,_async} routines [PR93226]

libgomp/ChangeLog:

PR libgomp/93226
* libgomp-plugin.h (GOMP_OFFLOAD_openacc_async_dev2dev): New
prototype.
* libgomp.h (struct acc_dispatch_t): Add dev2dev_func.
(gomp_copy_dev2dev): New prototype.
* libgomp.map (OACC_2.6.1): New; add acc_memcpy_device{,_async}.
* libgomp.texi (acc_memcpy_device): New.
* oacc-mem.c (memcpy_tofrom_device): Change to take from/to
device boolean; use memcpy not memmove; add early return if
size == 0 or same device + same ptr.
(acc_memcpy_to_device, acc_memcpy_to_device_async,
acc_memcpy_from_device, acc_memcpy_from_device_async): Update.
(acc_memcpy_device, acc_memcpy_device_async): New.
* openacc.f90 (acc_memcpy_device, acc_memcpy_device_async):
Add interface.
* openacc_lib.h (acc_memcpy_device, acc_memcpy_device_async):
Likewise.
* openacc.h (acc_memcpy_device, acc_memcpy_device_async): Add
prototype.
* plugin/plugin-gcn.c (GOMP_OFFLOAD_openacc_async_host2dev):
Update comment.
(GOMP_OFFLOAD_openacc_async_dev2host): Update call.
(GOMP_OFFLOAD_openacc_async_dev2dev): New.
* plugin/plugin-nvptx.c (cuda_memcpy_dev_sanity_check): New.
(GOMP_OFFLOAD_dev2dev): Call it.
(GOMP_OFFLOAD_openacc_async_dev2dev): New.
* target.c (gomp_copy_dev2dev): New.
(gomp_load_plugin_for_device): Load dev2dev and async_dev2dev.
* testsuite/libgomp.oacc-c-c++-common/acc_memcpy_device-1.c: New test.
* testsuite/libgomp.oacc-fortran/acc_memcpy_device-1.f90: New test.

c++: xobj lambda 'this' capture [PR113563]

Various places were still making assumptions that we could get to the 'this'
capture through current_class_ref in a lambda op(), which is incorrect for
an explicit object op().

PR c++/113563

gcc/cp/ChangeLog:

* lambda.cc (build_capture_proxy): Check pointerness of the
member, not the proxy type.
(lambda_expr_this_capture): Don't assume current_class_ref.
(nonlambda_method_basetype): Likewise.
* semantics.cc (finish_non_static_data_member): Don't assume
TREE_TYPE (object) is set.
(finish_this_expr): Check current_class_type for lambda,
not current_class_ref.

gcc/testsuite/ChangeLog:

* g++.dg/cpp23/explicit-obj-lambda16.C: New test.

c++, coroutines: Make a check more specific [PR109283].

The check was intended to assert that we had visited contained
ternary expressions with embedded co_awaits, but had been made
too general - and therefore was ICEing on code that was actually
OK. Fixed by checking specifically that no co_awaits embedded.

PR c++/109283

gcc/cp/ChangeLog:

* coroutines.cc (find_any_await): Only save the statement
pointer if the caller passes a place for it.
(flatten_await_stmt): When checking that ternary expressions
have been handled, also check that they contain a co_await.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr109283.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

c++: C++17 constexpr lambda and goto/static

We only want the error for these cases for functions explicitly declared
constexpr, but we still want to set invalid_constexpr on C++17 lambdas so
maybe_save_constexpr_fundef doesn't make them implicitly constexpr.

The potential_constant_expression_1 change isn't necessary for this test,
but still seems correct.

gcc/cp/ChangeLog:

* decl.cc (start_decl): Also set invalid_constexpr
for maybe_constexpr_fn.
* parser.cc (cp_parser_jump_statement): Likewise.
* constexpr.cc (potential_constant_expression_1): Ignore
goto to an artificial label.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-lambda29.C: New test.

Fortran: Make minor adjustment to error message.

PR fortran/120049

gcc/fortran/ChangeLog:

* check.cc(check_c_ptr_2): Rephrase error message
for clarity.

gcc/testsuite/ChangeLog:

* gfortran.dg/c_f_pointer_tests_6.f90: Adjust dg-error
directive.

i386: Add x86 FMV symbol tests

This is for testing the x86 mangling of FMV versioned function
assembly names.

gcc/testsuite/ChangeLog:

* g++.target/i386/mv-symbols1.C: New test.
* g++.target/i386/mv-symbols2.C: New test.
* g++.target/i386/mv-symbols3.C: New test.
* g++.target/i386/mv-symbols4.C: New test.
* g++.target/i386/mv-symbols5.C: New test.
* g++.target/i386/mvc-symbols1.C: New test.
* g++.target/i386/mvc-symbols2.C: New test.
* g++.target/i386/mvc-symbols3.C: New test.
* g++.target/i386/mvc-symbols4.C: New test.

Co-authored-by: Alfie Richards <alfie.richards@arm.com>

ppc: Add PowerPC FMV symbol tests.

This tests the mangling of function assembly names when annotated with
target_clones attributes.

gcc/testsuite/ChangeLog:

* g++.target/powerpc/mvc-symbols1.C: New test.
* g++.target/powerpc/mvc-symbols2.C: New test.
* g++.target/powerpc/mvc-symbols3.C: New test.
* g++.target/powerpc/mvc-symbols4.C: New test.

Co-authored-by: Alfie Richards <alfie.richards@arm.com>

OpenMP: Fix ICE and other issues in C/C++ metadirective error recovery.

The new testcase included in this patch used to ICE in gcc after
diagnosing the first error, and in g++ it only diagnosed the error in
the first metadirective, ignoring the second one. The solution is to
make error recovery in the C front end more like that in the C++ front
end, and remove the code in both front ends that previously tried to
skip all the way over the following statement (instead of just to the
end of the metadirective pragma) after an error.

gcc/c/ChangeLog
* c-parser.cc (c_parser_skip_to_closing_brace): New, copied from
the equivalent function in the C++ front end.
(c_parser_skip_to_end_of_block_or_statement): Pass false to
the error flag.
(c_parser_omp_context_selector): Immediately return error_mark_node
after giving an error that the integer trait property is invalid,
similarly to C++ front end.
(c_parser_omp_context_selector_specification): Likewise handle
error return from c_parser_omp_context_selector similarly to C++.
(c_parser_omp_metadirective): Do not call
c_parser_skip_to_end_of_block_or_statement after an error.

gcc/cp/ChangeLog
* parser.cc (cp_parser_omp_metadirective): Do not call
cp_parser_skip_to_end_of_block_or_statement after an error.

gcc/testsuite/ChangeLog
* c-c++-common/gomp/declare-variant-2.c: Adjust patterns now that
C and C++ now behave similarly.
* c-c++-common/gomp/metadirective-error-recovery.c: New.

OpenMP: Fix ICE in metadirective recovery after error [PR120180]

It's not clear whether a metadirective in a loop nest is supposed to
be valid, but GCC certainly shouldn't be ICE'ing after diagnosing it
as an error.

gcc/c/ChangeLog
PR c/120180
* c-parser.cc (c_parser_omp_metadirective): Only consume the
token if it is the expected close paren.

gcc/cp/ChangeLog
PR c/120180
* parser.cc (cp_parser_omp_metadirective): Only consume the
token if it is the expected close paren.

gcc/testsuite/ChangeLog
PR c/120180
* c-c++-common/gomp/pr120180.c: New.

c++, coroutines: Delete now unused code for parm guards.

Since r16-775-g18df4a10bc9694 we use nested cleanups to
handle parameter copy destructors in the ramp (and pass
a list of cleanups required to the actor which will only
be invoked if the parameter copies were all correctly
built - and therefore does not need to guard destructors
either.

This deletes the provisions for frame parameter copy
destructor guards.

gcc/cp/ChangeLog:

* coroutines.cc (analyze_fn_parms): No longer
create a parameter copy guard var.
* coroutines.h (struct param_info): Remove the
entry for the parameter copy destructor guard.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

c++, coroutines: Fix identification of coroutine ramps [PR120453].

The existing implementation, incorrectly, tried to use DECL_RAMP_FN
in check_return_expr to determine if we are handling a ramp func.
However, that query is only set for the resume/destroy functions.

Replace the use of DECL_RAMP_FN with a new query.

PR c++/120453

gcc/cp/ChangeLog:

* cp-tree.h (DECL_RAMP_P): New.
* typeck.cc (check_return_expr): Use DECL_RAMP_P instead
of DECL_RAMP_FN.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/pr120453.C: New test.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

ipa: When inlining, don't combine PT JFs changing signedness (PR120295)

In GCC 15 we allowed jump-function generation code to skip over a
type-cast converting one integer to another as long as the latter can
hold all the values of the former or has at least the same precision.
This works well for IPA-CP where we do then evaluate each jump
function as we propagate values and value-ranges.  However, the
test-case in PR 120295 shows a problem with inlining, where we combine
pass-through jump-functions so that they are always relative to the
function which is the root of the inline tree.  Unfortunately, we are
happy to combine also those with type-casts to a different signedness
which makes us use sign zero extension for the expected value ranges
where we should have used sign extension.  When the value-range which
then leads to wrong insertion of a call to builtin_unreachable is
being computed, the information about an existence of a intermediary
signed type has already been lost during previous inlining.

This patch simply blocks combining such jump-functions so that it is
back-portable to GCC 15.  Once we switch pass-through jump functions
to use a vector of operations rather than having room for just one, we
will be able to address this situation with adding an extra conversion
instead.

gcc/ChangeLog:

2025-05-19  Martin Jambor  <mjambor@suse.cz>

PR ipa/120295
* ipa-prop.cc (update_jump_functions_after_inlining): Do not
combine pass-through jump functions with type-casts changing
signedness.

gcc/testsuite/ChangeLog:

2025-05-19  Martin Jambor  <mjambor@suse.cz>

PR ipa/120295
* gcc.dg/ipa/pr120295.c: New test.

ipa: Fix whitespace when dumping VR in jump_functions

Lack of white space breakes the tree-visualisation structure and makes
the dump unnecessarily difficult to read.

gcc/ChangeLog:

2025-05-19 Martin Jambor <mjambor@suse.cz>

* ipa-prop.cc (ipa_dump_jump_function): Fix whitespace when
dumping IPA VRs.

libstdc++: Compare keys and values separately in flat_map::operator==

Instead of effectively doing a zipped comparison of the keys and values,
compare them separately to leverage the underlying containers' optimized
equality implementations.

libstdc++-v3/ChangeLog:

* include/std/flat_map (_Flat_map_impl::operator==): Compare
keys and values separately.

Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Fix tuple/pair confusion with std::erase_if(flat_map) [PR120465]

std::erase_if for flat_map/multimap is implemented via ranges::erase_if
over a zip_view of the keys and values, the value_type of which is a
tuple, but the given predicate needs to be called with a pair (flat_map's
value_type). So use a projection to convert the tuple into a suitable
pair.

PR libstdc++/120465

libstdc++-v3/ChangeLog:

* include/std/flat_map (_Flat_map_impl::_M_erase_if): Use a
projection with ranges::remove_if to pass a pair instead of
a tuple to the predicate.
* testsuite/23_containers/flat_map/1.cc (test07): Strengthen
to expect the argument passed to the predicate is a pair.
* testsuite/23_containers/flat_multimap/1.cc (test07): Likewise.

Co-authored-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>

libstdc++: Fix another 17_intro/names.cc failure on AIX

FAIL: 17_intro/names.cc -std=gnu++98 (test for excess errors)

Also fix typo in experimental/names.cc where I did #undef for the wrong
name in r16-901-gd1ced2a5ea6b09.

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc [_AIX] (a): Undefine.
* testsuite/experimental/names.cc [_AIX] (ptr): Undefine.

libstdc++: Fix lwg4084.cc test FAIL on AIX

On AIX printf formats a quiet NaN as "NaNQ" and it doesn't matter
whether %f or %F is used. Similarly, it always prints "INF" for
infinity, even when %f is used. Adjust a test that currently fails due
to this AIX-specific (and non-conforming) behaviour.

libstdc++-v3/ChangeLog:

* testsuite/22_locale/num_put/put/char/lwg4084.cc [_AIX]: Adjust
expected output for NaN and infinity.

libstdc++: Re-enable some XPASS tests for AIX

The deque shrink_to_fit.cc test always passes on AIX, I think it should
not have been disabled.

The 96088.cc tests pass for C++20 and later (I don't know why) so make
them require C++20, as they fail otherwise.

libstdc++-v3/ChangeLog:

* testsuite/23_containers/deque/capacity/shrink_to_fit.cc:
Remove dg-xfail-run-if for AIX.
* testsuite/23_containers/unordered_map/96088.cc: Replace
dg-xfail-run-if with dg-require-effective-target c++20.
* testsuite/23_containers/unordered_multimap/96088.cc: Likewise.
* testsuite/23_containers/unordered_multiset/96088.cc: Likewise.
* testsuite/23_containers/unordered_set/96088.cc: Likewise.

i386: Use Shuffles instead of shifts for Reduction in AMD znver4/5

In AMD znver4, znver5 targets vpshufd, vpsrldq have latencies 1,2 and
throughput 4 (2 for znver4),2 respectively. It is better to generate
shuffles instead of shifts wherever possible. In this patch we try to
generate appropriate shuffle instruction to copy higher half to lower
half instead of a simple right shift during horizontal vector reduction.

gcc/ChangeLog:

* config/i386/i386-expand.cc (emit_reduc_half): Use shuffles to
generate reduc half for V4SI, similar modes.
* config/i386/i386.h (TARGET_SSE_REDUCTION_PREFER_PSHUF): New Macro.
* config/i386/x86-tune.def (X86_TUNE_SSE_REDUCTION_PREFER_PSHUF):
New tuning.

gcc/testsuite/ChangeLog:

* gcc.target/i386/reduc-pshuf.c: New test.

libstdc++: Disable -Wlong-long warnings in boost_concept_check.h

The _IntegerConcept, _SignedIntegerConcept and _UnsignedIntegerConcept
class template are specialized for long long, which gives warnings with
-Wsystem-headers in C++98 mode.

libstdc++-v3/ChangeLog:

* include/bits/boost_concept_check.h: Disable -Wlong-long
warnings.
* testsuite/24_iterators/operations/prev_neg.cc: Adjust dg-error
line number.

libstdc++: Document that -std cannot be used in --target_board now

Only using GLIBCXX_TESTSUITE_STDS or v3_std_list works now.

libstdc++-v3/ChangeLog:

* doc/xml/manual/test.xml: Remove outdated documentation on
testing with -std options in --target_board.
* doc/html/manual/test.html: Regenerate.

ggc-page: Fix up build on non-USING_MMAP hosts [PR120464]

The r16-852 "Use optimize free lists for alloc_pages" change broke build
on non-USING_MMAP hosts.
I don't have access to one, so I've just added #undef USING_MMAP
before first use of that macro after the definitions.

There were 2 problems.  One was one missed G.free_pages
to free_list->free_pages replacement in #ifdef USING_MALLOC_PAGE_GROUPS
guarded code which resulted in obvious compile error.

Once fixed, there was an ICE during self-test and without self-test pretty
much on any garbage collection.
The problem is that the patch moved all of release_pages into new
do_release_pages and runs it for each freelist from the new release_pages
wrapper.  The #ifdef USING_MALLOC_PAGE_GROUPS code had two loops, one
which walked the entries in the freelist and freed the ones which had
unused group there and another which walked all the groups (regardless of
which freelist they belong to) and freed the unused ones.
With the change the first call to do_release_pages would free freelist
entries from the first freelist with unused groups, then free all unused
groups and then second and following would access already freed groups,
crashing there, and then walk again all groups looking for unused ones (but
there are guaranteed to be none).

So, this patch fixes it by moving the unused group freeing to the caller,
release_pages after all freelists are freed, and while at it, moves there
the statistics printout as well, we don't need to print separate info
for each of the freelist, previously we were emitting just one.

2025-05-29  Jakub Jelinek  <jakub@redhat.com>

PR bootstrap/120464
* ggc-page.cc (struct ggc_globals): Fix up comment formatting.
(find_free_list): Likewise.
(alloc_page): For defined(USING_MALLOC_PAGE_GROUPS) use
free_list->free_pages instead of G.free_pages.
(do_release_pages): Add n1 and n2 arguments, make them used.
Move defined(USING_MALLOC_PAGE_GROUPS) page group freeing to
release_pages and dumping of statistics as well.  Formatting fixes.
(release_pages): Adjust do_release_pages caller, move here
defined(USING_MALLOC_PAGE_GROUPS) page group freeing and dumping
of statistics.
(ggc_handle_finalizers): Fix up comment formatting and typo.

RISC-V: Add minimal support of double trap extension 1.0

Add support of double trap extension [1], enabling GCC
to recognize the following extensions at compile time.

New extensions:
- ssdbltrp
- smdbltrp

[1] https://github.com/riscv/riscv-double-trap/releases/download/v1.0/riscv-double-trap.pdf

gcc/ChangeLog:

* config/riscv/riscv-ext.def: New extensions
* config/riscv/riscv-ext.opt: Auto re-generated
* doc/riscv-ext.texi: Auto re-generated

gcc/testsuite/ChangeLog:

* gcc.target/riscv/arch-57.c: New test
* gcc.target/riscv/arch-58.c: New test

Signed-off-by: Jerry Zhang Jian <jerry.zhangjian@sifive.com>

Fortran: Fix ChangeLog.

PR fortran/119856

gcc/fortran/ChangeLog:

* ChangeLog: Fix PR number in log.

RISC-V: Add test for vec_duplicate + vmul.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vmul.vv combine to vmul.vx,
with the GR2VR cost is 0, 1 and 2.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm
check for vmul.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Add test for vec_duplicate + vmul.vv combine case 0 with GR2VR cost 0, 2 and 15

Add asm dump check test for vec_duplicate + vmul.vv combine to vmul.vx,
with the GR2VR cost is 0, 2 and 15.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i16.c: Add asm check
for vmul.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_binary_data.h: Add test
data for vmul run test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmul-run-1-i16.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmul-run-1-i32.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmul-run-1-i64.c: New test.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_vmul-run-1-i8.c: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Combine vec_duplicate + vmul.vv to vmul.vx on GR2VR cost

This patch would like to combine the vec_duplicate + vmul.vv to the
vmul.vx.  From example as below code.  The related pattern will depend
on the cost of vec_duplicate from GR2VR.  Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.

Assume we have example code like below, GR2VR cost is 0.

  #define DEF_VX_BINARY(T, OP)                                        \
  void                                                                \
  test_vx_binary (T * restrict out, T * restrict in, T x, unsigned n) \
  {                                                                   \
    for (unsigned i = 0; i < n; i++)                                  \
      out[i] = in[i] OP x;                                            \
  }

  DEF_VX_BINARY(int32_t, |)

Before this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     vsetvli a5,zero,e32,m1,ta,ma
  13   │     vmv.v.x v2,a2
  14   │     slli    a3,a3,32
  15   │     srli    a3,a3,32
  16   │ .L3:
  17   │     vsetvli a5,a3,e32,m1,ta,ma
  18   │     vle32.v v1,0(a1)
  19   │     slli    a4,a5,2
  20   │     sub a3,a3,a5
  21   │     add a1,a1,a4
  22   │     vmul.vv v1,v1,v2
  23   │     vse32.v v1,0(a0)
  24   │     add a0,a0,a4
  25   │     bne a3,zero,.L3

After this patch:
  10   │ test_vx_binary_or_int32_t_case_0:
  11   │     beq a3,zero,.L8
  12   │     slli    a3,a3,32
  13   │     srli    a3,a3,32
  14   │ .L3:
  15   │     vsetvli a5,a3,e32,m1,ta,ma
  16   │     vle32.v v1,0(a1)
  17   │     slli    a4,a5,2
  18   │     sub a3,a3,a5
  19   │     add a1,a1,a4
  20   │     vmul.vx v1,v1,a2
  21   │     vse32.v v1,0(a0)
  22   │     add a0,a0,a4
  23   │     bne a3,zero,.L3

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/riscv-v.cc (expand_vx_binary_vec_dup_vec): Add
new case for MULT op.
(expand_vx_binary_vec_vec_dup): Ditto.
* config/riscv/riscv.cc (riscv_rtx_costs): Ditto.
* config/riscv/vector-iterators.md: Add new op mult to no_shift_vx_ops.

Signed-off-by: Pan Li <pan2.li@intel.com>

c++: add __is_*destructible builtins [PR107600]

Typically "does this class have a trivial destructor" is the wrong question
to ask, we rather want "can I destroy this class trivially", thus the
std::is_trivially_destructible standard trait. Let's provide a builtin for
it, and complain about asking whether a deleted destructor is trivial.

Clang and MSVC also have these traits.

PR c++/107600

gcc/cp/ChangeLog:

* cp-trait.def (IS_DESTRUCTIBLE, IS_NOTHROW_DESTRUCTIBLE)
(IS_TRIVIALLY_DESTRUCTIBLE): New.
* constraint.cc (diagnose_trait_expr): Explain them.
* method.cc (destructible_expr): New.
(is_xible_helper): Use it.
* semantics.cc (finish_trait_expr): Handle new traits.
(trait_expr_value): Likewise. Complain about asking
whether a deleted dtor is trivial.

gcc/testsuite/ChangeLog:

* g++.dg/ext/is_destructible1.C: New test.

Daily bump.

[AUTOFDO] Fix autogen remake issue

Fix autogen issue introduced by commit
commit 86dc974cf30f926a014438a5fccdc9d41e26282b

ChangeLog:

* Makefile.def: Fix typo in cpu_type
* Makefile.tpl: Add cpu_type

Signed-off-by: Kugan Vivekanandarajah <kvivekananda@nvidia.com>

Set znver5 addss cost to 2 again

since uses of addss for other purposes then modelling FP addition/subtraction should
be gone now, this patch sets addss cost back to 2.

gcc/ChangeLog:

PR target/119298
* config/i386/x86-tune-costs.h (struct processor_costs): Set addss cost
back to 2.

Fortran: gfc_simplify_{cospi,sinpi} - fix for MPFR < 4.2.0

gcc/fortran/ChangeLog:

PR fortran/113152
* simplify.cc (gfc_simplify_cospi, gfc_simplify_sinpi): Avoid using
mpfr_fmod_ui in the MPFR < 4.2.0 version.

Fortran: Adjust handling of optional comma in FORMAT.

This change adjusts the error messages for optional commas
in format strings to give a warning at compile time unless
-std=legacy is used. This is more consistant with the
runtime library. A missing comma separator should not be
encouraged as it is non-standard fortran.

PR fortran/119586

gcc/fortran/ChangeLog:

* io.cc: Set missing comma error checks to STD_STD_LEGACY.

gcc/testsuite/ChangeLog:

* gfortran.dg/comma_format_extension_1.f: Update dg-options to
"-std=legacy".
* gfortran.dg/comma_format_extension_3.f: Likewise.
* gfortran.dg/continuation_13.f90: Likewise.

fortran: add constant input support for trig functions with half-revolutions

This patch introduces constant input support for trigonometric functions,
including those involving half-revolutions. Both valid and invalid inputs have
been thoroughly tested, as have mpfr versions greater than or equal to 4.2 and
less than 4.2.

Inspired by Steve's previous work, this patch also fixes subtle bugs revealed
by newly added test cases.

If this patch is merged, I plan to work on middle-end optimization support for
previously added GCC built-ins and libgfortran intrinsics.

PR fortran/113152

gcc/fortran/ChangeLog:

* gfortran.h (enum gfc_isym_id): Add new enum.
* intrinsic.cc (add_functions): Register new intrinsics. Changing the call
from gfc_resolve_trigd{,2} to gfc_resolve_trig{,2}.
* intrinsic.h (gfc_simplify_acospi, gfc_simplify_asinpi,
gfc_simplify_asinpi, gfc_simplify_atanpi, gfc_simplify_atan2pi,
gfc_simplify_cospi, gfc_simplify_sinpi, gfc_simplify_tanpi): New.
(gfc_resolve_trig): Rename from gfc_resolve_trigd.
(gfc_resolve_trig2): Rename from gfc_resolve_trigd2.
* iresolve.cc (gfc_resolve_trig): Rename from gfc_resolve_trigd.
(gfc_resolve_trig2): Rename from gfc_resolve_trigd2.
* mathbuiltins.def: Add 7 new math builtins and re-align.
* simplify.cc (gfc_simplify_acos, gfc_simplify_asin,
gfc_simplify_acosd, gfc_simplify_asind): Revise error message.
(gfc_simplify_acospi, gfc_simplify_asinpi,
gfc_simplify_asinpi, gfc_simplify_atanpi, gfc_simplify_atan2pi,
gfc_simplify_cospi, gfc_simplify_sinpi, gfc_simplify_tanpi): New.

gcc/testsuite/ChangeLog:

* gfortran.dg/dec_math_3.f90: Test invalid input.
* gfortran.dg/dec_math_5.f90: Test valid output.
* gfortran.dg/dec_math_6.f90: New test.

Signed-off-by: Yuao Ma <c8ef@outlook.com>
Co-authored-by: Steven G. Kargl <kargl@gcc.gnu.org>

vect: Remove non-SLP paths in strided slp/elementwise.

This patch removes non-SLP paths in the
VMAT_STRIDED_SLP/VMAT_ELEMENTWISE part of vectorizable_load.

gcc/ChangeLog:

* tree-vect-stmts.cc (vectorizable_load): Remove non-SLP paths.

RISC-V: Avoid division by zero in check_builtin_call [PR120436].

In check_builtin_call we eventually perform a division by zero when no
vector modes are present. This patch just avoids the division in that
case.

PR target/120436

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-shapes.cc (struct vset_def):
Avoid division by zero.
(struct vget_def): Ditto.
* config/riscv/riscv-vector-builtins.h (struct function_group_info):
Use required_extensions_specified instead of duplicating code.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/pr120436.c: New test.

libgomp.fortran/metadirective-1.f90: Expect 'error:' for nvptx compile [PR118694]

This should have been part of commit r16-838-gb3d07ec7ac2ccd or
r16-883-g5d6ed6d604ff94 - all showing the same issue:
'!$omp target' followed by a metadirective with 'teams'; if
the metadirective cannot be early resolved, a diagnostic
error is shown about using directives between 'target' and
'teams'.

While the message is misleading, the problem is that the
host invokes 'target' differently when 'teams' is present;
in this case, host fallback + amdgcn offload require the
no-teams case, nvptx offload the teams case such that it
only can be resolved at runtime.

Mark the error as 'dg-bogus + xfail' to silence the FAIL,
when nvptx offloading is compiled for. (If not, the
metadirective can be resolved early during compilation.)

libgomp/ChangeLog:

PR middle-end/118694
* testsuite/libgomp.fortran/metadirective-1.f90: xfail when
compiling (also) for nvptx offloading as an error is then expected.

c++: modules and using-directives

We weren't representing 'using namespace' at all in modules, which broke
some of the <chrono> literals tests.

This only represents exported using-declarations; others should be
irrelevant to importers, as any name lookup in the imported module that
would have cared about them was done while compiling the header unit.

I experimented with various approaches to representing them; this patch
handles them in read/write_namespaces, after the namespaces themselves. I
spent a while pondering how to deal with the depset code in order to connect
them, but then realized it would be simpler to refer to them based on their
index in the array of namespaces.

Any using-directives from an indirect import are ignored, so in an export
import, any imported using-directives are exported again.

gcc/cp/ChangeLog:

* module.cc (module_state::write_namespaces): Write
using-directives.
(module_state::read_namespaces): And read them.
* name-lookup.cc (add_using_namespace): Add overload. Build a
USING_DECL for modules.
(name_lookup::search_usings, name_lookup::queue_usings)
(using_directives_contain_std_p): Strip the USING_DECL.
* name-lookup.h: Declare it.
* parser.cc (cp_parser_import_declaration): Set MK_EXPORTING
for export import.

gcc/testsuite/ChangeLog:

* g++.dg/modules/namespace-8_a.C: New test.
* g++.dg/modules/namespace-8_b.C: New test.
* g++.dg/modules/namespace-9_a.C: New test.
* g++.dg/modules/namespace-9_b.C: New test.
* g++.dg/modules/namespace-10_a.C: New test.
* g++.dg/modules/namespace-10_b.C: New test.
* g++.dg/modules/namespace-10_c.C: New test.
* g++.dg/modules/namespace-11_a.C: New test.
* g++.dg/modules/namespace-11_b.C: New test.
* g++.dg/modules/namespace-11_c.C: New test.

RISC-V: Add test cases for avg_floor vaadd implementation

Add asm and run testcase for avg_floor vaadd implementation.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/avg.h: New test.
* gcc.target/riscv/rvv/autovec/avg_data.h: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i16-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i32-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i16.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-1-i8-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i16-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i32-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i16.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i32.c: New test.
* gcc.target/riscv/rvv/autovec/avg_floor-run-1-i8-from-i64.c: New test.
* gcc.target/riscv/rvv/autovec/avg_run.h: New test.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Reconcile the existing test for avg_floor

Some existing avg_floor test need updated due to change to
leverage vaadd.vv directly.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vls/avg-1.c: Update asm check
to vaadd.
* gcc.target/riscv/rvv/autovec/vls/avg-2.c: Ditto.
* gcc.target/riscv/rvv/autovec/vls/avg-3.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv32gcv.c: Ditto.
* gcc.target/riscv/rvv/autovec/widen/vec-avg-rv64gcv.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>

RISC-V: Leverage vaadd.vv for signed standard name avg_floor

The signed avg_floor totally match the sematics of fixed point
rvv insn vaadd, within round down.  Thus, leverage it directly
to implement the avf_floor.

The spec of RVV is somehow not that clear about the difference
between the float point and fixed point for the rounding that
discard least-significant information.

For float point which is not two's complement, the "discard
least-significant information" indicates truncation round.  For
example as below:

* 3.5 -> 3
* -2.3 -> -2

For fixed point which is two's complement, the "discard
least-significant information" indicates round down.  For
example as below:

* 3.5 -> 3
* -2.3 -> -3

And the vaadd takes the round down which is totally matching
the sematics of the avf_floor.

The below test suites are passed for this patch series.
* The rv64gcv fully regression test.

gcc/ChangeLog:

* config/riscv/autovec.md (avg<v_double_trunc>3_floor): Add insn
expand to leverage vaadd directly.

Signed-off-by: Pan Li <pan2.li@intel.com>

Handle auto-fdo 0 more carefully

This patch fixes few other places where auto-fdo 0 should be be treated as
actual 0 (i.e. probably never executed). Overall I think we should end up
combining static profile with auto-fdo profile where auto-fdo has 0 counts,
but that is something that should be benchmarked and first it is neccessary to
get something benchmarkeable out of auto-FDO.

gcc/ChangeLog:

* cgraph.cc (cgraph_edge::maybe_hot_p): For auto-fdo turn 0
to non-zero.
* ipa-cp.cc (cs_interesting_for_ipcp_p): Do not trust
auto-fdo 0.
* profile-count.cc (profile_count::adjust_for_ipa_scaling): Likewise.
(profile_count::from_gcov_type): Fix formating.

Do not recompute profile when entry block has afdo count of 0

With normal profile feedback checking entry block count to be non-zero is quite
reliable check for presence of non-0 profile in the body since the function
body can only be executed if the entry block was executed. With autofdo this
is not true, since the entry block may just execute too few times to be
recorded. As a consequence we currently drop AFDO profile quite often. This
patch fixes it.

gcc/ChangeLog:

* predict.cc (rebuild_frequencies): look harder for presence
of profile feedback.

aarch64: Enable newly implemented features for FUJITSU-MONAKA

This patch enables newly implemented features in GCC (FAMINMAX, FP8FMA,
FP8DOT2, FP8DOT4, LUT) for FUJITSU-MONAKA
processor (-mcpu=fujitsu-monaka).

2025-05-23 Yuta Mukai <mukai.yuta@fujitsu.com>

gcc/ChangeLog:

* config/aarch64/aarch64-cores.def (fujitsu-monaka): Update ISA
features.

Fix profile_probability quality of switch

This fixes ages old bug I noticed only now where switch cases, in situation
prediction is completely missing, gets all equal probability that should be
GUESSED instead of ADJUSTED.

gcc/ChangeLog:

* predict.cc (set_even_probabilities): Set quality to guessed.

Do not erase static profile by 0 autofdo profile

This patch makes auto-fdo more careful about keeping info we have
from static profile prediction.

If all counters in function are 0, we can keep original auto-fdo profile.
Having all 0 profile is not very useful especially becuase 0 in autofdo is not
very informative and the code still may have been executed in the train run.
I added comment about adding GUESSED_GLOBAL0_AFDO which would still preserve
info that the function is not hot in the profile, but I would like to do this
incrementally.

If function has non-zero counters, we can still keep info about zero being
reliable from static prediction (i.e. after EH or with cold attribute).

gcc/ChangeLog:

* auto-profile.cc (update_count_by_afdo_count): New function.
(afdo_set_bb_count): Add debug output; only set count if it is
non-zero.
(afdo_find_equiv_class): Add debug output.
(afdo_calculate_branch_prob): Fix formating.
(afdo_annotate_cfg): Add debug output; do not erase static
profile if autofdo profile is all 0.

doc: Fix extend.texi menu

commit 517c9487f8fdc4e4e90252a9365e5823259dc783
Author: Alejandro Colomar <alx@kernel.org>
Date: Thu May 22 01:15:36 2025 +0200

c: Add _Countof operator [PR117025]

broke gcc build on RHEL 9 when building texi files (with bundled
makeinfo 6.7):

gcc/doc/extend.texi:6: node `C Extensions' lacks menu item for
`_Countof' despite being its Up target

The same fail will happen for makeinfo <= 6.7, while won't fail
when makeinfo >= 6.8.

Fixed by adding the missing menu entires.

gcc/ChangeLog:

* doc/extend.texi (C Extensions): Add missing menu items.

For datarefs with big gap, split them into different groups.

The patch tries to solve miss vectorization for below case.

void
foo (int* a, int* restrict b)
{
    b[0] = a[0] * a[64];
    b[1] = a[65] * a[1];
    b[2] = a[2] * a[66];
    b[3] = a[67] * a[3];
    b[4] = a[68] * a[4];
    b[5] = a[69] * a[5];
    b[6] = a[6] * a[70];
    b[7] = a[7] * a[71];
}

In vect_analyze_data_ref_accesses, a[0], a[1], .. a[7], a[64], ...,
a[71] are in same group with size of 71. It caused vectorization
unprofitable.

gcc/ChangeLog:

PR tree-optimization/119181
* tree-vect-data-refs.cc (vect_analyze_data_ref_accesses):
Split datarefs when there's a gap bigger than
MAX_BITSIZE_MODE_ANY_MODE.

gcc/testsuite/ChangeLog:

* gcc.dg/vect/bb-slp-pr119181.c: New test.

Daily bump.

c: Fix up a pasto in countof diagnostics [PR117025]

The C23 in there looks like pasto, should be C2Y.

2025-05-27 Jakub Jelinek <jakub@redhat.com>

PR c/117025
* c-parser.cc (c_parser_sizeof_or_countof_expression): Use
C2Y rather than C23 in pedwarn_c23.

Fortran: fix regression introduced by commit r16-914-g787a8dec1acedf

A last-minute cleanup before patch submission reordered a change
that should not have happened. This fixes it.

PR fortran/101735

gcc/fortran/ChangeLog:

* primary.cc (gfc_match_varspec): Correct order of logic.

libgcc: Add DPD support + fix big-endian support of _BitInt <-> dfp conversions

The following patch fixes
FAIL: gcc.dg/dfp/bitint-1.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-2.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-3.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-4.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-5.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-6.c (test for excess errors)
FAIL: gcc.dg/dfp/bitint-8.c (test for excess errors)
FAIL: gcc.dg/dfp/int128-1.c (test for excess errors)
FAIL: gcc.dg/dfp/int128-2.c (test for excess errors)
FAIL: gcc.dg/dfp/int128-4.c (test for excess errors)
on s390x-linux (with the 3 not yet posted patches).

The patch does multiple things:
1) the routines were written for the DFP BID (binary integer decimal)
   format which is used on all arches but powerpc*/s390* (those use
   DPD - densely packed decimal format); as most of the code is actually
   the same for both BID and DPD formats, I haven't copied the sources
   + slightly modified them, but added the DPD support directly, + renaming
   of the exported symbols from __bid_* prefixed to __dpd_* prefixed that
   GCC expects on the DPD targets
2) while testing that I've found some big-endian issues in the existing
   support
3) testing also revealed that in some cases __builtin_clzll (~msb) was
   called with msb set to all ones, so invoking UB; apparently on aarch64
   and x86 we were lucky and got some value that happened to work well,
   but that wasn't the case on s390x

For 1), the patch uses two ~ 2KB tables to speed up the decoding/encoding.
I haven't found such tables in what is added into libgcc.a, though they
are in libdecnumber/bid/bid2dpd_dpd2bid.h, but there they are just huge
and next to other huge tables - there is d2b which is like __dpd_d2bbitint
in the patch but it uses 64-bit entries rather than 16-bit, then there is
d2b2 with 64-bit entries like in d2b all multiplied by 1000, then d2b3
similarly multiplied by 1000000, then d2b4 similarly multiplied by
1000000000, then d2b5 similarly multiplied by 1000000000000ULL and
d2b6 similarly multipled by 1000000000000000ULL.  Arguably it can
save some of the multiplications, but on the other side accesses memory
which is unlikely in the caches, and the 2048 bytes in the patch vs.
24 times more for d2b is IMHO significant.
For b2d, libdecnumber/bid/bid2dpd_dpd2bid.h has again b2d table like
__dpd_b2dbitint in the patch, except that it has 64-bit entries rather
than 16-bit (this time 1000 entries), but then has b2d2 which has the
same entries shifted left by 10, then b2d3 shifted left by 20, b2d4 shifted
left by 30 and b2d5 shifted left by 40.  I can understand for d2b paying
memory cost to speed up multiplications, but don't understand paying
extra 4 * 8 * 1000 bytes (+ 6 * 1000 bytes for b2d not using ushort)
just to avoid shifts.

2025-05-27  Jakub Jelinek  <jakub@redhat.com>

* config/t-softfp (softfp_bid_list): Don't guard with
$(enable_decimal_float) == bid.
* soft-fp/bitint.h (__bid_pow10bitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_pow10bitint.
(__dpd_d2bbitint, __dpd_b2dbitint): Declare.
* soft-fp/bitintpow10.c (__dpd_d2bbitint, __dpd_b2dbitint): New
variables.
* soft-fp/fixsdbitint.c (__bid_fixsdbitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_fixsdbitint.
Add DPD support.  Fix big-endian support.
* soft-fp/fixddbitint.c (__bid_fixddbitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_fixddbitint.
Add DPD support.  Fix big-endian support.
* soft-fp/fixtdbitint.c (__bid_fixtdbitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_fixtdbitint.
Add DPD support.  Fix big-endian support.
* soft-fp/fixsdti.c (__bid_fixsdbitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_fixsdbitint.
(__bid_fixsdti): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine to
__dpd_fixsdti.
* soft-fp/fixddti.c (__bid_fixddbitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_fixddbitint.
(__bid_fixddti): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine to
__dpd_fixddti.
* soft-fp/fixtdti.c (__bid_fixtdbitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_fixtdbitint.
(__bid_fixtdti): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine to
__dpd_fixtdti.
* soft-fp/fixunssdti.c (__bid_fixsdbitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_fixsdbitint.
(__bid_fixunssdti): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine
to __dpd_fixunssdti.
* soft-fp/fixunsddti.c (__bid_fixddbitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_fixddbitint.
(__bid_fixunsddti): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine
to __dpd_fixunsddti.
* soft-fp/fixunstdti.c (__bid_fixtdbitint): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_fixtdbitint.
(__bid_fixunstdti): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine
to __dpd_fixunstdti.
* soft-fp/floatbitintsd.c (__bid_floatbitintsd): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_floatbitintsd.
Add DPD support.  Avoid calling __builtin_clzll with 0 argument.  Fix
big-endian support.
* soft-fp/floatbitintdd.c (__bid_floatbitintdd): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_floatbitintdd.
Add DPD support.  Avoid calling __builtin_clzll with 0 argument.  Fix
big-endian support.
* soft-fp/floatbitinttd.c (__bid_floatbitinttd): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_floatbitinttd.
Add DPD support.  Avoid calling __builtin_clzll with 0 argument.  Fix
big-endian support.
* soft-fp/floattisd.c (__bid_floatbitintsd): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_floatbitintsd.
(__bid_floattisd): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine to
__dpd_floattisd.
* soft-fp/floattidd.c (__bid_floatbitintdd): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_floatbitintdd.
(__bid_floattidd): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine to
__dpd_floattidd.
* soft-fp/floattitd.c (__bid_floatbitinttd): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_floatbitinttd.
(__bid_floattitd): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine to
__dpd_floattitd.
* soft-fp/floatuntisd.c (__bid_floatbitintsd): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_floatbitintsd.
(__bid_floatuntisd): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine
to __dpd_floatuntisd.
* soft-fp/floatuntidd.c (__bid_floatbitintdd): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_floatbitintdd.
(__bid_floatuntidd): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine
to __dpd_floatuntidd.
* soft-fp/floatuntitd.c (__bid_floatbitinttd): For
!defined(ENABLE_DECIMAL_BID_FORMAT) redefine to __dpd_floatbitinttd.
(__bid_floatuntitd): For !defined(ENABLE_DECIMAL_BID_FORMAT) redefine
to __dpd_floatuntitd.

fold: DECL_VALUE_EXPR isn't simple [PR120400]

This PR noted that fold_truth_andor was wrongly changing && to & where the
RHS is a VAR_DECL with DECL_VALUE_EXPR; we can't assume that such can be
evaluated unconditionally.

To be more precise we could recurse into DECL_VALUE_EXPR, but that doesn't
seem worth bothering with since typical uses involve a COMPONENT_REF, which
is not simple.

PR c++/120400

gcc/ChangeLog:

* fold-const.cc (simple_operand_p): False for vars with
DECL_VALUE_EXPR.

c: Add -Wpedantic diagnostic for _Countof [PR117025]

It has been standardized in C2y.

PR c/117025

gcc/c/ChangeLog:

* c-parser.cc (c_parser_sizeof_or_countof_expression):
Add -Wpedantic diagnostic for _Countof in <= C23 mode.

gcc/testsuite/ChangeLog:

* gcc.dg/countof-compat.c: New test.
* gcc.dg/countof-no-compat.c: New test.
* gcc.dg/countof-pedantic.c: New test.
* gcc.dg/countof-pedantic-errors.c: New test.

Signed-off-by: Alejandro Colomar <alx@kernel.org>

c: Add <stdcountof.h> [PR117025]

PR c/117025

gcc/ChangeLog:

* Makefile.in (USER_H): Add <stdcountof.h>.
* ginclude/stdcountof.h: New file.

gcc/testsuite/ChangeLog:

* gcc.dg/countof-stdcountof.c: New test.

Signed-off-by: Alejandro Colomar <alx@kernel.org>

c: Add _Countof operator [PR117025]

This operator is similar to sizeof but can only be applied to an array,
and returns its number of elements.

FUTURE DIRECTIONS:

-  We should make it work with array parameters to functions,
   and somehow magically return the number of elements of the array,
   regardless of it being really a pointer.

Link: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3550.pdf>
Link: <https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117025>
Link: <https://inbox.sourceware.org/gcc/M8S4oQy--3-2@tutanota.com/T/>
Link: <https://inbox.sourceware.org/gcc-patches/20240728141547.302478-1-alx@kernel.org/T/#t>
Link: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3313.pdf>
Link: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3325.pdf>
Link: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3369.pdf>
Link: <https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3469.htm>
Link: <https://github.com/llvm/llvm-project/issues/102836>
Link: <https://thephd.dev/the-big-array-size-survey-for-c>
Link: <https://thephd.dev/the-big-array-size-survey-for-c-results>
Link: <https://stackoverflow.com/questions/37538/#57537491>

PR c/117025

gcc/ChangeLog:

* doc/extend.texi: Document _Countof operator.

gcc/c-family/ChangeLog:

* c-common.h (enum rid): Add RID_COUNTOF.
(c_countof_type): New function prototype.
* c-common.def (COUNTOF_EXPR): New tree.
* c-common.cc (c_common_reswords): Add RID_COUNTOF entry.
(c_countof_type): New function.

gcc/c/ChangeLog:

* c-tree.h (in_countof): Add global variable declaration.
(c_expr_countof_expr): Add function prototype.
(c_expr_countof_type): Add function prototype.
* c-decl.cc (start_struct, finish_struct): Add support for
_Countof.
(start_enum, finish_enum): Add support for _Countof.
* c-parser.cc (c_parser_sizeof_expression): New macro.
(c_parser_countof_expression): New macro.
(c_parser_sizeof_or_countof_expression): Rename function and add
support for _Countof.
(c_parser_unary_expression): Add RID_COUNTOF entry.
* c-typeck.cc (in_countof): Add global variable.
(build_external_ref): Add support for _Countof.
(record_maybe_used_decl): Add support for _Countof.
(pop_maybe_used): Add support for _Countof.
(is_top_array_vla): New function.
(c_expr_countof_expr, c_expr_countof_type): New functions.

gcc/testsuite/ChangeLog:

* gcc.dg/countof-compile.c: New test.
* gcc.dg/countof-vla.c: New test.
* gcc.dg/countof-vmt.c: New test.
* gcc.dg/countof-zero-compile.c: New test.
* gcc.dg/countof-zero.c: New test.
* gcc.dg/countof.c: New test.

Suggested-by: Xavier Del Campo Romero <xavi.dcr@tutanota.com>
Co-authored-by: Martin Uecker <uecker@tugraz.at>
Acked-by: "James K. Lowden" <jklowden@schemamania.org>
Signed-off-by: Alejandro Colomar <alx@kernel.org>

Fortran: Fix c_associated argument checks.

PR fortran/120049

gcc/fortran/ChangeLog:

* check.cc (gfc_check_c_associated): Use new helper functions.
Only call check_c_ptr_1 if optional c_ptr_2 tests succeed.
(check_c_ptr_1): Handle only c_ptr_1 checks.
(check_c_ptr_2): Expand checks for c_ptr_2 and handle cases
where there is no derived pointer in the gfc_expr and check
the inmod_sym_id only if it exists.
* misc.cc (gfc_typename): Handle the case for BT_VOID rather
than throw an internal error.

gcc/testsuite/ChangeLog:

* gfortran.dg/pr120049_a.f90: Update test directives.
* gfortran.dg/pr120049_b.f90: Update test directives
* gfortran.dg/pr120049_2.f90: New test.

Co-Authored-By: Steve Kargl <kargl@gcc.gnu.org>

Fortran: fix parsing of type parameter inquiries of substrings [PR101735]

Handling of type parameter inquiries of substrings failed to due either
parsing issues or not following or handling reference chains properly.

PR fortran/101735

gcc/fortran/ChangeLog:

* expr.cc (find_inquiry_ref): If an inquiry reference applies to
a substring, use that, and calculate substring length if needed.
* primary.cc (extend_ref): Also handle attaching to end of
reference chain for appending.
(gfc_match_varspec): Discrimate between arrays of character and
substrings of them. If a substring is taken from a character
component of a derived type, get the proper typespec so that
inquiry references work correctly.
(gfc_match_rvalue): Handle corner case where we hit a seemingly
dangling '%' and missed an inquiry reference. Try another match.

gcc/testsuite/ChangeLog:

* gfortran.dg/inquiry_type_ref_7.f90: New test.

c++, coroutines: Fix typos in TRUTH_ANDIF_EXPRs.

These were typoed to TRUTH_AND_EXPR (and then that got copied).

gcc/cp/ChangeLog:

* coroutines.cc (cp_coroutine_transform::build_ramp_function):
Replace TRUTH_AND_EXPR with TRUTH_ANDIF_EXPR in three places.

Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>

Enable afdo testing on AMD Zen3+

contrib/ChangeLog:

* gen_autofdo_event.py: Add support for AMD Zen 3 and
later CPUs.

gcc/ChangeLog:

* config/i386/gcc-auto-profile: regenerate.

Remove dead code in auto-profile.cc

This code to track what locations were used when reading auto-fdo profile
seems dead since the initial commit. Removed thus.

gcc/ChangeLog:

* auto-profile.cc (function_instance::mark_annotated): Remove.
(function_instance::total_annotated_count): Remove.
(autofdo_source_profile::mark_annotated): Remove.
(afdo_set_bb_count): Do not mark annotated locations.
(afdo_annotate_cfg): Likewise.

Fix IPA-SRA issue with reverse SSO on specific pattern

IPA-SRA generally works fine in the presence of reverse Scalar_Storage_Order
by propagating the relevant flag onto the newly generated MEM_REFs.  However
we have been recently faced with a specific Ada pattern that it does not
handle correctly: the 'Valid attribute applied to a floating-point component
of an aggregate type with reverse Scalar_Storage_Order.

The attribute is implemented by a call to a specific routine of the runtime
that expects a pointer to the object so, in the case of a component with
reverse SSO, the compiler first loads it from the aggregate to get back the
native storage order, but it does the load using an array of bytes instead
of the floating-point type to prevent the FPU from fiddling with the value,
which yields in the .original dump file:

  *(character[1:4] *) &F2b = VIEW_CONVERT_EXPR<character[1:4]>(item.f);

Of course that's a bit convoluted, but it does not seem that another method
would be simpler or even work, and using VIEW_CONVERT_EXPR to toggle the SSO
is supposed to be supported in any case (unlike aliasing or type punning).

The attached patch makes it work.  While the call to storage_order_barrier_p
from IPA-SRA is quite natural (the regular SRA has it too), the tweak to the
predicate itself is needed to handle the scalar->aggregate conversion, which
is admittedly awkward but again without clear alternative.

gcc/
* ipa-sra.cc (scan_expr_access): Also disqualify storage order
barriers from splitting.
* tree.h (storage_order_barrier_p): Also return false if the
operand of the VIEW_CONVERT_EXPR has reverse storage order.

gcc/testsuite/
* gnat.dg/sso19.adb: New test.
* gnat.dg/sso19_pkg.ads, gnat.dg/sso19_pkg.adb: New helper.

diagnostics: rework experimental-html output [PR116792]

This patch reworks the HTML output from the the option
  -fdiagnostics-add-output=experimental-html
so that for source quoting and path printing, rather than simply adding
the textual output inside <pre> elements, it breaks up the output
into HTML tags reflecting the structure of the output, using CSS, SVG
and a little javascript to help the user navigate the diagnostics and
the events within any paths.

This uses ideas from the patch I posted in:
  https://gcc.gnu.org/pipermail/gcc-patches/2020-November/558603.html
but reworks the above patch so that:
* rather than printing source to a pretty_printer, the HTML is created
  by building a DOM tree in memory, using a new xml::printer class.  This
  should be less error-prone than the pretty_printer approach, since it
  ought to solve escaping issues.  Instead of a text vs html boolean,
  the code is generalized via templates with to_text vs to_html sinks.
  This templatization applies both to path-printing and
  diagnostic-show-locus.cc.
* the HTML output can have multiple diagnostics and multiple paths rather
  than just a single path.  The javascript keyboard controls now cycle
  through all diagnostics and all events within them

An example of the output can be seen at:
  https://dmalcolm.fedorapeople.org/gcc/2025-05-27/malloc-1.c.html
where the keys "j" and "k" cycle through diagnostics and events
within them.

gcc/ChangeLog:
PR other/116792
* diagnostic-format-html.cc: Define INCLUDE_STRING.
Include "xml.h", "xml-printer.h", and "json.h".
(html_generation_options::html_generation_options): New.
(namespace xml): Move decls to xml.h and convert from using
label_text to std::string.
(xml::text::write_as_xml): Reimplement indentation so it is done
by this node, rather than the parent.
(xml::node_with_children::add_text): Convert from label_text to
std::string.  Consolidate runs of text into a single node.
(xml::document::write_as_xml): Reimplement indentation.
(xml::element::write_as_xml): Reimplement indentation so it is
done by this node, rather than the parent.  Convert from
label_text to std::string.  Update attribute-printing to new
representation to preserve insertion order.
(xml::element::set_attr): Convert from label_text to std::string.
Record insertion order.
(xml::raw::write_as_xml): New.
(xml::printer::printer): New.
(xml::printer::push_tag): New.
(xml::printer::push_tag_with_class): New.
(xml::printer::pop_tag): New.
(xml::printer::set_attr): New.
(xml::printer::add_text): New.
(xml::printer::add_raw): New.
(xml::printer::push_element): New.
(xml::printer::append): New.
(xml::printer::get_insertion_point): New.
(html_builder::add_focus_id): New.
(html_builder::m_html_gen_opts): New field.
(html_builder::m_head_element): New field.
(html_builder::m_next_diag_id): New field.
(html_builder::m_ui_focus_ids): New field.
(make_div): Convert from label_text to std::string.
(make_span): Likewise.
(HTML_STYLE): New.
(HTML_SCRIPT): New.
(html_builder::html_builder): Fix indentation.  Add
"html_gen_opts" param.  Initialize new fields.  Reimplement
using xml::printer.  Optionally add style and script tags.
(class html_path_label_writer): New.
(html_builder::make_element_for_diagnostic): Convert from
label_text to std::string. Set "id" on "gcc-diagnostic" and
"gcc-message" <div> elements; add the latter to the focus ids.
Use diagnostic_context::maybe_show_locus_as_html rather than
html_builder::make_element_for_source.  Use print_path_as_html
rather than html_builder::make_element_for_path.
(html_builder::make_element_for_source): Drop.
(html_builder::make_element_for_path): Drop.
(html_builder::make_element_for_patch): Convert from label_text to
std::string.
(html_builder::make_metadata_element): Likewise.  Use
xml::printer.
(html_builder::make_element_for_metadata): Convert from label_text
to std::string.
(html_builder::emit_diagram): Expand comment.
(html_builder::flush_to_file): Write out initializer for
"focus_ids" into javascript.
(html_output_format::html_output_format): Add param
"html_gen_opts" and use it to initialize m_builder.
(html_file_output_format::html_file_output_format): Likewise, to
initialize base class.
(make_html_sink): Likewise, to pass to ctor.
(selftest::test_html_diagnostic_context::test_html_diagnostic_context):
Set up html_generation_options.
(selftest::html_buffered_output_format::html_buffered_output_format):
Add html_gen_opts param.
(selftest::test_simple_log): Add id attributes to expected text
for "gcc-diagnostic" and "gcc-message" elements.  Update
whitespace for indentation fixes.
(selftest::test_metadata): Update whitespace for indentation
fixes.
(selftest::test_printer): New selftest.
(selftest::test_attribute_ordering): New selftest.
(selftest::diagnostic_format_html_cc_tests): Call the new
selftests.
* diagnostic-format-html.h (struct html_generation_options): New.
(make_html_sink): Add "html_gen_opts" param.
(print_path_as_html): New decl.
* diagnostic-path-output.cc: Define INCLUDE_MAP.  Add includes of
"diagnostic-format-html.h", "xml.h", and "xml-printer.h".
(path_print_policy::path_print_policy): Add ctor.
(path_print_policy::get_diagram_theme): Fix whitespace.
(struct stack_frame): New.
(begin_html_stack_frame): New function.
(end_html_stack_frame): New function.
(emit_svg_arrow): New function.
(event_range::print): Rename to...
(event_range::print_as_text): ...this.  Update call to
diagnostic_start_span.
(event_range::print_as_html): New, based on the above, but ported
from pretty_printer to xml::printer.
(thread_event_printer::print_swimlane_for_event_range): Rename
to...
(thread_event_printer::print_swimlane_for_event_range_as_text):
...this.  Update for renaming of event_range::print to
event_range::print_as_text.
(thread_event_printer::print_swimlane_for_event_range_as_html):
New.
(print_path_summary_as_text): Update for "_as_text" renaming.
(print_path_summary_as_html): New.
(print_path_as_html): New.
* diagnostic-show-locus.cc: Add defines of INCLUDE_MAP and
INCLUDE_STRING.  Add includes of "xml.h" and "xml-printer.h".
(struct char_display_policy): Replace "m_print_cb" with
"m_print_text_cb" and "m_print_html_cb".
(struct to_text): New.
(struct to_html): New.
(get_printer): New.
(default_diagnostic_start_span_fn<to_text>): New.
(default_diagnostic_start_span_fn<to_html>): New.
(class layout): Update "friend class layout_printer;" for
template.
(enum class margin_kind): New.
(class layout_printer): Convert into a template.
(layout_printer::m_pp): Replace field with...
(layout_printer::m_sink): ...this.
(layout_printer::m_colorizer): Drop field in favor of a pointer
in the "to_text" sink.
(default_print_decoded_ch): Convert into a template.
(escape_as_bytes_print): Likewise.
(escape_as_unicode_print): Likewise.
(make_char_policy): Update to use both text and html callbacks.
(layout_printer::print_gap_in_line_numbering): Replace with...
(layout_printer<to_text>::print_gap_in_line_numbering): ...this
(layout_printer<to_html>::print_gap_in_line_numbering): ...and
this.
(layout_printer::print_source_line): Convert to template, using
m_sink.
(layout_printer::print_leftmost_column): Likewise.
(layout_printer::start_annotation_line): Likewise.
(layout_printer<to_text>::end_line): New.
(layout_printer<to_html>::end_line): New.
(layout_printer::print_annotation_line): Convert to template,
using m_sink.
(class line_label): Add field m_original_range_idx.
(layout_printer<to_text>::begin_label): New.
(layout_printer<to_html>::begin_label): New.
(layout_printer<to_text>::end_label): New.
(layout_printer<to_html>::end_label): New.
(layout_printer::print_any_labels): Convert to template, using
m_sink.
(layout_printer::print_leading_fixits): Likewise.
(layout_printer::print_trailing_fixits): Likewise.
(layout_printer::print_newline): Drop.
(layout_printer::move_to_column): Convert to template, using
m_sink.
(layout_printer::show_ruler): Likewise.
(layout_printer::print_line): Likewise.
(layout_printer::print_any_right_to_left_edge_lines): Likewise.
(layout_printer::layout_printer): Likewise.
(diagnostic_context::maybe_show_locus_as_html): New.
(diagnostic_source_print_policy::diagnostic_source_print_policy):
Update for split of start_span_cb into text vs html variants.
(diagnostic_source_print_policy::print): Update for use of
templates; use to_text.
(diagnostic_source_print_policy::print_as_html): New.
(layout_printer::print): Convert to template, using m_sink.
(selftest::make_element_for_locus): New.
(selftest::make_raw_html_for_locus): New.
(selftest::test_layout_x_offset_display_utf8): Update for use of
templates.
(selftest::test_layout_x_offset_display_tab): Likewise.
(selftest::test_one_liner_caret_and_range): Add test coverage of
HTML output.
(selftest::test_one_liner_labels): Likewise.
* diagnostic.cc (diagnostic_context::initialize): Update for split
of start_span_cb into text vs html variants.
(default_diagnostic_start_span_fn): Move to
diagnostic-show-locus.cc, converting to template.
* diagnostic.h (class xml::printer): New forward decl.
(diagnostic_start_span_fn): Replace typedef with "using",
converting to a template.
(struct to_text): New forward decl.
(struct to_html): New forward decl.
(get_printer): New decl.
(diagnostic_location_print_policy::print_text_span_start): New
decl.
(diagnostic_location_print_policy::print_html_span_start): New
decl.
(class html_label_writer): New.
(diagnostic_source_print_policy::print_as_html): New decl.
(diagnostic_source_print_policy::get_start_span_fn): Replace
with...
(diagnostic_source_print_policy::get_text_start_span_fn): ...this
(diagnostic_source_print_policy::get_html_start_span_fn): ...and
this
(diagnostic_source_print_policy::m_start_span_cb): Replace with...
(diagnostic_source_print_policy::m_text_start_span_cb): ...this
(diagnostic_source_print_policy::m_html_start_span_cb): ...and
this.
(diagnostic_context::maybe_show_locus_as_html): New decl.
(diagnostic_context::m_text_callbacks::m_start_span): Replace
with...
(diagnostic_context::m_text_callbacks::m_text_start_span): ...this
(diagnostic_context::m_text_callbacks::m_html_start_span): ...and
this.
(diagnostic_start_span): Update for template change.
(diagnostic_show_locus_as_html): New inline function.
(default_diagnostic_start_span_fn): Convert to template.
* doc/invoke.texi (experimental-html): Add "css" and "javascript"
keys.
* opts-diagnostic.cc (html_scheme_handler::make_sink): Likewise.
* selftest-diagnostic.cc
(selftest::test_diagnostic_context::start_span_cb): Update for
template changes.
* selftest-diagnostic.h
(selftest::test_diagnostic_context::start_span_cb): Likewise.
* xml-printer.h: New file.
* xml.h: New file, based on material in diagnostic-format-html.cc,
but using std::string rather than label_text.
(xml::element::m_key_insertion_order): New field.
(struct xml::raw): New.

gcc/fortran/ChangeLog
PR other/116792
* error.cc (gfc_diagnostic_start_span): Update for diagnostic.h
changes.

gcc/testsuite/ChangeLog:
PR other/116792
* gcc.dg/html-output/missing-semicolon.c: Add ":javascript=no" to
html output.
* gcc.dg/html-output/missing-semicolon.py: Move repeated
definitions into lib/htmltest.py.
* gcc.dg/plugin/diagnostic_group_plugin.cc: Update for template
changes.
* gcc.dg/plugin/diagnostic-test-metadata-html.c: Add
":javascript=no" to html output.  Add
"-fdiagnostics-show-line-numbers".
* gcc.dg/plugin/diagnostic-test-metadata-html.py: Move repeated
definitions into lib/htmltest.py.  Add checks of annotated source.
* gcc.dg/plugin/diagnostic-test-paths-2.c: Add ":javascript=no" to
html output.
* gcc.dg/plugin/diagnostic-test-paths-2.py: Move repeated
definitions into lib/htmltest.py.  Add checks of execution path.
* gcc.dg/plugin/diagnostic-test-paths-4.c: Add
-fdiagnostics-add-output=experimental-html:javascript=no.  Add
invocation ot diagnostic-test-paths-4.py.
* gcc.dg/plugin/diagnostic-test-paths-4.py: New test script.
* gcc.dg/plugin/diagnostic-test-show-locus-bw-line-numbers.c: Add
-fdiagnostics-add-output=experimental-html:javascript=no.  Add
invocation of diagnostic-test-show-locus.py.
* gcc.dg/plugin/diagnostic-test-show-locus.py: New test script.
* lib/htmltest.py: New test support script.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

diagnostics: split path-printing into a new diagnostic-path-output.cc

No functional change intended.

gcc/ChangeLog:
* Makefile.in (OBJS-libcommon): Add diagnostic-path-output.o.
* diagnostic-path-output.cc: New file, taken from material in
diagnostic-path.cc.
* diagnostic-path.cc: Drop includes of
"diagnostic-macro-unwinding.h", "intl.h", "gcc-rich-location.h",
"diagnostic-color.h", "diagnostic-event-id.h",
"diagnostic-label-effects.h", "pretty-print-markup.h",
"selftest.h", "selftest-diagnostic.h",
"selftest-diagnostic-path.h", "text-art/theme.h", and
"diagnostic-format-text.h".
(class path_print_policy): Move to diagnostic-path-output.cc.
(class path_label): Likewise.
(can_consolidate_events): Likewise.
(class per_thread_summary): Likewise.
(struct event_range): Likewise.
(struct path_summary): Likewise.
(per_thread_summary::interprocedural_p): Likewise.
(path_summary::path_summary): Likewise.
(write_indent): Likewise.
(base_indent): Likewise.
(per_frame_indent): Likewise.
(class thread_event_printer): Likewise.
(print_path_summary_as_text): Likewise.
(class element_event_desc): Likewise.
(diagnostic_text_output_format::print_path): Likewise.
(selftest::path_events_have_column_data_p): Likewise.
(selftest::test_empty_path): Likewise.
(selftest::test_intraprocedural_path): Likewise.
(selftest::test_interprocedural_path_1): Likewise.
(selftest::test_interprocedural_path_2): Likewise.
(selftest::test_recursion): Likewise.
(class selftest::control_flow_test): Likewise.
(selftest::test_control_flow_1): Likewise.
(selftest::test_control_flow_2): Likewise.
(selftest::test_control_flow_3): Likewise.
(selftest::assert_cfg_edge_path_streq): Likewise.
(ASSERT_CFG_EDGE_PATH_STREQ): Likewise.
(selftest::test_control_flow_4): Likewise.
(selftest::test_control_flow_5): Likewise.
(selftest::test_control_flow_6): Likewise.
(selftest::control_flow_tests): Likewise.
(selftest::diagnostic_path_cc_tests): Likewise, renaming
accordingly.
* selftest-run-tests.cc (selftest::run_tests): Update for
move of path-printing selftests.
* selftest.h (selftest::diagnostic_path_cc_tests): Replace decl
with...
(selftest::diagnostic_path_output_cc_tests): ...this.

Signed-off-by: David Malcolm <dmalcolm@redhat.com>

s390x: Fix bootstrap.

A typo in the mnemonic attribute caused a failed bootstrap. Not sure
how that passed the bootstrap done before committing.

gcc/ChangeLog:

* config/s390/vector.md(*vec_extract<mode>): Fix mnemonic.

Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>

tree-optimization/117965 - phiprop validity checking is too strict

The PR shows that when using std::clamp from the C++ standard library
and there is surrounding code using exceptions then phiprop can fail
to simplify the code so phiopt can turn the clamping into efficient
min/max operations.

The validation code is needlessly complicated, steming from the
time we had memory-SSA with multiple virtual operands. The following
simplifies this, thereby fixing this issue.

PR tree-optimization/117965
* tree-ssa-phiprop.cc (phivn_valid_p): Remove.
(propagate_with_phi): Pass in virtual PHI node from BB,
rewrite load motion validity check to require the same
virtual use along all paths.

* g++.dg/tree-ssa/pr117965-1.C: New testcase.
* g++.dg/tree-ssa/pr117965-2.C: Likewise.

asf: Fix calling of emit_move_insn on registers of different modes [PR119884]

This patch uses `lowpart_subreg` for the base register initialization,
instead of zero-extending it. We had tried this solution before, but
we were leaving undefined bytes in the upper part of the register.
This shouldn't be happening as we are supposed to write the whole
register when the load is eliminated. This was occurring when having
multiple stores with the same offset as the load, generating a
register move for all of them, overwriting the bit inserts that were
inserted before them.

In order to overcome this, we are removing redundant stores from the
sequence, i.e. stores that write to addresses that will be overwritten
by stores that come after them in the sequence. We are using the same
bitmap that is used for the load elimination check, to keep track of
the bytes that are written by each store.

Also, we are now allowing the load to be eliminated even when there
are overlaps between the stores, as there is no obvious reason why we
shouldn't do that, we just want the stores to cover all of the load's
bytes.

Bootstrapped/regtested on AArch64 and x86_64.

PR rtl-optimization/119884

gcc/ChangeLog:

* avoid-store-forwarding.cc (process_store_forwarding):
Use `lowpart_subreg` for the base register initialization
and remove redundant stores from the store/load sequence.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr119884.c: New test.

sbitmap: Add bitmap_all_bits_in_range_p function

This patch adds the `bitmap_all_bits_in_range_p` function in sbitmap,
which checks if all the bits in a range are set.

Helper function `bitmap_bit_in_range_p` has also been added, in order
to be used by `bitmap_all_bits_in_range_p` and
`bitmap_any_bit_in_range_p`. When the function's `any_inverted`
parameter is true, the function checks if any of the bits in the range
is unset, otherwise it checks if any of them is set.

Function `bitmap_any_bit_in_range_p` has been updated to call
`bitmap_bit_in_range_p` with the `any_inverted` parameter set to
false, retaining its previous functionality.

Function `bitmap_all_bits_in_range_p` calls `bitmap_bit_in_range_p`
with `any_inverted` set to true and returns the negation of the
result, i.e. true if all the bits in the range are set.

gcc/ChangeLog:

* sbitmap.cc (bitmap_any_bit_in_range_p):
Call and return the result of `bitmap_bit_in_range_p` with the
`any_inverted` parameter set to false.
(bitmap_bit_in_range_p): New function.
(bitmap_all_bits_in_range_p): New function.
* sbitmap.h (bitmap_all_bits_in_range_p): New function.

sbitmap: Rename bitmap_bit_in_range_p to bitmap_any_bit_in_range_p

This patch renames `bitmap_bit_in_range_p` to
`bitmap_any_bit_in_range_p` to better reflect its purpose.

gcc/ChangeLog:

* sbitmap.cc (bitmap_bit_in_range_p): Renamed the function.
(bitmap_any_bit_in_range_p): New function name.
(bitmap_bit_in_range_p_checking): Renamed the function.
(bitmap_any_bit_in_range_p_checking): New function name.
(test_set_range): Updated function calls to use the new name.
(test_bit_in_range): Likewise.
* sbitmap.h (bitmap_bit_in_range_p): Renamed the function.
(bitmap_any_bit_in_range_p): New function name.
* tree-ssa-dse.cc (live_bytes_read):
Updated function call to use the new name.

[RISC-V] Add andi+bclr synthesis

So this patch from Shreya adds the ability to use andi + a series of bclr insns
to synthesize a logical AND, much like we're doing for IOR/XOR using ori+bset
or their xor equivalents.

This would regress from a code quality standpoint if we didn't make some
adjustments to a handful of define_insn_and_split patterns in the riscv backend
which support the same kind of idioms.

Essentially we turn those define_insn_and_split patterns into the simple
define_splits they always should have been.  That's been the plan since we
started down this path -- now is the time to make that change for a subset of
patterns.  It may be the case that when we're finished we may not even need
those patterns.  That's still TBD.

I'm aware of one minor regression in xalan.  As seen elsewhere, combine
reconstructs the mask value, uses mvconst_internal to load it into a reg then
an and instruction.  That looks better than the operation synthesis, but only
because of the mvconst_internal little white lie.

This patch does help in a variety of places.  It's fairly common in gimple.c
from 502.gcc to see cases where we'd use bclr to clear a bit, then set the
exact same bit a few instructions later.  That was an artifact of using a
define_insn_and_split -- it wasn't obvious to combine that we had two
instructions manipulating the same bit.  Now that is obvious to combine and the
redundant operation gets removed.

This has spun in my tester with no regressions on riscv32-elf and riscv64-elf.
Hopefully the baseline for the tester as stepped forward 🙂

gcc/
* config/riscv/bitmanip.md (andi+bclr splits): Simplified from
prior define_insn_and_splits.
* config/riscv/riscv.cc (synthesize_and): Add support for andi+bclr
sequences.

Co-authored-by: Jeff Law <jlaw@ventanamicro.com>

libstdc++: Fix some names.cc test failures on AIX

libstdc++-v3/ChangeLog:

* testsuite/17_intro/names.cc [_AIX] (n): Undefine.
* testsuite/experimental/names.cc [_AIX] (ptr): Undefine.

libstdc++: Fix test failures for 32-bit AIX

With -maix32 (the default) we only have 16-bit wchar_t so these tests
fail. The debug.cc one is because we use -fwide-exec-charset=UTF-32BE
which tries to encode each wide character as four bytes in a 2-byte
wchar_t. The format.cc one is because the clown face character can't be
encoded in a single 16-bit wchar_t.

libstdc++-v3/ChangeLog:

* testsuite/std/format/debug.cc: Disable for targets with 16-bit
wchar_t.
* testsuite/std/format/functions/format.cc: Use -DUNICODE for
targets with 32-bit wchar_t.
(test_unicode) [UNICODE]: Only run checks when UNICODE is
defined.

libstdc++: Regenerate include/Makefile.in

libstdc++-v3/ChangeLog:

* include/Makefile.in: Regenerate.

RISC-V: Add test for vec_duplicate + vxor.vv combine case 1 with GR2VR cost 0, 1 and 2

Add asm dump check test for vec_duplicate + vxor.vv combine to vxor.vx,
with the GR2VR cost is 0, 1 and 2.

The below test suites are passed for this patch.
* The rv64gcv fully regression test.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i16.c: Add asm check
for vxor.vx combine.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-4-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-5-u8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-i8.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-6-u8.c: Ditto.

Signed-off-by: Pan Li <pan2.li@intel.com>